Learning-based Framework for Transit Assignment Modeling Under Information Provision Mohamed Wahba1 and Amer Shalaby2
Abstract The modeling of service dynamics has been the focus of recent developments in the field of transit assignment modeling. The emerging focus on dynamic service modeling requires a corresponding shift in transit demand modeling to represent appropriately the dynamic behaviour of passengers and their responses to Intelligent Transportation Systems technologies. This paper presents the theoretical development of a departure time and transit path choice model based on the Markovian Decision Process. This model is the core of the MIcrosimulation Learning-based Approach to TRansit Assignment (MILATRAS). Passengers, while traveling, move to different locations in the transit network at different points in time (e.g. at stop, on board), representing a stochastic process. This stochastic process is partly dependent on the transit service performance and partly controlled by the transit rider’s trip choices. This can be analyzed as a Markovian Decision Process, in which actions are rewarded and hence passengers’ optimal policies for maximizing the trip utility can be estimated. The proposed model is classified as a bounded rational model, with a constant utility term and a stochastic choice rule. The model is appropriate for modeling information provision since it distinguishes between individual’s experience with the service performance and information provided about system dynamics. Keywords Travel choice, Markovian decision process, Learning, Information provision
1
M. Wahba Adjunct Professor, Civil Engineering Department, University of Toronto, 35 St. George Street, Toronto, Ontario, Canada M5S 1A4, email:
[email protected] 2
A. Shalaby Associate Professor, Civil Engineering Department, University of Toronto, 35 St. George Street, Toronto, Ontario, Canada M5S 1A4
1
Introduction This paper presents the theoretical development of a departure time and transit path choice model based on the Markovian Decision Process. This model is the core of MILATRAS MIcrosimulation Learning-based Approach to TRansit ASsignment (Wahba 2008; Wahba and Shalaby 2009). The proposed model considers multiple dimensions of the transit path choice problem; such dimensions were either simplified (e.g. stop choice) or ignored (e.g. departure time) in most previous approaches. The developed assignment procedure is capable of modeling the day-to-day and within-day dynamics of the transit service, as well as passengers’ responses under information provision. Nuzzolo et al. (2001) presented a dynamic stochastic path choice model which considers variations in transit services and passengers’ learning and adaptation. The model is called “doubly dynamic” as it accounts for within-day and day-to-day variations. Passengers are assumed to make a discrete choice of trip decisions based on travel disutility and the desired arrival time at destination. Nuzzolo et al. (2003) report on the theoretical foundation of the schedule-based approach to modeling travelers’ behaviour in transit networks. In the assignment procedure, the path choice model employs the Logit/Probit formulations to calculate the choice probabilities. The run-choice is varied in both within-day and day-to-day assignment based on an exponential updating mechanism of run attributes. This is not considered a learning-driven approach as the learning process does not follow an explicit learning algorithm, and it is done on an aggregate level using a common pool of information that is accessed by all travelers. Recently, traffic assignment procedures implemented learning-based models which have been shown to result in different and more realistic assignments relative to conventional methods (Nakayama et al. 1999; Arentze and Timmermans 2003; Ettema et al. 2005). With an agent-based representation, individual travelers are explicitly modeled as cognitive agents. This has led to the shift to an agent-based modeling framework that has been successfully and 2
aggressively applied in activity-based models (Timmermans et al. 2003; Salvini and Miller 2003; Roorda and Miller 2006) and in traffic assignment models with information provision (de Palma and Marchal 2002; Rossetti and Liu 2005; Ettema et al. 2005). As such, the next generation of dynamic transit assignment algorithms should consider adopting learning-based and multi-agent microsimulation concepts for consistency with emerging traffic assignment models and for integration with state-of-the-art activity-based modeling frameworks. In response, the authors developed MILATRAS (Wahba 2008, Wahba and Shalaby 2009) for the modeling of day-to-day and within-day dynamics of the transit assignment problem. The proposed transit path choice model within MILATRAS is inspired by the nonequilibrium framework for assignment procedures proposed by Cascetta and Cantarella (1991) and the agent-based, learning-driven path choice model presented by Ettema et al. (2005). It applies similar concepts to the transit assignment problem and passenger behavioural modeling, taking into consideration the distinctive features of public transport networks and transit rider travel behaviour. Cascetta and Cantarella (1991) addressed the lack of explicit modeling of day-to-day and within-day variations in supply and demand within static equilibrium procedures where a fixed-point solution is searched. They argued that the equilibrium framework is not structurally compatible with observed phenomena, such as habitual behaviour, random variations in demand and network conditions, and transient states of network conditions following modifications. They proposed a process model that generates fixed-point or steady-state arcflows (approximating the mean of the distribution of the corresponding stochastic process model and comparable to Stochastic User Equilibrium outputs) with the explicit representation of dayto-day adjustments in travelers’ decisions and within-day variations of demand. The evolution of the path-flows in successive days is then modeled as a Markovian stochastic process. Cantarella and Cascetta (1995) proposed conditions for the existence and uniqueness of stationary probability distributions for the stochastic process model. The model, however, did 3
not consider en-route replanning due to variations in network conditions or information provision. Examples of research efforts in this direction include Davis and Nihan (1993), Watling (1996), and Hazelton and Watling (2004). Ettema et al. (2005) represent learning and experience at the individual level for the auto-assignment problem. They promote the use of microsimulation methods to model the system-level phenomena, such as congestion, based on individual-level choices. They used a microsimulation platform to account for the travel time uncertainties on departure time choices, thus allowing for the modeling of day-to-day adjustments of travelers’ behaviour. They used mental models to represent traveler’s experience of network conditions. Travelers are assumed to be decision-makers who decide on the trip departure time from origin, for a fixed route. The study, therefore, did not consider other en-route trip choices, e.g. route choice. For a more comprehensive exposition to methodological developments of transit assignment models over the past few decades, see Wahba (2008).
Markovian Decision Process Markov Chains and the Ergodic Theory A Markov Chain is an ordered sequence of system state occurrences from a stochastic process, whose system variables satisfy the Markov property. The Markov property means that the conditional probability distribution of future states of the process (i.e. values of the random variable at future time instances), given the present state and all past states, depends only upon the present state and not on any past state. The present state of the system is sufficient and necessary to predict the future state of the process. This is sometimes referred to as the ‘memoryless property’ of the Markov process. The Ergodic Theory provides the foundation for studying stochastic processes that have the Markovian property. It establishes the conditions under which a Markov Chain can be analysed to determine its transition probabilities and steady state behaviour. The Ergodic 4
Theory states that if a Markov Chain is ergodic, then a unique steady state distribution exists and is independent of the initial conditions (Petersen 1990). A Markov Chain is called ‘ergodic’ if, and only if, the chain is irreducible, positive-recurrent and aperiodic. The positive-recurrent and aperiodic characteristics guarantee that the steady state distribution exists, while irreducibility guarantees that the steady state distribution is unique and independent of initial conditions. Actions, Rewards and Decision Making in Markov Chains In Markov processes, the outcome of the stochastic process is entirely random, or is not controlled but rather observed. Situations with the underlying stochastic process having the Markov property and where outcomes are partly random and partly under the control of a decision maker are studied using Markov Decision Processes (MDPs). A Markov Decision Process (MDP) is a discrete-time stochastic process, where a decision-maker partly controls the transition probabilities through deliberate choices at discrete points in time. In an MDP, in addition to the system state representation, at each state there are several actions from which the decision maker must choose. This decision-making behaviour, consequently, influences the transition probability from one system state to another. Associated with the decision-making capability is the concept of a reward. Rewards can be earned for each state visited, or for each state-action pair implemented. The decision-maker in an MDP has a goal of optimizing some cumulative function of the rewards, implying that the decision-maker acts rationally. The main difference between Markov Processes and Markov Decision Processes is the addition of actions (allowing choices) and rewards (giving motivation). If the reward and return functions are specified, then the MDP becomes an (stochastic) optimization problem where the decision variables are the transition probabilities and the objective function is about the maximization of the expected return. When the underlying Markov Process is ergodic, then a unique steady state distribution exists. This means that a unique state-transition probability matrix from state i to state j, Prij , exists. It ensures that the 5
above formulated optimization problem has a solution and it is unique (i.e. global optimum). A solution, or a possible probability matrix represents a policy
that the agent can follow where
π → Pr(i, j). This policy guides the agent as to what actions to choose at any given state regardless of prior history. The optimal solution represents the optimal policy π* that maximizes
the expected return. Reinforcement Learning (RL) represents an approach to estimate the policy that optimizes the expected return. RL uses a formal framework defining the interaction between a goal-directed agent and its environment in terms of states, actions and rewards (Sutton and Barto 1998). It is a computational approach to model goal-directed learning and decision-making. It uses exploitation of experience and exploration of available actions to converge to the policy that yields the maximum utility. The agent will choose actions at each state according to the optimal policy π* returned. The elements of the optimal policy can then be used to study agent’s choices and explain agent’s preferences in various environments.
The Transit Path Choice Model Transit Path Choice Problem as a Markovian Decision Process Travelers are assumed to be goal-directed intelligent agents. A traveler has a goal of minimizing the travel cost (e.g. time, money, inconveniences, etc.) and for some trip purposes, a goal of maximizing the probability of arriving on a desired arrival time. In order to achieve their goals, travelers follow policies to optimize the utility of their trip. These policies manifest themselves through the observed travel behaviour (i.e. trip choices) of individual passengers. Assuming that travelers are rational, it is logical to expect that they follow their optimal policy. Based on the notion that actions receive rewards (or penalties), it is also expected that travelers value their choices according to a utility function. Although this utility function is not directly observed by the modeler, it is known to the individual traveler. Note that the term “return” or “value”, in the
6
context of transit path choice modeling refers to an estimate of a random variable and is meant to reflect an expected value or an expected return. In the transit assignment context, a transit user is faced with various types of choices during the trip. A passenger needs to decide on a departure time, an origin stop, a run associated with a route to board, and possibly a connection to transfer at. In addition, in a multimodal system, access and egress mode choices to and from the transit service may be included. For a recurring trip such as home-based work trip, a passenger settles on the trip choices by trying (or judging) different options until certain path choices prove to be the optimal for this particular trip’s objectives. This optimization process is based on the passenger’s cumulative experience with the transit service performance. We are interested in the process through which passengers settle on their trip choices. This is important to be able to model the shifts in trip choices when changes to the transit service are introduced. A passenger at origin has a choice set of different combinations of departure time and path choices. Over time, a passenger chooses the combination of departure time, origin stop, run (or route), and transfer stop that minimizes the perceived travel cost. In order to find this combination, the passenger must have valued (tried or judged) all other possible choices and found that the chosen combination outperforms all other options, with reference to a utility function. At the origin, the passenger has the objective of minimizing travel cost for the trip. At a transfer stopError! Reference source not found., the passenger’s objective is still to minimize the travel cost for the remainder of the trip, regardless of prior choices. This does not mean that previous choices are not important; it is rather the relative impact of prior choices on future decisions than the actual choices. This influence on future choices is expressed in the value of p (the choice probability for one alternative). The value of p also depends on the transit service performance. The outcome of the passenger’s choice, in turn, affects the transit service
7
performance through, for example, boarding a vehicle reduces the available capacity on the chosen run by 1. The change in the location of the transit rider during the trip represents a stochastic process, with multiple states and transition probabilities between states. Traveling passengers move to different locations in the transit network at different points in time (e.g. at stop, on board). The corresponding state of a stochastic process in this case is represented by the location of the transit rider, which takes a value out of a state space. The state space represents all possible locations for a transit rider; it consists of possible origin stop choices, route choices, transfer stop choices and destination stop choices. A passenger at a transfer stopError! Reference source not found. has made three transitions: from origin-location to origin-stoplocation (associated with an origin stop choice), from origin-stop-location to onboard-routelocation (associated with a route/run choice), and from onboard-route-location to transfer-stoplocation (associated with a transfer stop choice). Each transition depends on the current state of the passenger (i.e. location of the passenger) and on the transit service performance. The passenger would need to make (at least) two more transitions to reach the trip destination: from transfer-stop-location to onboard-route-location (associated with a route/run choice) and from onboard-route-location to destination-stop-location and final destination. Given the present state (i.e. location), the passenger decides on future transitions. This resembles a stochastic process with the Markov property. It should be stated that the location information only might not be sufficient to decide on future transitions; information related to previous transitions, such as fare paid, remaining time to scheduled arrival time, or real-time information about the transit system performance, are important. Instead of having to memorize previous transitions, the present state (or the system state) should summarize all information that is assumed to be available to the passenger at any time t. This Markovian Stochastic Process is partly dependent on the transit service performance and partly controlled by the transit rider. This can be analyzed as a Markovian 8
Decision Process (MDP). In an MDP, actions are rewarded and hence optimal policies can be estimated. For an origin stop choice, there is an immediate cost expressed as the value of the travel cost to access the origin stop (time and money). Also, there is a future value of a specific stop choice expressed in the expected travel cost of the subsequent available route and transfer connection choices. For a route choice, there is an immediate cost expressed as the value of waiting time. A future cost, associated with a route choice, is related to the value of possible transfer connections and the probability of arriving at the desired arrival time. Assuming passengers are rational, it is logical to expect passengers to follow their optimal policy and to optimize their trip return or cost. The effect of such optimal policy is observed through disaggregate individual choices Prij
observed*
, and aggregate route loads
*
Lobserved . The value of p (the choice probability for one alternative)Error! Reference source not found. represents one cell in the state transition probability matrix Prij Markov process is ergodic, then there exists a unique optimal policy
observed*
∗
. If the underlying
that allows passengers ∗
to optimize their return. Associated with π* is a steady-state transition probability matrix Prij . Passengers devise their optimal policies based on a value function for the state-action evaluation. While the optimal policy π* and its effect through route loads is observed, the only unknown to the modeler is the value function. By reconstructing the transit path choice problem as a MDP, the value function used by individual passengers, can be estimated (Wahba 2008). This is similar to the process of Inverse Reinforcement Learning (IRL) (Russell 1998). Mental Model Structure In order to ensure the uniqueness and existence of the optimal policy π* in the reconstructed MDP, the underlying Markov process needs to be ergodic. To show that a Markov Chain is ergodic, its state-action transition diagram has to be irreducible, positive-recurrent, and aperiodic.
9
The state-action transition diagram for the transit path choice problem is represented by the ‘mental model’ of the relevant parts of the transit network. For simplicity of analysis, and without losing generality, assume that the system state is only represented by the location of the passenger during the trip. At each state s, there is a set of possible actions ai ∈ A(s), available for each passenger. When a passenger is at state to as the state-action pair (s,a). Action
and decides to take action a, this is referred
moves the passenger from state s to state s', expressed
as a: s → s'. The reward from or the value of a state-action pair is measured based on the travel cost association with choosing (s,a). The schematic representation of state transitions in the underlying stochastic process is shown in Figure 1Error! Reference source not found., where the system state variable represents the location of the passenger during the transit trip. The state space has 7 possible states: Origin O, Departure Time
, Origin Stop G, On-board V, Off-Stop F, On-Stop N,
Destination Stop N. The underlying stochastic process has the Markov property, since the present location of the passenger is necessary and sufficient for predicting the future location of the passenger, regardless of prior locations. Figure 1 is about here For any transit trip, the number of states is finite (i.e. possible locations are finite). The
state OGN is a strong state; that is, any state j (where ≠ OGN) is reachable from state OGN in a finite number of steps. With the existence of one strong state, a Markov Chain with a finite
number of states is guaranteed to be irreducible. Therefore, the underlying Markov process for the transit path choice problem is irreducible. The state OGN is a positive-recurrent state, since after leaving OGN the probability of returning back to OGN in any number of steps is 1. In a Markov Chain with a finite number of states, the existence of one positive-recurrent state guarantees that the Markov Chain is
10
positive-recurrent. Therefore, the underlying Markov process for the transit path choice problem is positive-recurrent. The state OGN has a possible self-loop transition, representing the change in departure time choice from origin. Similarly, the OGS and ONS states have a possible self-loop transition, representing the no-boarding decision. Such states with a possible self-loop transition are considered aperiodic. With the irreducibility property and finite number of states, the existence of one aperiodic state means that the underlying Markov process for the transit path choice problem is aperiodic. The proposed framework, using the mental model structure outlined above, ensures that the Markov Chain representing the transit path choice problem is therefore ergodic (Wahba 2008). The ergodic property in the transportation context means that each possible combination of path choices can be tried by iterative dynamics (i.e. all path options can be tried through making the same trip repeatedly over days). It also means that there exists a unique optimal policy π* that will optimize the return of the transit trip (or minimize the travel cost in this regard) given a value function. Alternatively, given an observed optimal policy π* , a value function can be estimated so as to regenerate the observed optimal policy (or observed choices). If the form of the value function is assumed, then its parameters can be calibrated to maximize the likelihood of reproducing the observed choices. When the objective is to find the probability of choosing action a at state s, Ps (a), Reinforcement Learning (RL) techniques provide a systematic procedure for estimating these probabilities. RL algorithms attempt to find a policy π* : S → A that the agent follows and it gives the probability Ps (a)* . In particular, the Temporal-Difference (TD) Q-Learning algorithm (Watkins 1989) is adopted in this framework. In RL terminology, passengers need to follow a policy that defines the passenger’s behaviour (i.e. trip choices) at a given time. Passengers need a reward function which signals the immediate reward of a specific state-action pair. For instance, the immediate reward of
11
boarding a bus i from a stop j is represented by the experienced waiting time for bus i at stop j. A value function calculates the accumulated reward over time of a specific state-action pair. For example, the value of boarding a bus i from a stop j is calculated as the expected travel time (out-of-vehicle and in-vehicle time) to the destination, starting from this stop and boarding this bus. Figure 2Error! Reference source not found. depicts value function components for choosing to access a stop from origin; the immediate return associated with travel cost to access the stop (e.g. access time, access fare, parking inconvenience, etc.) and future reward associated with travel cost to complete the trip (e.g. waiting time, in-vehicle time, fare, number of transfers, deviation from desired arrival time) after arriving at the stop. The value function ensures that state-action pairs with short-term high reward but with a long-term low value are not preferred. Figure 2 is about here In the transit assignment context, a Q-value represents the state-action utility (or called hereafter the Generalized Cost, GC). The generalized cost for a passenger is given in Eq. 1 and Eq. 2. n
k
p1
g1
ˆ X ˆ X min GC s, a GC s,a p p g g a A{ s}
k GC s , a 1 GC s , a ˆ g X g min GC s, a aA{ s} g 1
(1)
(2)
where -
X g is a passenger-experienced alternative-specific (state-action) attribute k
-
ˆ X g
g
represents the immediate reward (or cost) for the state-action pair.
g1
-
min GC s, a represents long-term expected reward for choosing this state-action aA{s}
pair; is a discount factor, which determines the importance of future rewards. 12
-
represents the learning rate, which determines to what extent the newly acquired experience will overwrite the old experience.
-
A{s} represents the set of attractive actions at state s .
Note that X g (e.g. XWT : waiting time) is a random variable, and its realization at time step t depends on the transit network conditions and choices of other passengers. Therefore, the GC value of aspecific state-action pair is a random variable and the GC function acts in replacement of the unobserved value function. The choice rule follows a mixed {ε − greedy , SoftMax} action-choice model, with a (1– ε)
probability of exploitation, (ε) probability of exploration, and a SoftMax model (Sutton and Barto 1998) such that Eq. 3 is for exploiting and Eq.4 is for exploring.
1 GCs,a min GCs, a a 0 otherwise
Exploiting with probability (1- ε), Ps a Exploring with probability (ε), Ps a
V s , a V s, a
(3)
where V s, a g GC s, a
(4)
aAS
The mixed {ε − greedy, SoftMax} method means that agents behave greedy most of the
time (i.e. they exploit current knowledge to maximize immediate reward) and every once in a
while, with probability (ε), they select an action at random, using the SoftMax rule. The SoftMax method is used to convert action values into action probabilities, by varying the action probabilities as a weighted function of the estimated value. This means that the greedy action is given the highest selection probability, but all other actions are ranked and weighted according to their value estimates.
Learning-based Departure Time and Path Choice Model The underlying hypothesis is that individual passengers are expected to adjust their behaviour (i.e. trip choices) according to their experience with the transit system performance. Individual 13
passengers base their daily travel decisions on the accumulated experience gathered from repetitively traveling through the transit network on consecutive days. Travelers’ behaviour, therefore, is modeled as a dynamic process of repetitively making decisions and updating perceptions, according to a learning process. This decision-making process is based on a mental model of the transit network conditions. The learning and decision-making processes of passengers are assumed in the proposed model to follow Reinforcement Learning (RL) principles for experience updating and choice techniques. The mental model tree-like structure is an efficient representation of the state-action table traditionally developed for Q-learning problems (see Fig. 3Error! Reference source not found.). The probability of deciding on action a at state s is based on the accumulated experience, captured by the GC value. The GC(s,a) is updated every time state s is visited and action a is chosen. The GC(s,a) has two components: an immediate reward for choosing action a at state s, and an estimated cumulative future reward for this specific state-action pair. This GC value represents the passenger’s experience with the transit network conditions. Figure 3 is about here Previous studies refer to the path choice problem as the decision passengers make to board a vehicle departing from a stop, at a specific point in time, which is in a set of attractive paths serving the origin and destination for that passenger – see Hickman and Bernstein (1997) for definitions of the static and dynamic path choice problems. Why a passenger is at this stop (i.e. origin stop choice) at this point in time (i.e. departure time choice) is not addressed adequately in the literature as part of the path choice problem. Often, the travel time on paths including a transfer depends on when the passenger begins the trip, as paths which are attractive at one time may be less so later if there is a smaller probability of making particular connections. The proposed learning-based choice model considers the departure time choice, the stop choice and the run (or sequence of runs) choice.
14
Pre-Trip Choice Modeling The departure time and origin stop choices are assumed to be at-home choices (i.e. pre-trip), in which passenger agents consider available information obtained from previous trips, in addition to pre-trip information provision (if any). Once a passenger agent arrives at a stop, the bus run choice is considered an adaptive choice, in which, besides previous information, the passenger considers developments that occur during the trip and additional en-route information. At the origin, a passenger develops an initial (tentative) travel plan, based on his updated mental model (historical experience and pre-trip information, if provided). This travel plan includes a departure time, an origin stop (which are fixed) and a (tentative) route (or sequence of routes). The initial plan reflects the passenger’s preferences and expectations, and he would follow it if reality (en-route dynamics) matches to a great extent his expectations (which may be different from published static schedule information).
Departure Time Choice At origin (state O), on iteration d, the departure time choice is modeled according to the mixed {ε − greedy, SoftMax} action choice rule outlined in the previous section. Being at state O, a
specific departure time choice signals an immediate reward and a future return for a passenger agent
which can be written as, on iteration d: d 1 d d d 1 t GCZ O,t 1 GCZ O,t Z MIN GCZ T,g g A Z T
(5)
d Zt tZ : where t : O T , and
- tZ is a variable representing the utility of leaving origin at time t (e.g. auto availability,
value). This is activity-schedule fitting an input to the transit path choice model from
activity-based models, if available.
15
-
d d t MIN GC T,g is the actual travel cost, which is not known on day d before Z g A Z T Z
trip starts.
After completing the trip with departure time t, the GC value for agent Z at iteration d − 1
is updated to the GC value at iteration d, reflecting the passenger agent experience with this
state-action pair. It is important to note that the choice at iteration d is based partly on the most updated experience from iteration d − 1. Without information provision, passengers have no means of knowing the actual travel cost of their trip on iteration d before starting the trip,
however they utilize what they have leaned to provide an estimate of travel cost. If the estimated value GC value at iteration d proves to be close to the actual travel cost, then the associated choice is reinforced in future iterations. When a passenger agent is at state t: O → T (i.e. after deciding on a departure time,
state T), the origin stop choice follows a similar procedure to the mixed {ε − greedy, SoftMax} action choice model, where Eq. 6 is for exploring alternatives and Eq. 7 is for exploiting past experience.
d 1 f GC T,g d PT g d 1 f GC T,i iA T d 1 d 1 d 1 iff GC(T,g) MIN GCT,i PT g iA T 0 Otherwise
(6)
(7)
Origin Stop Choice The origin stop choice has an immediate cost (e.g. access time) and a future value represented by the travel cost associated with possible attractive route choice set from the origin stop. The
16
generalized cost for an origin stop (state G), for a passenger agent on iteration d, is updated as follows: d 1 d d d 1 g GCT,g 1 GCT,g MIN GCS,r r A(S ).s g
d k g where g : T S , and r X r : r1
-
(8)
Xr represents the immediate travel cost associated with stop g; for example, access
time and access cost to travel to stop g. The modeling of access mode choice may be introduced at this level by varying Xr as a function of the access mode.
-
r can be an alternative-specific parameter; for example, access time for train stations is perceived differently from bus stops.
r may also be a general parameter for all
alternatives, such as the fare parameter that reflects the perceived monetary cost associated with a specific choice g.
It is worth mentioning that the estimated travel cost of choosing stop g is associated with a specific departure time t. This permits the representation of different transit network conditions at different times during the modeling period.
En-Route Choice Modeling Route / Run Choice When a passenger arrives at a stop s g,n (g: origin stop, n: transfer stop), an initial route choice is made. This route choice follows the mixed {ε − greedy, SoftMax} action choice model. The value of a run from route r choice at a stop s is updated based on the following: d 1 d d d 1 r GCS,r 1 GCS,r MIN GCV, f f A(V )
(9)
17
d k r where r : S V , and j X j : j1
-
X j represents immediate travel cost associated with a run from route r, for example,
waiting time for run r (of route R), crowding level of run r, etc.
-
j can be an alternative-specific parameter; for example, waiting time for trains is perceived differently from waiting time for buses.
-
d d r MIN GC V , f is the service performance of route r on iteration d, unknown f A(V )
before making a route / run choice. This decision-making and experience updating mechanism reflects previous experiences at iteration d − 1 of the chosen route and does not take into consideration the within-day dynamics
of the transit service on iteration d. In the proposed framework, the choice-rule is therefore used in conjunction with a number of SWITCH choice rules: ‘SWITCH AND WAIT’, ‘SWITCH AND BOARD’, ‘SWITCH AND STAY’, and ‘SWITCH AND ALIGHT’. The first two choice rules are concerned with the adaptive behaviour when passengers are at stops, while the latter two are used to model the adaptive behaviour when passengers are on-board of a transit vehicle. For a passenger waiting at stop s g,n, there is an initial route choice r. When a transit vehicle from the initially chosen route r arrives at stop s, the passenger’s decision making
WAIT’ choice rule. On iteration d, if the transit service, process follows the ‘SWITCH AND considering within-day dynamics, of route r performs in accordance with or better than the passenger’s expected travel cost for boarding route r from stop s, then the passenger will choose to board the initially chosen route-run r and this choice will be enforced. Without provision of information, a passenger waiting at stop s when a vehicle arrives from the initially chosen route r has no knowledge of the actual performance of other attractive
18
routes r As on iteration d except through previous experiences, summarized up to iteration d − 1. Therefore, when service performs according to (or better than) expectations, there is no need to re-evaluate the initial choice, as the experiences for all attractive routes were considered in making the initial route choice. For a transit vehicle arriving at a stop s from the initially chosen route r for a passenger, when within-day dynamics result in transit service for route r performing not in accordance with (i.e. worse than) passenger’s expectations, a reconsideration of the initial route choice is warranted. In this case, the passenger’s expectation of the service performance of route r is d 1 d r updated and the route choice mechanism is applied with MIN GC V, f as a f A(V )
d 1 replacement of GCS,r for route r. Without information provision, the passenger’s expectations of the service performance of all other attractive routes remain a function of previously
d 1 accumulated experiences, GCS,i,i r . If the initial route choice still outperforms other
attractive routes (based on experience summarized up to iteration d − 1 for other route options),
the ‘SWITCH AND WAIT’ choice rule will result in passenger choosing the initial route choice. A change in route choice using the ‘SWITCH AND WAIT’ choice rule is observed if the within-day dynamics of route r has resulted in another attractive route r outperforming route r. The passenger then waits (and hence the name of the routine) for a run from route r to arrive.
Regardless of the output of the ‘SWITCH AND WAIT’ choice rule, the passenger’s accumulated d d r has been experience with route r at stops s, GCS,r, is updated based on Eq. 9 since already experienced.
For a passenger waiting at stop s g,n with an initial route choice r As, the SWITCH AND BOARD choice rule is applied when a transit vehicle arrives from route r As ,
19
that
is
not
the
initially
chosen
route
r.
The
updated
travel
cost
for
route r ,
d 1 d r MIN GCV, f , is calculated. If its updated travel cost matches the expected value, f A(V )
d 1 GCS, r, then there is no need to reconsider the initial route choice r. If the updated travel cost for route r is improved, then the initial route choice r needs reconsideration. In this situation,
d 1 d d 1 r MIN GC V, f the updated travel cost for route r is compared only to GC S,r , f A(V )
without considering other attractive routes since without information provision the passenger’s
d 1
expectations regarding their performance ( GCS,i,i r, r) remains the same, and the performance of other routes was estimated to be inferior to the initial route choice r. If the
updated travel cost of route r becomes more attractive, then the ‘SWITCH AND BOARD’ choice rule results in the passenger choosing to board route r . Otherwise, the passenger continues to wait for the arrival of route r (the initial route choice).
Transfer Choice When a passenger is on board of a transit vehicle V of route r, there are two possibilities. First, this route r reaches the trip destination and no more travel choices are required (except egress mode choice, if any). In this case, the choice set, AV , has only one element, des, representing the destination stop associated with route r. Based on the mental model structure,
with each route. The immediate travel cost of there can be only one destination stop associated deciding on alighting at a destination stop can be thought of as, for example, the in-vehicle time and fare to reach the destination stop. There is also a future travel cost associated with this decision; this cost includes egress time and egress monetary cost. When the passenger arrives
20
at destination, the schedule delay is calculated representing the deviation from the scheduled arrival time. Since no more trip decisions are needed at destination, the destination location represents the terminating-state of the underlying stochastic process. At this state, the recursive calculations terminate and are propagated backwards to update expectations for the departure time choice made at the beginning of the trip. The GCV,des is updated based on the following equation: d d d d 1 des GC V , des 1 GC V , des des
(10)
d k d k des des where des : V D , j X j , and a X a : j1 a1 -
X j represents immediate travel cost associated with the destination stop, for example, in-vehicle time to reach fare to reach the destination stop. the destination stop,
-
j can be an alternative-specific parameter; for example, in-vehicle time for subways is perceived differently from in-vehicle time for buses.
-
X a represents future travel cost associated with the destination stop, for example,
egress time and egress monetary cost to reach the trip destination from the destination
stop. It also represents the schedule delay associated with trip choices. -
a reflects the various perceptions with regard to future travel cost components. For
instance, the early/late schedule delay values.
It is worth mentioning that the destination stop choice
des
is linked to a route r. The route
choice is associated with a stop choice s, which is for a departure time t. This means that the experience for path choices is relevant to the departure time choice. It is not uncommon that
21
passengers make different trip decisions for different origin departure time choices. The transit service performance is observed to be time-dependent and passengers have time-dependent expectations about path choices, hence the departure time choice. In this formulation, schedule delay is dependent on both the departure time and trip choices. This can be noted from Eq. 5; schedule delay does not directly appear in the travel cost calculation for departure time choice d 1 MIN GC T,g and is implicitly included through the term by recursive computations. Z g AZ T
For on-board passengers, a transfer connection may be needed to reach the trip destination. In such situations, a passenger decides on an initial off-stop choice upon boarding the transit vehicle V of route r following the same mixed {ε − greedy, SoftMax} action-choice model. The GC value for the off-stop choice is updated as follows:
d 1 d d d 1 f GC V , f 1 GC V , f MIN GC F , n nA( f )
(11)
d k f where f :V F , and j X j : j1
-
X j represents immediate travel cost associated with off-stop f: for example, in-vehicle
time to each off-stop f, number of transfers to reach destination from off-stop f, etc.
-
j can be an alternative-specific parameter (in-vehicle time for subways is perceived differently from in-vehicle time for buses) or a general-parameter (transfer penalty).
This initial off-stop choice is based on previous experiences’ generalized cost, and does not reflect within-day dynamics of the transit service. The ‘SWITCH AND STAY’ and ‘SWITCH AND ALIGHT’ choice rules capture on-board passengers’ adaptive behaviour in response to service dynamics. When the transit vehicle arrives at the initially chosen off-stop f, and when the current experience for the chosen off-stop matches expectations, the passenger choice is to alight at 22
the initially chosen off-stop f according to the ‘SWITCH AND STAY’ choice rule. In other situations, passengers may experience an unexpected travel cost associated with the initial offstop choice f (e.g. an unexpected delay). This warrants the re-evaluation of the initial off-stop choice since it may not be the optimal choice anymore when compared with other attractive offstop options. When the transit vehicle arrives at a feasible off-stop f AV , f f (where f is the passenger’s initial off-stop choice), the ‘SWITCH AND ALIGHT’ choice rule is applied and a
to service dynamics (either by alighting passenger adjusts en-route transfer choices according at the current stop f ′ or staying on-board for the arrival at the initially chosen off-stop f).
Regardless of the decisions made according to the ‘SWITCH AND STAY’ (‘SWITCH
AND ALIGHT’) choice rule, the passenger’s accumulated experience with off-stop f (off-stop d d d f f ), GC V , f ( GC V , f ), is updated based on Eq. 11 since has been already
experienced.
When a passenger alights at an off-stop f A(V ) , a decision concerning the transfer onstop, n AF , is made. This decision is based on the expected travel cost associated with each possible transfer on-stop. These expectations reflect previous experiences and choices
are made following a similar decision-making behaviour outlined for previous choices. When a passenger arrives at a transfer on-stop n, an initial route choice is made. The boarding decision is dependent on the within-day dynamics of the transit service as explained before. There could be an immediate penalty associated with the choice of a transfer on-stop n such as inconvenience due to walking or crossing multiple intersections to reach the on-stop n. There is also a future travel cost related to the transfer on-stop choice expressed by the expected utility of attractive routes r AS ,s n . The travel cost associated with a transfer onstop n choice is updated as:
23
d 1 d d d 1 n GCF,n 1 GCF,n MIN GCS,r r A(S ).s n
d k n where n : F S , and j X j : r1
-
X j represents the immediate travel cost associated with an on-stop choice n; for
example, transfer walk time
-
(12)
j reflects the passenger’s perception of travel cost associated with on-stop choices, e.g. transfer penalty
In summary, the proposed model assumes individual passengers as decision makers,
who choose a departure time, an origin stop, a destination stop and a route (or a sequence of routes) between a given pair of an origin and a destination each day. This decision-making behaviour, as explained, consists of a two-step process: making choices and updating perceptions. When information on service performance (via real-time information provision system or information sharing with other users) is not provided, the step of making choices precedes perception updating. Perception updating occurs only when the state-action pair is visited. Pre-trip decisions are treated as fixed choices, while en-route choices are adaptive.
Notes on the Proposed Model A Note on the Modeling of Information Provision The benefits of Advanced Traveler Information Systems (ATIS) applications can be assessed by comparing the path choice behaviour and time savings of informed passengers vs. noninformed ones. However, conventional transit assignment models (e.g. strategy-based models) assume that passengers have full information about the network conditions and infinite information processing capabilities; this is referred to as the “perfect knowledge of network” assumption. These models are not appropriate for evaluating information provision policies 24
since information regarding network conditions would be assumed to be available to all passengers. The emergence and increased deployment of ATIS make it practically important to relax the assumption of perfect information or perfect knowledge in transit assignment studies. The system performance on iteration d is dependent on choices of other passengers; these choices are not known, to any passenger, apriori. In the proposed model and with no provision of real-time information, passengers do not have perfect knowledge about the system performance on iteration d. Instead, passengers form their expectations about the performance of the transit service based on accumulated experiences summarized up to iteration d − 1 (see Fig. 2Error! Reference source not found.).
Under information provision, perception update precedes decision making (see Fig. 2Error! Reference source not found.). That is, ‘mental model’ content is updated before d a making choices: can now be obtained for all actions a AS and PS a is computed as d f GC s, a , compared to PS a
d 1 f GC s,awhen real-time information is not provided.
This is also reflected in the SWITCH choice rules calculations. For example, the ‘SWITCH AND d 1 d d 1 r WAIT’ choice rule compares MIN GC V, f and GCS,i,i r , when information f A(V )
d 1 is not provided. expectations for the unknown GCS,i,i r represents passengers’ d performance of other attractive routes, in iteration d. With information provision, i ,i r is now
obtainable (see Fig. 4Error! Reference source not found.).
Figure 4 is about here Based on the above, information provision will become effective when: 25
great variability in service performance exists and average GC values are not representative of state-action pair values,
non-recurrent congestion occurs – due for example to traffic incidents – where d 1 GC S,i ,i r are not representative of the within-day network dynamics in iteration d,
and
non-familiar travelers’ expectations are not consistent with the network performance.
The proposed framework allows for the evaluation of
the impact of pre-trip and en-route
information provision by comparing passengers’ choice profiles (trip choices and usage of SWITCH choice rules) in the case of absence of information and the case of information provision.
A Note on the Classification of the Proposed Choice Model The decision rule (or choice rule) in the proposed model is stochastic (mixed {ε − greedy, SoftMax} action choice model), allowing exploitation and exploration of individual’s experience.
The utility (or generalized travel cost) term is formed by the passenger, and it has no random error component (see Eq. 2). Therefore, the proposed model can be classified as a bounded rational model, with a constant utility term and a stochastic decision rule. On the other hand, random utility models are characterised by a deterministic decision rule and a stochastic utility term. When the decision making process of transit travelers (see Fig. 3Error! Reference source not found.) is modeled as a nested-logit model (or a cross-nested logit model), the parameter estimation process in the decision-making tree is performed from the bottom nest all the way up to the root. At the run choice-nest, for example, the parameter estimation procedure forgets (or does not consider) upper level choice-nests (e.g. stop choice-nest) and takes into consideration further nests, assuming that information needed to make the run choice is 26
preserved at the run choice-nest level. This in many ways is analogous to the memoryless property of the MDP.
Framework Applications Wahba and Shalaby (Error! Reference source not found.) report on a full-scale implementation of MILATRAS, including the proposed departure time and path choice model of this paper, using the Toronto Transit Commission (TTC) system as a case study. It presents a large-scale real-world application; the TTC AM network has 500 branches with more than 10,000 stops and about 332,000 passenger-agents. The choice model parameters were calibrated such that the entropy of the simulated route loads was optimized with reference to the observed route loads, and validated with individual choices. The modeled route loads, based on the calibrated parameters, greatly approximate the distribution underlying the observed loads. In this application, transit passengers were assumed to plan their transit trip based on their experience with the transportation network; with no prior (or perfect) knowledge of service performance. Wang et. al (2010) compared the proposed framework, MILATRAS, with two conventional transit assignment approaches, namely EMME/2 Transit Assignment Module and MADITUC. MILATRAS was shown to perform comparatively and to produce base case steadystate run loads and stop loading profiles. MILATRAS, however, presents a policy-sensitive platform for modeling and evaluating transit-ITS deployments; a real challenge to conventional tools. The proposed framework was used to investigate the impact of various scenarios of information provision on transit riders departure time and path choices, and network performance. The results show that, for a medium-size transit network with low to medium frequency services, the stop and departure time choices seem to be more important than the run choice, as they significantly affect the trip time. 27
Another area of application is emissions modeling. The proposed framework, MILATRAS, was integrated within a multi-modal framework to evaluate transit emissions on a link-based mesoscopic level for the TTC network using time-dependent network loading profiles (Lau at el. 2010).
Another study (Kucirek 2012) investigated fare-based assignment for inter-municipal, cross-regional trips using three different operators (with different fare) using the proposed framework and comparing assignment results to surveyed trip-record data and results from a fare-based travel demand forecasting model implemented in EMME2. The results show that the conventional approach over-predicts number of transfer between operators, while MILATRAS is capable of accurately estimating transfers between operators and capturing fare-interactions between local (e.g. Mississauga Transit) and premium (e.g. GO Transit) transit services.
Future Research A mode choice component will be integrated into the overall modeling framework, using the learning-based approach. Future efforts will be directed to the modeling of passengers’ travel choices in a multimodal network, incorporating the access and egress mode choices. Also, the growing trend in using smart cards (e.g. Presto Card in the Greater Toronto Area) provides a rich source of data on transit departure time and path choices in particular and multimodal trips in general. Such data sources will present great opportunities for calibrating and validating dynamic transit path choice models.
Acknowledgment This research was supported by the Natural Sciences and Engineering Research Council (NSERC) and the Ontario Graduate Scholarship (OGS) Program.
28
References Arentze TA, Timmermans HJP (2003) Modeling learning and adaptation processes in activitytravel choice: A framework and numerical experiments. Transportation, 30:37-62. doi:10.1023/A:1021290725727 Cantarella GE, Cascetta E (1995) Dynamic Processes and Equilibrium in Transportation Networks: Towards a Unifying Theory. Transportation Science 29:305-329. doi:10.1287/trsc.29.4.305 Cascetta E, Cantarella GE (1991) A day-to-day and within-day dynamic stochastic assignment model. Transportation Research Part A 25:277-291. doi:10.1016/0191-2607(91)90144-F Davis GA, Nihan NL (1993) Large population approximations of a general stochastic traffic assignment model. Operations Research 41:169–178. doi:10.1287/opre.41.1.169 de Palma A, Marchal F (2002) Real Cases Applications of the Fully Dynamic METROPOLIS Tool-Box: An Advocacy for Large-Scale Mesoscopic Transportation Systems. Networks and Spatial Economics 2:347-369. doi:10.1023/A:1020847511499 Ettema D, Tamminga G, Timmermans HJP, Arentze T (2005) A micro-simulation model system of departure time using a perception updating model under travel time uncertainty. Transportation Research Part A 39:325-344. doi:10/1016/j.tra.2004.12.002 Hazelton M, Watling D (2004) Computation of equilibrium distributions of Markov traffic assignment models. Transportation Science 38:331-342. doi:10.1287/trsc.l030.0052 Hickman M, Berstein D (1997) Transit Service and Path Choice Models in Stochastic and TimeDependent Networks. Transportation Science 31:129-146. doi:10.1287/trsc.31.2.129 Kucirek P, Wahba M, Miller EJ (2012) Fare-based Transit Assignment Models: Comparing the Accuracy of Strategy-based Aggregate Model EMME, Against Microsimulation Model MILATRAS, using Trips Crossing the Mississauga-Toronto Boundary. University of Toronto DMG Research Reports. http://www.dmg.utoronto.ca/pdf/reports/research/kucirekp_gta_transitfares.pdf. Lau J, Hatzopoulou M, Wahba M, Miller EJ (2010) An integrated multi-model evaluation of transit bus emissions in Toronto. Journal of Transportation Research Record 2216:1-9. doi:10.3141/2216-01 Nakayama S, Kitamura R, Fuji S (1999) Drivers’ Learning and Network Behaviour: A Dynamic Analysis of the Driver-Network System as a Complex System. Transportation Research Record 1676:30-36. doi:10.3141/1676-04 Nuzzolo A, Russo F, Crisalli U (2001) A Doubly Dynamic Schedule-based Assignment Model for Transit Networks. Transportation Science 35(3):268-285. doi:10.1287/trsc.35.3.268.10149 29
Nuzzolo A, Russo F, Crisalli U (2003) Transit networking Modelling: the schedule-based approach dynamic approach. FrancoAngeli, Milan Petersen K (1990) Ergodic Theory, 1st edn. Cambridge University Press. Cambridge Roorda MJ, Miller EJ (2006) Assessing Transportation Policy Using an Activity-Based Microsimulation Model of Travel Demand. ITE Journal 76:16-21 Rossetti RJF, Liu R (2005) A multi-agent approach to assess drivers responses to pre-trip information systems. Journal of Intelligent Transportation Systems 9:1-10. doi:10.1080/15472450590912529 Russell S (1998) Learning agents for uncertain environments (extended abstract). COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory. doi:10.1145/279943.279964 Salvini P, Miller EJ (2003) ILUTE: An Operational Prototype of a Comprehensive Microsimulation Model of Urban Systems. Networks and Spatial Economics 5:217-234. doi:10.1007/s11067-005-2630-5 Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction. The MIT Press, Cambridge Timmermans HJP, Arentze T, Ettema D (2003) Learning and adaptation behaviour: empirical evidence and modeling issues. Proceedings ITS Conference, Eindhoven Wahba M (2008) MILATRAS: MIcrosimulation Learning-based Approach to Transit Assignment. Dissertation, University of Toronto Wahba M, Shalaby A (2009) MILATRAS: a new modeling framework for the transit assignment problem. In: Wilson NHM, Nuzzolo A (ed) Schedule-based Modeling of Transportation Networks: theory and applications. Springer, New York, pp171-194 Wahba M, Shalaby A (2011) Large-scale application of MILATRAS: case study of the Toronto transit network. Transportation 38:889-908. doi:10.1007/s11116-011-9358-5 Wang J, Wahba M, Miller EJ (2010) Comparison of Agent-Based Transit Assignment Procedure with Conventional Approaches. Journal of Transportation Research Record 2175:47-56. doi:10.3141/2175-06 Watkins CJCH (1989) Learning from Delayed Rewards. Dissertation, University of Cambridge Watling D (1996) Asymmetric problems and stochastic process models of traffic assignment. Transportation Research Part B 30:339-357. doi:10.1016/0191-2615(96)00006-9
30
Fig. 1 Schematic representation of the state transitions for the transit path choice problem
31
Fig. 2 Value function components for an origin stop choice
32
Fig. 3 The mental model structure with different types of choices
33
d 1 GC S , i
d PS a
d 1 d f GC S , a ` , GC S , a
d 1 d d a GC S , a EX MIN GC S `, a `
(a)
Traveler Behaviour - without information provision
d 1 GC S , i
|| d 1 d d d GC S , i IPi MIN GC S `, a `
d d PS a f GC S , i
|| d 1 d d d a GC S , a EX MIN GC S `, a `
d d a a f ( IP , EX )
(b)
Traveler Behaviour – with information provision
Fig. 4 Modeling of Traveler Behaviour (a) without information provision and (b) with information provision
34