Jun 30, 2006 - buy tickets well in advance (21 days) will get the tickets for a lower price ..... for remaining seat capacities to reduce the modeling domain and.
A Statistical Modeling Approach to Airline Revenue Management Sheela Siddappa1 , Dirk G¨unther2 , Jay M. Rosenberger1 , Victoria C. P. Chen1 , 1
Department of Industrial and Manufacturing Systems Engineering The University of Texas at Arlington Campus Box 19017 Arlington, TX 76019 USA
2
Sabre Research Group Bebelstrasse 15 58453 Witten GERMANY June 30, 2006
1
Abstract Revenue management (RM) aims to maximize a company’s revenue by allocating the right seat to the right customer. In this paper, we present an approach based on a Markov decision process (MDP) formulation. Our approach involves an off-line phase that derives a policy for accepting/rejecting customer booking requests, and an on-line phase that conducts the actual decisions as the booking requests arrive. To enable a computationally-tractable solution method, the off-line phase consists of three components: (1) identification of realistic ranges of remaining seat capacity at different points in time, (2) solutions to deterministic and stochastic linear programming problems that provide upper and lower bounds, respectively, on the MDP value function, and (3) estimation of the upper and lower bound value functions using statistical modeling. This value function approximation is then used to determine the RM accept/reject policy. Prior versions of this statistical modeling approach have employed remaining seat capacity ranges from zero to the capacity of the aircraft. In reality, actual remaining capacities are near capacity when the booking process begins and near zero when the flights depart. Thus, our modified version uses realistic ranges to enable a more accurate statistical model, leading to a better RM policy.
2
Before deregulation in 1979, airlines were managed by the Civil Aeronautics Board (Bailey et al. 1985), which dictated the routes to be flown and the fares to be charged to the customers. The costs were passed on to the passengers with guaranteed profit levels to the airlines. Carriers simply accepted the passengers on a first-come-first-serve basis. There were limited booking classes, and there was little control over revenues. After deregulation, there was tremendous growth in the number of certified airlines and high pressure on pricing. Airline carriers began to explore ways to compete effectively, and different approaches in revenue management (RM) evolved. Since then, airlines have expanded their efforts in RM to increase their revenue. Revenue management, also known as yield management, is defined as, “Selling the right seat at the right time to the right passenger for the right price” (Ben 1995). RM is applied in various transportation sectors, such as auto rentals, ferries, rail, tour operators, cargo, and cruises. Other areas, like hotel/resorts, extended stay hotel, health care, manufacturing apparel, and companies that produce perishable goods etc., also use RM (Bodily and Pfeifer 1973). Airlines have developed a complex and diverse fare structure. They offer a variety of fares to meet different classes of customers. Airlines use restrictions to establish different classes of services. For example, customers who buy tickets well in advance (21 days) will get the tickets for a lower price than customers who buy the ticket on the day of departure. Seats on the same flight are sold at different fares to different customers. There is competition among the airline carriers to expand and explore RM to improve their revenue. American Airlines, for example, reported an increase in revenue of 5% due to improved RM methods in 1992, which translated to $1.4 billion over a 3-year period (Smith et al. 1992).
1
Revenue Management Process Overview
A leg is a flight that travels non-stop from an origin to a destination. In an airline reservation system, customers request a particular itinerary, which consists of one or more legs. The booking process typically starts three months prior to the date of departure. A customer makes a booking request by bidding a price for a desired itinerary. Once the request is placed, an airline representative uses a computer reservation system to decide if the request is to be accepted or rejected. The customer’s requested price is compared with a threshold calculated by the airline that represents the fair market value. If the customer’s bid is higher than the airline’s threshold value, then the request is accepted; otherwise, the request is rejected. Usually airlines update their prices at certain specific dates during the booking process called 3
reading dates. These dates get closer to each other as the day of departure gets closer (see Figure: 1). Booking requests rejected for any demand class due to unavailability or filled seats are called spilled demand. Figure 1 about here.
1.1
Problem Definition
Airline RM deals with managing the inventory of the demand classes offered on each itinerary, so as to maximize revenue. In this paper, we concentrate on the seat inventory control problem, also called the yield management problem. Given the flight capacities and schedule, can we accept the request placed by a customer at time τ during the booking process? Our approach seeks to achieve a better RM policy by using statistical methods to approximate the value functions of a Markov decision process (MDP). The value function approximation is conducted off-line and makes use of deterministic and stochastic linear programming problems from the RM literature. In the next section, we provide the relevant background on airline RM. The statistical modeling approaches are described in Sections 3 and 4. Our computational results on a real airline hub are presented in Section 5, and concluding remarks are given in Section 6.
2
RM Methodology
The research in this paper is based on the statistical modeling approach of Chen et al. (2003) and G¨unther (1998). They formulated the RM model as an MDP, similar to that of Lautenbacher and Stidham (1999). Traditionally MDP has been solved using Stochastic Dynamic Programming (SDP), which provides a superior RM policy, but SDP is computationally intensive. Hence, Chen et al. (2003) developed a new statistical modeling approach, motivated by the Orthogonal Array (OA) and Multivariate Adaptive Regression Splines (MARS) SDP method of Chen et al. (1999), to estimate upper and lower bounds of the MDP value functions. The bounds are estimated at reading dates, these dates divide the booking period into smaller intervals called reading periods.
4
2.1
RM Literature and Background
Littlewood (1972) was the first to address the RM problem of computing booking limits for a single leg with two demand classes. His rule: “Sell the discount seats as long as the revenue from the low fare passengers is greater than or equal to the product of marginal revenue from full fare and probability that full fare demand does not exceed the remaining capacity.” Belobaba (1987) extended this rule to multiple demand classes. He introduced the term EMSR (Expected Marginal Seat Revenue). The EMSR method produces optimal booking limits for the two-demand class problem, and it is easy to implement.
2.1.1
RM as an MDP
In the airline booking process, the decision to accept or reject a current booking request (BR) depends on the remaining seat capacity, the time the request was placed, the itinerary and demand class requested, and other characteristics of the current request, but it does not depend on decisions made about previous booking requests. Hence, RM can be classified as an MDP (Lautenbacher and Stidham 1999). The MDP formulation for the RM problem divides the three-month booking period into tMDP time intervals, with at most one booking request per interval. These intervals are indexed in decreasing order, i = tMDP , . . . , 1, 0, where i = 1 denotes the first interval immediately preceding departure, and i = 0 is at departure. Each reading period can have multiple booking requests while each MDP interval can have at most one booking request. The state vector xi holds the remaining leg capacities at the beginning of time interval i. Let pfi (g) denote the probability that a request for g seats, for demand class f occurs in time interval i; pi (0) denotes the probability of no booking requests in time interval i; and Gf is the maximum size of a group request for demand class f . Suppose the booking process is in state x at the beginning of time interval i. If a booking request for g seats that arrives during time interval i is accepted, then a new state x0 is reached at the beginning of time interval i − 1, where x0 subtracts g seats from the legs involved in the requested itinerary and is greater than or equal to zero. Let Fi (x), for x ≥ 0, denote the optimal value function, the maximum expected revenue collected over time intervals i through departure when the system is at state x at the beginning of time interval i. Then F0 (x) = 0 for all x. Thus, the MDP
5
value functions can be written as:
Fi (x) =
Gf m X X f =1 g=1
pfi (g)
max{grf + Fi−1 (x0 ), Fi−1 (x)}, if (x0 ≥ 0), Fi−1 (x),
otherwise.
The fair market value (FMV) of a group of requested seats is defined as the difference in the value function of rejecting the request versus accepting the request: FMV = Fi−1 (x) − Fi−1 (x0 ), where x0 ≥ 0.
2.1.2
Dynamic Programming Approach
Ladany and Bedi (1977) and Hersh and Ladany (1978) developed dynamic programming formulations to allocate seats for two flight legs. They discuss overbooking and cancellations for flights with one intermediate stop. Ladany and Bedi (1977) simplified the approach by removing all conditioning on current bookings. Rothstein (1971) formulated the RM problem as a nonhomogeneous Markovian sequential decision process considering overbooking. Lee and Hersh (1993) developed a discrete-time dynamic model to find an optimal booking policy. Their analysis showed that for problems with more than two booking classes and no multiple seat bookings, the optimal booking policy can be reduced to two sets of critical values: (1) booking capacity and (2) decision periods. Lautenbacher and Stidham (1999) solved the single-leg problem without overbooking using a discrete-time MDP. They link the dynamic customers of different demand classes book at the same time and static demand for different demand classes arrives separately in a predetermined order through a dynamic program common to both. Subramanian et al. (1999) took overbooking, cancellations and no shows into consideration while solving for the seat allocation problem using the MDP for a single flight leg with multiple demand classes. They showed that: (1) booking limits need not be monotonic in the time remaining until departure, (2) it would be optimal to accept a low-demand class and reject a high demand class passenger because of differing cancellation refunds, and (3) the optimal policy depends upon both the total capacity and the remaining capacity of the flight. Zhang and Cooper (2005) formulated a simultaneous seat-inventory control problem of a set of parallel flights between a common origin and destination with dynamic customer choice among the flights as an extension of the classic multiperiod, single-flight ”block demand” revenue management model. They proposed a simulation-based techniques for solving the stochastic optimization problem.
6
2.1.3
Bid Price Approach
Bid pricing is practiced by most of the airlines. In the bid price approach, a threshold or bid price is assigned to each flight leg. If a customer’s booking request is greater than or equal to the sum of the bid prices along the desired itinerary, then the request is accepted; otherwise it is rejected; see Figure 2. Consider the following example: Suppose a customer bids $1220 for the itinerary he wishes to travel. Table 1 gives the threshold values set by the airlines for each of the legs on the itinerary. Since the total price bid by the customer ($1220) is greater than the sum of the bid prices for the legs $1109, the request is accepted. Figure 2 about here. Table 1 about here. Bid pricing is easy to implement and requires storage of only a single bid price for each flight leg. It gives a nested itinerary and demand class specific control policy, and it is easy to manage the inventory. Despite all these advantages it has the disadvantage that it is difficult to estimate/determine good bid prices. Frequent revisions are required with re-optimization and re-forecasting. Some of the methods used to estimate bid prices are discussed in Sections 2.1.4 and 2.1.5.
2.1.4
Deterministic Bid Price Approach
Define the following notation: • T = the total number of reading dates, indexed by t, where t = T represents the first reading date. • r = the vector of fares associated with each demand class. • u = a seat allocation decision vector for all demand classes. • xt = a vector of remaining seat capacities at reading date t. • dt = a vector of remaining demand at reading date t. • A = a 0-1 itinerary-leg matrix, with one if that itinerary includes the leg.
7
Given the number and position of reading dates, the flight schedule and capacities, a deterministic (DET) linear programming problem is solved at reading date t to obtain the seat allocations aggregated over reading dates t through departure. It is modeled as below. (DET) max ru
(1)
s.t. Au ≤ xt
(2)
0 ≤u ≤ E[dt ].
(3)
The dual of this problem will provide bid prices for each flight leg. Gallego and Van Ryzin (1994) used a network model to compute bid prices. Every time a request is accepted, remaining seat capacity is updated, and at reading dates t < T updated capacity values are used to solve (DET) to generate new bid prices. Results show that revenue increases by increasing the number of the reading dates. The higher the number of reading dates, the higher the accuracy of the bid prices, and hence, more revenue can be captured. However, more reading dates requires more computation.
2.1.5
Stochastic Bid Price Approach
In addition to the notation above, define the following: • uf t = a seat allocation decision vector for demand class f at reading date t, where ut is the corresponding vector for all demand classes. • df t = a random variable of the demand for demand class f on reading date t. A stochastic model, also called the probabilistic nonlinear programming model (PNLP) is considered when demand is a random variable. This model is also referred to as the stochastic (STOCH) network model and is as given below. (STOCH) max
t X m X
rf E[min(df τ , uf τ )]
(4)
τ =1 f =1
s.t. A
t X
! ≤ xt
(5)
uf t ≥ 0.
(6)
uτ
τ =1
Again, the dual will provide bid prices for each flight leg. Talluri and Van Ryzin (1999) analyzed a randomized version of deterministic linear programming to compute network bid prices. Their method is more difficult to implement than the (DET) method. It consists of simulating 8
the itinerary demand with a sequence of realizations, and solving (DET) to allocate capacities to itineraries for each realization. The dual prices from the sequence are averaged to form a bid price approximation. Hersh and Ladany (1978) presented a two-stage stochastic programming model to overcome the shortcomings of the (DET) and (STOCH) models. The first stage allocates capacity to all the demand classes, and the second stage models capacity utilization. Their simulation results show that this provides better revenue improvements than a linear programming approach. They also prove that their approach is prone to less error than those resulting from the linear programming method.
2.1.6
Hybrid Approach
Curry (1990) combined both the EMSR and mathematical programming approaches. The EMSR approach accounts for computerized reservation system nesting, but only controls seat inventory, by controlling leg bookings. Mathematical programming handles realistically large problems and accounts for multiple origin-destination (OD) itineraries and side constraints. Curry developed equations to solve the RM problem, when demand classes are nested on an OD itinerary, and inventory is not shared among the ODs. Cooper and Homem-de Mello (2006) studied policies that combine both mathematical programming and MDP methods. They employed a simple allocation policy when far from the time of departure and developed a detailed decision rule close to departure. They used sampling-based stochastic optimization methods to solve the formulation. The solution was capable of using deterministic optimization techniques. They employed an MDP solution for a portion of the booking process rather than approximations of MDP value functions. Their results showed that the hybrid policies perform well for two-leg problems, but their approach cannot be used for larger networks. Bertsimas and Boer (2005) developed an algorithm that addressed different issues, like demand uncertainty, nesting and the dynamic nature of the booking process. They combined a stochastic gradient algorithm and approximate dynamic programming ideas to improve the initial booking limits. Talluri and Van Ryzin have worked on developing discrete choice models for RM since 2000. Their aim is to capture the “buy-up” and “buy-down” behavior of the customers (Talluri and Van Ryzin to appear).
9
3
The Statistical Modeling Approach to RM
Motivated by the successful application of orthogonal array experimental designs and multivariate adaptive regression splines in stochastic dynamic programming (Chen et al. 1999), Chen et al. (2003) proposed an MDP based OA-MARS approach to RM. In this approach, the RM problem is solved in two parts, off-line and on-line. The off-line or the statistical modeling module derives the RM accept/reject policy while the on-line or booking module simulates the actual decisions. Their model assumes: 1. The booking process starts ninety days before the day of departure. 2. Flight capacities and the flight schedule are known. 3. There is no overbooking or cancellation.
3.1
Statistical Modeling Module
The steps involved in this module are: 1. The reading dates are chosen and remaining seat capacities are initially set equal to the flight capacities for the flight legs. 2. An OA experimental design is constructed to provide discretized coverage of the remaining seat capacity state space. The state space ranges from zero to the plane capacities of the flights in the network. 3. For each of the discretization points, the (DET) model and the (STOCH) model are solved. The (DET) model is proved to provide an upper bound on the MDP value function, denoted by FtU (x) (Talluri and Van Ryzin 1998) and the (STOCH) model is proved to provide a lower bound, denoted by FtL (x) (G¨unther 1998). This loop is repeated at all reading dates. 4. For each reading date, a MARS approximation is fit separately to estimate the (DET) and (STOCH) revenues over the entire state space. Thus a total 2T different MARS approximations are generated. Figure 3 illustrates the procedure followed in the statistical modeling module. The essential statistical models FˆtL and FˆtU are now made available for the on-line booking module. Figure 3 about here.
10
3.2
Booking Module
A fair market value for a booking request of group size g, for demand class f at time τ is estimated using Pessimistic = FˆτL (x) − FˆτU (x0 )
(7)
Optimistic = FˆτU (x) − FˆτL (x0 )
(8)
Fair Market Value =
Pessimistic + Optimistic 2
(9)
In Figure 4, the RM policy is defined as, “accept the booking request only if the requested fare is greater than the fair market value.” Figure 4 about here. In the next section, we identify appropriate ranges for remaining seat capacities to reduce the modeling domain and enable more accurate MARS approximations. The statistical modeling and booking modules described in Sections 3.1 and 3.2 are modified as follows: The off-line phase consists of a revised statistical modeling module that conducts a preprocessing simulation to identify realistic ranges of the remaining seat capacity state variables and then builds statistical models of the (DET) and (STOCH) revenue functions to estimate bounds on the value function of the MDP. The on-line phase uses the statistical models from the off-line phase in the RM policy to make the booking decisions similar to the statistical modeling approach.
4
Revised Statistical Modeling Module
In the revised version of the statistical modeling module, realistic ranges of the remaining capacity state variable are generated, instead of the same ranges (from zero to capacity) throughout the booking period. Intuitively, these ranges should be close to the capacity at the beginning of the booking period and move closer to zero towards departure.
4.1
Generation of Realistic State Space
In the statistical modeling module of Section 3.1, the state space remains the same for all the reading dates. Hence, the design points are spread out over a wider region than required. We know that, in practice, one is unlikely to find
11
an empty flight on the day of the departure or a full flight 90 days prior to the day of departure. In order to be more realistic, we estimate the possible/realistic ranges for each reading date. These are called trust regions. Demand scenarios are generated based on real data. Remaining flight capacity is initialized to the actual flight capacity. The (DET) model, as described in Section 2.1.4, is employed at the reading dates to generate bid prices. The RM policy which states, “accept booking request only if the fare is greater than the bid price” is used to make decisions on accepting/rejecting the request. Upon accepting the request, remaining seat capacity is updated to remaining capacity minus the booking request’s group size g. At each reading date, an optimization model is solved to obtain updated bid prices, and the process repeats until the flight departs. Demand scenarios are simulated many times and at the end of each reading date, remaining seat capacities are recorded. Figure 5 shows the generation of the trust regions. Remaining seat capacities obtained at each reading date over the entire simulation are used to determine the maximum and minimum capacities at those reading dates. Figure 5 about here. To estimate the number of simulation runs needed to obtain good realistic ranges, a simulation for an initial sample size of s = 30 was run. The resulting data was used to estimate the standard deviation of remaining capacity, σ. Desired sample size was then estimated using a confidence interval approach, & s=
2zα/2 σ E
2 ' ,
(10)
where E is 5% of the expected value of the sample plus or minus the confidence coefficient times the standard error. A total of 85 simulation runs were conducted.
4.2
Approximation of the Value Functions
The remaining seat capacity state spaces are set according to the empirically-derived realistic ranges. Similar to step 2 of the statistical modeling module in Section 3.1, an OA experimental design is employed to identify discretization points in the realistic state spaces for each reading date. Otherwise, steps 3 and 4 in Section 3.1 remain essentially the same (see Figure 3). As in Sections 2.1.4 and 2.1.5, the (DET) model is used to provide an upper bound on the MDP value function, and the (STOCH) model is used to provide a lower bound. Solving the deterministic model is a straightforward LP, 12
but there are different approaches for solving the stochastic model. The next section describes the approach used in this paper.
4.3
Solving the Stochastic Network Optimization Model
In the (STOCH) model, consider E[min(df t , uf t )] of the objective function. Olinick and Rosenberger (2003) showed that this function is concave. Expanding it using a Taylor series about a constant u0 ≥ 0 we have, E[min(df t , uf t )] = E[min(df t , u0 )] + (uf t − u0 )∇E[min(df t , u0 )] + o(||uf t − u0 ||2 ). From the definition of expected value we know that, for any discrete random variable L, E[L] =
(11) P
lp(l). Let,
b(df t ) = min(df t , u0 ). For simplicity, let df t = d for the purposes derivation. Hence, E[min(d, u0 )] =
u0 X
∞ X
dp(d) +
u0 p(d),
(12)
d=u0 +1
d=0
where p(d) is the probability of demand d. Demand is assumed to follow a compound Poisson process with arrival rate λ. Let H be the cumulative distribution function for the Poisson distribution and h be the probability mass function for the Poisson distribution. Hence, E[min(d, u0 )] =
u0 X
∞ X
de−λ λd /d! + u0
d=0 u0 X
=λ
(13)
d=u0 +1 ∞ X
e−λ λd−1 /(d − 1)! + u0
d=1
=λ
e−λ λd /d!
uX 0 −1
e−λ λd /d!
(14)
d=u0 +1
e−λ λd /d! + u0 [1 −
u0 X
e−λ λd /d!]
(15)
d=0
d=0
= λH(uo − 1) + u0 [1 − H(uo )].
(16)
Using finite differences, ∇E[min(d, u0 )] is estimated by ∇E[min(d, u0 )] = [(u0 + 1) − (u0 + 1)H(u0 + 1) + λH(u0 )] − [u0 − u0 H(u0 ) + λH(u0 − 1)] = λh(u0 ) − u0 h(u0 + 1) − H(u0 + 1) + 1.
(17) (18)
Define wf t , as the decision vector for the expected number of passengers for demand class f at reading date t, such that wf t ≤ E[min(df t , uf t )].
13
Substituting all the above into the original (STOCH) model, the final (STOCH) model obtained is as below. max
t X m X
rf wf τ
(19)
τ =1 f =1
s.t A
t X
! uτ
≤ xt
(20)
τ =1
wf τ − uf τ [λh(u0 ) − u0 h(u0 + 1) − H(u0 + 1) + 1] ≤ λH(u0 − 1)− u0 [λh(u0 ) − u0 h(u0 + 1) − H(u0 + 1) + H(u0 )] ∀f = 1, 2, ...m, ∀τ = 0, 1, ...t, ∀u0 ∈