Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint Paat Rusmevichientong∗
Zuo-Jun Max Shen†
David B. Shmoys‡
Cornell University
UC Berkeley
Cornell University
April 7, 2008
Abstract The paper considers a stylized model of a dynamic assortment optimization problem, where given a limited capacity constraint, we must decide the assortment of products to offer to customers to maximize the profit. Our model is motivated by the problem faced by retailers of stocking products on a shelf with limited capacities and by the problem of placing a limited number of ads on a web page. We assume that each customer chooses to purchase the product (or to click on the ad) that maximizes her utility. We use the multinomial logit choice model to represent demand. However, we do not know the demand for each product. We can learn the demand distribution by offering different product assortments, observing resulting selections, and inferring the demand distribution from past selections and assortment decisions. We present an adaptive policy for joint parameter estimation and assortment optimization. To evaluate our proposed policy, we define a benchmark profit as the maximum expected profit that we can earn if we know the underlying demand distribution in advance. We show that the running average expected profit generated by our policy converges to the benchmark profit and establish its convergence rate. Numerical experiments based on sales data from an online retailer indicate that our policy performs well, generating over 90% of the optimal profit after less than two days of sales.
1.
Motivation and Problem Formulation
Companies have realized the importance of offering products that are tailored to the demand of customers in each region. For instance, Wal-mart stocks specific lines of clothes targeted exclusively to certain groups of customers (Zimmerman (2006)). Car manufacturers are well-known for ∗
School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14853, USA. E-mail:
[email protected] † Department of Industrial Engineering and Operations Research, University of California–Berkeley, 4129 Etcheverry Hall, Berkeley, CA 94720, USA. E-mail:
[email protected] ‡ School of Operations Research and Information Engineering and Department of Computer Science, Cornell University, Ithaca, NY 14853, USA. E-mail:
[email protected]
1
understanding the importance of learning customer preferences and offering the right mix of products at the right locations (O’Connell (2005)). Given limited resources such as shelf capacity and marketing budget, a retailer must determine which assortment of products to offer to customers to maximize profits. In some cases, the retailer does not know the demand a priori, and in fact, the probability that a customer will purchase a product will depend on the assortment offered. The retailer can learn the demand by offering different assortments, observing purchases, and estimating the underlying demand from past sales and assortment decisions. In addition to applications in retail and consumer product industries, the problem of demand learning and finding a profitable assortment also has applications in online advertising. Figure 1 shows the front page of the NY Times online (as of March 4, 2008). Highlighted items in the figure correspond to advertisements, which appear in the top and the right hand column of the web page. In this setting, the capacity constraint represents the limited number of locations on the web page where the ads can appear. The demand for each product corresponds to the number of customers who click on the ad. Since many ads are similar, the probability that a customer will click on any one ad will likely depend on the assortment of ads shown to the customers. In general, we often do not know these probabilities in advance and must estimate their values from observed clickthru data. Given a limited online real estate and unknown demand for the ad, we must decide on the assortment of ads that will generate the most profit, adjusting our assortment decisions and refining our demand estimates over time as new data arrive. Motivated by the problems of product stocking and ads placement mentioned above, we formulate a stylized dynamic assortment optimization model that captures some of the issues commonly present in these problems, including the capacity constraint, the uncertainty in the demand distribution, and the dependence of the purchase or selection probability on the assortment decision. Our formulation assumes that we have N products indexed by 1, 2, . . . , N . For each i = 1, . . . , N , let wi > 0 denote the marginal profit of product i, where we assume that through appropriate scaling max`=1,...,N w` ≥ 1. Due to a capacity constraint, we can offer at most C products to the customers. The goal is to determine the subset of at most C products that will yield the maximum profit. To account for dependence of the selection probability on the assortment decision, we will assume a multinomial logit (MNL) choice model, which is one of the most popular and most commonly used choice models in economics, marketing, and operations management. The MNL model was first introduced by McFadden (1974) and has been applied in a very large number of applications (see, Ben-Akiva and Lerman (1985), Anderson et al. (1992), Mahajan and van Ryzin (1998), and the references therein). Under the MNL choice model, each customer chooses the product that
2
Figure 1: Sample advertisements (highlighted in circles) shown in the front page of the NY Times. The ads can appear only in certain locations on the web page. Given a limited online real estate, we must decide on the most profitable set of ads to display to customers.
maximizes her utility, where for each i = 0, 1, . . . , N , the utility Ui associated with product i (with 0 corresponding to the option of no purchase) is given by: Ui = µ i + ζi , where µi ∈ R denotes the mean utility that the customer assigns to product i and ζ0 , . . . , ζN are independent and identically distributed random variables having a Gumbel distribution with location parameter 0 and scale parameter 1; that is, P {ζi ≤ x} = e−e
−x
for each x ∈ R. To simplify
our exposition and without loss of generality, we will set µ0 = 0. Under the MNL choice model, it is a well known result (see, for example, McFadden (1974)) that, for each assortment S ⊆ {1, . . . , N }, the probability θj (S) that the customer chooses the product j is given by: P µj / 1 + µk , if j ∈ S, e e k∈S P θj (S) = 1/ 1 + k∈S eµk , if j = 0, 0, otherwise.
3
To facilitate our exposition, we define vi = eµi for all i. Then, the expected profit f (S) associated with the assortment S is given by f (S) =
X
P wj θj (S) =
j∈S
wj v j
j∈S
1+
P
j∈S
vj
.
The assortment optimization problem, which we refer to as the Capacitated Multinomial Logit Assortment Problem (CAPACITATED MNL), is defined as follows: Z∗ =
max
S⊆{1,...,N }:|S|≤C
f (S),
(1)
and we denote the optimal solution by SC∗ , that is, Z ∗ = f (SC∗ ). We are interested in the setting where the mean utilities µi ’s are unknown, and we have to infer these values by offering different assortments and estimating the parameters based on the resulting selections. In this case, we aim to develop an adaptive policy that generates a sequence of assortments S1 , S2 , . . . such that |St | ≤ C for each t, and St depends only on the customer selections and assortment decisions in the prior t − 1 periods. As a performance measure, we will consider P the running average expected profit T1 Tt=1 E [f (St )], and we want this average to converge to Z ∗ as the number of time periods T increases to infinity. 1.1
Contributions and Organization of the Paper
Our approach to the dynamic assortment optimization problem consists of two parts. In the first part, which we refer to as Static Optimization, we assume that the mean utilities µ1 , . . . , µN are known in advance, and focus on finding a solution to the CAPACITATED MNL problem given in Equation (1). We present a polynomial-time algorithm for finding the optimal assortment and establish structural properties of the solution. In the second part, which we refer to as Dynamic Optimization, we leverage the insight gained from the static optimization problem and adapt the algorithm to the setting when the mean utilities are unknown and must be estimated from historical data. We now describe the contributions and key insights of the paper. Capacitated vs. Unconstrained Static Optimization: When there is no capacity constraint (that is, C = N ), Gallego et al. (2004) and Liu and Van Ryzin (2004) showed that the optimal solution to the static optimization problem in Equation (1) can be found among the following N assortments: Λ1 = {λ1 } , Λ2 = {λ1 , λ2 } , Λ3 = {λ1 , λ2 , λ3 } . . . , ΛN = {λ1 , . . . , λN } , where the permutation λ = (λ1 , . . . , λN ) represents the ordering of the products based on their marginal profits, that is, wλ1 ≥ wλ2 · · · ≥ wλN . Further, they showed that the optimal assortment can be found by performing a sequential search, starting with Λ1 , and sequentially adding one 4
product at a time (in a descending order of marginal profit) until the first time that the profit decreases. Therefore, the optimal assortment is the first Λi such that f (Λi ) > f (Λi+1 ). This is a powerful and compelling result because it enables us to restrict our search to at most N subsets instead of 2N . The solution to the unconstrained problem suggests that, when we have a capacity constraint C, the optimal solution might involve the top C products with the highest marginal profits. As shown in Example 1 in Section 2.1, however, this intuition is false. In fact, the optimal assortment can actually contain the product with the lowest marginal profit. Moreover, one might hypothesize that the optimal solution to the problem with a smaller capacity constraint is a subset of the optimal solution for the problem with a larger capacity constraint, that is, SC∗ 1 ⊆ SC∗ 2 whenever C1 ≤ C2 . As shown in the example, this conjecture turns out to be false as well. The example demonstrates that the constrained optimization problem is significantly more difficult, and thus, we must be careful in applying our intuition from the unconstrained problem. In Section 6.2, we compare the expected profit of the optimal assortment and the assortment generated under the greedy heuristic of Gallego et al. (2004) and Liu and Van Ryzin (2004), using actual sales data from an online retailer. We show that when the capacity constraint is 10, the optimal expected profit is 14% higher than the profit of the assortment generated by the greedy heuristic. A Polynomial-time Algorithm for Static Optimization: In Section 2.2, we describe a polynomialtime algorithm for solving the CAPACITATED MNL problem that has a running time of O N 2 , which we refer to as the StaticMNL algorithm. The algorithm generates a sequence hA0 , A1 , . . . , AK i of assortments that guarantees to contain an optimal solution, where K ≤ N 2+1 − 1 = O(N 2 ). We emphasize that the ordering among the sets A` ’s is not based on the ordering of the marginal profits. The algorithm is based on an ingenious solution to a more general problem of optimizing a rational objective function by Megiddo (1979). Given the widespread use of the MNL choice model, we believe it is important to bring the community’s attention to this method, and we have provided a detailed description of the algorithm as it applies to our setting. New Structural Results for the Optimal Solution: Although we cannot sort the products in a descending order of marginal profits as in the unconstrained optimization problem, the optimal solution to the CAPACITATED MNL problem still exhibits many structural properties that are similar to its unconstrained counterpart. By exploiting the specific structure of our multinomial logit choice model, we derive a new upper bound associated with the sequence hA0 , A1 , . . . , AK i.
5
We show that the number of distinct subsets among A0 , A1 , . . . , AK is at most C(N +C −1) (Proposition 3). Thus, if the capacity C is fixed, this result implies that finding the optimal assortment requires searching through O(N ) subsets, roughly the same number as in the unconstrained problem. Furthermore, we show in Proposition 4 that the expected profit associated with the sequence hA0 , A1 , . . . , AK i is unimodal in the sense that there exists an index m∗ such that f (A1 ) ≤ f (A2 ) ≤ · · · ≤ f (Am∗ )
and f (Am∗ ) ≥ f (Am∗+1 ) ≥ · · · ≥ f (AK ) .
This result enables us to identify the optimal assortment quickly. To our knowledge, these represent the first structural results associated with the CAPACITATED MNL problem. Parameter Estimation and Error Bounds for Maximum Likelihood Estimates: In Section 3, as the first step toward solving the dynamic optimization problem, we introduce a parameter estimation technique based on maximum likelihood estimation (MLE). Traditional applications of MLE to the MNL choice model have assumed that we can offer all N products to each customer at once. The utilities of all products are then estimated from the resulting sales data. Due to the capacity constraint in our setting, however, we can offer at most C products. It turns out that, by offering overlapping assortments, we can still estimate the necessary parameters for computing the optimal assortment. In addition, we develop new error bounds that relate the quality of our parameter estimates (based on MLE) as a function of the number customers. In Theorems 5 and 6, we show that if we offer our assortments to T independent customers, then for any > 0, the probability that our 4 estimates of the utilities differ from their true values by more than is O e− T . To the best of our knowledge, this is the first such error bound for this problem. The error bound from Theorem 6 combined with the key algorithmic insight from the static optimization problem (Proposition 2) enable us to establish the following key result. Suppose we estimate of the mean utilities based on observations from T i.i.d. customers, and we use these estimated utilities as inputs for the StaticMNL algorithm. In Corollary 7, we show that the output of the StaticMNL algorithm will be exactly the same as if we have used the true mean utilities as inputs, with a very high probability of at least 1 − O e−T . An Adaptive Algorithm for Dynamic Optimization: In Section 4, we combine the results from Sections 2 and 3 to develop an adaptive algorithm for joint parameter estimation and optimization. Our proposed algorithm extends the algorithm for the static optimization problem to handle the setting when the mean utilities µi ’s are unknown and must be estimated from historical sales and assortment decisions. The proposed policy generates a sequence of assortments S1 , S2 , . . . such that
6
the T -period average expected profit
1 T
PT
t=1 E
[f (St )] converges to Z ∗ at the rate of O (log T )2 /T
(see Theorem 9 and Corollary 10 for more details). Numerical Experiments: We test our proposed policy using actual sales data from an online retailer. The results indicate that our policy performs well, yielding over 90% of the optimal expected profit after about two days of sales. 1.2
Literature Review
The literature on assortment optimization can be divided into two areas – static and dynamic optimization – and we believe that our research contributes to both of these areas. Kok et al. (2006) provide an excellent review of the current state-of-the-art in the assortment optimization literature. The problem of static assortment planning has been studied extensively in the literature. Most models assume that the total display space is limited and the demand of a particular product depends on the allocated shelf space. The focus here is to develop optimization algorithms for incremental changes to the assortment by adding or dropping products, or by adjusting the space allocated to different products (see, for example, Corstjens and Doyle (1981, 1983), Bultez and Naert (1988), Bultez et al. (1989), and Dreze et al. (1994)). Some recent assortment models are built within the multinomial logit framework. Chong et al. (2001) consider brand-level assortment decisions using a nested MNL model including measurements associated with each brand in a product category. They use a local improvement heuristic based on pairwise interchanges of product variants to modify the assortment and achieve a substantial increase in profit. Cachon et al. (2005) and Gaur and Honhon (2006) explicitly account for customer search in the assortment planning process and show that ignoring customer search in demand estimation can lead to sub-optimal assortment decisions and lower expected profits. Smith (2006) generalizes previous research by allowing general heterogeneous customer preferences and by incorporating customer’s choice of retailer within the multinomial logit framework. He applies his model to a commercially obtained data on customer preferences for DVD players. One of the major findings from this paper is that including heterogeneity in customer preferences plays an important role in choosing the optimal assortment. Several papers with operations management focus also consider inventory decisions in product assortment problems. These papers typically assume a certain product substitution structure when stockouts occur (see Ryzin and Mahajan (1999), Smith and Agrawal (2000), Mahajan and Ryzin (2001), Kok and Fisher (2004), Cachon et al. (2005), and Honhon et al. (2006)). Most of these
7
papers assume that customers are homogenous with the same expected attractiveness ranking for products across all customers. Another related stream of research deals with product line design issues. A product may have a set of attributes where each attribute can have different levels, and a product line can have a set of different attribute level configurations. Thus, a product line can satisfy more customers than a single product would. Green and Krieger (1985) were among the first to model the product line design problem. Followup work includes McBride and Zufryden (1988) (using an integer programming method), Kohli and Sukumar (1990) (dynamic programming), Dobson and Kalish (1993) and Nair et al. (1995) (heuristic for an extended Green and Krieger model), and Alexouda and Paparizos (2001) and Fligler et al. (2006) (genetic algorithms). Chen and Hausman (2001) use choice-based conjoint analysis to model customer preferences. They show that the conjoint approach has some special mathematical properties that lead to an efficient optimal algorithm for the product line selection problem. Yano and Dobson (1998) provide a comprehensive review of the literature in this area. None of the above papers consider demand learning, thus their assortment problems are static, not dynamic. All of those papers begin with the presumption that we know the parameters of the underlying customer choice model. To the best of our knowledge, the only paper on product assortment optimization with demand learning is the work by Caro and Gallien (2006). Caro and Gallien (2006) apply a finite-horizon multi-armed bandit model with Bayesian learning to a dynamic assortment problem for seasonal goods. Several plays per stage are allowed in their model to accommodate the situation where a company can make product design and assortment decisions during the testing cycle when actual demand information becomes available. They derive a closed-form dynamic index policy that captures the key exploration versus exploitation tradeoffs. Connections to Multiarmed Bandit Problems: The work by Caro and Gallien (2006) is part of the literature on multi-armed bandit problems. In a multi-armed bandit problem, we wish to choose an arm (or a decision) with the highest expected reward, but we do not know the reward distribution associated with each arm a priori. In their seminal paper, Lai and Robbins (1985) propose an adaptive index-based policy whose running average expected reward converges to the maximum reward. Subsequent work has refined and extended the original analysis. Examples include the work by Lai (1987) (considers a Bayesian setting with prior distribution on the reward of each arm); Agrawal (1995) (considers an index formula that depends only on the running average reward of each arm); Auer et al. (1995) (develop a probabilistic decision rule for the bandit problem in adversarial settings); Auer et al. (2002) (provide a finite-time analysis of the multi-armed bandit problem).
8
We can formulate our problem as an instance of the multi-armed bandit problem by viewing each assortment – corresponding to the sets of products to be offered at each store – as an arm whose expected profit is unknown. We can then apply mutliarmed bandit algorithms such as those of Lai and Robbins (1985) and Auer et al. (2002) to generate a sequence of assortments over time whose average expected profit converges to the optimal profit. Unfortunately, in this formulation, the number of possible decisions is O N C , which grows exponentially with the capacity constraint. All known generic multiarmed bandit algorithms have running times and convergence rates that scale linearly with the number of decisions, making them impractical when the capacity constraint C is large. An alternative formulation of our problem is to view each product as an arm in a bandit problem. In each period, we can play up to C arms, corresponding to offering an assortment of size C or less to the customers. Variations of the multiarmed bandit problems where we can play multiple arms in each period have been extensively studied by Anantharam et al. (1987a) and Anantharam et al. (1987b). However, all of the literature in this area assume that the rewards of playing multiple arms is the sum of the reward of each individual arm, which is not applicable to our setting because of the structure of the multinomial logit choice model. Moreover, it is not clear whether the policy proposed in the literature will be effective here. We note that there have been resurgent interests in the multi-armed bandit problems with a very large number of arms. Examples include Berry et al. (1997) who consider a Bayesian setting with infinite Bernoulli arms, and Kleinberg (2004, 2005) who consider applications to network routing and single-product pricing. However, most of these works are very problem-specific, and they are not applicable to our assortment optimization problem. Connection to Economics Literature: Motivated by applications in operations management and online advertising, we focus on the assortment optimization problem under a multinomial logit choice model. Our formulation can be viewed as a special case of the general problem of solving and estimating the dynamic stochastic discrete choice model in economics. See, for example, Eckstein and Wolpin (1989), Hotz and Miller (1993), Keane and Wolpin (1994), Paap and Franses (2000) and the references therein. Much of the work in this area focuses on finding efficient estimation procedures and approximation techniques for solving large-scale dynamic programming problems. To our knowledge, our paper is the first application to the assortment optimization with a capacity constraint.
9
1.3
Additional Applications and Discussion of the Problem Formulation
In Section 1, we describe potential applications of our model to the online ads placement problem and to the assortment problem in traditional “brick-and-mortar” retail stores. We now highlight additional potential applications of our model and discuss the strengths and weaknesses of the formulation. Although e-commerce companies such as Amazon.com can offer customers a very large assortment of products, we believe our model can still be applied in the recommendation of products for a generic customer whose purchase history is unknown. Examples of such a customer include a new customer to the web site or a customer who does not have an identifiable “cookie” associated with his or her web browser. Figure 2 shows an example of book recommendations by Amazon.com, which in this case, is limited to about 15 books per page. Given the limited space to display products on the web page, the company might wish to display the assortment of products that maximizes the total profit. This problem also fits within our formulation.
Figure 2: An example of book recommendations shown on Amazon.com. Since the web page can display only a limited number of products, we want to display the assortment of products that will give the highest profit.
We now discuss the limitations of our formulation.
Although the MNL choice model has
been used successfully in many applications, it exhibits the Independence of Irrelevant Alternatives, where the ratio of the selection probabilities between two alternatives is independent of the assortment containing them. In certain applications, this property is not consistent with the actual
10
customer choice behavior. For more discussions on this topic and extensions of the MNL model that alleviate this problem, the reader is referred to Ben-Akiva and Lerman (1985) and Kok et al. (2006). Our formulation assumes that each customer independently subscribes to the same choice model. This assumption is appropriate when we have a single market and the customers within the market are relatively homogenous. Extensions of our model to the multi-store setting is beyond the scope of this paper. We also assume that the cost of changing an assortment from one customer to the next is negligible. We believe that this assumption is reasonable in the online setting where the cost of changing the ads or product recommendations on the web page is minimal. In settings where there are significant costs associated with switching product assortments, however, our model might not be appropriate. In addition, we implicitly assume that we have enough supply of each product to ignore all inventory considerations.
2.
Static Optimization: Algorithm and Structure of the Optimal Solution
This section focuses on developing a solution to the CAPACITATED MNL problem given in Equation (1), under the assumption that the mean utilities µ1 , . . . , µN are known in advance. In Section 2.1, we present two examples which show that, when we have a capacity constraint, the solution to the assortment optimization problem does not necessarily follow the intuition suggested by its unconstrained counterpart. Then, in Section 2.2, we present a polynomial-time algorithm for this problem, which is based on the work by Megiddo (1979). By exploiting the special structure of the MNL choice model, we derive new structural results associated with the optimal assortments, which are presented in Sections 2.3 and 2.4. We will rely on these structural insights extensively when we develop an adaptive policy for the dynamic optimization problem in Section 4. 2.1
Contrast to the Unconstrained Optimization Problem
As mentioned in Section 1.1, Gallego et al. (2004) and Liu and Van Ryzin (2004) showed that the unconstrained optimization problem (when C = N ) can be solved by simply sorting the products in descending order of the marginal profits. As shown in the following example, this result is not true when we have a capacity constraint. Example 1. Consider four products with w1 = 9.5, w2 = 9.0, w3 = 7.0, w4 = 4.5
and
11
v1 = 0.2, v2 = 0.6, v3 = 0.3, v4 = 5.2
The expected profit for each assortment is shown in Table 1. In this case, the optimal assortments for each value of C is given by: S1∗ = {4},
S2∗ = {2, 4},
S3∗ = {1, 2, 3},
S4∗ = {1, 2, 3, 4}.
Note that product 4 which has the lowest marginal profit is included in the optimal assortment when C = 1 and C = 2, but it is not included in S3∗ . Yet, it re-appears in S4∗ when there is no capacity constraint (C = 4). Moreover, note that although S1∗ ⊆ S2∗ , we observe that S2∗ * S3∗ . S
f (S)
S
f (S)
S
f (S)
S
f (S)
{1}
1.583
{1, 2}
4.056
{1, 2, 3}
4.476
{1, 2, 3, 4}
4.493
{2}
3.375
{1, 3}
2.667
{1, 2, 4}
4.386
{3}
1.615
{1, 4}
3.953
{1, 3, 4}
4.090
{4}
3.774
{2, 3}
3.947
{2, 3, 4}
4.352
{2, 4}
4.253
{3, 4}
3.923
Table 1: Expected profits for different assortments in Example 1. The optimal assortment for each value of C is highlighted in bold.
2.2
A Polynomial-time Algorithm
We now present a polynomial-time algorithm for solving the CAPACITATED MNL optimization problem based on an ingenious solution proposed by Megiddo (1979). The key idea underlying our algorithm is the observation that we can express the optimal profit Z ∗ from Equation (1) as follows. Z ∗ = max {λ ∈ R : ∃X ⊆ {1, . . . , N }, |X| ≤ C, and f (X) ≥ λ} ( ) X = max λ ∈ R : ∃X ⊆ {1, . . . , N }, |X| ≤ C, and vi (wi − λ) ≥ λ , i∈X
where the final equality follows from the definition of the profit function f (·). The inputs to the µi for i = 1, . . . , N , and algorithm consist of the vectors v = (v1 , v2 , . . . , vN ) ∈ RN + where vi = e
the marginal profits w = (w1 , . . . , wN ) ∈ RN + , along with the capacity constraint C. Given these inputs, we then construct N + 1 linear functions h0 , h1 , . . . , hN , where for each λ ∈ R, h0 (λ) = 0
and
hi (λ) = vi (wi − λ) ,
for i = 1, . . . , N.
P Note that Z ∗ = max λ ∈ R : ∃X ⊆ {1, . . . , N }, |X| ≤ C, and i∈X hi (λ) ≥ λ . As suggested by the expression for Z ∗ , the geometry of the collection of lines h0 (·), h1 (·), . . . , hN (·) will play a prominent role in our proposed algorithm. 12
We wish to find the ordering of the products based on the values of hi (λ)’s as λ ranges over all real values. To do this, we first compute the intersection point between any two lines. For each {i, j} ⊆ {1, 2, . . . , N }, let I(i, j) denote the point on the x-axis where lines hi (·) and hj (·) intersect, that is, I(0, j) = I(j, 0) = wj for each j = 1, . . . , N and for each {i, j} ⊆ {1, 2, . . . , N } such that vi 6= vj , hi (I(i, j)) = hj (I (i, j))
⇔
I(i, j) =
v i wi − v j wj . vi − vj
We then sort the intersection points in an increasing order of I(i, j)’s. Let (i1 , j1 ) , . . . , (iK , jK ) denote this ordering, that is, 0 ≤ i` < j` ≤ N for each ` = 1, . . . , K and −∞ ≡ I(i0 , j0 ) < I(i1 , j1 ) ≤ I(i2 , j2 ) ≤ · · · ≤ I(iK , jK ) < I(iK+1 , jK+1 ) ≡ +∞, where we have added the two end points I(i0 , j0 ) and I(iK+1 , jK+1 ) to facilitate our exposi tion. By definition, within the interval I (it , jt ) , I (it+1 , jt+1 ) , none of the N + 1 lines intersect each other, and thus the ordering of the N + 1 lines remains constant. For t = 0, 1, . . . , K, the algorithm will maintain the following four pieces of information associated with the interval I (it , jt ) , I (it+1 , jt+1 ) : 1. The ordering σ t of the N lines. 2. The set Gt corresponding to the first C elements according to the ordering σ t . 3. The set B t of products whose values have become negative. 4. The assortment At = Gt \ B t . Starting with an ordering of the products based on vi ’s (corresponding to λ = −∞), each time we encounter an intersection point, we update σ t , Gt , B t , At . We continue this process until we have exhausted all the intersection points. The algorithm then outputs a sequence of quadruplets
t t t t σ , G , B , A : t = 0, 1, . . . , K . The formal description of our algorithm – which we refer to as the StaticMNL(v, w, C) algorithm – is stated below. StaticMNL (v, w, C) N INPUT: v = (v1 , . . . , vN ) ∈ RN + , w = (w1 , . . . , wN ) ∈ R+ , and C ∈ Z+ .
INITIALIZATION: For i = 0, 1, . . . , N , let a linear function hi : R → R be defined by: for each λ ∈ R, h0 (λ) = 0, and for i = 1, . . . , N , hi (λ) = vi (wi − λ). For each {i, j} ⊆ {0, 1, . . . , N } with i 6= j, let I(i, j) denote the point on the x-axis where lines hi (·) and hj (·) intersect. Let (i1 , j1 ) , . . . , (iK , jK ) denote the ordering of the intersection points, that is, i` < j` for each ` = 1, . . . , K and −∞ ≡ I(i0 , j0 ) < I(i1 , j1 ) ≤ I(i2 , j2 ) ≤ · · · ≤ I(iK , jK ) < I(iK+1 , jK+1 ) ≡ +∞,
13
v3(w3 - λ) v2(w2 - λ)
v1(w1 - λ)
Figure 3: An illustration of the StaticMNL algorithm with three products. 0 Also, let σ 0 = σ10 , . . . , σN denote an ordering of the products based on vi ’s, that is, vσ10 ≥ vσ20 ≥ · · · ≥ vσ0 . N
DESCRIPTION: Let A0 = G0 =
0 σ10 , . . . , σC
and B 0 = ∅. For t = 1, . . . , K, construct the
permutation σ t and sets At and Gt recursively as follows: • If it 6= 0, let the permutation σ t be obtained from σ t−1 by transposing it and jt and set B t = B t−1 . • If it = 0, let σ t = σ t−1 and B t = B t−1 ∪ {jt }. t and At = Gt \ B t . • Let Gt = σ1t , . . . σC OUTPUT: The sequence
σ t , Gt , B t , At : t = 0, 1, . . . , K .
The following example illustrates the application of the StaticMNL algorithm to a setting with three products. Example 2. Consider an example with 3 products whose corresponding collection of lines h1 (·), h2 (·), and h3 (·) are given in Figure 3. In this example, the products are ordered so that v1 < v2 < v3 and we have six intersection points with I(2, 3) < I(1, 3) < I(0, 3) < I(1, 2) < I(0, 2) < I(0, 1) Suppose that we have a capacity constraint C = 2. The output of the StaticMNL algorithm is then given in the following table. We note that, for each t, the assortment At correspond to the top two lines whose values are nonnegative within the interval (I (it , jt ) , I (it+1 , jt+1 )).
14
t
I(it , jt )
σt
Gt
Bt
At
0
−∞
(3, 2, 1)
{3, 2}
∅
{3, 2}
1
I(2, 3)
(2, 3, 1)
{2, 3}
∅
{2, 3}
2
I(1, 3)
(2, 1, 3)
{2, 1}
∅
{2, 1}
3
I(0, 3)
(2, 1, 3)
{2, 1}
{3}
{2, 1}
4
I(1, 2)
(1, 2, 3)
{1, 2}
{3}
{1, 2}
5
I(0, 2)
(1, 2, 3)
{1, 2}
{3, 2}
{1}
6
I(0, 1)
(1, 2, 3)
{1, 2}
{3, 2, 1}
∅
The main result of this section is stated in Theorem 1 which establishes the running time and the correctness of the StaticMNL(v, w, C) algorithm. A general proof of this result appeared in Megiddo (1979). For completeness, we provide a short proof specific to our problem here.
Theorem 1. Let A0 , A1 , . . . , AK denote the sequence of assortments generated by the Stat icMNL (v, w, C) algorithm. Then, Z ∗ = max f Ai : i = 0, 1, . . . , K . Also, the running time of the algorithm is of order O(N 2 ). Proof. The running time of the algorithm is proportional to the number of intersection points among the N + 1 lines h0 (·), h1 (·), . . . , hN (·), which is at most N 2+1 = O(N 2 ). To establish the correctness, note that from our earlier discussion at the beginning of Section 2.2, we have that ( ) X vi (wi − λ) ≥ λ . (2) Z ∗ = max λ ∈ R : max X:|X|≤C
i∈X
However, for each λ ∈ R, the solution to the optimization problem maxX:|X|≤C
P
i∈X
vi (wi − λ)
corresponds to the top C lines (among hj (λ) = vj (wj − λ)) whose values are nonnegative. The ordering σ 0 in the description of the StaticMNL(v, w, C) algorithm corresponds to the ordering of vj (wj − λ)’s when λ = −∞. Moreover, by our construction, I (i, j) represents the point where lines hi (λ) = vi (wi − λ) and hj (λ) = vj (wj − λ) intersect (if 1 ≤ i, j ≤ N ), or the point where the value of vj (wj − λ) becomes zero (if i = 0). Thus, for each λ strictly between I (it , jt ) and I (it+1 , jt+1 ), the ordering of the values vj (wj − λ)’s must remain constant and their values do not change sign. Therefore, for each λ strictly between I (it , jt ) and I (it+1 , jt+1 ), the optimization problem given in Equation (2) will always have the same optimal solution, corresponding to the assortment At . Since the collection of assortments A0 , A1 , . . . , AK encompasses the optimal solution to the optimization problem given in Equation (2) for all values of λ ∈ R, this collection of subsets must contain an optimal solution to the CAPACITATED MNL assortment optimization problem. 15
From the description of the StaticMNL(v, w, C) algorithm, we observe the following crucial insight. The correctness of the algorithm depends only on the correct initial ordering σ 0 of the mean utilities and the ordering of the intersection points I (ik , jk )’s. Suppose that we only have approximations µ bj ’s of the true mean utilities. We can still obtain an optimal solution, provided that the ordering of the utilities µ bj and the ordering of the intersection points Ib (ik , jk )’s (computed using the estimated utilities µ bj ) coincide with the true orderings. This observation is summarized in the following proposition. The result of Proposition 2 will form a basis of our proposed adaptive algorithm in Section 4. Before we proceed to the statement of proposition, let us introduce the following notation. Let µ b = (b µ1 , . . . , µ bN ) ∈ RN denote an approximation to the true mean utilities µ1 , . . . , µN . For each i = 0, 1, . . . , N , let vbi = eµbi and let b hi : R → R be defined by: b h0 (λ) = 0 and for i ≥ 1, b b j) denote the intersection hi (λ) = vbi (wi − λ) for all λ ∈ R. For each {i, j} ⊆ {0, 1, . . . , N }, let I(i, point of the lines defined by b hi and b hj . Proposition 2. If the ordering of the estimated utilities µ bj ’s and the ordering of the estimated b j)’s coincide with the orderings based on µj ’s and I(i, j)’s, respectively, intersection points I(i, then both the StaticMNL(v, w, C) and the StaticMNL(b v , w, C) algorithms generate the same sequence of assortments as outputs. 2.3
An Upper Bound on the Number of Distinct Assortments
The number of assortments generated by the StaticMNL(v, w, C) algorithm is upper bounded by N +1 = O(N 2 ), presenting the maximum number of intersection points among N + 1 lines. By 2 exploiting the special structure of our logit choice model, we can show that the number of distinct assortments is actually at most C(N + C − 1). When C is fixed, this represents an improvement by a factor of N from the trivial bound. This result is stated in Proposition 3 whose proof is given in Appendix A. Before we proceed to the statement of the lemma, let us introduce the following assumption that will be used throughout the paper. We emphasize that Assumption 1 is introduced primarily to simplify our exposition and to facilitate the discussion of the key ideas without having to worry about degenerate cases. In Section 5, we will discuss extensions of our results to settings where this assumption may fail. Assumption 1. (a) The products are indexed so that v1 < v2 < · · · < vN . (b) The intersection points are distinct, that is, for any (i, j) 6= (s, t), I(i, j) 6= I(s, t). 16
Assumption 1(a) requires the values of vi ’s to be distinct, while Assumption 1(b) requires that the marginal profit wi ’s are distinct, and no three lines among h1 (·), . . . , hN (·) can intersect at the same point. As a consequence of Assumption 1, we observe that every pair of lines hi (·) and hj (·) will intersect each other, and thus, the number of intersection points K is exactly N 2+1 . In addition, we also have a strict ordering of the intersection points, that is, −∞ ≡ I(i0 , j0 ) < I(i1 , j1 ) < I(i2 , j2 ) < · · · < I(iK , jK ) < I(iK+1 , jK+1 ) ≡ +∞, and the collection of intervals
nh
I (it , jt ) , I (it+1 , jt+1 )
:0≤t≤K
o
partitions the real line.
Proposition 3. Let A0 , A1 , . . . , AK denote the sequence of assortments generated by the StaticMNL (v, w, C) algorithm. Under Assumption 1, the number of distinct non-empty assortments generated by the StaticMNL(v, w, C) algorithm is at most C(N − C + 1); that is, there are at most C(N − C + 1) distinct non-empty subsets among A0 , A1 , . . . , AK . If the capacity C is fixed, the above lemma shows that finding the optimal assortment requires us to search through O(N ) sets, which is comparable to the unconstrained case. Of course, the ordering of the assortments A0 , A1 , . . . AK is not based on the ordering of the marginal profits, but rather, on the ordering of the intersection points. Given the result of Proposition 3, throughout the
rest of the paper, we will assume that the sequence A0 , A1 , . . . , AK generated by our StaticMNL algorithm has a length of at most C(N + C − 1). 2.4
Unimodality of the Sequence of Expected Profits
Proposition 3 limits the number of assortments that we have to search to find the optimal solution to O(N ) subsets. Using the special structure of our problem, we can further show that the expected profits associated with the sequence of assortments generated by the StaticMNL(v, w, C) algorithm is unimodal in the sense given in Proposition 4 below. The proof of this result appears in Appendix B.
Proposition 4. Let A0 , A1 , . . . , AK denote the sequence of assortments generated by the Stat
icMNL (v, w, C) algorithm. Then, under Assumption 1, the sequence f A0 , f A1 , . . . , f AK is unimodal, that is, there exists 0 ≤ m∗ ≤ K such that ∗ f A0 ≤ f A1 ≤ · · · ≤ f Am
and
∗ ∗ f Am ≥ f Am +1 ≥ · · · ≥ f AK
Given the sequence of assortments A0 , A1 , . . . , AK , if we can evaluate the expected profit f (·) exactly, we can apply the standard Golden Ratio or Fibonacci search (see Press et al. (1999), for example) to find the optimal assortment using only O (log(N C)) iterations (since there are O(N C) 17
distinct assortments). Of course, since the true mean utilities are unknown, we cannot evaluate f (·) exactly. As shown in Section 4.1, the structure associated with the sequence of expected profit still allows us to develop an efficient search algorithm for identifying the optimal assortment.
3.
Parameter Estimation
In practice, the true mean utilities µ1 , . . . , µN are unknown and we must estimate their values from data. A standard approach for parameter estimation in the multinomial logit model is based on maximum likelihood estimation (MLE). Under this approach, we offer all products to each customer, observe their selections, and estimate the mean utilities µ bj by maximizing the likelihood function. Since we have a capacity C, we cannot offer all products simultaneously to the customers. However, we can still obtain an efficient parameter estimation technique by leveraging the result of Theorem 1 and Proposition 2. Recall from Section 2.2 that the intersection point I(i, j) of the lines defined by hi (λ) := vi (wi − λ) and hj (λ) := vj (wj − λ) is given by I(i, j) =
wi θi (S) − wj θj (S) wi v i − wj v j = , vi − vj θi (S) − θj (S)
(3)
where for each S ⊆ {1, . . . , N } such that {i, j} ⊆ S, we define θi (S) =
1+
v Pi
`∈S
v`
and θj (S) =
1+
v Pj
`∈S
v`
.
Note that the parameters θi (S) and θj (S) correspond to the probabilities that a customer will select product i and product j, respectively, from the assortment S. Thus, to estimate the intersection points I(i, j), we do not need to offer all products to the customers. We only need to offer an assortment that contains both products i and j, and observe the resulting selections. Also, note that for each assortment S such that {i, j} ⊆ S, we have that µi ≤ µj if and only if θi (S) ≤ θj (S). Therefore, we can also estimate the orderings of the mean utilities µj ’s based on the ordering of the corresponding selection probabilities. These observations lead us to consider the following parameter estimation technique. Consider a collection of subsets E such that for each E ∈ E, |E| = C and for each pair of products i and j with i 6= j, there exists an assortment E ∈ E where {i, j} ⊆ E, that is, E = {E ⊆ {1, . . . , N } : |E| = C and for all i 6= j , {i, j} ⊆ X for some X ∈ E} .
(4)
It is easy to verify that we only need at most 5(N/C)2 such subsets, that is, |E| ≤ 5(N/C)2 . The following example shows a construction of the collection E.
18
Example 3. Consider an example with N = 6 and C = 3. In this case, we can define E as follows: E = {{1, 2, 3}, {4, 5, 6}, {1, 2, 4}, {1, 2, 5}, {1, 2, 6}, {1, 3, 4}, {3, 5, 6}} . We can show that for all i 6= j, there exists E ∈ E such that {i, j} ⊆ E. For each assortment E ∈ E, we will offer it to T independent customers. Let C1 (E) , . . . , CT (E) denote the choices selected by each of the T customers when offered the assortment E, where Ct (E) ∈ {0} ∪ E with 0 corresponding to the option of no purchase. For each x ∈ RN , let xE = (xj : j ∈ E) ∈ RC denote the vector whose coordinates correspond to the indices in E. For each xE ∈ RC , the likelihood function lik xE C1 (E) , . . . , CT (E) is given by: T T n o Y Y = lik xE C1 (E) , . . . , CT (E) Pr Ct (E` ) µE = xE =
t=1
t=1
exCt (E) P 1 + `∈E ex`
where we define x0 = 0. Let the (negative) log-likelihood ` : RC → R be defined by: for each xE ∈ RC , ` xE C1 (E) , . . . , CT (E) = =
−1 ln lik xE C1 (E) , . . . , CT (E) T ! T 1 1X , P ln xCt (E) x` T e e / 1 + `∈E t=1
By construction, `(·) is a non-negative function. The maximum likelihood estimates µ bE = (b µi (C1 (E) , . . . , CT (E)) : i ∈ E) is given by µ bE = arg min ` xE C1 (E) , . . . , CT (E) C
(5)
xE ∈R
We note that it is a well-known result (see, for example, Ben-Akiva and Lerman (1985)) that the function ` is globally convex and µ bE can be computed using many commercially available optimizer packages (see, for example, Bierlaire (2003)). To facilitate our analysis, throughout this write-up, we will make the following assumption regarding the mean utilities µ1 , . . . , µN . Assumption 2. The manager has a prior knowledge of the upper bound M > 0 such that |µi | ≤ M for all i. Note that Assumption 2 requires only that we have a prior knowledge of the range of the mean utilities µi ’s; it does not require that we know their actual values. We emphasize that Assumption 2 is introduced primarily to simplify our exposition and the mathematical proofs. All of the results in this section remain true without this assumption. In Section 5, we discuss how to extend our results to the setting when the manager does not know the range of the mean utilities in advance. 19
Given Assumption 2, it suffices to restrict the domain of the optimization problem in Equation (5) to [−M, M ]C . Given the maximum likelihood estimates of the mean utilities µ b(E) = (b µi (E) : i ∈ E), define the estimated selection probabilities θb (E) = θbi (E) : i ∈ E as follows: for each i ∈ E, θbi (E) =
eµbi (E) P 1 + `∈E eµb` (E)
b Note that θ(E) is a random vector that depends on the selection made by the customers who were offered the assortment E. The following theorem establishes an error bound on the difference between the estimated selection probabilities θb (E) and the true values θ (E) as a function of the sample size. The proof of this result appears in Appendix C. Theorem 5. Under Assumption 2, if the assortment E ∈ E is offered to T independent customers, then for each 0 < < 1, ( ) X 256eB 256eB C+1 −4 T /4096B 2 b P ln e , θk (E) − θk (E) ≥ ≤ 8 2 2 k∈E
where B = 2M + ln(1 + C). The above result shows the errors in our estimated probabilities decrease to zero exponentially fast with the number of customers. We emphasize that the above error bound is a worst-case error bound that holds for any arbitrarily distribution and it is extremely conservative. Thus, the results of the above theorem and that of Theorem 6 below provide only a qualitative assessment about how our estimates scale with the number of samples. For each E ∈ E and {i, j} ⊆ E with i 6= j, we can define an estimated intersection point IbE (i, j) as follows: if θbi (E) 6= θbj (E), θbi (E) wi − θbj (E) wj , IbE (i, j) = θbi (E) − θbj (E) otherwise, set IbE (i, j) to some arbitrary value. Also, define IbE (0, j) = IbE (j, 0) = wj for each j. Let us also introduce the following notation that will be used throughout the rest of the paper. Define α > 0, η > 0 and ξ > 0 as follow: α ≡ min {|I(i, j) − I(s, t)| > 0 : 0 ≤ i, j, s, t ≤ N and (i, j) 6= (s, t)} , η ≡ min {|θi (E) − θj (E)| : E ∈ E and {i, j} ⊆ E with i 6= j} , ξ ≡ min{α, η}/2 .
(6)
Note that α corresponds to the minimum distance between any two intersection points, which is strictly positive by Assumption 1. Moreover, under Assumption 1, the parameter η is positive because µi 6= µj , and thus, θi (E) 6= θj (E) for all i 6= j and E ∈ E. 20
The main result of this section is given in the following theorem, which establishes an error bound for the estimated intersection points. The proof of this result appears in Appendix D. Theorem 6. Under Assumptions 1 and 2, there exist a1 > 0 and a2 > 0 such that if each assortment in E is offered to T independent customers, then for each 0 < < 1, 2 a1 a1 C+1 −a2 4 T N b b e , ln 2 P max max I(i, j) − IE (i, j) , max θi (E) − θi (E) > ≤ 40 E∈E i∈E C 2 {i,j}⊆E:i6=j where B = 2M + ln(1 + C) and we can take a1 and a2 to be any number such that a1 ≥
104 eB (max` w` )2 η4
and
a2 ≤
η8 . 6 × 106 × B 2 (max` w` )2
Let us emphasize the importance of Theorem 6. From the discussion at the end of Section 2.2 and as summarized in Proposition 2, we know the following key fact. If the ordering of the estimated mean utilities and the ordering of the estimated intersection points coincide with the true (yet unknown) orderings, then the sequence of assortments generated by the StaticMNL algorithm under the estimated utilities will be exactly the same as the output under the true mean utilities. Consider the event that θbi (E) − θi (E) ≤ ξ and IbE (i, j) − I(i, j) ≤ ξ for all E ∈ E and for all {i, j} ⊆ E. Theorem 6 tells us that, after observing the selections from T independent customers, 4 this event will happen with a very high probability of at least 1 − O e−ξ T . When this event happens, it follows that the ordering of the intersection points based on IbE (i, j) and the ordering of the mean utilities based on θbi (E) will coincide with the true orderings. From Proposition 2, we know that when this happens, we are guaranteed that the outputs of the StaticMNL algorithm – using the estimated utilities as inputs – will be exactly the same as if we had use the true mean utilities as inputs. This result is summarized in the following corollary. Before we proceed to the statement of the corollary, let us first introduce the following notation. Let A = hA0 , A1 , . . . , AK i denote the sequence of assortments generated by the StaticMNL under the true (yet unknown) mean utilities. Suppose that each assortment in E is offered to T indepenD E b )= A b0 (T ), A b1 (T ), . . . , A bK 0 (T ) denote the output of the StaticMNL dent customers. Let A(T algorithm when we use as inputs the estimated utilities based on the observed selections of the T b ) is a random variable whose outcome depends on the selections of the customers. Note that A(T customers. Corollary 7. Suppose we use the estimated utilities computed from the observations of T independent customers as inputs into the StaticMNL algorithm. Then, under Assumptions 1 and 2, 4 with probability at least 1 − O e−ξ T , the output of the StaticMNL algorithm (with estimated 21
utilities as inputs) will be exactly the same as the output under the true mean utilities, that is, 2 n o a1 a1 C+1 −a2 ξ4 T N b ln e , Pr A(T ) = A ≥ 1 − 40 C ξ2 ξ2 where the constants a1 and a2 are given in Theorem 6, and the parameter ξ is defined in Equation (6). We note that the right hand side of the inequality in Corollary 7 increases to one as the number of customers T increases. As before, we emphasize that the bound in the above theorem represents the worst-case error bound, and it provides only a qualitative understanding of how the errors behave as the number of customers increases. The constants appearing the above theorem are too large to be of practical values. In practice, we observe that our estimates converge significantly faster than what is predicted by the above theorem.
4.
Dynamic Optimization
To develop an algorithm for dynamic assortment optimization, we will integrate the following results: our polynomial-time algorithm for the static optimization problem from Section 2.2, the structural insight from Sections 2.3 and 2.4, and the parameter estimation technique from the previous section. We now provide an overview of our proposed adaptive policy. A more detailed description of the algorithm and its analysis are given in Section 4.2. To simplify our discussion, we will assume throughout this section that in each period, we offer an assortment to a single customer and observe his or her selection. We introduce this assumption primarily to simplify our exposition. Our analysis easily extends to settings where a single assortment is offered to multiple customers. Our policy operates in cycles. Each cycle t ≥ 1 consists of an exploration phase followed by an exploitation phase. During the exploration, we offer each assortment E ∈ E to a single customer and observe his or her selection. At the end of the tth cycle, we will have a total of t observations from the past t exploration phases associated with each assortment E ∈ E. We then compute the maximum likelihood estimates based on these observations, and determine the estimated ordering of the mean utilities and the estimated ordering of the intersection points. We then begin the exploitation phase of cycle t by applying the StaticMNL algorithm, using the estimated orderings as inputs. The StaticMNL algorithm will generate a sequence of D E 2 bt , . . . , A bt assortments Abt = A 1 Lt as an output with Lt = O(N ). By Corollary 7, we know that with a very high probability, the sequence At will contain the true optimal assortment SC∗ with f (SC∗ ) = Z ∗ (see Equation (1)). We cannot, however, evaluate the profit function f (·) directly because we do not know the bt in the collection Abt as an arm true mean utilities µj . However, we can view each assortment A k
22
in a multiarmed bandit problem (with a total of O(N 2 ) arms), and apply generic all-purpose multiarmed bandit algorithms (for example, Lai and Robbins (1985) and Auer et al. (2002)) to find the assortment that gives the maximum expected profit. Since the number of arms Lt is only O(N 2 ), these generic multiarmed bandit algorithms will quickly identify the optimal arm. By exploiting the unimodality of the sequence of profits (Proposition 4), we can identify the optimal assortment even more quickly by using a sampling-based version of the classic Golden Ratio Search for unimodal function. During our experiments, we found that the sampling-based golden ratio search performs significantly better than the traditional multiarmed bandit algorithms. All of the numerical experiments in Section 6 are conducted exclusively using the sampling-based golden ratio search. In Section 4.1, we provide a more detailed comparison between the efficiency of the bandit algorithms and the sampling-based golden ratio search. 4.1
Sampling-Based Golden Ratio Search Algorithm for Finding the Optimal Assortment Without Knowing the Profit Function
Motivated by the discussion from the previous section, we consider the problem of finding the optimal assortment from a sequence of assortments D1 , . . . , DL . In our adaptive algorithm, this sequence of assortments will correspond to the output of the StaticMNL algorithm, with our estimates of the mean utilities as its inputs. If we can evaluate the profit function f (·) exactly, this is a simple search problem. Of course, we cannot compute f (·) exactly since we do know the true mean utilities. For each assortment S, we can only obtain an estimate of f (S) by offering the assortment to a customer and observing her selection. The problem of finding an optimal assortment when the profit function is unknown is an instance of a well-known multiarmed bandit problem (see, for example, Lai and Robbins (1985); Auer et al. (2002) for more details). Given the assortments D1 , . . . , DL and a time horizon T , we wish to find a strategy φ = (j1 , j2 , . . . jT ) for trying the different assortments in order to maximize the total P expected profit Tt=1 E f Djt over T periods. In this case, jt ∈ {1, . . . , L} denotes the index of the assortment chosen in period t and it can depend only on the observed selections in the previous t − 1 periods. By formulating our problem as an instance of the multiarmed bandit, we can apply many of the standard algorithms that have been developed for this problem. One such algorithm is the UCB1 algorithm introduced by Auer et al. (2002). The algorithm maintains an average reward associated with each assortment. In each period, the algorithm computes an index associated with each assortment that corresponds to the sum of the average profit and an additional term that reflects how often the assortment has been tried. Then, the algorithm chooses the assortment with 23
the highest index. Auer et al. (2002) establish the following lower bound on total expected profit earned after T periods under the UCB1 algorithm: " T # X jt i E f (D ) ≥ max f D T − Fucb1 − Hucb1 ln T, t=1
1≤i≤L
(7)
where Fucb1 = 5L (max` w` ) (max1≤`≤L ∆` ) and Hucb1 = 8L(max` w` ) max`:∆` >0 1/∆` , and for each ` = 1, . . . , L, ∆` = maxi=1,...,L f (Di ) − f (D` ). P The above result shows that the average expected profit under the UCB1 algorithm, T1 Tt=1 E f Djt , converges to the maximum expected profits at the rate of O (log T /T ). However, we note that the constants Fucb1 and Gucb1 that appear in the performance bound for the UCB1 algorithm increase linearly with the number of assortments L. For our application, the number of assortments generated by the StaticMNL algorithm is of order C(N + C − 1) by Proposition 3. Thus, the performance and the error associated with the UCB1 algorithm will scale linearly with the number of products N . By exploiting the unimodality of the sequence of expected profits (Proposition 4), we can develop a search algorithm whose error scales only with log(N ) instead of N . When the number of products N is quite large, this can yield a significant improvement in the performance of our algorithm. Recall from Proposition 4 that given the sequence of the assortments hA0 , A1 , . . . , AK i generated by the StaticMNL algorithm, the sequence of profits hf (A0 ) , f (A1 ) , . . . , f (AK )i is unimodal. If we can evaluate f (Ai )’s exactly, we can apply the classical Golden Ratio or Fibonacci search algorithms (see, for example, Press et al. (1999)), which will find the maximum in O (log K) steps. For each assortment S, although we cannot evaluate f (S) exactly, we can offer the assortment S to a sample of customers and compute the average profits from the resulting sales. If the number of customers is sufficiently large, then the average profit should serve as a good proxy for the true expected value f (S). This suggests that we can adapt the classical Golden Ratio search to our setting by replacing the actual expected profit with the sample average. The resulting algorithm, which we refer to as the Sampling-Based Golden Ratio Search algorithm, is given below. We note that the algorithm requires as its input the minimum number of samples WGRS for each assortment. This represents the minimum number of customers to whom we must have offered an assortment and observe their selections. Sampling-Based Golden Ratio Search
INPUTS: A unimodal sequence of assortments D1 , D2 , . . . , DL , a time horizon T , and a minimum number of samples WGRS for each assortment. 24
INITIALIZATION: We perform the standard Golden Ratio search. Whenever we need to compare the values of two assortments Ds and Dt , we check to see if each assortment has been offered to at least WGRS independent customers. If not, then offer each of these assortments to the customers until we have at least WGRS observations for each assortment. If we have enough data, compare the two assortments based on the average profits obtained and proceed as in the classical Golden Search algorithm. DESCRIPTION: At the end of the initialization phase, we are left with a single assortment, offer that assortment until the end of the horizon. The idea of using the sample average as an approximation of the true expectation has been applied in many applications (see, for example, Shapiro (2003), Levi et al. (2005), and Swamy and Shmoys (2005) for notable successes). We present the algorithm and its analysis primarily for the sake of completeness. We will use this algorithm in the next section when we present an adaptive policy for generating a sequence of assortments. The following proposition establishes a performance bound associated with our sampling-based golden ratio research; the proof appears in Appendix E. Let a sequence of assortments D1 , . . . , DL and a time horizon T be given. Let β = mini,j:f (Di )6=f (Dj ) f (Di ) − f (Dj ) , and let Djt : 1 ≤ t ≤ T denote the sequence of assortments chosen by the Sampling-Based Golden Ratio Search algorithm. Proposition 8. If the associated sequence of profits f (D1 ), . . . , f (DL ) is unimodal and the number of samples WGRS associated with each assortment is chosen so that WGRS = 4 (ln T ) (max` w` ) /β 2 , then the total expected profit after T periods is given by " T # X jt i E f D ≥ max f (D ) T − Fg − Hg ln T, i=1,...,L
t=1
where the constants Fg and Hg are given by: Fg = 13 max w` ln(2L) and `
Hg =
34 (max` w` )2 ln(2L) . β2
Let us contrast the performance of our Sampling-Based Golden Ratio Search with the traditional multiarmed bandit UCB1 algorithm given in Equation (7). From Proposition 8, we observe that average expected profit over T periods under the golden ratio search converges to the optimal expected profit at the rate of O(log T /T ) as does the UCB1 algorithm. However, the constants Fg and Hg appearing in the performance bound increase only with logarithm of the number of assortments, as opposed to linear dependence in the UCB1 algorithm. In our application, this translates to a dependence of log(N C) in the error bound. When the number of products N is large, this can result in significant improvements in performance. In our numerical experiments, we adaptively adjust the value of β using our current estimate of the mean utilities. 25
4.2
An Adaptive Algorithm for Joint Parameter Estimation and Assortment Optimization
We now provide a detailed description of our dynamic policy for joint parameter estimation and assortment optimization, which we refer to as the Adaptive Assortment algorithm. As mentioned at the beginning of Section 4, our algorithm proceeds in cycles; each cycle consists of an exploration phase followed by an exploitation phase. In the exploration phase, we refine our estimates of the mean utilities by offering a collection of assortments to customers, observing their selections, and estimating the utilities from the sales data. Then, in the exploitation phase, we generate a sequence of assortments by applying the StaticMNL algorithm from Section 2.2, using our estimates of the mean utilities as a proxy for the true values. Given a sequence of assortments outputted by StaticMNL algorithm, we then search for the one that gives the highest expected profits using the Sampling-Based Golden Ratio Search algorithm introduced in the previous section. Once we identify the “optimal” assortment, we offer it to a pre-specified number of customers to complete the exploitation phase. A formal description of the algorithm is given below. As we mentioned earlier, we will assume for simplicity that in each period, we offer a single assortment to a single customer. Adaptive Assortment INPUTS: A vector of marginal profits w = (w1 , . . . , wN ) and a fixed capacity C. For any t ≥ 1, let Vt denote the number of periods associated with exploitation phase of the tth cycle. INITIALIZATION: Construct a collection of assortments E so that for each E ∈ E, |E| = C and for each i 6= j, there exists E ∈ E such that {i, j} ∈ E (see Equation (4) for more details). DESCRIPTION: The algorithm operates in cycle. Each cycle t ≥ 1 consists of two phases that are defined as follows: • Phase 1 (Exploration): This phase will consist of a total of |E| periods. For each assortment E ∈ E, we offer it to a single customer and let Ct (E) ∈ {0} ∪ E denote the selection made by the customer. Let µ bt (E) = µ bti (E) : i ∈ E denote the maximum likelihood estimate of the mean utilities based on the selections C1 (E), . . . , Ct (E) made in Phase 1 of the previous t cycles (see Equation (5)). Also, let θbt (E) = θbt (E) : i ∈ E denote the estimated selection probabilities. i
For any i 6= j, find E ∈ E such that {i, j} ⊆ E and compute the estimated intersection point IbEt (i, j) based on the estimated selection probabilities θbt (E). Also, estimate the ordering between µi and µj based on θbt (E) and θbt (E). i
j
26
b t = it , j t : 0 ≤ it < j t ≤ N denote an ordering of the estimated intersection points Let D ` ` ` ` IbEt (i, j)’s, and let σ bt denote the ordering of the estimated utilities. • Phase 2 (Exploitation): Apply the StaticMNL algorithm from Section 2.2 using as inputs b t and σ bt based on our estimated utilities computed from the the estimated orderings D D E bt , . . . , A bt exploration phase. Let Abt = A denote the sequence of assortments generated 1 Lt by the StaticMNL algorithm. Using the collection of assortments Abt as inputs, apply the Sampling-Based Golden Ratio Search algorithm for the next Vt periods. OUTPUTS: Let Zit : t ≥ 1, 1 ≤ i ≤ |E| + Vt denote the sequence of profits generated, where Zit denote the profit in period i of the tth cycle. Let w∗ = max`=1,...,N w` . The following theorem establishes an upper bound on the difference between the optimal expected profit Z ∗ and the average expected profit under the Adaptive Assortment algorithm for any choice of cycle length Vt . Theorem 9. Under Assumptions 1 and 2, for any M ≥ 1, the difference between the maximum expected profit Z ∗ and the average expected profit per period after M cycles under the Adaptive Assortment algorithm is given by PM P|E|+Vt t P E Zi A3 M A1 A2 M t=1 i=1 ∗ t=1 ln Vt + PM + , ≤ PM Z − PM PM |E| M + t=1 Vt t=1 Vt t=1 Vt t=1 Vt where A1 = 40w
∗
N C
2
a1 a1 ln ξ2 ξ2
C+1 X ∞
−a2 ξ 4 t
Vt e
and
A2 = 5
t=1
N C
2
w∗ + Fg
and
A3 = Hg ,
where the constants Fg and Hg are given in Proposition 8 while the constants a1 and a2 are given in Theorem 6, and ξ is defined in Equation (6). As an immediate corollary of the above theorem, we can choose a cycle length Vt so that Vt = exp a2 ξ 4 − · t for each t for some 0 < < a2 ξ 4 . Then, we have that ∞ X t=1
Vt e
−a2 ξ 4 t
≤
∞ X t=1
e−t ≤
1 , 1 − e−
which implies that A1 < ∞. Moreover, PM PM 4 2 a2 ξ 4 − M 2 t=1 a2 ξ − t t=1 ln Vt ≤ PM ≤ RM PM 4 4 t=1 Vt t=1 exp {(a2 ξ − ) t} 0 exp {(a2 ξ − ) t} dt 2 PM 2 2 4 t=1 Vt 2 a2 ξ − M log = = O , P M exp {(a2 ξ 4 − ) M } − 1 V t t=1 27
which implies that, for large values of T , the average expected profit of our Adaptive Assortment algorithm after T periods converges to Z ∗ at the rate of O (log T )2 /T . This result is summarized in the following corollary. Corollary 10. If we choose the cycle length Vt so that Vt = O exp
a2 ξ 4 − · t for all t, then
the average expected profit after T periods under the Adaptive Assortment algorithm converges to Z ∗ at the rate of O (log T )2 /T . We are now ready to give a proof of Theorem 9. Proof. Let A = hA0 , A1 , . . . , AK i denote the sequence assortments generated by the StaticMNL D E bt , . . . , A bt denote the algorithm under the true (yet unknown) mean utilities. Recall that Abt = A 1 Lt output of the StaticMNL algorithm when the inputs are the estimated mean utilities computed from the observations of t independent customers at the end of the tth exploration phase. The sequence Abt is a random variable that depends on the customer selections. It follows immediately from Corollary 7 that 2 o n a1 a1 C+1 −a2 ξ4 t N ln e . Pr Abt = A ≥ 1 − 40 C ξ2 ξ2 Since the profit earned in any period is always nonnegative, for each cycle t = 1, 2, . . . , M , |E|+Vt |E|+Vt n o n o X X t E Zi ≥ P Abt = A E Zit Abt = A ≥ P Abt = A (Vt Z ∗ − Fg − Hg ln Vt ) , i=1
i=|E|+1
where the final inequality follows from the fact that when the event Abt = A occurs, the sequence of profits is unimodal by Proposition 4 and the performance guarantee of the Sampling-Based Golden Ration Search (Proposition 8) holds. n o Using the lower bound on Pr Abt = A , we obtain |E|+Vt
X
E
Zit
∗
≥ Vt Z − Fg − Hg ln Vt − 40
i=1
N C
2
a1 a1 ln ξ2 ξ2
C+1
Vt Z ∗ e−a2 ξ
4t
a1 a1 C+1 4 = (Vt + |E|) Z − |E|Z − Fg − Hg ln Vt − 40w Vt e−a2 ξ t ln 2 2 ξ ξ 2 C+1 2 N a1 a1 N 4 ≥ (Vt + |E|) Z ∗ − 40w∗ ln 2 Vt e−a2 ξ t − 5 w∗ − Fg − Hg ln Vt 2 C ξ ξ C ∗
∗
∗
N C
2
where the last inequality follows from the fact that Z ∗ ≤ w∗ . Therefore, the average expected profit per period after M cycles is given by PM P|E|+Vt t P E Zi A3 M A1 A2 M t=1 i=1 ∗ t=1 ln Vt ≥ Z − PM − PM − PM PM |E| M + t=1 Vt t=1 Vt t=1 Vt t=1 Vt which is the desired result. 28
5.
Extensions and Future Research
In this section, we discuss extensions of results from the previous section and additional ideas for improving the performance of our proposed Adaptive Assortment algorithm. These extensions will be incorporated in our implementation of the algorithm in the next section. 5.1
Allowing Linear-In-Parameters Utility Expression
Thus far, our model assumes that the mean utility µi ’s are independent and must be estimated individually for each product. In many applications, we often assume that the mean utility µi is given by a linear combination of the features associated with product i, that is, for i = 1, 2, . . . , N , µi =
K X
β` φi,` ,
`=1
where for each i = 1, . . . , N , φi = (φi,1 , . . . , φi,K ) ∈ RK is a K-dimensional vector that represents the features of the ith product. Examples of product features might include its price, customer reviews, or brands. We assume the feature vectors φ1 , . . . , φN are known in advance and only need to estimate the coefficients β1 , . . . , βK . When K ≤ C, all of the results that we have developed in Sections 3 and 4 easily extend to this setting. We omit the details due to space constraints. 5.2
Allowing For the Same Utilities (Relaxing Assumption 1)
Recall that Assumption 1 requires that the utilities of all products and the corresponding intersection points are distinct. As mentioned before, this assumption is used primarily to simplify our discussion and to avoid degenerate cases. We emphasize that the correctness of the StaticMNL algorithm in Section 2.2 does not depend on Assumption 1. Also, by slightly perturbing the utility value or the marginal profit of each product, we can always guarantee that the problem instance satisfies this assumption. 5.3
Removing Bounds on the Utilities (Relaxing Assumption 2)
Assumption 2 requires that we know upper and lower bounds on the true mean utilities µi ’s. The reason for this assumption is purely theoretical. The sampling complexity bound derived in Theorems 5 and 6 requires that our likelihood function is bounded. In practice, this is almost never an issue since the likelihood function in our problem is strictly concave, and thus, the maximum is guaranteed to exist. By using a more complex argument and exploiting the concavity of the likelihood function, it is possible to relax Assumption 2. For more details, the reader is referred to Niemiro (1992).
29
6.
Numerical Experiments
We describe the numerical experiments that we perform to validate our proposed algorithms. In Section 6.1, we describe our motivation, the dataset used in our experiment, and our model of the mean utilities. Then, in Section 6.2, we consider the static CAPACITATED MNL optimization problem, and contrast the optimal assortment with the assortment computed under the greedy heuristic of Gallego et al. (2004) and Liu and Van Ryzin (2004). Then, in Section 6.3, we assume the mean utilities are unknown and apply our proposed Adaptive Assortment algorithm. 6.1
Motivation, Dataset, and Model
Before we can evaluate the performance of both the StaticMNL and Adaptive Assortment algorithms, we need to identify a set of products and specify their mean utilities. To help us understand the range of utility values that we might encounter in actual applications, we estimate the utilities using data on DVD sales at a large online retailer. We consider DVDs that are sold during a three-month period from July 1, 2005 through September 30, 2005. During this period, the retailer sold over 4.3 million DVDs, spanning across 51,764 DVD titles. To simplify our analysis, we restrict our attention to customers who have purchased DVDs that account for the top 33% of the total sales, and assume that each customer purchases at most one DVD. This gives us total of 1, 409, 261 customers in our dataset. We consider 200 products, corresponding to the 200 best-selling DVDs. These 200 DVDs account for about 65% of the total sales among our customers. We observe that the best-selling selling DVD in our dataset was purchased by only about 2.6% of the customers. In fact, among the top ten best-selling DVDs, each one was sold to only around 1.1% - 2.6% of the customers. Thus, only a small fraction of the customers purchased each DVD. We assume that the mean utility of each DVD is a linear combination of its attributes; that is, P for each i = 1, 2, . . . , 200, the mean utility µi of the ith DVD is given by: µi = β0 + K `=1 β` φi,` , where φi = (φi,1 , . . . , φi,K ) represents the features of the ith DVD. We set the utility associated with the option of no purchase to 0 (µ0 = 0). The attributes of each DVD that we consider are the average selling price and customer reviews. We obtain data on customer reviews of each DVD from Amazon.com web site through a publicly available interface via Amazon.com E-Commerce Services (http://aws.amazon.com). Each visitor at the Amazon.com web site can provide a review and a rating for each DVD. The rating is on a scale of 1 to 5, with 5 representing the most favorable review. Each review can be voted by other visitors as either “helpful” or “not helpful”. For each DVD, we consider all reviews up until June 30, 2005, and compute features such as the average
30
rating, the proportion of reviews that give a 5 rating, and the average number of helpful votes received by each review, and so on. We then perform statistical model selection and parameter estimation to determine the most relevant attributes and the corresponding coefficients. We use the software BIOGEME developed by Bierlaire (2003) to estimate the coefficients and perform model selections. The two most relevant attributes are the total number of votes received by the reviews of the product and the average number of “helpful” votes per review. We estimate that for each DVD i = 1, . . . , 200, µi = −4.82 + 3.74 × 10−5 × φ1,i + 5.74 × 10−3 × φ2,i
(8)
where φ1,i = Total Number of Votes Received by All Reviews of DVD i, φ2,i = Average Number of Helpful Votes Received by Each Review of DVD i All estimated coefficients (−4.82, 3.74×10−5 , and 0.00574) are statistically significant with p-values of 0.00, 0.03, and 0.06, respectively. We also checked for any correlation between the two product features, and found them to have statistically no correlation. Figure 4 shows a histogram of the estimated utility across the 200 best-selling DVDs, along with descriptive statistics. The average utility is approximately -4.67, with a maximum of -3.55 and a minimum of -4.72. We note that since the utility of the option of no purchase is set to 0 and the fraction of customers who purchase any one DVD is at most 2.6%, the mean utility of each DVD will be negative. Histogram of the Estimated Utility for 200 Best-Selling DVDs 35%
Average = -4.67 Std Dev = 0.18 Max = -3.55 Min = -4.82 Median = -4.72
30%
% of Total DVDs
25%
20%
15%
10%
5%
0% < -4.80
-4.75
-4.7
-4.65
-4.6
-4.55
-4.5
-4.45
-4.4
> 4.40
Estimated Utility
Figure 4: Histogram of the estimated utility based on Equation (8) for the 200 best-selling DVDs.
31
In our dataset, it turns out that the average selling price of each DVD is not a significant factor in determining customer purchases. We hypothesize that this occurs because for the 200 best-selling DVDs, the customers are not price sensitive and the sales volumes are driven primarily by customer reviews. Validating this hypothesis remains an open research problem. We emphasize that our model of the mean utility is quite simplistic and it is unlikely to capture details and factors that affect each customer’s purchasing decision. Rather, our goal is to obtain a rough estimate of the range of utilities values that one might encounter in actual applications. Developing a more sophisticated choice model is a subject of ongoing research and is beyond the scope of this paper. 6.2
Static Optimization
Using the estimated utility of each DVD computed from our sales data (see Equation (8)), we first consider the optimal assortment under the StaticMNL algorithm. In this setting, we set the marginal profit wi ’s of each DVD to the average selling price during the three-month period from which we collect the sales data. We then contrast the result with the assortment computed under the greedy algorithm of Gallego et al. (2004) and Liu and Van Ryzin (2004), which simply picks among the top C most expensive DVDs. Table 2 contrasts the optimal assortment computed by the StaticMNL and the greedy algorithms when we set the capacity to 10. In this case, the “greedy assortment” corresponds to the 10 most expensive DVDs. The optimal assortment yields about 14% (= (7.31-6.43)/6.43) increase in the expected profit. We note that the first seven DVDs in the optimal and the greedy assortments coincide. The remaining three DVDs in the optimal assortment consist of Star Wars Trilogy, Fraggle Rock (Season One), and The Muppet Show (Season One). Although their selling prices are less than the 10 most expensive DVDs, they are quite popular and have high estimated utilities. The Star Wars Trilogy DVD had over 7,300 reviews and these reviews received over 32,000 votes, with an average of 8.74 helpful votes per review. This represents the largest number of votes received by any DVD in our dataset. Fraggle Rock and The Muppet Show receive significantly less total votes (only about 487 and 901, respectively). However, a very large number of customers considered these reviews to be quite helpful, and these two DVDs garnered an average of 157 and 191 helpful votes per review, representing the two highest average number of helpful votes across all DVDs. These factors contribute to the selection of these three DVDs as part of the optimal assortment. Figure 5 shows the characteristics of the optimal assortment as the capacity C increases from 1 to 20. For each value of C, the greedy assortment corresponds to the C most expensive DVDs. The solid-circle line in Figure 5 shows the improvement in the expected profit (relative to the greedy
32
Greedy Assortment (Top 10 Most Expensive DVDs): Expected Profit = $6.43 Title Estimated Average Selling Utility Price ($) 24 - Seasons 1-3 -4.74 115.49 The Boston Red Sox 2004 World Series Collector’s Edition -4.74 92.03 Star Trek Enterprise - The Complete Second Season -4.70 91.67 Band of Brothers -4.47 79.35 The Lord of the Rings - The Motion Picture Trilogy -4.42 77.94 Six Feet Under - The Complete Fourth Season -4.71 70.12 The Sopranos - The Complete Fifth Season -4.70 64.97 Shelley Duvall’s Faerie Tale Theatre -The Complete Collection Gift Set -4.62 49.95 The O.C. - The Complete Second Season -4.73 48.97 Thundercats - Season One, Volume One -4.73 46.12 Optimal Assortment: Expected Profit = $7.31 Title Estimated Utility 24 - Seasons 1-3 -4.74 The Boston Red Sox 2004 World Series Collector’s Edition -4.74 Star Trek Enterprise - The Complete Second Season -4.70 Band of Brothers -4.47 The Lord of the Rings - The Motion Picture Trilogy -4.42 Six Feet Under - The Complete Fourth Season -4.71 The Sopranos - The Complete Fifth Season -4.70 Star Wars Trilogy -3.55 Fraggle Rock - Complete First Season -3.90 The Muppet Show - Season One -3.69
Average Selling Price ($) 115.49 92.03 91.67 79.35 77.94 70.12 64.97 45.45 31.49 27.99
Table 2: Expected profits under the greedy and the optimal assortments. assortment) as the capacity C increases. For small values of C, the increases in expected profit can be quite significant. The dash-diamond line in the figure shows the number of DVDs in the optimal assortments that are not in the greedy assortment. 6.3
Dynamic Optimization
In this section, we assume that the coefficients in underlying utility model in Equation (8) are not known in advance, and we must adaptively estimate the coefficients β0 , β1 and β2 based on observed sales and assortment decisions. Figure 6 shows the running average expected profit over time under our proposed adaptive policy, when averaged over 100 problem instances each with a capacity of 10. In this experiment, we assume that each period consists of a single customer and use the Sampling-Based Golden Ratio Search algorithm introduced in Section 4.1. We observe that the average expected profit approaches 90% of the optimal profit after we have offered assortments to about 30,000 customers, representing approximately two days of total sales (since we have 1, 409, 261 customers over a three-month period). In fact, the running average profit surpasses the expected profit of the greedy assortment after about one day of sales.
33
Chart1
Characteristics of the Optimal Assortment as the Capacity Increases Improvement in Expected Profit (Relative to the Greedy Assortment) Number of DVDs in the Optimal Assorment That are NOT in the Greedy Assortment 7 6
25.0%
5 20.0% 4 15.0% 3 10.0% 2 5.0%
Number of DVDs in the Optimal Assortment That are NOT in the Greedy Assortment
Improvement in Profit Relative to the Greedy Assortment (%)
30.0%
1
0.0%
0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 Capacity
Page 1
Figure 5: Improvements in the expected profit (relative to the greedy assortment) as the capacity increases, and the number of DVDs in the optimal assortment that do not appear in the greedy assortment.
7.
Conclusion
We study a dynamic assortment optimization problem under a multinomial logit choice model and capacity constraints. We present an adaptive algorithm that combines parameter estimation and assortment optimization, and analyze its convergence properties. Numerical experiments on sales data from an online retailer indicate that the proposed policy performs well. There are several interesting future research directions. For example, it would be interesting to consider more realistic choice models. It is also interesting to explore how we can incorporate additional constraints in the problem formulation, such as the total capital investment or inventory constraints.
Acknowledgement The authors would like to thank Candi Yano for her helpful comments and suggestions on the earlier draft of our manuscript, which inspires us to consider the multinomial logit choice model. We are grateful for all of her help. We would like to thank Mike Todd for insightful comments on linear algebra and for pointers to implementations of the Golden Ratio Search and Newton-Raphson algorithms, and Yinyu Ye for helpful discussions on the problem. We also would like to thank Miao Song for all of her help with the BIOGEME software. This research is supported in part by the National Science Foundation through grants DMS-0732196, CMMI-0621433, and CMMI-0727640.
34
Run. Avg. Expected Profit Under the Adaptive Assortment Algorithm for 200 Best Selling DVDs, Capacity = 10, 100 Problem Instances (dash lines around the solid line represent 95% confidence interval) 94.0%
Run. Avg. Profit (% of Optimal)
93.0% 92.0% 91.0% 90.0% 89.0%
Expected Profit Under Greedy Assortment 88.0% 87.0%
Avg. # of Customers Per Day ≈ 15,000
86.0% 85.0% 0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Number of Customers (thousands)
Figure 6: Running average expected profit under the Adaptive Assortment algorithm.
References Agrawal, R. 1995. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability 27:1054–1078. Alexouda, G., and K. Paparizos. 2001. A Genetic Algorithm Approach to the Product Line Design Problem Using the Seller’s Return Criterion: An Extensive Comparative Computational Study. European Journal of Operational Research 134:165–178. Anantharam, V., P. Varaiya, and J. Walrand. 1987a. Asymptotically efficient adaptive allocation rules for the multi-armed bandit problem with multiple plays, Part I: I.I.D. rewards. IEEE Transactions on Automatic Control AC-32 (11): 968–976. Anantharam, V., P. Varaiya, and J. Walrand. 1987b. Asymptotically efficient adaptive allocation rules for the multi-armed bandit problem with multiple plays, Part II : Markovian rewards. IEEE Transactions on Automatic Control AC-32 (11): 977–982. Anderson, S., A. de Palma, and J. F. Thisse. 1992. Discrete choice theory of product differentiation. MIT Press. Auer, P., N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms. Auer, P., N. Cesa-Bianchi, and F. P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47:235–256. Ben-Akiva, M., and S. Lerman. 1985. Discrete choice analysis: Theory and application to travel demand. MIT Press.
35
Berry, D. A., R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp. 1997. Bandit Problems With Infinitely Many Arms. The Annals of Statistics 25 (5): 2103–2116. Bierlaire, M. 2003. BIOGEME: A free package for the estimation of discrete choice models. In Proceedings of the 3rd Swiss Transportation Research Conference. Bultez, A., E. Gijsbrechts, P. Naert, and P. vanden Abeele. 1989. Asymmetric Cannibalism in Retail Assortments. Journal of Retailing 65 (2): 153–192. Bultez, A., and P. Naert. 1988. SHARP: Shelf Allocation for Retailers Profit. Marketing Science 7:211–231. Cachon, G., C. Terwiesch, and Y. Xu. 2005. Assortment Planning in the Presence of Consumer Search. Manufacturing and Service Operations Management 7:330–346. Caro, F., and J. Gallien. 2006. Dynamic Assortment with Demand Learning for Seasonal Consumer Goods. To appear in Management Science. Chen, K. D., and W. H. Hausman. 2001. Technical Note - Mathematical Properties of the Optimal Product Line Selection Problem Using Choice-Based Conjoint Analysis. Management Science 46 (2): 327–332. Chong, J. K., T. H. Ho, and C. S. Tang. 2001. A modeling framework for category assortment planning. Manufacturing Service Operation Management 3 (3): 191–210. Corstjens, M., and P. Doyle. 1981. A Model for Optimizing Retail Space Allocations. Journal of Marketing 27:822–833. Corstjens, M., and P. Doyle. 1983. A Dynamic Model for Strategically Allocating Retail Space. Journal of the Operational Research Society 34 (10): 943–951. Cover, T. M., and J. A. Thomas. 2006. Elements of information theory. Wiley. Dobson, G., and S. Kalish. 1993. Heuristics for Pricing and Positioning a Product Line Using Conjoint and Cost Data. Management Science 39:160–175. Dreze, X., S. J. Hoch, and M. E. Purk. 1994. Shelf Management and Space Elasticity. Journal of Retailing 70:301–326. Eckstein, Z., and K. I. Wolpin. 1989. The Specification and Estimation of Dynamic Stochastic Discrete Choice Models: A Survey. The Journal of Human Resources 24 (4): 562–598. Fligler, A., G. E. Fruchter, and R. Winer. 2006. Optimal Product Line Design: A Genetic Algorithm Approach to Mitigate Cannibalization. To appear in Journal of Optimization Theory and Applications. Gallego, G., G. Iyengar, R. Phillips, and A. Dubey. 2004. Managing Flexible Products on a Network. Working Paper. Gaur, V., and D. Honhon. 2006. Assortment Planning and Inventory Decisions Under a Locational Choice Model. Management Science 52 (10): 1528–1543. Green, P. E., and A. M. Krieger. 1985. Models and Heuristics for Product Line Selection. Marketing Science 4 (1): 1–19. Haussler, D. 1992. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications. Information and Computation 100.
36
Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58:13–30. Honhon, D., V. Gaur, and S. Seshadri. 2006. Assortment Planning and Inventory Management Under Dynamic Substitution. working paper. Hotz, V. J., and R. A. Miller. 1993. Conditional Choice Probabilities and the Estimation of Dynamic Models. The Review of Economic Studies 60 (3): 497–529. Keane, M. P., and K. I. Wolpin. 1994. The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence. The Review of Economics and Statistics 76 (4): 648–672. Kleinberg, R. 2004. Nearly Tight Bounds for the Continuum-Armed Bandit Problem. In Advances in Neural Information Processing Systems. Kleinberg, R. 2005. Online decision problems with large strategy sets. Ph.D. Dissertation, MIT. Kohli, R., and R. Sukumar. 1990. Heuristics for Product-Line Design Using Conjoint Analysis. Management Science 36 (12): 1464–1478. Kok, A. G., and M. Fisher. 2004. Demand Estimation and Assortment Optimization Under Substitution: Methodology and Application. To appear in Operations Research. Kok, A. G., M. Fisher, and R. Vaidyanathan. 2006. Assortment planning: Review of literature and industry practice. In Retail Supply Chain Management: Kluwers. Lai, T. L. 1987. Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. The Annals of Statistics 15 (3): 1091–1114. Lai, T. L., and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6:4–22. Levi, R., R. O. Roundy, and D. B. Shmoys. 2005. Computing Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models. Technical Report 1427, School of OR&IE, Cornell University. To appear in MOR. Liu, Q., and G. Van Ryzin. 2004. On the Choice-Based Linear Programming Model for Network Revenue Management. To appear in MSOM. Mahajan, S., and G. V. Ryzin. 2001. Stocking Retail Assortments under Dynamic Substitution. Operations Research 49:334–351. Mahajan, S., and G. van Ryzin. 1998. Retail inventories and consumer choice. In Quantitative Models for Supply Chain Management, ed. S. Tayur, R. Ganeshan, and M. Magazine, 491–551: Kluwer Academic Publishers. McBride, R. D., and S. Zufryden. 1988. An Integer Programming Approach To the Optimal Product Line Selection Problem. Marketing Science 7:126–140. McFadden, D. 1974. Conditional Logit Analysis of Qualitative Choice Behavior. In Frontiers in Econometrics, 105–142: Academic Press.
37
Megiddo, N. 1979. Combinatorial Optimization with Rational Objective Functions. Mathematics of Operations Research 4 (4): 414–424. Nair, S. K., L. S. Thakur, and K. Wen. 1995. Near Optimal Solutions For Product Line Design And Selection: Beam Search Heuristics. Management Science 41 (5): 767–785. Niemiro, W. 1992. Asymptotics for M-Estimators Defined by Convex Minimization. The Annals of Statistics 20 (3): 1514–1533. O’Connell, P. 2005. Shaping Cars for the Chinese. Business Week . November 21. Available at http: //www.businessweek.com/magazine/content/05_47/b3960010.htm. Paap, R., and P. H. Franses. 2000. A Dynamic Multinomial Probit Model for Brand Choice with Different Long-Run and Short-Run Effects of Marketing-Mix Variables. Journal of Applied Econometrics 15 (6): 717–744. Press, W. H., S. A. Teukolsky, and W. T. V. et al.. 1999. Numerical recipes in c, the art of scientific computing. Cambridge University Press. Ryzin, G. V., and S. Mahajan. 1999. On the Relationship Between Inventory Costs and Variety Benefits in Retail Assortments. Management Science 45:1496–1509. Shapiro, A. 2003. Stochastic Programming. In Handbook in Operations Research and Management Science: Elsevier. Smith, S. 2006. Optimizing Retail Assortments for Diverse Preferences. working paper. Smith, S. A., and N. Agrawal. 2000. Management of Multi-item Retail Inventories Systems with Demand Substitution. Operations Research 48:50–64. Swamy, C., and D. B. Shmoys. 2005. Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization. In Proceedings of the 46th Annual IEEE Symposium on the Foundations of Computer Science. Yano, C. A., and G. Dobson. 1998. Profit Optimizing Product Line Design, Selection and Pricing with Manufacturing Cost Considerations: A Survey. In Product Variety Management: Research and Advances, ed. T.-H. Ho and C. Tang, 145–176: Kluwer Academic Publisher. Zimmerman, A. 2006. Thinking Local. To Boost Sale, Wal-Mart Drops One-Size-Fits-All Approach. Wall Street Journal . September 7, 2006.
A.
Proof of Proposition 3
The proof of Proposition 3 makes use of the following four lemmas which give us additional insight into the properties of the sets generated in each iteration of the StaticMNL(v, w, C) algorithm. The first lemma (whose proof appears in Appendix A.1) provides a simple characterization of the ordering among intersection points among any three lines. Lemma A.1. Under Assumption 1, for all 0 ≤ i < j < k ≤ N , one of the following conditions must hold: either I(i, j) < I(i, k) < I(j, k) or I(j, k) < I(i, k) < I(i, j).
38
The result of Lemma A.1 allows us to establish the following characterization of the sequence of permutations generated by the StaticMNL(v, w, C) algorithm. We note that given a permutation τ : {1, . . . , N } → {1, . . . , N }, x is adjacent to y under τ if τ −1 (x) − τ −1 (y) = 1. The proof of this lemma appears in Appendix A.2.
Lemma A.2. Let σ 0 , . . . , σ K denote the sequence of permutations generated by the StaticMNL(v, w, C) algorithm. Then, under Assumption 1, for each t = 1, 2, . . . , K, either σ t = σ t−1 or σ t is obtained by transposing two adjacent items under σ t−1 . The next lemma shows that if a set Gt (corresponding to the first C elements in the permutation σ t ) is distinct from all the previous sets, then one of its elements must be strictly smaller. The proof of this result appears in Appendix A.3.
Lemma A.3. Let G0 , . . . , GK denote the sequence of sets generated by the StaticMNL(v, w, C) algorithm. Under Assumption 1, for each t = 1, 2, . . . , K, if Gt 6= Gt−1 , then Gt is obtained from Gt−1 by t−1 t−1 transposition between jt = σC and it = σC+1 with 0 < it < jt , that is, Gt = Gt−1 \ {jt } ∪ {it } and P P `∈Gt ` < `∈Gt−1 `. The next lemma shows that the size of the assortments generated by the StaticMNL algorithm is nonincreasing and establishes the number of distinct assortments with sizes less than C. The proof appears in Appendix A.4.
Lemma A.4. Let A0 , . . . , AK denote the sequence of assortments generated by the StaticMNL(v, w, C) algorithm. Then, under Assumption 1, for each s < C, there is exactly one distinct assortment of size s. We are now ready to give a proof of Proposition 3.
Proof. Let A0 , . . . , AK denote the sequence of assortments generated by the StaticMNL(v, w, C) algorithm. By Lemma A.4, we know that there are exactly C − 1 distinct non-empty assortments of size C − 1 or less. Thus, it suffices to count the number of distinct assortments of size C. Note that if At is an assortment of size C, it must be the case that At = Gt . Therefore, the number of distinct assortments of size C is bounded above by the number of distinct subsets among G0 , G1 , . . . , GK . Under Assumption 1, we know that, among h1 (·), . . . , hN (·), each of the N2 pairs of lines will intersect each other. Moreover, by Assumption 1, we have that σ 0 = (N, N − 1, . . . , 2, 1), corresponding to the order of hi (λ)’s at λ = −∞, and σ K = (1, 2, . . . , N − 1, N ) for λ = +∞. Note that X `∈G0
`−
X
` =
{N + (N − 1) + · · · + (N − C + 1)} − {1 + 2 + · · · + C}
`∈GK
=
N C − 2(1 + 2 + · · · + C − 1) − C = N C − C(C − 1) − C = C(N − C).
P P By Lemma A.3, whenever Gt is distinct from Gt−1 , the total value `∈Gt ` is strictly less than `∈Gt−1 `. Thus, in addition to G0 , there can be at most C(N − C) distinct subsets of Gt ’s. Therefore, the number of distinct subsets among G0 , G1 , . . . , GK is at most C(N − C) + 1. Thus, the maximum number of distinct non-empty assortments is at most C(N − C) + 1 + (C − 1) = C(N − C + 1), which is the desired result.
A.1
Proof of Lemma A.1
39
Line : hk (·)
hj (·) − −Option2
I(j, k)
hj (·) − −Option1
Line : hi (·)
I(i, j)
I(i, j) I(i, k)
I(j, k)
Figure 7: A geometric proof of Lemma A.1. Proof. As shown in Figure 7, the proof of this lemma follows from a simple geometric intuition. By Assumption 1, we have that vi < vj < vk , which implies that the line hi (·) will intersect with hk (·). Since vj is between vi and vk , the are two possible options for line hj (·) as shown in the two dash lines in Figure 7. In both cases, we observe that I(i, k) is always between I(i, j) and I(j, k), giving the desired results. A more algebraic proof is also straightforward. We omit the details due to space constraints.
A.2
Proof of Lemma A.2
Proof. We will prove the lemma by induction on t. Consider the case when t = 1. By Assumption 1, we know that σ 0 = (N, N − 1, . . . , 2, 1). We want to show that either σ 1 = σ 0 or σ 1 is obtained by transposing two adjacent products. Suppose that σ 1 6= σ 0 . It then follows from the definition that σ 1 is obtained from σ 0 by a transposition of i1 and j1 , where I (i1 , j1 ) corresponds to the smallest intersection point. Since this is the first intersection point, it follows from Lemma A.1 that we must have j1 = i1 + 1, which is the desired result. Now, suppose that the result holds for ` = 1, 2, . . . , t; that is, for each s = 1, . . . , t, σ s is either equal to σ or is obtained from σ s−1 by transposing two adjacent products. We will now establish the result for σ t+1 via proof by contradiction. s−1
Suppose on the contrary that σ t+1 6= σ t and σ t+1 is obtained by σ t by transposing it and jt , which are NOT adjacent under σ t . This implies there is another product m that is between it and jt under σ t , that t is, it = σut , m = σvt , and jt = σw where either 1 ≤ u < v < w ≤ N or 1 ≤ w < v < u ≤ N . We first claim that we cannot have 1 ≤ u < v < w ≤ N . The follows because we know from Assumption 1(a) that σ 0 = (N, N − 1, . . . , 2, 1). If 1 ≤ u < v < w ≤ N holds, this implies that we have σ t = (. . . , it , . . . , m, . . . , jt , . . .) . Since it < jt and the transpositions in all of the previous iterations involve adjacent items by induction, it must be the case that it and jt have switched places once before, that is, we must have already encountered the intersection point I(it , jt ) in the earlier iterations. Contradiction! So, we have 1 ≤ w < v < u ≤ N , which implies that σ t = (. . . , jt , . . . , m, . . . , it , . . .) . By construction, we know that it < jt . Thus, there are three cases to consider for the value of m. Case 1: m < it < jt . In this case, m is smaller than it but appears earlier in the ordering σ t . Since σ 0 = (N, N − 1, . . . , 2, 1) and all previous transpositions involve adjacent items by induction, it must be the case that it and m interchange their positions in the earlier iteration, implying that I(m, it ) < I(it , jt ). It
40
thus follows from Lemma A.1 that I(m, it ) < I(m, jt ) < I(it , jt ), implying that we must have already encountered the intersection point I(m, jt ) before this iteration. By induction, this means that m and jt should have interchanged their places and we must have m before jt (note that m ≥ 1). This contradicts our definition of σ t ! Case 2: it < m < jt . In this case, it follows from Lemma A.1 that either I(it , m) < I(it , jt ) < I(m, jt )
OR I(m, jt ) < I(it , jt ) < I(it , m).
We will show that either one of the above condition will lead to a contradiction. Suppose that I(it , m) < I(it , jt ) < I(m, jt ). This implies that we must have encountered the intersection point I(it , m) before the current iteration. Therefore, by induction, m and it should have switched places and we must have m after it in σ t . This again contradicts our definition of σ t ! Now, suppose that I(m, jt ) < I(it , jt ) < I(it , m). Then, we must have encountered the intersection point I(m, jt ) before the current iteration. Therefore, by induction, m and jt should have switched places and we must have m before jt in σ t . This again contradicts our definition of σ t ! Case 3: it < jt < m. The proof for this case is similar to Case 1 and we omit the details. Thus, all three cases lead to contradictions. Therefore, it must be the case that σ t+1 is obtained from σ t by transposing two adjacent products. This completes the induction.
A.3
Proof of Lemma A.3
Proof. By definition, if Gt 6= Gt−1 , it must be the case that we encounter an intersection point I (it , jt ) with 0 < it < jt . In this case, σ t is obtained from σ t−1 by transposition of it and jt . Moreover, we know from Lemma A.2 that it is adjacent to jt . Note that Gt and Gt−1 correspond to the first C elements for the ordering σ t and σ t−1 , respectively. Thus, in order for Gt to be different from Gt−1 , it must be the case that t−1 t−1 t−1 t−1 either 1) σC = jt and σC+1 = it , or 2) σC = it and σC+1 = jt . We will first show that option 2) is not feasible. Recall that under Assumption 1, we have σ 0 = (N, N − 1, . . . , 2, 1). Since it < jt and every transposition only occur between adjacent items (by Lemma t−1 t−1 A.2), if σC = it and σC+1 = jt , then it must be the case that it and jt have interchanged places before; that is, we have already encountered the intersection point I (it , jt ) in the earlier iterations. This is a contradiction! t−1 t−1 Therefore, we can only have σC = jt and σC+1 = it . In this case, we have Gt = Gt−1 \ {jt } ∪ {it }, which P P implies that `∈Gt ` − `∈Gt−1 ` = it − jt < 0, which is the desired result.
A.4
Proof of Lemma A.4
Proof. By definition, A0 = C. Let θ denote the smallest index such that Aθ < C. Note that by definition, |As | = C for all 0 ≤ s < θ, which implies that As = Gs . Thus, by Lemma A.3, if As 6= As−1 , we must have As = As−1 \ {jt } ∪ {it }. To complete the proof of Lemma A.4, it suffices to show the following two results: (a) Aθ = Aθ−1 \ {jθ } (b) For any 1 ≤ t ≤ K, if At−1 < C, then either At = At−1 or At = At−1 \ {jt } We will first prove part (a). The set Aθ is created when we encounter the θth intersection point I (iθ , jθ ). We claim that we must have iθ = 0. Suppose on the contrary that we have 0 < iθ < jθ . In this case,
41
B θ = B θ−1 . Thus, for Aθ < C = Aθ−1 , it must be the case that Gθ 6= Gθ−1 . It follows from Lemma θ−1 θ−1 A.3 that Gθ is obtained from Gθ−1 by transposition of jθ = σC and iθ = σC+1 with 0 < iθ < jθ . Since θ A < C, it follows that B θ 3 iθ , which implies that we must have already encountered the intersection I (0, iθ ) earlier. By Lemma A.1, I (0, iθ ) < I (0, jθ ) < I (iθ , jθ ) , which implies that the intersection point I(0, jθ ) appeared before the current iteration, and thus, B θ−1 3 jθ , which implies that Aθ−1 < C. Contradiction! Therefore, iθ = 0. Then, we know that σ θ = σ θ−1 , Gθ = Gθ−1 , and B θ = B θ−1 ∪ {jθ }. Since θ−1 A = C > Aθ , Gθ−1 ∩ B θ−1 = ∅ and Gθ ∩ B θ 6= ∅. Since only jθ is added to the set B θ−1 , we have Gθ ∩ B θ = {jθ }, and therefore, Aθ = Gθ \ B θ = Gθ \ {jθ } = Gθ−1 \ {jθ } = Aθ−1 \ {jθ }, which is the desired result. To prove part (b), consider At−1 such that At−1 = r < C for some r. Note that by the def t−1 inition, σ t−1 = σ1t−1 , .. . , σN represent the ordering of the lines h1 (·), . . . , hN (·) during the interval I (it−1 , jj−1 ) , I (it , jt ) with hσt−1 (I (it , jt )) ≥ hσt−1 (I (it , jt )) ≥ · · · ≥ hσt−1 (I (it , jt )) 1
2
(9)
N
t−1 t−1 Since Gt−1 = σ1t−1 , . . . , σC and At−1 = r < C, σr+1 ∈ B t−1 , which implies that hσrt−1 (I (it , jt )) ≥ 0 > hσt−1 (I (it , jt )) ≥ · · · ≥ hσt−1 (I (it , jt )) . r+1
N
t−1 t−1 t−1 Since the functions hi (·)’s are strictly decreasing, we have B t−1 = σr+1 , σr+2 , . . . , σN .
We will now show that either At = At−1 or At = At−1 \{jt }. Consider the tth intersection point I (it , jt ). There are two cases to consider: it = 0 and it ≥ 1. Suppose that it = 0. Then, we claim that jt = σrt−1 . Since t−1 t−1 t−1 B t−1 = σr+1 , σr+2 , . . . , σN , we know that jt ∈ σ1t−1 , . . . , σrt−1 . If, on the contrary, jt = σkt−1 where k < r, this means that I 0, σkt−1 < I 0, σrt−1 ; that is, the value associated with the line hσrt−1 remains nonnegative. However, by the ordering in (9), it must be the case that 0 > hσt−1 (I (it , jt )) ≥ hσrt−1 (I (it , jt )). k Contradiction! Therefore, jt = σrt−1 , which implies that At = At−1 \ σrt−1 , which is the desired result. On the other hand, suppose that it ≥ 1. In this case, we have B t = B t−1 . We claim that we must c either have {it , jt } ⊂ B t or {it , jt } ⊂ (B t ) ; that is, either both elements are in B t or both or not. This will show that At = At−1 , which is the desired result. To prove this, note that since 0 < it < jt , it follows from Lemma A.1 that either I(0, it ) < I(0, jt ) < I(it , jt ) or I(it , jt ) < I(0, jt ) < I(0, it ). If I(0, it ) < I(0, jt ) < I(it , jt ) holds, then the intersection points I(0, it ) and I(0, jt ) appear in the earlier iterations, implying that {it , jt } ⊂ B t . On the other hand, if I(it , jt ) < I(0, jt ) < I(0, it ), then it ∈ / B t and jt ∈ / Bt.
B.
Proof of Proposition 4
Let us introduce the following notation that will be used throughout the rest of this section. Let the function g : R → R be defined by: ( ) X g(λ) = max v` (w` − λ) : X ⊆ {1, 2, . . . , N } and |X| ≤ C . `∈X
The following lemma establishes important properties of the function g and its relationship with the optimal profit Z ∗ .
42
Lemma B.1. Under Assumption 1, we have the following properties. • The function g is piecewise linear, non-increasing, convex, and continuous. h P • For any λ ∈ R, if λ ∈ I (it , jt ) , I (it+1 , jt+1 ) for some t = 0, . . . , K, then g(λ) = `∈At v` (w` − λ) • Z ∗ = max {λ : g(λ) ≥ λ} Proof. Observe that g(λ) can be expressed as a solution of the following linear program: ( g(λ) = max
N X `=1
z` v` (w` − λ) :
N X
) zi ≤ C
and
0 ≤ zi ≤ 1 for all i
i=1
The first part of Lemma B.1 follows immediately from standard results in linear programming. To establish the second part of the lemma, note that to compute g(λ), we would order the products in a descending order of hi (λ) = vi (wi − λ) and choose the first C product with non-negative values. By definition, we know that within the interval [I (it , jt ) , I (it+1 , jt+1 )), none of the N + 1 lines h0 (·), h1 (·), . . . , hN (·) intersects each other. Thus, for λ ∈ [I (it , jt ) , I (it+1 , jt+1 )), the ordering of hi (λ) is given by σ t and the first C items with non-negative value is simply At . This is exactly what we need to show. The final part of Lemma B.1 follows from the following argument. By definition of the profit function P f , there exists X ⊂ {1, . . . , N } with |X| ≤ C such that f (X) ≥ λ if and only if i∈X vi (wi − λ) ≥ λ, P which occurs if and only if maxA:|A|≤C i∈A vi (wi − λ) ≥ λ, which is equivalent to g(λ) ≥ λ. Therefore, Z ∗ = max {λ : g(λ) ≥ λ} The next lemma provides an expression for the change in the profit for the sequence of assortments generated by the StaticMNL algorithm. For x ∈ R, we let +1, if x > 0, sign(x) = 0, if x = 0, −1, if x < 0
Lemma B.2. Let A0 , . . . , AK denote the sequence of assortments generated by the StaticMNL(v, w, C) algorithm. Then, under Assumption 1, for each t = 1, 2, . . . , K, if At 6= At−1 , sign f At − f At−1 = sign (g (I (it , jt )) − I (it , jt )) .
43
Proof. There are two cases to consider: |At | = C and |At | < C. Suppose that |At | = C. It then follows from Lemma A.4 that At = At−1 \ {jt } ∪ {it } with 1 ≤ it < jt . Let X = At−1 \ {it , jt }. Then, we have f At − f At−1 P P w` v` + wjt vjt `∈X w` v` + wit vit P − `∈XP = 1 + `∈X v` + vit 1 + `∈X v` + vjt P P 1 + `∈X v` (wit vit − wjt vjt ) − `∈X w` v` (vit − vjt ) + (wit − wjt ) vit vjt P P = 1 + `∈X v` + vit 1 + `∈X v` + vjt (wit vit −wjt vjt ) (wit −wjt )vit vjt P P (vit − vjt ) · 1 + `∈X v` w v − + `∈X ` ` vit −vjt vit −vjt P P = 1 + `∈X v` + vit 1 + `∈X v` + vjt P P (vit − vjt ) · 1 + `∈X v` I (it , jt ) − `∈X w` v` − hit (I (it , jt )) P P = 1 + `∈X v` + vit 1 + `∈X v` + vjt P − (vit − vjt ) · −I(it , jt ) + hit (I (it , jt )) + `∈X v` (w` − I (it , jt )) P P = 1 + `∈X v` + vit 1 + `∈X v` + vjt P − (vit − vjt ) · −I(it , jt ) + `∈At h` (I(it , jt )) P P = , 1 + `∈X v` + vit 1 + `∈X v` + vjt where the fourth equality follows from the fact that I (it , jt ) =
(wit vit − wjt vjt ) vi t − vj t
and hit (I (it , jt )) =
− (wit − wjt ) vit vjt vit − vjt
Note that the denominator is always positive. Also, since it < jt , it follows from Assumption 1 that P − (vit − vjt ) is always positive. Thus, the sign of f (At )−f At−1 is determined by −I(it , jt )+ `∈At h` (I(it , jt )), which is equal to −I(it , jt ) + g (I (it , jt )) by Lemma B.1. This is the desired result. A similar argument applies to the case when |At | < C. We are now ready to give a proof of Proposition 4. Proof. Let m∗ ∈ {0, 1, . . . , K} denote the largest index such that I (i1 , j1 ) ≤ · · · ≤ I (im∗ , jm∗ ) ≤ Z ∗ < I (im∗ +1 , jm∗ +1 ) < · · · < I (iK , jK ) . Consider any t = 1, . . . , m∗ , if At 6= At−1 , then it follows from Lemma B.2 that sign f At − f At−1 = sign (g (I(it , jt )) − I(it , jt )) ≥ 0, where the last inequality follows from the fact that I (it , jt ) ≤ Z ∗ and Z ∗ = max {λ : g(λ) ≥ λ}. This implies that g (I(it , jt )) ≥ I(it , jt ). Thus, the sequence hf (At ) : t = 1, 2, . . . , Ki is non-decreasing for t = 0, 1, . . . , m∗ . On the other hand, consider any t ≥ m∗ + 1. In this case, we have I (it , jt ) > Z ∗ , which implies that g (I(it , jt )) < I(it , jt ). Using the same arguments as above, we can show that sign f At − f At−1 ≤ 0, and thus, the sequence hf (At ) : t = 1, 2, . . . , Ki is non-increasing for t ≥ m∗ + 1. So, we have that ∗ ∗ ∗ f A0 ≤ f A1 ≤ · · · ≤ f Am and f Am ≥ f Am +1 ≥ · · · ≥ f AK , which is the desired result.
44
C.
Proof of Theorem 5
The proof of Theorem 5 relies on the following series of lemmas. Let us first introduce the following notation. For each E ∈ E, let fE : ({0} ∪ E) × RC → R be defined as follows: for each ` ∈ {0} ∪ E and xE ∈ RC , ! 1 1 = ln P , P fE (`, xE ) = ln ex` 1 + k∈E exk x ` exk e k∈{0}∪E
where we define x0 = 0. Recall that the utility U` that each customer assigns to the option ` is given by U` = µ∗` +ζ` , where µ∗` = 0 and ζ` ’s are i.i.d. Gumbel random variables with mean zero and scaling parameter one. Let the random variable C (E) denote the option that has the maximum utility given the assortment E; that is, C(E) = arg max`∈E U` . For each xE ∈ RC , let g (xE ) be defined by gE (xE )
X
:= E [fE (C(E), xE )] =
P {C(E) = `} fE (`, xE )
`∈{0}∪E
=
eµ`
X P `∈{0}∪E
k∈{0}∪E
eµk
ln
!
1 ex`
P
k∈{0}∪E
exk
The following standard lemma establishes a relationship between the true mean utilities µi ’s and the function g. For the proof, the reader is referred to Cover and Thomas (2006). Lemma C.1. For each E ∈ E, gE (µE ) = minxE ∈RC gE (xE ), and for each xE ∈ RC , X p eµ` ex` P − P ≤ 2 gE (xE ) − gE (µE ). x k k∈{0}∪E eµk k∈{0}∪E e `∈{0}∪E
The next lemma provides an upper bound on the difference between E [fE (C(E), µ b)] and E [fE (C(E), µ∗ )] for each assortment E ∈ E. Before we proceed to the lemma, we need the following definition (see Haussler (1992) for more details). Definition 1. For any T ⊂ Rd , T is full if there exists some translation of T that intersects all 2d orthants of Rd , that is, there exists x ¯ ∈ Rd such that {sign(¯ x + y) : y ∈ T } = {0, 1}d where for each vector z ∈ Rd , sign(z) = (sign(z1 ), . . . , sign(zd )) and sign(·) is the usual sign function which is 1 if the argument is positive and zero otherwise. Definition 2. Let F be a family of functions from a set Z to R. For any sequence z¯ = (z1 , . . . , zd ) of points in Z, let F|¯z = {(f (z1 ), . . . , f (zd )) : f ∈ F} ⊂ Rd . The pseudo-dimension of F is the largest d such that there exists a sequence z¯ of d points in Z and F|¯z is full. For our application, we will be interested in the collection of functions GE defined as follows: GE ≡ f (·, xE ) : {0, i, j} → R+ xE ∈ RC ∩ [−M, M ]C , where M is the constant given in Assumption 2. Note that each function in GE is indexed by a C-dimensional vector xE . The following lemma shows that the pseudo-dimension of GE is at most C + 1. Lemma C.2. For each E ∈ E, the pseudo-dimension of GE is at most C + 1.
45
Proof. To establish this result, it suffices to show that for any (z1 , z2 , . . . zC+2 ) ∈ ({0} ∪ E) vectors o n f (z1 , xE ), f (z2 , xE ), . . . , f (zC+2 , xE ), : xE ∈ RC ∩ [−M, M ]C ⊂ RC+2
C+2
, the set of
cannot be full; that is, no translation of this set can intersect all orthants in RC+2 . To prove this, note that since the set {0} ∪ E has C + 2 elements, at least two of the zs ’s must coincide. By relabeling if necessary, we can assume without loss of generality that z1 = z2 = ` for some n ` ∈ {0} ∪ E. However, since z1 = z2 , we have o that f (z1 , xE ) = f (z2 , xE ) for all xE ∈ RC . Thus, the set
f (z1 , xE ), f (z2 , xE ) : xE ∈ RC ∩ [−M, M ]C
forms a line in the plane and no translation of this set can possibly intersect all four orthants in R2 . Thus, the set cannot be full. The following result follows from Theorem 11 in Haussler (1992). Lemma C.3. Let F be a collection of functions from Z into [0, B] with pseudo-dimension d. Let W1 , W2 , . . . , WT be T independent and identically distributed random variables drawn according to any distribution from the set Z. Then, for each > 0, ( ) d T 2 2 32eB 32eB 1X Pr sup EW [f (W )] − f (Wi ) ≥ ≤ 8 ln e− T /64B . T f ∈F i=1 where W denotes a random variable having the same distribution as Wi ’s. We will now apply Lemma C.3 to our problem and establish the following result. Lemma C.4. Under Assumption 2, for each E ∈ E and > 0, Pr {gE (b µE ) − gE (µE ) ≥ } ≤ 8
64eB 64eB ln
C+1
2
e−
T /256B 2
,
where B = 2M + ln(C + 1) and the random variable µ bE denotes the maximum likelihood estimate defined in Equation (5) based on a sample of T customers, and the probability is taken with respect to the random variables C1 (E), . . . , CT (E). Proof. Let E ∈ E be given. To apply the result of Lemma C.3, we will consider the following collection of functions GE ≡ fE (·, xE ) : {0} ∪ E → R+ xE ∈ RC ∩ [−M, M ]C . Note that since xE ∈ RC ∩ [−M, M ]C , it follows that for each ` ∈ {0} ∪ E, X fE (`, xE ) = ln exk −x` ≤ ln 1 + Ce2M ≤ ln e2M (C + 1) = 2M + ln(C + 1). k∈{0}∪E
46
Note that gE (b µE ) − g (µE )
=
T 1X E [fE (C(E), µ bE )] − fE (Ct (E), µ bE ) T t=1
!
! T T 1X 1X + fE (Ct (E), µ bE ) − fE (Ct (E), µE ) T t=1 T t=1 ! T 1X + fE (Ct , µE ) − E [fE (C(E), µE )] T t=1 ! T 1X ≤ E [fE (C(E), µ bE )] − fE (Ct (E), µ bE ) T t=1 ! T 1X fE (Ct (E), µE ) − E [fE (C(E), µE )] + T t=1 T 1X ≤ 2 sup fE (Ct (E), xE ) , E [fE (C(E), xE )] − T t=1 xE ∈RC ∩[−M,M ]C where the first inequality follows from Equation (5), which implies that T T 1X 1X f (Ct (E), µ bE ) = min f (Ct (E), xE ) . T t=1 xE ∈RC ∩[−M,M ]C T t=1
The final inequality follows from the fact that both µ bE and µE are in RC ∩ [−M, M ]C . Therefore, Pr {gE (b µE ) − gE (µE ) ≥ } ( ) T 1X ≤ Pr sup f (Ct (E), xE ) ≥ E [fE (C(E), xE )] − 2 T t=1 xE ∈RC ∩[−M,M ]2 C+1 2 2 64eB 64eB ln e− T /256B , ≤ 8 where B = 2M + ln(C + 1). The last inequality follows from Lemma C.3 and the fact that the collection of functions G has a pseudo-dimension of at most C + 1. Finally, here is the proof of Theorem 5. Proof. It follows from Lemma C.1 that C+1 X 4 2 256eB 256eB 2 b ≤8 ln e− T /4096B P ≤ Pr gE (b µE ) − g (µE ) ≥ θk (E) − θk (E) ≥ 2 2 4 k∈{0}∪E
where the last inequality follows from Lemma C.4.
D.
Proof of Theorem 6
The proof of Theorem 6 makes use of the following result. Recall that η = min
min
E∈E {i,j}∈E:i6=j
Under Assumption 1, we know that η > 0.
47
|θi (E) − θj (E)|
Lemma D.1. For each E ∈ E, if maxk∈E θbk (E) − θk (E) ≤ η/3, then 6 max{w , w } X b i j θ (E) − θ (E) I(i, j) − IbE (i, j) ≤ k k 2 i∈E,j∈E:i6=j η max
k∈E
Proof. Let i ∈ E and j ∈ E be given. Since maxk∈E θbk (E) − θk (E) ≤ η/3, it follows from the definition of η that θbi (E) 6= θbj (E). Thus, by definition, θ (E) w − θ (E) w bi (E) wi − θbj (E) wj θ i i j j − I(i, j) − IbE (i, j) = θi (E) − θj (E) θbi (E) − θbj (E) θj (E) θi (E) θbi (E) θbj (E) ≤ wi − − + wj θi (E) − θj (E) θbi (E) − θbj (E) θi (E) − θj (E) θbi (E) − θbj (E) wi θi (E) θbj (E) − θbi (E) θj (E) wj θj (E) θbi (E) − θbj (E) θi (E) = + |θi (E) − θj (E)| · θbi (E) − θbj (E) |θi (E) − θj (E)| · θbi (E) − θbj (E) 2 max{wi , wj } θi (E) − θbi (E) + θj (E) − θbj (E) ≤ |θi (E) − θj (E)| · θbi (E) − θbj (E) 6 max{wi , wj } θi (E) − θbi (E) + θj (E) − θbj (E) ≤ η2 where the last inequality follows from the fact that since θbi (E) − θi (E) and θbi (E) − θi (E) are at most η/3, θbi (E) − θbj (E) ≥ η/3. Here is the proof of Theorem 6. Proof. Note that by our construction I(i,0) = Ib(i,0) = wi for all i. By the union bound, b b P max max I(i, j) − IE (i, j) , max θi (E) − θi (E) > E∈E i∈E {i,j}⊆E:i6=j X b b ≤ P max I(i, j) − IE (i, j) > OR max θi (E) − θi (E) > ≤
X E∈E
i∈E
{i,j}⊆E:i6=j
E∈E
P
max
{i,j}⊆E:i6=j
b b I(i, j) − IE (i, j) > OR max θi (E) − θi (E) > min{, η/3}
(
i∈E
X η 2 η b ≤ P θi (E) − θi (E) > min , , 3 6 max` w` E∈E i∈E ( ) 2 X X η ≤ P , θi (E) − θbi (E) > 6 max` w`
X
E∈E
)
i∈E
where the last inequality follows from the fact that max` w` ≥ 1. Note that the third inequality follows from the fact that the event max I(i, j) − IbE (i, j) > OR max θi (E) − θbi (E) > min{, η/3} i∈E
{i,j}⊆E:i6=j
is a subset of the event
X η η 2 θi (E) − θbi (E) > min , , 3 6 max` w`
i∈E
48
To see this, note that if maxi∈E θi (E) − θbi (E) ≤ min{, η/3} and max{i,j}⊆E:i6=j I(i, j) − IbE (i, j) > , then it follows from Lemma D.1 that 6 max w X ` ` bi (E) , < max I(i, j) − IbE (i, j) ≤ (E) − θ θ i η2 {i,j}⊆E:i6=j i∈E
which gives the desired result. From Theorem 5, we have that for each E ∈ E, ( ) X η 2 b P θi (E) − θi (E) > 6 max` w` i∈E
2
2
≤
8
9216eB (max` w` ) 9216eB (max` w` ) ln 2 η 4 2 η 4
!C+1
( exp −
)
4 η 8 T 2
5308416B 2 (max` w` )
Since |E| ≤ 5(N/C)2 , the desired result follows.
E.
Proof of Proposition 8
√ Proof. Let r = 5 − 1 /2 denote the golden ratio and let w∗ = max1≤`≤N w` . Since the classical Golden Ratio search reduces the size of the target set by factor of r in each iteration, the maximum number of iterations is at most dln L/ ln(1/r)e. For k = 1, . . . , dln L/ ln(1/r)e, let Gk denote the event that we make the wrong decision in the k th iteration; that is, we made a decision that is different from we would have done if we can evaluate the profit function f (·) exactly. Consider the first iteration and suppose we are comparing assortments Ds and Dt for some s and t. The s comparison is based on the sample average of WGRS samples for each assortment. Let X1s , . . . , XW denote GRS the observed profit associated with the selections of the WGRS customers who were offered the assortment t Ds . Similarly, let X1t , . . . , XW denote observed profits associated with Dt . By definition, we know that GRS E [Xis ] = f (Ds ) and E [Xit ] = f (Dt ) for all 1 ≤ i ≤ WGRS . Suppose that f (Ds ) > f (Dt ). Then, we can bound the probability of error in the first iteration as follows ( P {G1 }
= P
1
WX GRS
WGRS
i=1
1
WX GRS
WGRS
i=1
1
WX GRS
( ≤
P (
Xis
f (D ) − f (D ) )
(f (Ds ) − Xis ) > β W GRS i=1 i=1 ( ) ( ) WX WX GRS GRS 1 β 1 β t t s s ≤ P Xi − f (D ) > +P (f (D ) − Xi ) > WGRS i=1 2 WGRS i=1 2 β 2 WGRS ≤ 2 exp − , 4w∗ ≤
P
WGRS
Xit − f (Dt ) +
) s
where the last inequality follows from the classical Chernoff-Hoeffding Inequality (Hoeffding (1963)). Using th a similar argument, n it is easy o to verify that the probability of error P {Gk } in the k iteration is bounded above by 2k exp − β
2
WGRS 4w∗
.
49
To determine an upper bound on the regret after T periods, we note that the total number of samples that we used before we are left with a single assortment is at most ln L ln L WGRS · + 3 ≤ 4WGRS · , ln(1/r) ln(1/r) where the factor of 3 reflects the fact that the standard Golden Ratio search starts with 4 points in the first iteration and adds one additional search point in iterations. The maximum expected regret m l subsequent ln L ∗ incurred from these samples is at most 4w · WGRS · ln(1/r) . Moreover, if we correctly identify the optimal assortment at the end of the search, the regret will be zero thereafter. Thus, maximum expected regret after we concluded the search algorithm is at most ln L β 2 WGRS w∗ T · P Gdln L/ ln(1/r)e ≤ 2w∗ T exp − , ln(1/r) 4 max` w` where the right hand side follows from the upper bound on the error probability. Using the definition of WGRS , we have the following upper bound on the expected regret after T periods: ln L ln L β 2 WGRS ∗ ∗ 4w · WGRS · + 2w T exp − ln(1/r) ln(1/r) 4 max` w` ∗ ln L ln L 4w ln T ∗ ∗ +1 + 1 + 2w +1 ≤ 4w β2 ln(1/r) ln(1/r) ∗ 4w ln T ln(L/r) ln(L/r) = 4w∗ +1 + 2w∗ β2 ln(1/r) ln(1/r) ∗ 2 16(w ) ln(L/r) ln(L/r) = ln T + 6w∗ , 2 β ln(1/r) ln(1/r) and the desired result follows from the fact that 1/r ≤ 2 and 1/ ln(1/r) ≤ 2.079.
50