Consumers’ Preference Modeling to Price Bundle Offers in Telecommunication industry Study for a Single Provider H. Le Cadrea , M. Bouhtoub and B. Tuffinc Orange Labs rue du G´en´eral Leclerc 92794 Issy-les-Moulineaux Cedex 9 France {helene.lecadre,mustapha.bouhtou}@orange-ftgroup.com a,b
Irisa/Inria Campus de Beaulieu 35042 Rennes Cedex 3 France
[email protected] a,c
Abstract Network operators are merging their different types of service, such as fixed or wireless telephony, Internet or television, into single offers, called bundles. In the fierce competitive current market especially, it is essential to understand consumers’ preferences, i.e., their utility functions in our case, to define the most profitable offers, with their associated prices. We introduce a random utility model linear in quality attributes such as rate, length of commitment or trust in the operator’s brand, and with coefficients computed using ratings from a panel of customers. The utility and error coefficients are estimated via simulation of a Bayesian network. As a consequence, we are able to determine the optimal number of market segments, as well as the reservation price and coefficient densities on each segment, so that the best marketing strategy can be deduced. Numerical studies are carried out.
Keywords: Simulation, Discrete choice, Bayesian networks, Bundles, Marketing strategies
1 Introduction Mobile and fix telephony markets are becoming saturated in Western Europe. In the meantime, high speed Internet, including or not television, has become more and more popular. Competition forces telecommunication operators to merge these different services into single offers, called bundles. As an illustration, triple play offers, combining telephony, television over Internet and high speed Internet access have flooded the market, but the interest of other combinations of services has to be studied, from a marketing point of view, as well as the associated prices. Practically, it raises the problem of the service convergence. Indeed, the operators have to be present on every service market (i.e. Fix, Mobile, Internet), or require alliances to lease the service from competitors. Furthermore, the goals of the operators might be quite different, depending on their sizes and cost structures. For example, small operators might prefer to target specific market segments or expand their market shares as fast as possible, while big companies who already have wide market shares, would prefer to price discriminate between the segments. We do not investigate those complicated relations in this paper (it will be partially the focus of the illustration part), but it highlights the necessity to accurately understand customers’ preferences, in order to launch properly chosen offers on the market, in terms of content and price, to reach the operator’s goal: maximize his revenue. Modeling users’ preferences has received a lot of attention during the last decades, in the economic and marketing scientific research literature [8], [15], [19], [20]. Probably the most considered models are discrete choice models to represent preferences among several alternatives; they will also be used in this paper. It has already been extensively used with success in the airline industry, and for revenue management in general. 1
Tallury and Van Ryzin [26] apply discrete choice theory to the modeling of consumer behavior and the selection of optimal fares for a single leg management problem in the airline industry. McGill and Van Ryzin [22] give an overview of Revenue Management techniques and evoke the problem of predicting passenger behavior using discrete choice models and multinomial logit estimation. A key variable is the so-called reservation price, representing the price at which a user is is indifferent between buying the considered offer or choosing any other alternative in the choice set. Modeling preferences in case of bundling, on the other hand, has been less investigated. In this case, most models suppose that the customers’ demand is known by the seller and, usually, customers’ preferences are time invariant, i.e. do not evolve with price or product characteristic variations (cf Jeididi et al. [14] and, Hitt and Chen [12]). In reality, sellers have only partial knowledge of the parameters characterizing the customer preference distributions. By answers from a questionaire on a representative sample of customers, sellers can determine a set of prices maximizing their revenue. Besides, the coupled observation of revenue evolution and price allocation enables the seller to increase his information about the customers’ preferences. According to Kephart et al. [16] for example, the bundle strategy used is to exploit the prices known by experience to generate the highest revenue, with an exploration of new prices, because of a fluctuating and unpredictable demand function. The option of directly asking customers to declare their reservation prices is considered by Jeididi et al. [14], but this technique is not reliable since it yields incentives to cheat. Indeed, Jeididi and Jagpal show empirically that direct elicitation methods yield biased estimates of the appropriate demand curves and lead to sub-optimal product-line pricing strategies, because customers voluntarily decrease their declared prices in order to decrease the product price, or they can make judgement errors due to the offer novelty. It seems therefore fundamental to deal with customers’ errors as well as their heterogeneity by, for instance, using random utility functions with parameters that have to be estimated. This kind of approach is studied by Chung and Rao [9]. They consider a model based on attributes, i.e., selected important characteristics for preferences, and generalize the approaches based on attribute comparison (the Balance Model of Farquahr and Rao [10], the Attribute Satiation Model of McAlister [21] and Harlam and Lodish’s Multiple-Item Assortement Model [11]) to the case where the bundles are heterogeneous, i.e. the individual offers composing the bundle are comparable on sets of different attributes. It is shown for an application to the bundling of PC systems made of a computer, a monitor and a printer that the mean absolute percentage error between the estimated reservation prices and the ones obtained by direct elicitation is around 17%. Nevertheless, the authors consider a pure bundling framework, i.e., the proposed bundles all have the same size (for example, only triple-play offers). Our work is based on the model described in [9]. But we rather focus specifically here on a single telecommunication operator, who plans to commercialize a fixed number of offers. With respect to [9], an offer can be simple (ex: a simple mobile offer), double or triple (ex: TV-internet-fix phone). Similarly, every traditional compound offer can be splitted in single offers (ex: the triple-play offer is made of simple offers of TV, internet and fix phone), but we need to make sure that a resulting offer does not cannibalize the others, that is, it does not have any negative impact on the sales performance of other products. Thus the goal of an operator is not only to determine optimal prices, but also to select the set of bundles yielding the highest revenue, which is also our goal. How does the operator infer the (unknown) reservation prices of customers? Various approaches might be envisaged to infer consumer preferences from data. First, the maximum likelihood coupled with an Expectation-Maximization (EM) algorithm can be used to infer density parameters, but it requires to introduce an a priori distribution on the data. Lau et al. [19] allow the consumers’ deterministic utility to be a switching function of the explanatory variable. The procedure is formulated as a linear integer programming model whose resulting estimates are maximum likelihood estimates. Second, neural networks can be trained to learn subjective preferences and determine the best offer combination. In the training part, several elementary offer combinations should be used. For each combination, the consumers are asked to give global ratings for the presence of the various attributes in every offer. The algorithm learns the correspondance between the offer combinations and the consumer valuations. This method prevents us from the linearity assumption on the preference utility function, but requires to build and test numerous offer combinations which can be quite expensive, especially if competition is introduced. 2
Kaefer et al. [15] combine a multinomial logit model with a neural network approach to target prospective consumers new to their categories. They utilize the multinomial logit model to determine which variables are the most relevant in the neural network model. Finally, Bayesian networks are a powerfull knowledge representation tool under conditions of uncertainty. Besides, methods capturing available data to construct belief networks promise to greatly improve the efficiency and accuracy of the models. Hence, we choose to model the problem via a Bayesian network. Following [9], we rather propose to select a panel of customers, and, for a fixed number of segments, ask them to give ratings (or grades) for the presence of various attributes (such as brand image, QoS, rates...) in each single offer. From these ratings, a linear utility function is deduced, its coefficients being estimated. A random variable is added to the model to represent the error on choices made by the customers, this variable being drawn according to a discrete choice model. An algorithm based on Bayesian networks and Monte Carlo simulation is used to determine the parameters of the utility. Indeed, we have no information on them and want to increase our knowledge about the customers’ preferences; the framework of Bayesian networks is a relevant tool, making use of a priori densities, which influence is negligible on the values the algorithm converges to (cf Poirier [24]), and helps in getting more accurate coefficient estimations. Similarly again to [9], we take into account customers’ preference heterogenity among and within each market segment. Indeed, in the Marketing literature, it has been shown that the heterogeneity within each segment is too significant to be totally ignored (cf Allenby and Ginter [1]). Usually, it is assumed that consumers share the same type of behavior and attitudes and in any case a single model can be used in order to derive the purchasing probabilities for the examined products. However, the study of the utility functions, estimated by the help of multicriteria analysis methods, reveals that this assumption may lead to an erroneous abstraction of the reality [20]. Furthermore, taking into account consumers’ heterogeneity might improves the efficiency of the resulting pricing. From the obtained utility functions, we can deduce the customers’ estimated reservation prices for each offer, and the parameters’ density estimates for each segment. It then provides the operator useful interpretation on the market segment willingness to pay. For example, the computation of the Gini’s indexes (cf Tassi [27]) for the series of reservation prices associated with the sampled customers gives an insight about the subjective price valuation of his offers. If there is low disparity (small indexes), it means that everyone evaluates the offer the same way. But, if the disparity is strong (index around 1), it means that only some people value the product. Hence, it should be better for the operator to target a specific market segment. Besides, the operator wants to properly segment the market to better control demand. But the operator does not know to which market segment an individual customer belongs, as well as the optimal number of segments to use. The above algorithm is thus performed for various numbers of segments, and the number maximizing the log marginal likelihood of the parameters (seen as a performance criterion), is selected. We now emphasize our contribution (especially with respect to [9]): again, it is a specific application of bundling to telecommunication networks. It also includes rival offers in the global choice set, which enables us to get more accurate estimates of the reservation prices. Indeed, the reservation prices rely on the offers commercialized by all the operators. We also extend [9] in order to deal with choice sets made of various size offers (i.e. simple, double and triple-play products) which could influence each other. To perform this point, we modify the utility to incorporate the appropriate number of attribute categories. Our discrete choice model also differs from the one in [9] since we consider two non-overlapping choice sets: the first one contains all the commercialized offers, while the second one is the empty choice set, requiring an adaptation of the reservation price estimation and of the a posteriori densities updates. Indeed Chung and Rao [9] consider multiple choice sets containing the no-buy option and multiple bundles. Additionally, contrary to [9], we do not neglect the errors in the implementation (which is usually done due to the difficulty to perform the simulation because of their correlations), and use an acceptance-rejection procedure as a bypass. We also additionally implement a segmentation process, which requires the computation of the harmonic mean of the marginal likelihood values obtained at each iteration of the algorithm. The algorithm is run separately for each consumer and each iteration increase the operator’s information about the consumers’ preference parameters. The marginal likelihood is used as a criterion to determine the optimal number of market segments. We prove (cf Chung and Rao [9], Chib and Jeliazkov [7] and, Newton and Raftery [23]) that the harmonic mean of these likelihood 3
values is an estimate of the marginal likelihood. The algorithm uses as input customers’ data and gives us parameter and reservation price estimates for every customer, on each market segment, as outputs. We infer the associated densities using a non parametric approach, based on kernel estimators. This approach enables us to measure the efficency of the estimator. Indeed, the resulting quadratic risk is proved to be close to zero when the number of sampled customers is sufficiently large. Numerical experiments are performed using data from a survey realized by Orange in 2006 about behaviors of 1014 families towards telecommunication, multimedia and leisure activities, on a choice set made of 6 distinct offers. Four market segments are identified and the mean absolute percentage error between the estimated reservation prices and the true one for the triple-play offer (TV-fix-internet) is of around 15% with correlated errors and 33% with independent ones. The remaining of this paper is organized as follows. In Section 2, we introduce the random linear utility model, define the attribute groups and the utility coefficients using the data from the panel of customers. In Section 3, we introduce a priori densities on the parameter distribution and estimate the model parameters using Monte-Carlo methods. How the a posteriori densities are updated is explained in the appendix. The segmentation process, based on the estimation of the log marginal likelihood is detailed in Section 4. Numerical illustrations are also provided in Section 5 mainly. Finally we conclude and briefly introduce future extensions, related to competition between providers, in Section 6.
Notations For any set S, Let |S| be its cardinal. Define also Mat(r, r) as the set of r × r matrices. The next table reviews the specific notations of this paper. K set of operator’s available families services (ex: availble services about television, internet, or wireless or wired telephony) B operator’s choice set containing the offers to commercialize N panel of tested customers N cardinal of the panel set N a an attribute 1 A class of type 1 attributes, fully comparable for the available services A2 class of type 2 attributes, partially comparable for the available services A3 class of type 3 attributes, non comparable for the available services A total set of attributes wi (k) non negative coefficient characterizing the importance of service k for customer i b an offer b(k) the k th component of the offer, belonging to service family k Xi (a, b(k)) ∈ {0, 1, 2, ..., 10} rating of customer i for the importance of attribute a in offer b Si (a, b) weighted sum, in terms of the wi (k), of attribute a in offer b for customer i Di (a, b) weighted dispersion of the ratings of services in bundle b on attribute a RP (i, b) reservation price of the customer i ∈ N for the offer b F set of market segments F Maximum number of segments Zi Random variable giving the segment customer i belongs to Pb Price of bundle b Uib|Zi =f customer i’s utility for the offer b, provided i belongs to market segment f BV (ib|Zi = f ) customer i’s valuation for the offer b, provided i belongs to market segment f , excluding price βa (f, i) utility parameter representing the importance of attribute a for customer i on market segment f γa (f, i) utility parameter measuring the substituability (< 0)/complementarity (≥ 0) of attribute a, for the client i in f
4
αP (f, i) ψ(f, i) Φi Nk (m; V ) ρi1 ∈ [0; 1] ρi0
utility parameter of price sensitivity for client i on market segment f probability that customer i belongs to market segment f vector of unknown coefficients for customer i k-dimensional Gaussian law with mean m and covariance matrix V degree of independence in unobserved utility among the alternatives in the choice set for customer i client i’s utility to not buy anything
2 The Model This section presents the utility model describing users’ preferences. The utility is defined as a random variable, linear in price and the customers’ quality valuation, with unknown parameters/coefficents. Coefficients are different for each customer. Error terms will be estimated thanks to the Bayesian network framework in Appendix 7 Subsection 7.3, in order to derive the reservation prices for each customer and each offer on the market. Let F, be the segment set and F the maximum number of segments. The basic elements of the model are the following. Definition 1. • A service family holds all the elementary offers for a specific communication support. For example, the Mobile family contains all the Mobile phone offers on the market. Let K, be the set of families and |K| ∈ N the number of families. • An offer b is a combination of elements in a subset of K, that is, a choice of elementary offers for a number of communication supports.
• A choice set contains the set of offers of various size that an operator commercializes on the market. It will be denoted B.
In the illustrating part of the article, we will cope with a telecommunication system, made of 4 distinct families: wired telephony, wireless telephony, Internet and Television. Hence, |K| = 4. Each offer b is a combination of elementary offers b(k), for a number of communication supports k in K. If no elementary offer in b belongs to the k-th family, we use the convention b(k) = 0. b(k) = 0 means that the consumer’s attribute valuation in the component b(k) of the offer b will always be set to zero, while otherwise the consumer gives rating for the presence of the attribute in the k-th component of the offer b. Example 2. The decomposition is described in Figure 1, where we suppose that the operator commercializes a simple offer of Wireless telephony (b1 ), a double offer of Wired telephony-Internet (b2 ), and a triple-play offer made of Mobile-Internet-TV (b3 ). The offers are then decomposed over the service families. For the simple offer b1 in the Table, it gives us a single component b1 (1) in the Mobile family, while the others are set to 0: b1 (k) = 0, k = 2, 3, 4. b2 is a double-play offer of Fix and Internet. Hence, it gives two positive components: b2 (2) in Fix telephony family and b2 (3) in Internet family, while b2 (1) = 0, b2 (4) = 0. Finally, the third offer is a triple-play offer of Mobile, Internet and TV. Consequentely, there are 3 positive components b3 (1), b3 (3), b3 (4).
2.1
Utility model
Following [9], we use the random utility model introduced by Train [28], the most used one in discrete choice models. Provided customer i belongs to the market segment f , the random utility model associated with customer i, towards offer b, at price Pb , is expressed as Uib|Zi =f (Pb ) = Vib|Zi =f (Pb ) + ε(i, b), ∀i ∈ N , ∀b ∈ B, ∀f ∈ F := {1, 2, ..., F }, where 5
(1)
Figure 1: Decomposition of the operator’s choice set offers over the service families.
• the error coefficient ε(i, b) is a random variable taking into account the h i uncertainty associated with customers’ valuations. The vector ε(i) = ε(i, 1) ε(i, 2) ... ε(i, |B|) , contains all the error coefficients for the client i, towards any offer in the total choice set B. • Vib|Zi =f is the deterministic part for each customer. It is also decomposed (linearly) into Vib|Zi =f (Pb ) = BV (ib|Zi = f ) + αP (f, i)Pb to separate between the valuation due to the price Pb , with αP (f, i) ≤ 0 coefficient expressing the customer i sensitivity towards price, and the valuation BV (ib|Zi = f ) due to the bundle itself, excluding price. Why do we introduce random variables to represent errors? The operator has actually an incomplete information on the customers’ utilities, due to an uncertainty which is associated with unobserved alternative components/attributes (see below for a definition), unobserved taste variations of the customers, as well as judgement and measurement errors. Finally, using the definition of conditional probabilies, customer i’s utility for the offer b takes the form Vib =
F X
ψ(f, i) Vib|Zi =f ,
(2)
f =1
where ψ(f, i), i ∈ N , f ∈ F, is the unknown probability that customer i belongs to the market segment f .
2.2
Attributes and utility coefficients
The value of a bundle given that customer i is in segment f , BV (ib|Z i = f ) is assumed to depend on so-called attributes. After a definition of those attributes, we express BV (ib|Z i = f ) still as a (convenient) linear function. We first need the following definition. Definition 3. An attribute is an entity that defines customers’ valuations properties and distinguishes between the offers. For example, a classical attribute in telecommunication is the Quality of Service (QoS), but several QoS metrics can also be considered. As explained in [9], each attribute belongs to one of three categories, depending on whether it can be compared on the service families. For example, the attribute of QoS can be evaluated on every offer belonging 6
Rate
Internet
non comparable
Security
Internet, Fix phone
partially comparable
Length of commitment
Internet, Fix, TV, Mobile
fully comparable
Illimited communication towards Fix phones
Fix, Mobile
partially comparable
Illimited communication towards Mobile phones
Fix, Mobile
partially comparable
Hotline 24h/24, 7d/7
Internet, Fix, TV, Mobile
fully comparable
Number of free TV channels
Mobile, TV
non comparable
Mobile card
Mobile
non comparable
Trust in the brand
Fix, Internet, TV, Mobile
fully comparable
Quality of Service / Reliability
Fix, Internet, TV, Mobile
fully comparable
Table 1: Description of the main model attributes and their categories in telecommunications. to Wireless telephony, Wired telephony, Internet and TV families, i.e. it is fully comparable, while the attribute of Illimited communication towards wired phones can only be compared in offers belonging to wireless and wired telephony families. It is then only partially comparable. Some are not comparable at all. We therefore have the three following classes: • The class A1 contains the type 1 attributes, which are fully comparable, since they appear in every service family. • The class A2 contains the type 2 attributes, which are partially comparable, since they appear in at least two service families. • Finally, the class A3 contains the type 3 attributes, which are non comparable, i.e. they appear in only one system family. Define also A = ∪3i=1 Ai the total set of attributes.
Example 4. In a telecommunication system, we have selected 10 of the most representative attributes, listed in Table 1, and partitionned them into the 3 categories. The attribute category depends on how many system families it can be compared on. From this classification of attributes, the valuation is assumed to take the form i X h BV (ib|Zi = f ) = α0 (f, i) + βa1 (f, i)Si (a1 , b) + γa1 (f, i)Di (a1 , b) +
X h
a2 ∈A2
a1 ∈A1
i X αa3 (f, i)Ci (a3 , b). βa2 (f, i)Si (a2 , b) + γa2 (f, i)Di (a2 , b) +
(3)
a3 ∈A3
Let us now explain the different variables involved in this expression. The utility parameters Si (a, b), Di (a, b), i ∈ N , b ∈ B, a ∈ A1 ∪ A2 and Ci (a3 , b) are known as functions of the panel of customers’ ratings. The others are coefficients giving the importance of the above parameters. More precisely, each customer i in the panel N is supposed to have provided a rating Xi (a, b(k)) ∈ {0, 1, 2, ..., 10} to the presence of the attribute a in the simple offer b(k). This interval is arbitrarily chosen from standards (but the simulation algorithm which estimates the utility parameter will anyway renormalize the coefficients). Similarly, each customer i ∈ N is asked to associate importance weights with every family. These coefficients are non negative and normalized, i.e.: w i (k) ≥ 0, k = 1, 2, ..., |K|, and P|K| k=1 wi (k) = 1, ∀i ∈ N . They express the importance of each component family in the offer. The weighted sum Si (a, b) measures the ratings of customer i’s relevant components (i.e., the individual offers b(k)) for bundle b on attribute a: Si (a, b) =
|K| X k=1
wi (k) 1(a,b(k)) Xi (a, b(k)) , a ∈ A1 ∪ A2 , b ∈ B, 7
(4)
where 1(a,b(k)) = 1 if the attribute a presence can be rated on the simple offer b(k), and 0 otherwise. We also use as a utility parameter the weighted dispersion Di (a, b) of relevant components for bundle b and the average rating of bundle b, on attribute a: Di (a, b) =
|K| X k=1
¯ i (a, b) = with, X
P|K|
¯ i (a, b) , a ∈ A1 ∪ A2 , b ∈ B, wi (k) 1(a,b(k)) Xi (a, b(k)) − X
1(a,b(k)) Xi (a,b(k)) P|K| k=1 1(a,b(k))
k=1
(5)
the average rating that customer i gives to the presence of attribute a
in every single offers b(k) composing the bundle b. Remark that if the offer is made of a single element, the dispersion vanishes. P|K| Finally, our last set utility parameters, the Ci (a3 , b), given by Ci (a3 , b) = k=1 wk (i)Xi (a, b(k)), represents customer i’s (weighted) perceived value for non-comparable attributes. How do we balance the relative importance of the above utility parameters? It is done thanks to coefficients α0 (f, i), βa (f, i), γa (f, i) and αP (f, i). Those coefficients additionally have some marketing interpretations: • α0 (f, i) is the intercept coefficent. • βa (f, i) ≥ 0, means that the presence of the attribute a in an offer is desirable, according to the customer i on the segment f , while βa (f, i) < 0, means that the attribute effect has a negative effect in the offer. • In the same spirit, γa (f, i) ≥ 0, means that the customer i on the segment f thinks that the attribute a is complementary in the offer, while γa (f, i) < 0, expresses the substituability of the attribute, which might be replaced. With respect to [9], we do not restrict our paper to bundles of the same size. If we consider a simple offer s, the function BV (is|Zi = f ) can be simplified since there is a single attribute class remaining. It gives us X BV (is|Zi = f ) = α0 (f, i) + αa3 (f, i)Ci (a3 , s). a3 ∈A3
In the case of a double-play offer d, the only possible classes for attributes are those non comparable, and partially comparable. The function BV (id|Zi = f ) becomes in this case X X BV (id|Zi = f ) = α0 (f, i) + [βa2 (f, i)Si (a2 , d) + γa2 (f, i)Di (a2 , d)] + αa3 (f, i)Ci (a3 , d). a2 ∈A2
2.3
a3 ∈A3
Discrete choice model
We now have to deal with correlations among the error terms of bundle utilities, thanks to a discrete choice model. Here we do not follow [9], and use a different discrete choice. Indeed, Chung and Rao in [9] suppose that there choice sets are made of various bundle assortments and of the no-buy option. In our article, there are two non-overlapping choice sets: the whole set of offers and the no-buy option. The customers compare all the offers on the market, and judgement errors and unobserved similarities should be evaluated on the basis of the whole set: indeed, it is impossible to consider sub-sets because the customer will make his choice knowing all the proposed offers. We need to use a parameter ρi1 ∈ [0; 1] measuring the degree of independence in unobserved utility among the alternatives in the choice set B. As a first approximation, we consider that (1 − ρi1 ) is a measure of the correlation (cf Train [28]). ρi0 := Ui0 , is the client i’s utility to not buy anything. The error terms in the utility function are correlated within the choice set B, while they are independent of the error associated with the empty set. The discrete choice theory (cf Train [28]), by the nested logit model, makes use of a cumulated distribution function of form !ρi1 X ε(i, b) . (6) )) F (ε(i)) = exp (− exp(−ε(i, 0))) exp(− exp(− ρi1 b∈B
8
Since the cumulated distribution function associated with the empty set error is F (ε(i, 0)) = exp(− exp(−ε(i, 0))), the empty set error is drawn according to a Logit density ε(i, 0) ∼ exp(−ε(i, 0) − exp(−ε(i, 0))). Due to the error correlations, we have no explicit expressions of the densities associated with the correlated coefficients. Usually, the error coefficients are ignored since difficult to simulate due to correlations. Indeed, in the numerical illustrations in [9], Chung and Rao restrict their choice sets to the best bundle selected by the consumers and the no-buy option. Consequentely, they do not have to deal with correlations between error coefficients and, the distribution function associated with their choice set is the product of two logit densities which is quite easy to implement. However, in our telecommunication system, choice sets are large and contain all the possible offers. Consequentely, it is impossible to neglect the error correlation effects on the consumer reservation prices. Hence, we introduce in this section, a way to simulate them. Indeed, conditionally to the realization of the other error coefficients and by derivation of the cumulated distribution function, the density is given by X ε(i, b) exp(− (t+1) ) fε (ε(i, b)|ε(i, 0), (ε(i, k))k∈B, k6=b ) = exp (− exp(−ε(i, 0))) exp −ρi1 ρi1 k6=b ( !) ε(i, b) (t+1) exp −ρi1 exp − (t+1) . ρi1 To simulate the errors, we use a method of acceptance-rejection described in Section 7.3 of the appendix (see for instance [25] for the general framework of acceptance-rejection). A nested logit model is appropriate when the set of alternatives faced by a decision-maker can be partitioned into sub-sets, called nests, in such a way that the property of Independence from Irrelevant Alternatives (IIA) holds (cf Train [28]). We define our nested logit model and check that it satisifes the IIA property. From the errors, classical discrete choice theory (cf Train [28]) derives the expressions of the choice probabilities. Provided customer i belongs to the segment f , the probability to not buy anything at all is: Pi0 (ρi ) =
1 + exp −Vi0
P
1 j∈B exp(
where, by assumption, Vi0 := Ui0 − ε(i, 0) := ρi0 − ε(i, 0). Also, the probability to buy an offer b ∈ B is: Pib (ρi ) =
ρi1 , Vij (Pj ) ) ρi1
ρi1 −1 Vib0 (Pb0 ) b0 ∈B exp( ρ(i,1) ) P . Vib0 (Pb0 ) ρi1 + exp (V ) ) exp( 0 i0 b ∈B ρi1
ib (Pb ) ) exp( Vρ(i,1)
P
(7)
(8)
ij Since rjk = Pik = exp (Vij − Vik ) , ∀j, k ∈ B, the ratio of probabilities for any two alternatives in the same nest is independent of the existence of all other alternatives (cf Train [28]). Furthermore, the ratio of the probabilities to choose the alternatice j in the choice set B and the probability to not choose anything is
P
V
rj0 =
ij exp( ρi1 )
P
V
ij k∈B exp( ρi1
exp (Vi0 )
9
ρi1 −1
, ∀j ∈ B.
2.4
Reservation price
The reservation price is a key value. In the case of a simple offer, it is the maximum level for which the customer would buy the product. It has been generalized to the case of an offer b belonging to a global choice set B by Kohli and Mahajan first in [18] and later on in Chung and Rao [9].
Definition 5. RP (i, b), is the bundle b price at which customer i is indifferent between buying offer b and choosing another alternative in the choice set. Formally, provided that customer i belongs to the market segment f , the reservation price for the offer b can be defined as follows 1 Uib|Zi =f RP (i, b) = max 0 max Uib0 |Zi =f (Pb0 ); Ui0|i∈f . 0 0 b ∈B,b 6=b,b 6=0
From the definition of the utility, it gives RP (i, b) = −
maxb0 ∈B−{b} Uib0 |Zi =f (Pb0 ) ε(i, b) BV (ib|Zi = f ) − + . αP (f, i) αP (f, i) αP (f, i)
(9)
The error coefficients, the price sensitivity parameters and the random utilities are random variables. Then, the expected reservation price is E [RP (i, b)]
h α (F, i) X βa (F, i) X βa (F, i) γa (F, i) 0 1 2 + Si (a1 , b) + 1 Di (a1 , b) + Si (a2 , b) αP (F, i) α (F, i) α (F, i) α P P P (F, i) 1 2 a1 ∈A a2 ∈A i X αa (F, i) γa2 (F, i) 3 Di (a2 , b) + Ci (a3 , b) + αP (F, i) α P (F, i) 3 a3 ∈A maxb0 ∈B−{b} Uib0 (Pb0 ) ε(i, b) + E −E . αP (F, i) αP (F, i)
=
−E
To estimate the reservation prices, we therefore need to determine the utility functions. From the ratings of the panel of customers, the utility parameters Si (a, b), Di (a, b), ∀a ∈ A1 ∪ A2 , ∀i ∈ N , Ci (a, b), ∀a ∈ A3 , ∀i ∈ N , ∀b ∈ B can be computed. Now, the coefficients α0 (f, i), βa (f, i), γa (f, i) and αP (f, i) have still to be determined.
3 Estimation of the model coefficients Define Φi as the vector of coefficients for customer i, i.e., Φi := {βa (f, i)}a,f , {γa (f, i)}a,f , {αa (f, i)}f , {α0 (f, i)}f , {αP (f, i)}f .
The dimension of Φi is given by R = F (2|A1 | + 2|A2 | + |A3 | + 2).
3.1
Estimation of the random utility coefficients
Various approaches can be envisaged to infer consumer preferences from data. First, we can use the maximum likelihood. It requires to define an a priori density on the data, which depends on unknown parameter θ: p(x|θ). Then, the parameter θ should be defined so as to maximize the likelihood Q function: θ ∗ = arg maxθ L(θ|x) = arg maxθ N p(x i |θ). If the likelihood maximization can not be i=1 performed analytically, the Expectation-Maximization algorithm (EM, cf Robert [25]) can be used provided the likelihood function can be simplied to handle additionnal missing or hidden parameters. However, the difficulty remains to define a proper a priori density on the data. Second, a neural network approach can be 1
Including in B the empty set, leading to a null utility.
10
used. The algorithm is trained to learn the correspondance between offer combinations and consumers’ valuations. On the one hand, this method might prevent us from the linearity assumption on the preference utility function, but on the other hand many offer combinations should be tested which could be quite expensive if competition is introduced. Finally, Bayesian networks are growing in popularity as the model of choice for problems involving reasoning under uncertainty. They have been implemented in various application areas such as medical diagnostics, classification systems, multisensor fusion... Until recently the standard approach to constructing belief networks was a labor-intensive process of eliciting knowledge from experts. Methods for capturing available data to construct Bayesian networks or to refine an expert-provided network promise to greatly improve both the efficiency and the accuracy of the model. A challenge for approaches frequently encountered in real systems arises when the environment being modelled by the Bayesian network changes, either slowly or abruptly, in time or in characteristics. The learning rate shoud then be adapted. However, this last point is out of the scope of this article, but can be considered in extensions. Following [9], we use the hierarchical Bayesian framework (cf Poirier [24]) to estimate the utility coefficients, due to the lack of information of the operator about customers. Indeed, those parameters are difficult to evaluate directly, and Bayesian inference provides data augmentation and seems a relevant way to perform the estimation. The basic principle is to assume a priori laws on distributions and look by simulation to which values the simulation converges to. It is known that the result is robust to the choice of a priori laws (cf Poirier [24]). The approach therefore enables to learn iteratively these preference coefficients. It consists in sampling vector Φi given the segment Zi = f customer i belongs to. The unknown variables are supposed to have the a priori distributions: • The parameters for the discrete choice model ρi = (ρi0 , ρi1 )T ∼ N1 (µρ , Tρ ) × N1 m, σ 2 • the coefficients for the random utility
|[0;1]
2,
(10)
(11)
(Φi |Zi = f ) ∼ N R (µf , Tf ), F
with µρ 2-dimensional vector, Tρ 2 × 2 matrix, and µf R-dimensional vector, Tf R × R matrix. Those variables have hyper a priori distributions (to express the lack of a priori information of the operator about the prior distributions), given by • µρ ∼ N1 (vρ , Ωρ ), vρ ∈ R, Ωρ ∈ Mat(2, 2), R
• µf ∼ N R (vf , Ωf ), vf ∈ R F , Ωf ∈ Mat( FR , FR ) , ∀f ∈ F, F
•
Tρ−1
∼ Wishart(νρ , sρ ), νρ ∈ R, sρ ∈ R+ a variance parameter, R
• Tf−1 ∼ Wishart(νf , Sf ), νf ∈ R F , Sf ∈ Mat( FR , FR ) , is a variance-covariance matrix , ∀f ∈ F,
• (ψ1 , ..., ψF ) ∼ d(ω1 , ω2 , ..., ωF ), is defined for each client i, with d the normalizing function defined P F on the space R . d(ωf ) ≥ 0 ∀f ∈ F, l∈F d(ωl ) = 1.
We choose vρ = νρ = 1, vf = νf = 1 R , sρ = 1, Ωρ = Id2,2 , Sf = Ωf = Id R , R , and ωf = F1 ∀f ∈ F. The F F F choice of the normal and Wishart densities is arbitrary, as well as the values of involved parameters v ρ , Ωρ , vf , Ωf , νρ , sρ , νf and Sf , but it enables to simplify the way we update the choices in the simulation algorithm below. Again, Bayesian inference is very robust to thoses choices. Recall that a matrix M ∈ Mat(p, p) 2
To get an analytic expression of the normal density restricted to the close interval [0; 1], we let Φ = With Φ = 1 and m = 21 , we get σ = 0.93.
11
R1
0 σ
√1 2π
2
du. exp( −(u−m) 2σ 2
follows a Wishart distribution, denoted Wishartp (n, Σ), if M can be written M = X T X, where X is a random n × p matrix such that the n rows of X are independent random Gaussian vectors, drawn according to the density Np (0; Σ) (cf Tassi [27]). Unfortunately, we can hardly generate Φi because we do not have a closed form expression of the cumulative distribution function (the a posteriori distribution (11) is a density whose parameters are drawn according to hyper a priori densities). A traditional technique to cope with this problem is to use Markov Chain Monte (t) (t) Carlo (MCMC) techniques. The basic principle is here to simulate two Markov chains (ρ i )t≥0 , (Φi )t≥0 (t) (t) where Φi depends on the value of ρi , with steady-state distributions the target laws, and therefore to simulate the chains during a time T large enough to approach the steady-state behavior. We can then use the (T ) (T +1) (t) values of Φi , Φi , . . . (even if the Φi are not independent). In order to build the Markov chain and implement the Monte-Carlo estimation, we introduce a Hastings-Metropolis algorithm (cf Robert [25]) as described now; as a consequence, from the resulting data augmentation the estimations are updated at each time step of the algorithm. Assume that the chain is generated up to time t and that we want to generate a new state at time t + 1. For all variables, we use superscript (t) to represent the estimation at time t. The different steps are as follows (and are mainly taken from [9]). 1. Data augmentation. We draw a random variable at the time instant t + 1, Zi , which gives us the index of the segment to which the client i belongs, given the parameter values at time t. The probability of getting segment f for customer i is given by (t+1)
(t+1)
ξf i
(t)
=
=
(t)
P[Φi = φi ∩ {Zi = f }]
(t)
(t)
(t)
P[Φi = φi |Zi
= f ]ψ (t) (f, i)
, (t) (t) (t) (t) (t) 0 (t) 0 P[Φi = φi ∩ {Zi = f 0 }] f 0 ∈F P[Φi = φi |Zi = f ]ψ (f , i) 1 (t) (t) (t) (t) (t) (t) det(Tf )− 2 exp − 21 (φi − µf )T (Tf )−1 (φi − µf ) ψ (t) (f, i) . (12) P (t) (t) T (t) −1 (t) (t) (t) − 1 1 (t) (f 0 , i) 2 exp − (φ − µ ) (T ) (φ − µ ) ψ 0 0 0 f 0 ∈F det(Tf 0 ) i i f f f 2 P
f 0 ∈F
=P
In the next steps of the algorithm, we estimate the coefficients for customer i on segment Z i
(t+1)
only.
2. MCMC simulation of the parameters according to updated densities. We assume that customers choose the offer generating the maximum random utilities in Equation (13). The likelihood choices for the whole panel (by conditionning with respect to the segment for each customer) is given by |B| N X F Y Y Di0 L = (13) [Pib (1 − Pi0 )]Dib ψ(f, i) , Pi0 i=1 f =1
b=1
where Dib ∈ {0; 1}, b ∈ B ∪ B0 , is a binary choice variable which takes the value 1 if customer i P|B| chooses bundle b of the choice set, and 0 otherwise, with Di0 + b=1 Dib = 1 because only one choice is possible. We determine the choice combinations for each customer i, (D i0 , Dib , b ∈ B), maximizing the likelihood (13). Indeed, we need the choice variable values to update the random utility parameters in (14) and (16) as described below. We determine which offer maximizes each consumer’s random utility, by applying the likelihood function L defined in (13) to each basis vector of R |B|+1 for each customer and by taking the product of the resulting individual likelihood values. The choice variables as well as the discrete choice probabilities are time dependent. But for the sake of simplicity, we omit the time superscript for these variables.
Using the a priori information, the parameters are simulated via Metropolis-Hastings algorithm (cf [25], [9]) for each customer i ∈ N . Indeed, we do not have closed form expressions for the cumulative distribution functions of Φi and ρi . Hence, we simulate them using Metropolis-Hastings algorithm with random walk since we have no prior information about the conditional density guiding the Monte-Carlo 12
Markov chain evolution. According [25], if the Markov chain conditional generating density is positive in a neighbourhood of 0 then the associated Markov chain is ergodic and converges to the target law. In our numerical applications (cf Section 5), we use a normal density centered in the last generated parameter with postive variance as generating density for our random walk. To guarantee the convergence of our algorithm towards the target density, the simulation running time should then be (t+1) (t+1) kept large enough. In the numerical applications (cf Section 5), we draw the ρ i and Φi for 1000 simulation steps. The outputs are the steady state distributions of Φ i and ρi which approximate the ˆ (T ) can be used to give insights about marketing target laws. Then, the resulting parameter Φ i interpretations of the utility coefficients: βa (f, i), γa (f, i) on each market segment. For example, it provides us information about the complementarity or the substituability of the attribute a, which can be ˆ (T ) , ρˆ(T ) and ψˆ(T ) (f, i) in the quite useful to build an optimal offer. Furthermore, by substitution of Φ i i equation (9), we get estimates of the consumer i’s reservation price for the offer b, on the market segment f ∈ F. More formally, the parameters are simulated according to the densities defined below, for each consumer i ∈ N . (a) The discrete choice parameter according to (t+1)
ρi
∼
(t+1) i Yh (ρ − µ ρ )2 1 (t+1) Di0 (t+1) Dib p exp − i0 Pi0 (ρi ) 1 − Pi0 (ρi ) 2Tρ 2πTρ b∈B ! (t+1) (ρ − m)2 1 √ exp − i1 . 2(σ)2 σ 2π
! (14)
This expression results from our a priori assumption about the discret choice parameter (10). (t) Remark again that we use the information acquired at the time instant t, Φi . To simulate it, we use a Metropolis-Hastings’ algorithm with random walk (cf [25]). The idea is to use the previously (t) generated value to draw the next one. The basic principle is to draw a new variable r i = ρi + e(t) , (t) with e(t) random variable modeling a perturbation of density g and independent of ρ i . We set r , with probability min 1; fρi (ri ) , i (t+1) (t) fρi (ρi ) ρi = (t) ρi , otherwise
where the generating density is
fρi (ri )
=
(t+1) i Yh − µ ρ )2 (ρ 1 (t+1) Dib (t+1) Di0 p ) ) 1 − Pi0 (ρi Pi0 (ρi exp − i0 2Tρ 2πTρ b∈B ! (t+1) (ρi1 − m)2 1 √ exp − . 2(σ)2 σ 2π
(b) The coefficients according to (denoting by Φi f)
(t+1)
(t+1) Φi (f )
∼
Yh
b∈B
i (t+1) Dib Pib (Φi )
(f ) the sub-vector of Φi
(t+1)
(t)
!
limited to segment
1
det(Tf )− 2 R
(2π) 2F
1 (t+1) (t) T (t) −1 (t+1) (t) exp − (Φi (f ) − µf ) (Tf ) (Φi (f ) − µf ) ψ (t) (f, i) (15) 2 13
using (Φi
(t+1)
(t+1)
|Zi
= f ) ∼ N R (µf , Tf ) (11), and Bayes formula. The simulation of Φi (t)
F
(t)
(t+1)
takes into account ρi in Pib (Φi ), Zi , as well as the past information Tf , µf and (t) ψ (f, i). Since, we have no closed form expression for the cumulative distribution function of Φ i , we use once more Metropolis-Hastings algorithm with random walk. Then, we set φ , with probability min 1; fΦi (φi ) , i (t+1) (t) fΦi (Φi ) Φi = (t) Φi , otherwise (t+1)
(t)
(t+1)
(t)
(t)
where the generating density is
fΦi (φi )
=
Y
(t)
Dib
[Pib (φi )]
1
det(Tf )− 2 R
(2π) 2F
b∈B (t)
1 (t) (t) (t) exp − (φi − µf )T (Tf )−1 (φi − µf ) 2
(16)
ψ (f, i).
3. Updating the values of parameters. Using the simulation output, the parameters are updated as follows. (a) Computing (Tf )−1 . Recall that we have supposed that the covariance matrix of the density characterizing the random utility coefficient follows a law Tf−1 ∼ Wishart(νf , Sf ). After some (t+1)
computations described in Appendix 7.1, we find that the updating rule for the matrix (T f at time step t + 1 takes the form
(t+1) −1 )
(t+1) −1
(Tf (b)
)
∼ Wishart νf + Nf , (Sf−1 −
X
(t+1)
(Φi
i∈N
(t)
(t+1)
(f ) − µf )(Φi
(t)
(f ) − µf )T 1{Z (t+1) =f } )−1 i
(17)
(t+1) Updating µf . If the average value of coefficients estimate on segment f is P (t+1) Φ 1 (t+1) (t) (t+1) −1 1 (t) (t+1) − 1 i∈N i ξ1 ξ2 det((Ωf )−1 −(Tf ) ) 2 det(Ωf Tf ) 2 {Z =f } i ¯f = P , and if we let σ = Φ R 1 i∈N {Z (t+1) =f } (2π) F
,
i
with the parameters defined as follows
(t+1)
(t+1)
vf
=
and (t+1)
Ωf the updating rule for µf
(t+1)
(t+1)
µf
=
Ωf
(t+1) −1 ¯ (t+1) ) Φf
(Tf
σ−1 σ 2 − 1 (t+1) 0 Tf , Ωf = Ω f σ2
takes the form (see the Appendix 7.2 for details) (t+1) (t+1) −1 ¯ (t+1) ∼ N R vf + Ωf (Tf ) Φf ; Ωf + T f . F
(18)
(c) Computing ωf and ψ (t+1) (f, i). Using the hyper a priori densities, the parameter associated to the probability that the client i belongs to the market segment f is updated using the marginal discrete density d:, (t+1) (t+1) ωf ∼ Nf , (t+1)
14
!
.
P (t+1) where, Nf = i∈N 1{Z (t+1) =f } . The probability that the client i belongs to the segment f , i results from the normalization of the previous values, (t+1)
ψ (t+1) (f, i) = P
ωf
(t+1)
f 0 ∈F
ωf 0
.
Figure 2 illustrates the different steps of the process. The algorithm stops when the reservation prices become stable, i.e. |RP (t+k) (i, b) − RP (t) (i, b)| ≤ ε ∀k = 1, 2, 3, .., T < +∞, ∀i ∈ N , ∀b ∈ B, for T determined by the network operator and ε ≥ 0.
Figure 2: The preference algorithm: learning the utility coefficients.
4 Market segmentation The above analysis was perfomed in the case where the segmentation was fixed, but how to define this segmentation optimally? The goal of this section is to introduce such an optimal market segmentation (see [9], Chib and Jeliazkov [7] and, Newton and Raftery [23]), i.e. the optimal number of segments. 15
The algorithm described in Figure 2 is performed for any value k of number of segmets, 1 ≤ k ≤ F . In each case, the vector Φ is estimated. Define the corresponding (conditional) density as f (Φ|k). If we observe the estimated parameter vector Φ, the a posterori density can be expressed as f (Φ|k) π(k) , π(k|Φ) = PF f (Φ|l)π(l) l=1
(19)
where, π(l) is the a priori probability to choose l segments. The marginal likelihood for the vector Φ of parameters is F X f (Φ|l0 )π(l0 ). f (Φ) = l0 =1
From (19), we get
f (Φ) =
f (Φ|k)π(k) . π(k|Φ)
In order to choose the number of segments, we choose the one that maximizes the log marginal likelihood log (f (Φ)) = log (f (Φ|k)) + log (π(k)) − log (π(k|Φ)) . But, as described by Newton and Raftery [23], the harmonic mean of the likelihood values can be seen as an estimate of the marginal likelihood: fˆ(Φ, s) =
(
s
1X f (Φ|k)−1 s k=1
)−1
, s = 1, 2, ..., F.
Practically, we run our algorithm for all values of the number of segments and compute the parameter density (16). The value of s maximizing the log marginal likelihood is the optimal number of segments.
5 Numerical experiments Let us now illustrate the impact of the algorithm.
5.1
Marketing interpretations and accuracy of the preference algorithm
We consider two rival operators. Operator 1 commercializes a simple offer of Mobile, a double-play offer of wired-telephony-Internet and a triple-play offer of Mobile-Internet-Television. The other operator sells simple offers of wired-telephony and Internet, and a bundle of Fix telephony-Internet-Television. The choice set B is now made of the offers commercialized by both operators. We test our algorithm on data issued from a survey realized by Orange in 2006. The consumer panel was made of 1014 families, i.e. 2058 individual consumers. It consists of a set of questions aiming at understanding individual habits and behaviors towards telecommunication, multimedia and leisure activites. The optimal number of segments using the analysis of previous section is four. The preference model enables us to measure the importance and complementarity of the attributes, which is fundamental to build an optimal offer. Besides, it describes accurately all the market segments. As output of our algorithm, we get estimates of the random utility coefficients for each customer, on every market segment. Furthermore, we infer the consumers’ reservation prices on every market segment and for each offer. Using a non parametric approach, described in Appendix 7 Subsection 7.4, we estimate the resulting density functions. This last part enables us to get more accurate marketing interpretations of our algorithm outputs. For example, in Figures 3, 4, 5, 6, the first market segment represents customers who give importance to the brand reputation and do not care too much about the product prices (for example, teenagers), 16
Trust in the brand distribution on segment 1
2
Trust in the brand distribution on segment 3
1.4
1.8 1.2 1.6 1
1.4
distribution
distribution
1.2 1 0.8 0.6
0.8
0.6
0.4
0.4 0.2 0.2 0
0
0.2
0.4
0.6 0.8 1 1.2 1.4 trust in the brand importance [beta−trust(1)]
1.6
1.8
0
2
Figure 3: Distribution of Trust in the Brand impor-
0.2
0.4
0.6 0.8 1 1.2 1.4 trust in the brand importance [beta−trust(3)]
1.6
1.8
2
Figure 4: Distribution of Trust in the Brand impor-
tance on segment 1 (βTrust (1)).
tance on segment 3 (βTrust (3)).
Price sensitivity distribution on segment 1
1.6
0
Price sensitivity distribution on segment 3
1.5
1.4
1.2 1
distribution
distribution
1
0.8
0.6
0.5
0.4
0.2
0 −2
−1.8
−1.6
−1.4
−1.2 −1 −0.8 price sensitivity [alpha−P(1)]
−0.6
−0.4
−0.2
0 −2
0
Figure 5: Price sensitivity on segment 1 (αP (1)).
−1.8
−1.6
−1.4
−1.2
−1 −0.8 price sensitivity
−0.6
−0.4
−0.2
Figure 6: Price sensitivity on segment 3 (αP (3)).
17
0
Complementary CDFs without error, with error correlations (*), without correlation (o) for the 3−play offer 1
complementary cumulative distribution function values
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
5
10
15 20 25 30 35 maximum price for the TV−Internet−Fix offer
40
45
50
Figure 7: Complementary cumulative distribution functions for direct elicitation, with error correlations (*), without correlation (o).
whereas on the market segment 3, customers do not care too much about the brand reputation but don’t want to pay too much (for exemple, senior customers). The x-axis represent the parameter values. In Figure 7, we have drawn the complementary cumulative distribution function and its estimates taking into account error correlations (in stars) or without correlation (in circle), of the maximal price that cutomers associate with the triple-play offer of wired-telephony-Internet-Television. Our aim is to determine, as accurately as possible, the maximal price each customer would pay per month to get this offer. The x-axis represents the maximum price value. If correlations are considered, for a maximal price between 0 and 30, customers give reservation prices by direct elicitation which are lower than what they really think. They cheat to make the offer price decrease. But, if the maximal price is higher than 30, they make judgement errors and give prices which are higher than what they really think. The mean absolute percentage error (MAPE) 3 between the estimated reservation prices with error correlations and the true one obtained by direct elicitation via the Orange’s survey, is of around 15%. In the case where the error correlations are not considered, i.e. each error is drawn according to a Logit density, the mean absolute percentage error is of around 33%. The lowest value of MAPE in the correlation case indicates that taking into account error correlations is useful for estimating reservation prices.
5.2
Determination of the best marketing strategy using Gini’s indexes
Gini’s indexes might be indicators to identify the willingness to pay of each market segment, for every offer. Hence, the operator can determine which segment(s) to target. The Gini’s index [27] for the statistical serie P P P n
n
n
|xi −xj |
i , where m = i=1 (x1 , x2 , ..., xn ) is of the form: G := i=1 nj=1 . G ∈ [0; 1] traditionally measures 2 m n the inequality between the revenue distribution provided x is associated with revenue measures. G = 0 means that everyone has the same revenue, while G = 1 means that there is a perfect inequality. In Tables 2 and 3, for every offer and on each market segment (there are six offers and four optimal market segments), we have computed the Gini’s indexes for the series of reservation prices obtained as outputs of the algorithm described in this article. The Gini’s indexes for a specific offer gives the operator information about the willingness to pay of the customers. If there is low disparity (small indexes), it means that everyone valuates the offer the same way. But, if the disparity is strong (index around 1), it means that only some people valuate the product. We assume that the operators have three possible marketing strategy: 3
x
The mean absolute percentage error between the data At and the estimated sequence Ft is
18
1 n
Pn
t=1
t |. | AtA−F t
• either they just target a market segment. This situation is typical of Mobile Virtual Network Operators (MVNOs) in telecommunication, which are carriers providing users with mobile services without their own license for bandwidth, but renting that bandwidth to other providers; • eiter they can expand their market shares to become larger operators and therefore get a higher revenue on a longer term. Market share expansion strategy is usually used by new market entrants. To fetch numerous market shares, the operator has first to lower his offer price to seduce as many consumers as possible; • or they choose to discriminate prices among segments for increasing their benefits (typical revenue management situation of large operators). Such a strategy is profitable only for operators having a sufficiently large market share. Judging by the Gini’s index values, we give the most appropriate strategy to use in the last column. If there is a small Gini’s index on a unique segment and the associated reservation prices are rather high, then the operator should target this segment. If two or more segments seem to be quite interested in the offer (have small Gini’s indexes and high reservation prices), a price discrimination strategy should be appropriate. But, if there is no low enough Gini’s index (i.e. the disparity between consumers’ valuation is high), a market share expansion strategy to start seducing some customers, seems better. offer offer 1 (Op 1) offer 2 (Op 1) offer 3 (Op 1)
segt 1 index G(1) = 0.8 G(1) = 0.3 G(1) = 0.9
segt 2 index G(2) = 0.2 G(2) = 0.6 G(2) = 0.85
segt 3 index G(3) = 0.3 G(3) = 0.7 G(3) = 0.05
segt 4 index G(4) = 0.9 G(4) = 0.1 G(4) = 0.2
strategy Target or Discriminate Target or Discriminate Target
Table 2: Gini’s indexes for Operator 1’s three offers on every market segment. offer offer 1 (Op 2) offer 2 (Op 2) offer 3 (Op 2)
segt 1 index G(1) = 0.6 G(1) = 0.6 G(1) = 0.4
segt 2 index G(2) = 0.9 G(2) = 0.85 G(2) = 0.6
segt 3 index G(3) = 0.8 G(3) = 0.3 G(3) = 0.9
segt 4 index G(4) = 0.75 G(4) = 0.9 G(4) = 0.4
strategy Expansion Target Expansion
Table 3: Gini’s indexes for Operator 2’s three offers on every market segment.
6 Conclusions We have presented a way to model customers’ preferences using discrete choice theory. Using this customers’ information, we have developped an approach to help an operator to price optimally his offers so as to maximize his utility under uncertainty on the customer true reservation prices and on his rival profit margins. Besides, the model enables us to determine which market strategy is the most profitable for an operator. A possible research direction is to introduce a second level of game to deal with competition among operators. To extend this article and define the optimal offer, the operator has to learn the customers’ preferences. A trade-off has then to be found between an exploration strategy where all the possible offer combinations are tested (to learn the consumer preferences and determine the best suited offer), which can be costly, and an exploitation policy (to learn the consumers’ offer valuation and determine the optimal price), using acquired information. Indeed, in the exploitation process, the operator does not change his product line but tries to determine the optimal prices of his offers. He learns the consumers preferences for a specific offer combination. But he might also loose some money since the selected offer combination might not be totally suited to the targeted consumers. The exploration process which consists in testing numerous offer combinations should have improved his knowledge about the consumers’ preferences, but on the other side, 19
since the exploitation process cannot be completed if the line evolves continuously, prices are not optimal. So the operator might loose some money. Bisi and Dada [4] adress the problem of determining optimal ordering and pricing policies in a finite horizon newsvendor model with unobservable lost sales (i.e. a combination of ordered exploration and exploitation) and, Chen and Chen [6] propose a dynamic version of the joint pricing and lot size-scheduling problem taking into account some capacitated constraints impacting telecommunication networks. Cesa-Bianchi and Lugosi [5] deals with one seller having only partial information on the customer tastes, but increasing his knowledge by observing the effects of price variations on the customers decisions (i.e. an exploitation strategy).
7 Appendix 7.1
Computations for updating (Tf
(t+1) −1
)
For the sake of simplicity, let Φi = Φi and µf = µf . We have supposed that the covariance matrix of −1 the density characterizing the random utility parameters Tf ∼ Wishart(νf , Sf ). Since Φi (f ) ∼ N R (µf , Tf ), we know according to the definition of a Wishart density (cf Tassi [27]), that there F exists a matrix X Mf = (Φi (f ) − µf )(Φi (f ) − µf )T 1{Zi =f } (t+1)
(t+1)
i∈N
distributed according to a Wishart density Wishart( FR , Tf ), such that: Sf−1 = Mf + Bf , Bf ∈ Mat(
R R , ). F F
Computing the expectation of Tf−1 , we get
E[Tf−1 ] =
Z
exp − 21 Trace(Sf−1 Tf−1 ) Tf−1 dTf−1 . νf Q R R R R F Γ 21 (νf + i − 1) 2νf 2F π 4F ( F −1) det(Sf ) 2 i=1 det(Tf−1 )
νf − R −1 F 2
(20)
But, we know from the a priori density definition and (16), that
(t+1)
(Φ(t+1) |Zi
= f)
∼
Yh
(t+1)
Pib (Φi
− 1 2 iDib det (Tf(t) )−1 (2π)
b∈B
(Φi (f )(t+1) −
(t) µf ) ,
R 2F
1 (t) (t) exp − (Φi (f )(t+1) − µf )T (Tf )−1 2
using the basic properties of the determinant operator det(Tf ) = det((Tf )−1 ). Besides, (t)
(t)
Trace(Sf−1 Tf−1 ) = Trace(Mf Tf−1 + Bf Tf−1 ) = Trace(Mf Tf−1 ) + Trace(Bf Tf−1 ) , by linearity of the Trace operator, ! X (Φi (f ) − µf )Tf−1 (Φi (f ) − µf )T 1{Zi =f } + Trace(Bf Tf−1 ) = Trace i∈N
= Trace
X
i∈N
(Φi (f ) −
µf )T Tf−1 (Φi (f )
20
− µf )1{Zi =f }
!
+ Trace(Bf Tf−1 ),
using the fact that Trace(AB) = Trace(BA) and that the covariance matrix (T f By substitution in (20), we get
(t+1) −1 )
E[Tf−1 ]
Z
=
Tf−1
X
det(Tf−1 )
νf − R −1 F 2
i∈N
is symmetric4 .
1X exp − (Φi (f ) − µf )T Tf−1 (Φi (f ) − µf )1{Zi =f } 2 i∈N
!
R h F i−1 νf Y R R R 1 1 −1 νf 2F ( −1) dTf−1 . π 4F F det(Sf ) 2 exp − Trace(Bf Tf ) 2 Γ( (νf + i − 1)) 2 2
i=1
Then, by identification, we can update the distribution (t+1) −1 (Tf )
7.2
∼ Wishart νf +
Nf , (Sf−1
−
X
(t+1) (Φi (f )
i∈N
−
(t) (t+1) µf )(Φi (f )
−
(t) µf )T 1{Z (t+1) =f } )−1 i
!
.
(21)
Computations for updating µf
(t+1)
Recall that we made the assumption on the hyper a priori densities µf ∼ N R (vf , Ωf ). Besides, we know F from the Φi density definition (16) that
¯ (t+1) ) fΦi (f ) (Φ f
=
Y
1 ¯ (t+1) (t+1) −1 ¯ (t+1) (f ))T (Tf ) Φf ) exp(− (Φ f 2 F (2π) b∈B h 1 ¯ (t+1) T (t+1) −1 1 T (t+1) −1 ¯ (t+1) i exp( (Φ ) (T ) µ ) exp( µ (T ) Φf ) N0;T (t+1) (µf ), (22) f f 2 f 2 f f f
where N0;T (t+1) (µf ) is the (t)
f
(t+1) −1 − 1 det((Tf ) ) 2 (t+1) D ¯ [Pib (Φ )] ib R f
R F
dimension-normal density with mean 0 and covariance matrix Tf
(t+1)
applied to
the vector µf . If we realize an integration of µf with respect to a normal density defined below, we get Z 1 µf R 1 (2π) 2F det(Ωf ) 2 1 (t+1) −1 ¯ (t+1) T −1 (t+1) −1 ¯ (t+1) ) Φf Ωf (µf − vf ) − Ωf (Tf ) Φf )dµf exp(− (µf − vf ) − Ωf (Tf 2 Z 1 1 = µf − (µf − vf )T Ω−1 R 1 exp f (µf − vf ) 2 (2π) 2F det(Ωf ) 2 1 1 (t+1) −1 ¯ (t+1) ¯ (t+1) )T (T (t+1) )−1 (µf − vf ) exp (µf − vf )T (Tf ) Φf exp (Φ f 2 2 f 1 ¯ t+1 )T ((T (t+1) )−1 )T Ωf (T (t+1) )−1 Φ ¯ (t+1) dµf . exp − (Φ f f f f 2 ¯ But, we have already drawn Φ f
(t+1)
and Tf
(t+1)
and know Ωf , hence,
1 ¯ t+1 )T ((T (t+1) )−1 )T Ωf (T (t+1) )−1 Φ ¯ (t+1) ξ2 := exp − (Φ f f f 2 f
can be computed numerically. 4
If A is symmetric then AT = A. Let A−1 be the inverse of A. By definition, we have: A A−1 = A−1 A = I. By transposition of this expression and using the symmetry of A, we get: (A−1 )T A = A (A−1 )T = I. By substraction of this expression in the previous one, we get: A (A−1 − (A−1 )T ) = (A−1 − (A−1 )T ) A = 0. Since A 6= 0, it means that (A−1 )T = A−1 , i.e. A−1 is symmetric.
21
¯ Using (22) evaluated in Φ f
(t+1)
, we get that:
1 1 ¯ (t+1) T (t+1) −1 (t+1) −1 ¯ (t+1) exp( (µf − vf )T (Tf ) Φf ) exp( (Φ ) (Tf ) (µf − vf )) 2 2 f R ¯ (t+1) ) fΦ(t+1) (Φ f 1 ¯ (t+1) T (t+1) −1 ¯ (t+1) 1 (2π) F i exp( (Φ ) (Tf ) Φf ) = Q 1 (t+1) (t+1) 2 f ¯ )−1 )− 2 N0;Tf(t+1) (µf ) )]Dib det((Tf b∈B [Pib (Φf
= ξ1
1
Nv
(t+1) f ;Tf
(µf )
where, by identification ξ1 =
,
¯ (t+1) ) (t+1) (Φf i Q ¯ (t+1) )]Dib b∈B [Pib (Φf f
Φ
R
(2π) F
(t+1) −1 − 1 ) ) 2
det((Tf
(t+1) −1 ¯ (t+1) 1 1 ¯ (t+1) T ) (Tf ) Φf ). exp ( 2 (Φf
Finally, we get that: Z 1 1 (t+1) −1 ¯ (t+1) T −1 µf Ωf (µ − v ) − Ω (T ) Φ exp(− f f f R 1 f f 2 (2π) 2F det(Ωf ) 2 (t+1) −1 ¯ (t+1) (µf − vf ) − Ωf (Tf ) Φf )dµf Z h det(Ω−1 − T −1 )det(Tf ) i 1 1 R 1 2 f f = µf ξ1 ξ2 (2π) 2F exp − (µf − vf )T (Ω−1 )(µ − v ) dµf . f f f det(Ωf ) N0;T (t+1) (µf ) 2 f
Consequentely, the rule for updating µf
(t+1)
(t+1)
µf
7.3
takes the form
(t+1) −1 ¯ (t+1) (t+1) ) Φf ; Ωf + T f . ∼ N R vf + Ωf (Tf F
(23)
Simulation of the error coefficients via a method of acceptance-rejection
The principle of the acceptance-rejection algorithm, in order to generate a random variable according to density fε is to select a density g easy to sample and a constant M such that fε (x)/(M g(x)) ≤ 1 ∀x. The algorithm proceeds as follows: 1. Draw x according to density g and u according to a uniform law between 0 and 1. ( (x) Take ε(i, b)(t+1) = x, if u ≤ Mfεg(x) , 2. go back to 1, otherwise. We can take the type I extrem generalized density which is easy to generate, and whose expression is " # ε(i, b) ε(i, b) 1 g(ε(i, b)) = (t+1) exp − (t+1) − exp(− (t+1) ) . ρi1 ρi1 ρi1 From simple computations, we get a constant M of the form X ε(i, b) (t+1) (t+1) (t+1) exp (− exp(−ε(i, 0))) exp(−ρi1 M = ρi1 exp(− (t+1) )) exp 1 − ρi1 . ρ k6=b
i1
How to implement it in our algorithm? At the step t + 1 of the algorithm, we start by simulating ε(i, 1) (t+1) knowing ε(i, 0)(t+1) . Then, we simulate ε(i, b)(t+1) ∀b ≥ 2, conditionally to ε(i, 1)(t+1) , ..., ε(i, b − 1)(t+1) and ε(i, 0)(t+1) . 22
7.4
Density estimation
In the above analysis, we look for the densities of coefficients, but how do we infer it from the simulation output? Let p be an unknown density continuous function with respect to the Lebesgue measure on R, and let (βa (f, 1), βa (f, 2), ..., βa (f, N )) be a size-N sample of variables generated according to p, the same density. We want to estimate p and get some properties onthe estimator for huge samples of customers. An estimate of p is a function x 7→ pN (x) = pN x; βa (f, 1), ..., βa (f, N ) , measurable with respect to the observation βa (f, 1), βa (f, 2), ..., βa (f, N ) . The only information we have is that p ∈ P, where the class P is not in bijection with any finite-dimensional sub-set5 . Then we use a non-parametric framework. We introduce Parzen-Rosenblatt’s kernel estimator with window h > 0, since our outputs seem to be drawn according to skewed, non unimodal densities and that this estimator enables us to easily find asymptotic properties (cf Tsybakov [29]) N βa (f, i) − x 1 X K pˆN (x) = Nh h i=1 R where K : R → R, is an integrable function, such that K(u)du = 1. In the asymptotic framework, we suppose that the window h depends on N . We check the properties of the 2 well-known kernels (cf Tsybakov [29]):
• the rectangular kernel, K1 (u) = 21 1{|u|≤1} . In this case, the Parzen-Rozenblatt’s estimator can be seen as a derivative of the estimated cumulative distribution function converging almost surely towards the non parametric density, provided hN → 0 when N → +∞ . • The Epanechnikov’s kernel, K2 (u) = 34 (1 − u2 )1{|u|≤1} , which is a parabolic estimator of the density. In the numerical experiments (cf Section 5), we use the Epanechnikov’s kernel. 7.4.1
Study of the kernel estimator variances
To study the asymptotic properties of the kernel estimators, we introduce the quadratic risk of pˆN , at the point x0 : h i M SE(x0 ) := Ep (ˆ pN (x0 ) − p(x0 ))2 ,
where the expectation is computed with respect to the distribution of (β a (f, 1), βa (f, 2), ..., βa (f, N )). If we define the bias of pˆN in x0 as: b(x0 ) = Ep [ˆ p (x )] − p(x0 ) and the variance of pˆN as: h 2 i N 0 pN (x0 )) , we check the following relation (cf [29]): σ 2 (x0 ) = Ep pˆN (x0 ) − Ep (ˆ M SE(x0 ) = b2 (x0 ) + σ 2 (x0 ).
In [29], it is proved that if the density p satisfies: p(x) ≤ pmax < +∞, ∀x ∈ R, and that Ki is a kernel such that: Z Z Ki (u)du = 1, Ki2 (u)du < ∞ , i = 1, 2. Then, for all x0 ∈ R, h > 0 and n ≥ 1, 5
σ 2 (x0 ) ≤
C1 , Nh
Examples: the continuous densities on R, the Lipschitz densities on R.
23
R with C1 = pmax Ki2 (u)du , i = 1, 2. In our case, the density is bounded, maxi∈N {βa (f, i)} < +∞. R since on the market segment f : 0 ≤ pRmax = 2 (u)du = 1 < +∞ and By computation, we check that: K (u)du = 1, i = 1, 2. Then, we get: K i 1 R 2 1 1 + 40 ) < +∞. Consequentely, we have the following constants for our kernels: K2 (u)du = 9( 18 − 12 C1 = pmax et C1 = 9pmax ( 18 − N → +∞, then: σ 2 (x0 ) → 0. (2)
(1)
7.4.2
1 12
+
1 40 ).
Hence, we infer that, if h = hN is such that N hN → +∞ when
Study of the biases of the kernel estimators
Let κ ∈ R, be a real number. We note κ+ , the largest integer which is strictly inferior to the real κ and κ+ , the smallest integer which is strictly higher than the real κ. Let T , be an interval of R, κ > 0, L > 0. We define the following class of functions: n o Σ(κ, L) = f : T → R | f (l) exists , l = κ+ and |f (l) (x) − f (l) (x0 )| ≤ L |x − x0 |κ−l , ∀x, x0 ∈ T .
Definition 6. Let l ≥ 1, anRinteger. K : R →RR, is a kernel of order l, if the functions u j K(u), j = 0, 1, ..., l are integrable and satisfy: K(u)du = 1 et uj K(u)du = 0, j = 1, ..., l. By integration, we show that the kernels K1 et K2 are of order 1. We suppose that p belongs to the class of densities P = P(κ, L) defined by: Z P(κ, L) := p | p ≥ 0, p = 1 and p ∈ Σ(κ, L) on R .
The following result is proved by Tsybakov [29]. Let p ∈ P(κ, L) and let K, be a kernel of order l = κ + such R that: |u|κ K(u)du < ∞, then, for all x0 ∈ R, h > 0 and N ≥ 1, with C2 =
and that:
L l!
R
|b(x0 )| ≤ C2 hκ ,
|u|κ |K(u)|du. We check that: Z |u|κ K1 (u)du = Z
1 < ∞, κ+1
1 + (−1)κ+1 1 + (−1)κ+3 − κ+1 κ+3 n o R 2 2 But, in our case, κ ∈ [1; 2[, hence: |u|κ K2 (u)du = 43 κ+1 . − κ+3 |u|κ K2 (u)du =
3 4
< ∞.
For the rectangular kernel, L1 > 0 can be arbitrary chosen, since p0N (x) = 0, ∀x ∈ R. For the parabolic kernel, if x, y ∈ [−1, +1], |p0N (x) − p0N (y)| ≤ h33 |x0 − x|κ−1 , κ ∈ [1; 2[. Hence, L2 = We then infer that: C2
(1)
7.4.3
=
L1 κ+1
and C2
(2)
=
3L2 2 4 ( κ+1
−
2 κ+3 )
3 . h3
, κ ∈ [1; 2[.
Upper-bounding of the MSE
First, we determine L > 0. Let x, x0 ∈ R , then p(x) − p(y) = pN (x) − pN (y) + b(x) − b(y) . But,
using the bias inequalities, we have: b(x) − b(y) ≤ C2 hκ , i = 1, 2. Consequentely, for N large enough, hκ → 0 and then p(x) − p(y) ≈ pN (x) − pN (y) . (i)
24
Kernel
Variance upper bound
Bias upper bound
Uniform
pmax N hN
L1 κ κ+1 hN
Epanechnikov
1 1 + 40 ) 9pmax ( 18 − 12 N hN
2 9 4 ( κ+1
−
(1)
κ−3 2 κ+3 )hN
Optimal window 1 2
1
pmax N − 2κ+1 (hN )∗ = (κ+1) 2κL21 h 2 2κ[ 3 ( 2 − 2 )]2 i 2κ+1 1 4(κ−1) (2) − 2κ+1 κ+3 (hN )∗ = 3 2κ+1 ( 9p 4 κ+1 ) (1− 1 + 1 ) 2κ+1
max 8
N
12
40
2κ+1 − (2κ+1)(4(κ−1))
Table 4: Kernel property comparisons, L1 > 0, κ ∈ [1; 2[. In Tsybakov’s book [29], to minimize the MSE in hN , we just have to let h = h∗N , where: 1 2κ+1 C1 . In our case, we get h∗N = 2κC 2 2
(1) (hN )∗
and
(2) (hN )∗ =
=
−
1 12
+
3L2 2 4 ( κ+1
−
2 κ+3 )
9pmax ( 81 2κ
h
(κ + 1)2 pmax 2κL21
1 40 )
1 2κ+1
i2
1
N − 2κ+1 ,
1 2κ+1 1
N − 2κ+1 , κ ∈ [1; 2[.
Using Tsybakov [29], we get a mean square error of the form: 2κ M SE(x0 ) = O N − 2κ+1 , N → ∞ , κ ∈ [1; 2[, ∀x0 ∈ R. Kernel property comparison (cf Table 4) shows that the Epanechnikov’s kernel gives us variance upper-bound smaller than the uniform one, for our application. Hence, it should be more reliable to use. Besides, in both cases, the resulting quadratic risk is proved to be close to zero when the number of sampled consumers is sufficiently large. It means that our density estimate is very close to the true one when enough consumers are considered.
References [1] Allenby G., Ginter J. On the Heterogeneity of Demand, Journal of Marketing Research, 1998;35: 384-389. [2] Aydin G., Ryan J.K. Product Line Selection and Pricing Under the Multinomial Logit Choice Model, Proceedings of the 2000 MSOM Conference, University of Michigan, Ann. Arbor, MI; 2000. [3] Berry S., Pakes A. The Pure Characteristics Demand Model, International Economic Review, 2005;163: 530-544. [4] Bisi A., Dada M. Dynamic Learning, Pricing, and Ordering by a Censored Newsvendor, Naval Research Logisitcs, 2007;54: 448-461. [5] Cesa-Bianchi N., Lugosi G. Prediction, Learning, And Games, Cambridge University Press; 2006. [6] Chen J.-M., Chen L.-T. Pricing and production lot-size/scheduling with finite capacity for a deteriorating item over a finite horizon, Computers and Operations Research, 2005;32: 2801-2819. [7] Chib S., Jeliazkov I. Marginal Likelihood From the Metropolis-Hastings Output, Journal of the American Statistical Association, 2001; 96: 138-149. 25
[8] Choo E., Wedley W. A common framework for deriving preference values from pairwise comparison matrices, Computers and Operations Research, 2004;31: 898-908. [9] Chung J., Rao R. A General Choice Model for Bundles with Multiple-Category Product: Application to Market Segmentation and Optimal Pricing for Bundles, Journal of Marketing Research, 2003;11: 115-130. [10] Farquahr P., Rao V. A Balance Model for Evaluating Subsets of Multiattributed Items, Management Science, 1976;22: 528-539. [11] Harlam B., Lodish L. Modeling Consumers’ Choices of Multiple Items, Journal of Marketing Research, 1995;32: 404-418. [12] Hitt L., Chen P.-Y. Bundling with Consumers Self-Selection: A Simple Approach to Bundling Low-Marginal-Cost Goods, Management Science, 2005;51: 1481-1493. [13] Holenstein R. Using Sampling to Compute Bayes-Nash Equilibrium in Auction Games, Course Project, Departement of Computer Science, University of British Columbia; 2005. [14] Jeididi K., Jagpal S., Manchanda P. Measuring Heterogeneous Reservation Prices for Product Bundles, Marketing Science,2003;22: 107-130. [15] Kaefer F., Heilman C., Ramenofsky S. A neural network application to consumer classification to improve the timing of direct marketing activities, Computers and Operations Research, 2005;32: 2595-2615. [16] Kephart J.O., Brooks C.H., Das R., Mackie-Mason J.K., Gazzale R., Durfee E.H. Pricing Information Bundles in a Dynamic Environment, ACM Conference on Electronic Commerce; 2001. [17] Khouja M., Robbins S. Optimal pricing and quantity of products with two offerings, European Journal of Operational Research,2005;163: 530-544. [18] Kohli R., Mahajan V. A Reservation-Price Model for Optimal Pricing of Multiattribute Products in Conjoint Analysis, Journal of Marketing Research, 1991;28: 347-354. [19] Lau K.-N., Yang C.-H., Post G. Stochastic preference modeling within a switching regression framework, Computers and Operations Research, 1996;32: 1163-1169. [20] Matsatsinis N., Samaras A. Brand choice model selection based on consumers’ multicriteria preferences and experts’ knowledge, Computers and Operations Research, 2000;27; 689-707. [21] McAlister L. A Dynamic Attribute Satiation Model of Variety Seeking Behavior, Journal of Consumer Research, 1979;9: 213-224. [22] McGill J., Van Ryzin G. Revenue Management: Research and Prospects, Transportation Science, 1999;33: 233-256. [23] Newton M., Raftery A. Approximate Bayesian Inference with the Weighted Likelihood Bootstrap, Journal of the Royal Statistical Society, 1996;56: 3-48. [24] Poirier D. A Bayesian Analysis of Nested Logit Models, Journal of Econometrics, 1996;75: 163-181. [25] Robert C. Markov Chain Monte Carlo Methods, Economica, Statistique Math´ematique et Probabilit´es; 1996. [26] Tallury K., Van Ryzin G. Revenue Management Under General Discrete Choice Model of Consumer Behavior, Management Science, 2004;50: 15-33. 26
[27] Tassi P. Statistical Methods, Economica, 3-rd Edition; 2004. [28] Train K. Discrete Choice Methods with Simulation, Cambridge University Press; 2002. [29] Tsybakov A. Introduction to Non Parametric Estimation, Springer Verlag; 2003.
27