Random Choice as Behavioral Optimization - CiteSeerX

Random Choice as Behavioral Optimization†

Faruk Gul Paulo Natenzon and Wolfgang Pesendorfer Princeton University

August 2010 Abstract We study random choice rules to capture violations of the weak axiom of revealed preference. We show that the Luce rule is the unique random choice rule that admits a well-defined ranking of option sets. We consider two extensions of the Luce rule. The first, addresses the duplicates problem. The second, our most general model, accommodates three commonly observed regularities: the attraction effect, the compromise effects and the match-up effect.

†

1.

Introduction Many formal models of individual choice incorporate or explicitly model behavior that

is ruled out by standard utility maximization analysis. We can distinguish two categories of such models. New choice object models introduce a richer set of choice objects and permit novel behavior by enabling the utility functions that represent preferences to depend on new arguments. Behavioral optimization models consider familiar classes of choice objects but weaken the standard requirements on preferences to permit novel behavior patterns. Perhaps the first example of new choice object model is the Kreps and Porteus’ (1978) model of preference for timing of resolution of uncertainty. Other examples of new choice object models include the Kreps (1979), Dekel, Lipman and Rustichini (2001) and Ergin and Sarver (2010) models of preference for flexibility and the Gul and Pesendorfer (2001), Dekel, Lipman and Rustichini (2009), Stovall (2010) and Noor (2010) models of temptation and self-control. Behavioral optimization models include models that abandon the von NeumannMorgenstern independence axiom or probabilistic sophistication in the context of choice under uncertainty. More generally, they may give up standard requirements on choice functions, such as the weak axiom of revealed preference or Houthakker’s axiom and focus extensively on psychological procedures or environmental factors that influence choice outcomes. Strotz’ (1955) model of consistent planning and Simon’s (1978) model of satisficing behavior are two well-known behavioral optimization models. More recent behavioral optimization models include the Pollak (1968) and Peleg and Yaari (1973) models of consistent planning, the Köszegi and Rabin’s (2006) model of status quo dependent choice and the Manzini and Mariotti’s (2007) model of sequentially rationalizable choice. New choice object models aim to incorporate richer individual objectives into economic analysis while behavioral optimization models incorporate cognitive limitations in individuals’ ability to attain those objectives.1 One plausible way to deal with cognitive 1 Note however that Strotz’ work and the subsequent literature can be interpreted either as a model of a cognitive limitation (i.e., dynamic inconsistency) or a new type of individual objective (preference for commitment.) For a formal re-interpretation of Strotz’ behavioral optimization model as a new choice object model, see Gul and Pesendorfer (2001).

1

limitations is to model behavior as random; that is, to interpret the gap between the decision maker’s objectives and his limitations at meeting them as random mistakes. There are at least two reasons why modeling behavior as random is an attractive approach: first, many deterministic violations of the weak axiom of choice are unlikely to find much empirical support; while it is reasonable to expect that a particular decision maker may choose s from the set {s, t} on one occasion and choose t from {s, t, t′ } on another occasion, it is unlikely that this decision maker would choose s from {s, t} and t from {s, t, t′ } on every occasion. To put it differently, a deterministic theory may be too crude for measuring the intensity of an individual’s tendency to choose s over t. A second attractive feature of random choice models is that they enable different consumption outcomes without forcing the modeler to assume preference diversity. That is, random choice (or logit) models can be used as flexible representative agent models and serve as convenient aggregation devices. Thus, random choice models facilitate measurement and aggregation; that is, allow the econometrician to quantify intensity and use the behavior of distinct individuals as evidence for a single model. Luce (1959) introduces the idea of random choice as behavioral optimization. In particular, he proposes the “random choice hypothesis,” which asserts that the ratio with which option s is chosen from a set of options to the ratio with which t is chosen from the same set is constant across all sets that contain both s and t. The random choice hypothesis leads immediately to the Luce model: let ρ(a, b) be the probability that some outcome s ∈ a is chosen from the set b.2 Then, the random choice rule ρ is a Luce rule if there exist a strictly positive real number vs for every option s so that ρ(s, b) = ∑

vs

t∈b

vt

whenever s ∈ b. Standard deterministic theory implies that the decision maker has a coherent ranking of option sets. For the deterministic choice function r we can define the following ranking of option sets. Set a is better than set b (a ≽r b) if the choice from b ∪ c is in b only when 2

For singleton sets we write ρ(s, b) rather than ρ({s}, b).

2

the choice from a ∪ c is in a. If r maximizes a complete and transitive preference then ≽r is also complete and transitive and hence can be represented by an indirect utility function. The Luce model retains this property. For the Luce rule ρ, define ≽ρ as follows: a ≽ρ b if and only if ρ(a, a ∪ c) ≥ ρ(b, b ∪ c) for all c such that c∩(a∪b) = ∅. Hence, once again, the decision maker reveals a stochastic preference for a over b if he is as willing (i.e., as likely) to choose from a when faced with a ∪ c and he is to choose from b when confronting b ∪ c. It is easy to see that the Luce model implies that ≽ρ is complete and transitive since a ≽ρ b if and only if

∑

vs ≥

s∈a

Hence, V (a) =

∑ s∈a

∑

vs

s∈b

vs defines a stochastic indirect utility function for the Luce model.

Our first theorem (Theorem 1) provides a simple characterization of the Luce model. If the set of options is sufficiently rich, then the Luce model is equivalent to the following independence assumption: ρ(a, a ∪ c) ≥ ρ(b, b ∪ c) implies ρ(a, a ∪ d) ≥ ρ(a, a ∪ d)

(I)

whenever (a ∪ b) ∩ (c ∪ d) = ∅. Clearly, (I) is equivalent to the completeness of ≽ρ . The proof of Theorem 1 establishes that ≽ρ is also transitive whenever the richness assumption and (I) are satisfied. Hence, our characterization reveals that given richness, Luce rules are the only ones consistent with a complete and transitive ranking of option sets and hence a well defined indirect utility function. To put it differently, given richness, identifying a consistent ranking of option sets using the frequency distribution of choices is feasible if and only if the choices are induced by the Luce rule. While Theorem 1 formalizes the sense in which the Luce model is the natural extension of the rationality hypothesis to random choice, it offers no response to some of the well known regularities that are inconsistent with Luce model. The best known such regularity was first formalized by Debreu who noted that if two items s1 and s2 are very similar (a yellow and red bus) and t is a third dissimilar option (a train), then it may be that each 3

item is chosen with probability 1/2 from every two-element subset of {s1 , s2 , t} but t may chosen from {s1 , s2 , t} more frequently than each of the other two options. The problem that Debreu’s example identifies is more generally referred to as the “duplicates problem” in the discrete estimation literature. To address this shortcoming of the Luce rule, we introduce a modification of the Luce rule, the weighted attribute rule (WAR). To illustrate the WAR, consider Debreu’s example. Options s1 and s2 share a common attribute (“bus”) while option t has a unique attribute (“train”). The WAR assigns each attribute a value, e.g., w(b) for the bus and w(t) for the train. The value of an option is a weighted sum of its attribute values where the weight is the inverse of the number of other options in the set that have that attribute. If the set a contains n options with the single single attribute “bus” then each bus s is assigned a value vsa = w(b)/n. The values vsa determine the choice probabilities as in the Luce rule: ρ(s, a) = ∑

vsa

s′ ∈a

vsa′

In Debreu’s example, each option has a single attribute but, in general, options may have multiple attributes. In our model, attributes are identified with the set of options that have the attribute. Thus, in Debreu’s example b = {s1 , s2 } is the attribute “bus” and {t} is the attribute “train.” Theorem 2 shows that the WAR is the unique rule that emerges if we assume that (1) exact duplicates are treated as a single option and (2) maintain the Luce model once duplicates have been eliminated. For the Debreu example, this would mean that s1 and s2 are interpreted as as single option s no different than either s1 or s2 . That is, s1 and s2 are chosen with the same probability if they are both available and the probability of choosing t from {s1 , s2 , t} is the same as the probability of choosing t from {s1 , t}. One class of extensively studied random choice rules are random utility maximizers. Most econometric models of discrete choice such as logit, probit, nested logit, etc., are special example of random utility maximizers. A random utility is a probability distribution over utility functions and a random choice rule maximizes a random utility if the probability of choosing s for a is equal to drawing a utility function that attains its maximum in a at s. Block and Marschak (1960) show that Luce rules are random utility maximizers. In 4

Theorem 3, below, we show that this result extends to the weighted attribute rules. Every WAR is a random utility maximizer. However, not every random utility maximizer is a WAR. In particular, WARs satisfy a property known as stochastic transitivity: ρ(s, {s, t}) ≥ 1/2 and ρ(t, {t, t′ }) ≥ 1/2 implies ρ(s, {s, t′ }) ≥ 1/2 . Random utility maximizers may not satisfy stochastic transitivity due to a matchup effect illustrated in the following example. The options are four cars, a red and a yellow Porsche and a red and yellow Mercedes. With equal probability the random utility picks a utility function that assigns Mercedes a higher utility than Porsche or a utility function that does the reverse. Both utility functions rank red cars above yellow cars but the make of a car is the more important characteristic so that the yellow Mercedes has a higher utility than the red Porsche for one utility function while the yellow Porsche has a higher utility than the red Porsche for the other. If ρ maximizes this random utility then it chooses each option with probability 1/2 from the choice sets {red Porsche, yellow Mercedes} and {yellow Porsche, red Mercedes} but chooses only the red option from the sets {yellow Mercedes, red Mercedes}, a violation of stochastic transitivity. A further well known regularity inconsistent with random utility maximization – and therefore inconsistent with the Luce rule and WAR – is the attraction effect (Huber, et al. (1982)) and the related compromise effect (Simonson (1989)). The attraction effect refers to situations in which a new inferior option increases the market share of an existing similar but superior option. More formally, suppose s1 , s2 are similar options as above but s2 is inferior so that ρ(s1 , {s1 , s2 , t}) is small. The attraction effect arises if ρ(s1 , {s1 , s2 , t}) > ρ(s1 , {s1 , t}). Notice that random utility maximization implies monotone random choice rules (Luce and Suppes (1965)), that is, ρ(s1 , a ∪ b) ≤ ρ(s1 , a), for all sets a, b and therefore is inconsistent with the attraction effect. The compromise effect refers to a similar nonmonotonicity where an option is chosen with greater frequency after a more extreme option is added. We extend WARs to allow for negative attribute values and refer to the resulting model as the generalized attribute rule (GAR). The generalized attribute rule is identical 5

to the weighted attribute rule but allows for negative attributes, that is, attributes that reduce the probability of choosing the option. Theorem 5 establishes a connection between GARs and the Shapley value of an appropriate defined cooperative game. GARs are not a subset of random utility maximizers and therefore can potentially address the attraction and compromise effects. We provide example that show how GARs can generate attraction and compromise effects. We also provide an example that shows that GARs can generate matchup effects as described in the example above. 1.1

Related Literature Discrete choice models are used in economics, statistics and psychology. The simplest

example of such a model is the model of binary comparisons: Let A = {1, . . . , n} be a finite set outcomes and let B = {ij | i, j ∈ A, i < j} be the collection of two element subsets of A. Let Ω be any set of background variables. Then, for ω ∈ Ω, let ϕω (ij) be the probability of outcome i given that the experiment ij is conducted under conditions ω. When A = {1, 2} we omit the ij and when Ω is a singleton we omit the ω. Hence, in these situations we write ϕω and ϕ(ij) respectively. For example, let ω ∈ Ω be the dosage of a drug that is to be administered and let A = {1, 2}, where 1 and 2 denote the patients death and survival. That is, ϕω is the probability that a patient who receives the dose ω survives. For an economic example, consider the case in which ω is the price of the good and let 1 be the outcome that the consumer purchases it and 2 the outcome that he does not. Then, ϕω is the probability of a sale given price ω. The function ϕ is a vehicle for making assumptions on how the various ϕω (ij) relate to each other given a fixed ω and how a particular ϕω (ij) changes as ω is varied. Since our goal is to study random choice as a model of behavioral optimization, we will ignore the background variables and focus on the relationship among various choice probabilities. In practice, identifying convenient formulations that incorporate various background variables such as price, quality, dosage etc is the central part of the analysis. Call ϕ a pseudo random utility model if there exists a continuous cumulative distribution F with support IR, ηi for each i ∈ A such that F (t) + F (−t) = 1 and ϕ(ij) = F (ηi − ηj ) 6

The (thought) experiment is as follows: a random variable X with distribution F is drawn; if X > ηi − ηj , then the outcome is j otherwise it is i. If F is the cumulative of a standard normal variable, ϕ is called the probit model. The probit model was introduced by Thurstone (1927). The term probit is due to Bliss (1934). Zermelo (1929) observes that a closed form expression for the probit ϕ is not available and formulates a computationally simpler alternative: if F (x) =

ex 1+ex ,

then ϕ is the logit model. It is easy to verify that for

the logit model ϕ(ij) =

eηi eηi + eηj

Berkson (1944) coins the term logit for this ϕ. Call ϕ a simple random utility maximizer over binary choices if there are ηi for each i ∈ A such that ϕ(ij) = Prob{X + ηi ≥ Y + ηj } for some X, Y with joint distribution G with zero-mean marginals. The (thought) experiment is as follows: i.i.d. random variables, Xi , are drawn and the decision maker chooses the option that has the highest random utility ui = Xi + ηi . Thurstone (1927) observes that if G is the joint distribution of two independent normal random variables with σ =

√ 2 2

and ϕ maximizes the simple random utility G, {ηi }i∈A , then ϕ is the the probit model associated with the same ηi ’s. A more general observation in McFadden (1974) implies that if G is the product of two double exponential distributions; that is, if X, Y have distribution function G(x, y) = e−e

−x

−e−y

then the random utility model G, {ηi }i∈A is the logit model associated with the same ηi ’s. Random utility maximization over binary choices can immediately be generalized to random utility maximization over all nonempty subsets of A: if ρ(i, a) = Prob{Xi + ηi ≥ Yj + ηj } for all i, j ∈ a where the Xi ’s are mean-zero random variable, then, we say that ρ is a random utility maximizer. If Xi ’s are also i.i.d and have support support IR, then we say 7

that ρ is simple random utility maximizer. When Xi has a normal distribution with σ =

√ 2 2

we get Thurstone’s multinomial probit model and if each Xi has a double exponential distribution H(x) = e−e

−

x

, then we have McFadden’s multinomial logit model. McFadden

shows that choice probabilities associated with the multinomial logit are precisely the Luce rule probabilities associated with the Luce values vi = e−ηi . Prior to McFadden, Block and Marschak (1960) had shown that every Luce rule maximizes some random utility. They also identified a set of necessary conditions, the Block-Marschak inequalities, for a random choice rule to be a random utility maximizer. Falmagne (1978) proves that these conditions are in fact sufficient. Our Theorem 3 shows that WARs are also random maximizer. Moreover, the set of random maximizers is the closed convex hull of Luce rules.

2.

Luce Rules Let A be a nonempty set of choice objects. A set A of subsets of A is a proper

collection if (i) {s} ⊂ A for all s ∈ A, (ii) a ⊂ b ∈ A implies a ∈ A and (iii) a, b ∈ A implies a ∪ b ∈ A. One example of a proper collection is Af the set of all finite subsets of A. Unless stated otherwise, all sets denoted with lower case letters are elements of A. To simplify the statements below, we use the following notational convention: ab := a ∪ b

(C)

and identify s ∈ A with the singleton {s} ∈ A. Given any proper collection A, let A+ = A\{∅}. A function ρ : A × A+ is a random choice rule (RCR) if for all b ∈ A+ ρ(b, b) = 1 ∑ ρ(a, b) = ρ(s, b)

(1)

s∈a

The first part of (1) is the feasibility constraint; ρ must choose among options available in b; the second part of (1) is the requirement that ρ(·, b) is a countably additive probability. When we wish to be explicit about the domain, we refer to the function ρ together with the proper collection A of subsets of A as the random choice rule. When the underlying proper collection is clear, we suppress it. 8

Luce (1959) introduces the idea of random choice as behavioral optimization. In particular, he proposes the “random choice hypothesis,” which asserts that the ratio with which option s is chosen from a set of options to the ratio with which t is chosen from the same set is constant across all sets that contain both s and t. The random choice hypothesis leads immediately to the Luce model: let ρ(a, b) be the probability that some outcome s ∈ a is chosen from the set b. We write ρ(s, b) rather than ρ({s}, b). A function v : A → IR++ is a Luce value if for all b ̸= ∅, ∑

vs < ∞.

s∈b

Call a RCR, ρ, a Luce Rule if there exists a Luce value v such that ρ(s, b) = ∑

vs

t∈b

vt

(2)

whenever s ∈ b ∈ A. We say that the Luce value v induces ρ if equation (2) holds for all such s, b. Clearly, every Luce value induces a unique RCR. The Luce model is a generalization of the standard deterministic theory that allows random errors and hence is the natural benchmark for random choice as behavioral maximization. To see see this connection, let τ be a choice function; that is, τ (a) ⊂ a are the choices from a. Then, we can define the following ranking of option sets: a ≽τ b if and only if τ (bc) ∩ b ̸= ∅ implies τ (ac) ∩ a ̸= ∅ for all c disjoint3 from ab. Thus, a ≽τ b if a DM is willing to choose from a when faced with ac whenever he is willing to choose from b when faced with bc. If the choice function τ maximizes a complete and transitive preference, then the ranking of option sets, ≽τ , is also complete and transitive. Hence, standard choice theory implies complete and transitive ranking of option sets; that is, a well defined indirect utility function. The Luce model provides an analogous stochastic revealed preference theory. Define ≽ρ as follows: a ≽ρ b if and only if ρ(a, ac) ≥ ρ(b, bc) The requirement that c ∩ (ab) = ∅ is not necessary in the deterministic case; but it cannot be abandoned in its stochastic analog. 3

9

for all c such that c ∩ (ab) = ∅. Hence, once again, the decision maker reveals a preference for a over b if he is as willing (i.e., as likely) to choose from a when faced with ac as he is to choose from b when confronting bc. It is easy to see that the Luce model implies that ≽ρ is complete and transitive since a ≽ρ b if and only if

∑ s∈a

Hence, V (a) =

∑ s∈a

vs ≥

∑

vs

s∈b

vs defines a stochastic indirect utility function for the Luce model.

Next, we show that Luce rules are the only RCRs that admit a stochastic indirect utility function. A necessary condition for a well-defined stochastic indirect utility function is that the relation ≽ρ is complete. That is, a ≽ρ b or b ≽ρ a for all a, b. Independence, below, states this requirement. Independence (I):

ρ(a, ac) ≥ ρ(b, bc) implies ρ(a, ad) ≥ ρ(b, bd) if c ̸= ∅ ̸= d and

ab ∩ cd = ∅. Theorem 1, below, shows that in a setting with a rich set of options, the Luce rule is the only RCR that satisfies independence and, therefore, the only rule that admits a consistent ranking of option sets. Next, we state the richness requirement. Richness (R):

For a ̸= ∅, c and δ ∈ (0, 1), there is b such that b ∩ c = ∅ and ρ(a, ab) = δ.

The following example illustrates a setting that satisfies richness. Example 1: Let A be the set of all strictly positive real numbers, let A = Af the set of all finite subsets of A and let vs = s for all s ∈ A and let ρ be the RCR induced by this Luce ∑ value. To verify that ρ is rich, fix δ and a and c. Let δ1 = s∈a s and δ2 = (1 − δ)δ1 /δ. Choose s, t ∈ A such that s, t ∈ / ac and s + t = δ2 . Since ac is finite such s, t must exists. Let b = {s, t} and note that b ∈ A, b ∩ c = ∅ and ρ(a, ab) = δ1 /(δ1 + δ2 ) = δ as desired. Theorem 1:

A rich RCR satisfies independence if and only if it is a Luce Rule.

Theorem 1 shows that Luce rules are identified by independence when richness is satisfied. Of course, richness is an idealization analogous to divisibility in consumer theory or small event continuity in Savage theory. Particular applied settings may not satisfy 10

richness. Theorem 1 implies that if the Luce rule in a particular (non-rich) setting satisfies additional properties then these other properties are artefacts of the sparseness of the setting. Conversely, if an RCR satisfies independence (in a particular non-rich environment) but is not a Luce rule then this rule must violate independence if we extend the environment to a rich setting. The following example illustrates richness and independence in a setting with a countable number of options. Example 2: Let A be the set of all strictly positive rational numbers. Let A = {a ⊂ A | a ̸= ∅, Σs∈a s < ∞} ∪ {∅} vs = s. It is easy to verify that A is a proper collection and that v is a Luce value. Then, let ρ be the RCR induced by this Luce value. To conclude, we will verify that ρ is rich. Let A = {s1 , s2 , . . .} be an enumeration of A and consider any a, c ∈ A+ , and δ ∈ (0, 1). ∑ (1−δ)δ1 and set d0 = ∅. Define bj for j = 1, 2, . . . as follows: Let δ1 = s∈a s, δ2 = δ ∑ bj+1 = bj ∪ {sj+1 } if sj+1 ̸∈ ac and s∈dj s + sj+1 ≤ δ2 ; otherwise bj+1 = bj . Let ∑ ∪ 1 b = j≥1 bj and note that s∈b s = δ2 . Hence, b ∈ A, b ∩ c = ∅ and ρ(a, ab) = δ1δ+δ =δ 2 as desired.

3.

The Duplicates Problem and Weighted Attributes Luce (1977) credits Debreu (1960) with the following example and concedes that it

indicates a potentially important shortcoming of the random choice hypothesis. Example: Let A = {s1 , s2 , t} and let a = {s1 , s2 }, b = {s1 , t} and c = {s2 , t}. Assume that s1 , s2 are transparently similar options; for example, s1 is a red bus and s2 is a yellow bus, while t represents a clearly distinct option, for example, a train. Then, we might have ρ(s, d) = 1/2

(3)

for all d ∈ {a, b, c} and s ∈ d but {1 / ρ(s, A) = 1 4 /2

for s ∈ {s1 , s2 } for s = t 11

(4)

This RCR is not a Luce rule; if it were, equation (3) would imply vt = vs1 = vs2 > 0 while equation (4) would yield vt = vs1 + vs2 . The last observation enables us to make as stronger statement: The RCR above cannot be approximated by a Luce rule since for any Luce rule, equation (3) determines ρ(s, A) for all s ∈ A. The duplicates problem points to the following potential shortcoming of the Luce model: a single real number, vs , does not provide an adequate summary of the attractiveness of a particular alternative within each choice set. Both s1 and s2 are equally good alternatives to t but they are duplicates; that is, transparently similar alternatives that the decision makers does not view as distinct options. Hence, when all three options are available, the decision maker eliminates duplication by considering s1 , s2 as a single option. Next, we introduce a new random choice model and refer to it as the weighted attributes rule (WAR). It addresses the duplicates problem but otherwise retains the properties of Luce rules. Henceforth A refers to the (proper) collection of all finite subsets of A. For any a ∈ A, let |a| denote the cardinality of a. Each of the alternatives in Debreu’s example has one of two possible attributes, it is either a bus or a train. The set a = {s1 , s2 } has the attribute “bus” and the set b = {t} has the attribute “train.” In this example, there are just two attributes but, in general, each object may have many different attributes. For example, it could be that the train t and the yellow bus s2 are comfortable while the red bus s1 is not. Then, c = {s2 , t} has the attribute “comfortable.” In this way, we can identify each attribute with the set of choice objects that have the attribute. Thus, the set of attributes is a nonempty collection of subsets of A. Let C denote the set of attributes and let C a = {c ∈ C | c ∩ a} be the attributes of the set a. We require that each object has at least one attribute, that is, C s is non-empty for all s ∈ A. The function w : C → IR++ assigns each attribute a strictly ∑ positive number. We call the pair (w, C) an attribute value on A if c∈C a w(c) < ∞ for all a ∈ A. A random choice rule (ρ, A) is a WAR if there exists an attribute value (w, C) such that ρ(s, b) =

∑ c∈C s

|b ∩ c| 12

w(c) ∑ d∈C b

w(d)

(5)

We say that the attribute value (w, C) induces ρ if equation (5) holds for all s ∈ b ∈ A. Clearly, every attribute value (w, C) on A induces a unique RCR (ρ, A). To relate WARs to Luce rules, define vsb =

∑ w(c) |c ∩ b| s

c∈C

to be a weighted sum of the attribute values of alterative s. The weight is 1 if s is the only choice object in b that has the attribute c. In general, the weight is 1 over the number of elements in b that have the attribute c. Then, (5) can be restated as follows: ρ(s, b) = ∑

vsb

s∈b

vsb

Thus, we can interpret the random choice rule as a modification of the Luce rule. Specifically, the Luce value of each choice is a weighted sum of attribute values. The weight is equal to the inverse of the number of objects in the choice set that have the attribute. In the special case where each object has a single exclusive attribute, the model reduces to the Luce model with vs = w({s}). More generally, if choice objects have multiple attributes but all attributes are exclusive to some choice object then vsb =

∑

w(c)

c∈C s

is independent of b and therefore the model is an example of a Luce rule. Conversely, we can interpret every Luce value v as a special case of an attribute value with C = {{s}|s ∈ A|} and w({s}) = vs . Therefore, WARs are a generalization of Luce rules. In Debreu’s example above, there are two attributes C = {{s1 , s2 }, {t}}. Let the attribute value be w({s1 , s2 }) = 1 = w({t}) Then, it is easy to check that for a = {s1 , t} or a = {s2 , t} ρ(s, a) = 1/2 13

for all s ∈ a and ρ(s1 , {s1 , s2 , t}) = ρ(s2 , {s1 , s2 , t}) = 1/4 as desired. In Debreu’s example, the yellow and blue buses are duplicates because we can replace a yellow with a blue bus in any option set without affecting the choice probabilities of alternatives other than the two buses. The next definition extends this notion of a duplicates to option sets. The option sets a and b are duplicates if replacing a with b has no effect on the probabilities of choosing elements that are not in a or b. Definition:

a, b are duplicates if ab ∩ c = ∅ and s ∈ c implies ρ(s, ac) = ρ(s, bc).

We write a ∼ b if a is a duplicate of b. The relation ∼ is symmetric and reflexive. Next, we define the notion of overlap of option sets a and b. When a and b have elements in common they overlap. Even if a and b have no elements in common, they overlap if there are duplicates of a and b that have elements in common. For example, the option set consisting of the red bus and the train overlaps with the option set consisting of the yellow bus in Debreu’s example because the red bus is a duplicate of the yellow bus. Definition:

a, b ∈ A are nonoverlapping if a ∼ a′ , b ∼ b′ implies a′ ∩ b′ = ∅.

We write a ⊥ b if a and b are non-overlapping. Next, we show that WARs can be identified as the unique rule that treats duplicates as if they were a single option but retains Independence (I) when choice sets are non-overlapping. Thus, we show that the WAR is the extension of Luce rules that addresses the duplicates problem but otherwise retains its properties. To prove this result, we must strengthen the richness assumption. Definition: b is fine if c ⊂ a ∼ b implies there is d ⊂ b such that c ∼ d and a\c ∼ b\d. To illustrate the definition of fine sets, let ρ be a WAR. Each alternative s is described by a pair (k, j) where j is any natural number and k ∈ {1, 2, 3}. Alternatives s = (1, j) all have attribute one; alternatives (2, j) have attribute two and alternatives (3, j) have both attributes one and two. Thus, (k, j) and (k, j ′ ) are duplicates for all k = 1, 2, 3. Moreover 14

{(3, j)} is a duplicate of {(1, j ′ ), (2, j ′′ )}. It is easy to check that {(1, j ′ ), (2, j ′′ )} is fine for all j ′ , j ′′ but {(3, j)} is not. Let M = {a ∈ A | a is fine} be the fine option sets. We strengthen the richness assumption of the previous section in two ways. First, we require an analogous richness of non-overlapping option sets. Second, we require a rich collection of duplicates. Specifically, we require that each set a has a fine duplicate that is disjoint from a. Strong Richness (R∗ ):

For a ̸= ∅, c and δ ∈ (0, 1), there is (i) b such that b ⊥ c and

ρ(a, ab) = δ and (ii) d ∈ M such that d ∩ c = ∅ and d ∼ a. Our first substantive assumption says that duplicates are treated like a single option. Specifically, if a and b′ are duplicates then adding alternatives from b′ to a choice set that contains a does not alter the odds of choosing options that have no overlap with a. Elimination of Duplication (E):

a ∼ b′ ⊂ b ⊥ c and s ∈ c implies ρ(s, bc) = ρ(s, abc).

If the rule satisfies (E) and duplicates are treated as a single option then Independence (I), as defined above, may fail. For example, let s1 , s2 , s3 be different colored buses and let t be a train. Let a = {s1 }, b = {t}, c = {s2 } and d = {s2 , s3 }. Then, if the decision maker is equally likely to choose a train and a bus and randomizes equally among buses, we have 1/2 = ρ(a, ac) = ρ(b, bc) but 1/3 = ρ(a, ad) < ρ(b, bd) = 1/2 and therefore independence fails. Notice that in this example, c and d overlap with a and it is this overlap that creates the violation of independence. Weak independence requires independence to hold only if a and b have no overlap with c and d. Weak Independence (I∗ ):

ρ(a, ac) ≥ ρ(b, bc) implies ρ(a, ad) ≥ ρ(b, bd) if c ̸= ∅ ̸= d

and ab ⊥ cd. Theorem 2 shows that WARs are the only strongly rich choice rules that satisfy (E) and (I∗ ). Thus, WARs generalize Luce rules to account for duplicates but otherwise retain the independence assumption that characterizes Luce rules. Theorem 2:

A strongly rich ρ satisfies I∗ and E if and only if it is a WAR. 15

The following example provides a setting where the assumptions of Theorem 2 are satisfied. Let IN = {1, 2, . . .} be the set of all natural numbers and let A = IN × IR++ . Thus, each choice object s is described by a natural number i and a real number r. The natural number is analogous to the color in Debreu’s example and, therefore, the objects s = (i, r) and s′ = (j, r) are duplicates. In this example, each alternative s = (i, r) has a single attribute given by the set cs := {(j, r)|j ∈ IN }. Therefore, the set of attributes is C = {cs | s ∈ A}. If s = (i, r) then w(cs ) = r. It is easy to verify that (w, C) is an attribute value and that the ρ it induces satisfies strong richness. A somewhat stronger assertion about Theorem 2 can be made: let ρ be a WAR on any A, then there is an extension of ρ that satisfies richness. Specifically, there exists a set A∗ that contains A and a rich ρ∗ on A∗ such that ρ∗ (s, a) = ρ(s, a) whenever s ∈ a ⊂ A.

4.

WAR and Random Utility Maximization In this section, we show that each WAR is a random utility model, thus proving an

analogue of Block and Marschak’s (1960) result for Luce rules. The rich choice rules of the previous two sections require a setting with infinitely many options. Richness facilitates the identification of qualitative properties of random choice rules but it makes it more difficult to relate WARs to random utility models. Therefore, we introduce a simpler setting with finitely many options to relate WARs to random utility models. The finite setting also proves useful for extensions of WARs that are the subject of the next two sections. We say that a RCR is simple if A = {1, . . . , n} for some n ≥ 1. It follows that any n(2n −1)

simple RCR can be identified with some q ∈ IR+

where

qia = ρ(i, a) for all a ∈ A. Such a q satisfies qia ≤ 1 ∑

qia > 0 implies i ∈ a qia = 1

i∈N

16

(6)

n(2n −1)

Let Q be the set of all q ∈ IR+

that satisfy equation (6). Let Ql be the set of all

simple Luce rules and Qw be the set of all simple WARs. If q ∈ Qw there is an attribute value (C, w) that generates q. When A is finite, it is convenient to represent the (C, w) by a normalized function γ : A+ → IR+ such that { γ(b) =

∑ w(b) c∈C

w(c)

0

if b ∈ C otherwise

Clearly, each attribute value can be represented by a γ and each γ corresponds to an attribute value as defined in the previous section. We call such a γ : A+ → IR+ a simple attribute value. It satisfies:

∑

γ(c) > 0

c∈Ab

∑

γ(c) = 1.

c∈A+

where Ab = {c ⊂ A | c ∩ b ̸= ∅}. Let Γ1 denote the set of simple attribute values. The attribute value γ induces q ∈ Q if qib =

∑ c∈Ai

|b ∩ c|

γ(c) ∑ d∈Ab

γ(d)

for all b ∈ A+ and i ∈ b. A simple RCR q is an WAR if some γ ∈ Γ1 induces it. One class of extensively studied RCRs are random utility maximizers. Most econometric models of discrete choice such as logit, probit, nested logit, etc., are special example of random utility maximizers. Below we define, Qr , the set of RCR that are random utility maximizers. Let U be the set of all bijections from A to A. For any i ∈ a ⊂ A, let [ia] = {u ∈ U | ui ≥ uj ∀j ∈ a}. A function π : U → [0, 1] is a random utility if

∑

π(u) = 1. We identify each such |U | |U | ∑ function with an element in IR+ . Let Π = {π ∈ IR+ | u∈U πu = 1} be the set of all u∈U

random utilities. Hence, Π is the |U | − 1 = n! − 1-dimensional unit simplex. 17

The simple RCR q maximizes the random utility π if qia =

∑ u∈[ia]

πu for all i, a. A

simple RCR q is a random utility maximizer if there exists some π that it maximizes. Let Qr denote the set of random utility maximizers. Theorem 3 shows that every WAR is a random utility maximizer, i.e., Qw ⊂ Qr . That Luce rules are random utility maximizers was shown by Block and Marschak (1960). For any subset X ∈ IRk , let clX denote the closure of X and let convX denote its convex hull. Theorem 3:

Ql ⊂ Qw ⊂ Qr = cl conv Ql .

Let q ∈ Qw and let γ generate q. If i is chosen with probability greater than 1/2 from the choice set {i, j} then it must that the sum of i’s attribute values (equally weighted) ∑ ∑ is greater than the sum of j’s attribute values. That is, Ai γ(c) ≥ Aj γ(c). But this implies that WARs satisfy stochastic transitivity, that is, if a = {i, j}, b = {j, k} and c = {i, k} then qia ≥ 1/2 and qjb ≥ 1/2 implies qic ≥ 1/2 for q ∈ Qw . The following example shows that there are random utility maximizers that cannot be approximated by a WAR, that is, cl Qw ̸= Qr . The example builds on the fact that random utility models may violate stochastic transitivity. 4.1

Violations of Stochastic Transitivity: the Match-up effect The match-up effect refers to violations of stochastic transitivity, as illustrated in the

following example. Let A = {mr , my , pr , py } be the set of options where mr , my refer to a red and yellow Mercedes while pr and py refer to a red and yellow Porsche. Color is not an important characteristic but red is always chosen over yellow. Hence, qmy a = qpy b = 0

(7)

whenever mr ∈ a ∩ c and pr ∈ b ∩ c. Moreover, when both red cars are available, each is chosen with probability 1/2: qmr a = qpr a = 1/2 18

(8)

whenever mr ∈ a and pr ∈ a. The make of the car is a much more important issue than color. Since both makes are chosen with equal probability, we therefore have qmy a = qpy b = 1/2

(9)

whenever my ∈ a, py ∈ b, mr ∈ / a and pr ∈ / b. Equations (7)-(9) uniquely define the RCR q and this q fails stochastic transitivity. To see this, let a = {my , pr }, b = {mr , pr } and c = {mr , my }. Note that qmy a = 1/2 and qpr = 1/2 but qmy c = 0 < 1/2 . Since WARs satisfy stochastic transitivity there can be no WAR q ′ approximating the choice rule q defined in (7)-(9). However, the matchup effect is consistent with random utility maximization. To see that the RCR defined in (7)-(9) is an element of Qr let πu1 = πu2 = 1/2 where u1 (mr ) = 4 = u2 (pr ) u1 (my ) = 3 = u2 (py ) u1 (pr ) = 2 = u2 (mr ) u1 (py ) = 1 = u2 (my ) Verifying that the RCR q maximizes this random utility is straightforward.

5.

A Shapley Value Representation of WARs In this section, we provide an alternative representation of WARs as the Shapley value

of a cooperative game. As in the previous section, we consider a simple RCR q represented by simple attribute value γ ∈ Γ1 . For each set a ∈ A+ we can define the attribute value of a as κ(a) =

∑

γ(b)

(10)

b∈Aa

Hence, κ(a) is value of the attributes of the set a. Note that κ satisfies the following properties: (i) κ(∅) = 0, (ii) κ(a) ⊂ κ(b) whenever a ⊂ b; (iii) κ(a) > 0 for all a ̸= 0 and (iv) κ(A) = 1. A function that satisfies properties (i) and (ii) is called a characteristic function. A function that satisfies properties (i)-(iv) is called a capacity. We write K for the set of capacities. Note that the usual definition of a capacity does not include requirement 19

(iii). We add this requirement to the definition of a capacity to avoid having to introduce a new term. For any κ ∈ K define the characteristic function κa on A as follows: κa (b) =

κ(a ∩ b) κ(a)

(11)

The characteristic function κa satisfies (i), (ii), (iv) but fails (iii) unless a = A. Now consider the following cooperative game. The players are the options and the characteristic function κa assigns a value to each coalition of options. Only options in the set a generate value and the grand coalition of all elements in a generates value 1. Cooperative solution concepts take as input a characteristic function and assign each player (each option) a value that is typically interpreted as the player’s payoff resulting from some unspecified bargaining process. Here the players are the alternatives and their payoff is the probability of being chosen. The marginal contribution of the option i to the value of set b is κa (ib) − κa (b). The Shapley value (Shapley (1953)) of an alternative is its average marginal contribution where the average is taken over all sets b. This yields the following formula for the Shapley value, Lκa (i), of i ∈ A in the (cooperative) game (A, κa ): Lκa (i) =

∑ |a|!(|a| − |b| − 1)! [κa (ib) − κa (b)] |a|!

(12)

b:i∈b /

If i ∈ a then Lκa (i) > 0 since κa (i) − κa (∅) > 0 for all i ∈ a. Conversely, if i ̸∈ a then Lκa (i) = 0 since κa (ib) − κa (b) = 0. The Shapley value is additive and satisfies Lκa (a) = κa (a). Since κa (a) = 1 this, in turn, implies that the function σ : A × A+ defined as σ(a, b) = Lκb (a) is a random choice rule. For γ ∈ Γ1 let κγ be defined as the capacity κ such that equation (10) holds and let K1 be the set of capacities for which (9) holds for some γ ∈ Γ1 . Lemma 1 shows that the probability that the simple WAR q with value γ chooses i from a is equal to the Shapley value of i in the cooperative game with characteristic function κγa . 20

Lemma 1:

If κ = κγ for γ ∈ Γ1 then Lκa (i) =

∑ c∈Ai

|b ∩ c|

γ(c) ∑ d∈Ab

γ(d)

for all a ∈ A+ and i ∈ A. Lemma 1 then implies the following alternative characterization of WARs in terms of the Shapley values of appropriately defined cooperative games. Theorem 4:

A simple RCR q is an WAR if and only if there exists κ ∈ K1 such that

qia = Lκa (i) for all a ∈ A+ and i ∈ A. The capacity κ is an element of K1 if (10) holds for some simple attribute value. Capacities that can be generated by simple attribute values are a proper subset of all capacities. The key restriction is that γ is non-negative for all a ∈ A. This suggests a generalization of WARs that allows attributes with negative values. The next section introduces this generalization and uses it to address the matchup effect (Example 3), the attraction effect and the compromise effect.

6. The Generalized Attribute Rules In the previous section, we defined a set of capacities that can be generated by simple attribute values γ ∈ Γ1 . As we show next, the set of all capacities can be obtained if we allow the attribute value to be negative. A function γ : A+ → IR is a generalized attribute value if

∑

γ(c) > 0

c∈Ab

∑

γ(c) ≥ 0

c∈Ab \Aa

∑

γ(c) = 1

c∈A+

for all a, b ∈ A such that b ̸= a ⊂ b. Let Γ denote the set of all generalized attribute values. As above, we can associate each γ ∈ Γ with a function κγ such that κγ (a) =

∑ b∈Aa

21

γ(b)

Our next result, Lemma 2, shows that if γ is a generalized attribute value then κγ is a capacity. Lemma 2:

For all κ ∈ K, there exists a unique γ ∈ Γ such that κ = κγ .

As an illustration of Lemma 2, suppose A = {1, 2} and κ(1) = κ(2) = 1/3, κ({1, 2}) = 1. This capacity corresponds to the generalized attribute value γ such that γ(1) = γ(2) = 2/3 and γ({1, 2}) = −1/3. Thus, each options has a unique attribute with value 2/3 and both options share a negative attribute with value −1/3. We can use the generalized attribute value γ ∈ Γ to calculate the weighted average attribute value for each element i of the option set a as: via =

∑ γ(c) |a ∩ c| i

c∈A

and define simple random choice rule q as qia = ∑

via

j∈a

(13)

vja

It is easy to check that (13) can be rewritten as qia =

∑ c∈Ai

|a ∩ c|

γ(c) ∑ d∈Aa

γ(d)

(14)

We refer to random choice rules that can be represented as in (14) as generalized attribute rules (GAR). Next, we show that the Shapley value characterization of WARs extends to GARs. Lemma 3, below, shows that Lemma 1 above extends to all capacities. Lemma 3:

Lκa (i) =

∑ c∈Ai |b∩c|

∑γ(c) d∈Ab

γ(d)

for γ ∈ Γ and κ = κγ .

Theorem 5 shows that all GARs have a Shapley value characterization. Hence, Theorem 5 is the analogue of Theorem 4 above. Theorem 5:

A RCR q is an GAR if and only if there exists κ ∈ K such that qia = Lκa (i)

for all a ∈ A+ and i ∈ A. 22

Theorem 5 characterizes GARs in terms of the Shapley value of the associated capacity that measures the attribute value of each set of options. Next, we illustrate how 6.1

Violations of Stochastic Transitivity GARs allow violations of stochastic transitivity as described in Example 3. For sim-

plicity, we describe the capacity rather than the attribute values.4 To match the behavior of the RCR q, defined by (7)-(9), consider any capacity κ on A = {mr , my , pr , py } that satisfies κ({my , pr }) = κ({mr , py }) = ϵ κ({my , py } = ϵ2 κ({mr }) = κ({pr }) = κ({mr , my }) = κ({pr , py }) = ϵ3 κ({my }) = κ({py }) = ϵ4 and κ(a) = mr if {mr , pr } ⊂ a Then, let qia = Lκa (i) for all a, i. For example, if a = {mr , my } then qmr a =

ϵ3 − ϵ4 ϵ3 + →1 2ϵ3 2ϵ3

as ϵ → 0. If b = {my , pr }, then qmy b =

ϵ − ϵ3 ϵ4 1 + → 2ϵ 2ϵ 2

as ϵ → 0. Similar calculations show that, as ϵ → 0, this capacity replicates the choice probabilities (7)-(9) above. This shows that GARs can accommodate violations of stochastic transitivity. 6.2

The Attraction Effect Two well known regularities inconsistent with the Luce rule and WAR is the attraction

effect (Huber, et. al. (1982)) and the related compromise effect (Simonson (1989)). The attraction effect refers to situations in which a new inferior option increases the market share of an existing similar but superior option. A slight modification of the Porsche/Mercedes example above illustrates the attraction effect: let py , pr denote yellow and red Porsches 4

By Lemma 3, any capacity can be alternatively described by a γ ∈ Γ.

23

and mr , my denote yellow and red Mercedes’. Recall that a yellow car is inferior to the red car of the same make so that both ρ(py , {py , pr , mr }) and ρ(my , {my , mr , pr }) are small. The attraction effect arises if the inferior yellow cars serve as decoys that increase the market shares of red cars of the same make; that is, ρ(pr , {py , pr , mr }) > ρ(pr , {pr , mr }) and ρ(mr , {my , mr , pr }) > ρ(mr , {mr , pr }).

(15)

Note that random utility maximization implies monotone random choice rules (Luce and Suppes (1965)); that is, ρ(s, ab) ≤ ρ(s, a), whenever s ∈ a and, therefore, is inconsistent with the attraction effect. The compromise effect refers to a similar non-monotonicity where an option is chosen with greater frequency after a more extreme option is added. As Huber and Puto (1983) explain, a typical experimental study of the attraction effect compares choice frequencies of options differentiated along two dimensions, for example, quality and price (or color and make). Two comparably attractive core items such as a high-price, high-quality option and a low-price, low-quality option are designed (these correspond to the red cars in our example). Next, two decoy options are defined (the yellow cars). The decoys are inferior in the sense that they are rarely chosen when the corresponding core option is available. The typical experiment finds choice frequencies as in equation (3) above. Thus, each decoy attracts consumers to the corresponding core item. The following example illustrates how GARs can generate attraction effects. Let w(py ) = w(my ) = ϵ, w(pr ) = w(mr ) = 3, w({my , pr }) = w({py , mr }) = 2, w({my , mr }) = w({my , mr }) = −1. The attribute is zero for all sets other than the sets listed above. With this GAR, (i) core items have higher singleton attribute values than decoys, (ii) each decoy shares a common negative attribute with its own core item and (iii) shares a positive attribute with the other core item. Applying the GAR formula, we obtain the following choice probabilities 24

in the limit as ϵ → 0: ρ(pr , {pr , mr }) = 1/2 = ρ(mr , {pr , mr }) ρ(pr , {py , pr , mr }) = 9/16 = ρ(mr , {my , pr , mr }) ρ(mr , {py , pr , mr }) = 3/8 = ρ(pr , {my , pr , mr }). Property (ii) is necessary for a decoy to increase the market share of the corresponding core item and generate the attraction effect; without negative attributes the random choice rule is a WAR and therefore, by Theorem 3, satisfies monotonicity.

7.

Conclusion In this paper we have analyzed three nested models, the Luce rule, WAR and GAR. We

have shown that the Luce rule is the unique random choice rule that admits a well-defined ranking of option sets. However, the Luce rule has well-known shortcomings, perhaps the most prominent among them is the duplicates problem, first pointed out by Debreu. The weighted attributes rule modifies the Luce rule and treats duplicates as if they were a single option. Once duplicates are reduced to a single option, WARs retain the identifying property of the Luce model. While WARs deal with the duplicates problem, it is inconsistent with other well documented phenomena, such as the attraction or the compromise effect. We introduce a generalization of WAR, that accommodates some forms of the attraction and compromise effects.

25

8.

Proof of Theorem 1 Verifying that every Luce rule satisfies independence is straightforward. Hence, we

will only prove that a rich RCR that satisfies independence is a Luce rule. Define a binary relation ≽ρ on A+ as follows: a ≽ρ b if and only if ρ(a, ac) ≥ ρ(b, bc) for all c ∈ A+ such that ab ∩ c = ∅. Let ∼ρ be the symmetric and ≻ρ be the strict part of ≽ρ . Throughout the following lemmas, we assume that ρ satisfies (R) and (I). Lemma A1:

≽ρ is complete and transitive.

Proof: Clearly, ρ satisfies (I) only if ≽ρ is complete. Next, assume that a ≽ρ b and b ≽ρ c. By richness, there exists a d ∈ A such that d ∩ abc = ∅ and ρ(c, cd) < 1. Hence, d ̸= ∅. Note that ρ(a, ad) ≥ ρ(b, bd) ≥ ρ(c, cd); thus independence implies a ≽ρ c as desired. Definition:

The sequence a1 , . . . , an ∈ A is a test sequence if the elements are pairwise

disjoint and ρ(ai , ai ai+1 ) = 1/2 for all i = 1, . . . , n − 1. Lemma A2:

For any test sequence a1 , . . . , an ∈ A+ , ρ(ai , ai aj ) = 1/2 for all i ̸= j.

Proof: Note that if the result is true for n = 3, then it is true for all n. So assume n = 3 and suppose ρ(a1 , a1 a3 ) > 1/2 . Independence implies that a1 ≻ρ a2 . Since ρ(a1 , a1 a2 ) = /2 = ρ(a3 , a3 a2 ), independence also implies a1 ∼ρ a3 . Then, by Lemma A1 we have

1

a3 ≻ρ a2 . But ρ(a3 , a1 a3 ) < 1/2 = ρ(a2 , a1 a2 ) contradicting a3 ≻ρ a2 . A similar argument reveals the impossibility of ρ(a1 , a1 a3 ) < 1/2 . Hence, ρ(a1 , a1 a3 ) = 1/2 as desired. Lemma A3:

If a1 , . . . , an is a test sequence and a ∈ A+ with a ∩ a1 a2 · · · an = ∅ then

ρ(a, aai ) = ρ(a, aa1 ) for all i = 1, . . . , n. Proof: If necessary use richness to extend the test sequence so that n ≥ 3. Then, Lemma A2 implies ai ∼ρ aj for all i, j and hence ρ(a, aai ) = ρ(a, aa1 ) for all i. Lemma A4:

For all a, b ∈ A+ with a ∩ b = ∅, a ≽ρ b if and only if ρ(a, ab) ≥ 1/2 .

Proof: By (R), we can choose d ∈ A+ such that d ∩ ab = ∅ and ρ(b, bd) = 1/2 . Let, b1 = b and b2 = d and note that b1 , b2 is a test sequence. Then, by Lemma A3, ρ(a, ab) = ρ(a, ad) 26

and therefore ρ(a, ab) ≥ 1/2 if and only if ρ(a, ad) ≥ ρ(b, bd); that is, ρ(a, ab) ≥ 1/2 if and only if a ≽ρ b. Lemma A5:

If c1 , c2 , c3 , c4 is a test sequence, then ρ(ci , c1 c2 c3 c4 ) = 1/4 for all i =

1, 2, 3, 4. Proof: Let c = c1 c2 c3 c4 and without loss of generality assume ρ(ci , c) ≥ ρ(cj , c) whenever i ≤ j. Hence, by Lemma A4, c1 c2 ≽ρ c3 c4 and c1 c3 ≽ρ c2 c4 .

(A1)

By (R), there exist c5 such that c1 , c2 , c3 , c4 , c5 is a test sequence. Since c2 ∼ρ c3 ∼ρ c5 , and by (A1) we have ρ(c1 c2 , c1 c2 c3 ) = ρ(c1 c2 , c1 c2 c5 ) ≥ ρ(c3 c4 , c3 c4 c5 )

(A2)

= ρ(c3 c4 , c2 c3 c4 ). And by the same argument, ρ(c1 c3 , c1 c2 c3 ) ≥ ρ(c2 c4 , c2 c3 c4 ).

(A3)

But we also have 2 = 2[ρ(c1 , c1 c2 c3 ) + ρ(c2 , c1 c2 c3 ) + ρ(c3 , c1 c2 c3 )] = ρ(c1 c2 , c1 c2 c3 ) + ρ(c1 c3 , c1 c2 c3 ) + ρ(c2 c3 , c1 c2 c3 ) ≥ ρ(c3 c4 , c2 c3 c4 ) + ρ(c2 c4 , c2 c3 c4 ) + ρ(c2 c3 , c2 c3 c4 )

(A4)

= 2[ρ(c2 , c2 c3 c4 ) + ρ(c3 , c2 c3 c4 ) + ρ(c4 , c2 c3 c4 )] = 2. Equation (A4) implies that the inequalities in (A2) and (A3) must in fact be equalities. Hence ρ(c1 c2 , c1 c2 c5 ) = ρ(c1 c2 , c1 c2 c3 ) = ρ(c3 c4 , c2 c3 c4 ) = ρ(c3 c4 , c3 c4 c5 ) and (I) imply c1 c2 ∼ρ c3 c4 . By the same argument, we obtain c1 c3 ∼ρ c2 c4 . By Lemma A4 we have ρ(c1 c2 , c) = ρ(c3 c4 , c) = 1/2 and ρ(c1 c3 , c) = ρ(c2 c4 , c) = 1/2 . Finally since ρ(ci , c) ≥ ρ(cj , c) for i ≤ j we must have ρ(ci , c) = 1/4 for i = 1, 2, 3, 4. 27

Lemma A5. Also, Lemma A6:

If a1 , . . . , an is a test sequence then ai aj ∼ρ ak aℓ for all i ̸= j and k ̸= ℓ.

Proof: If i, j, k, ℓ are all distinct then Lemma A5 implies ρ(ai aj , ai aj ak aℓ ) = 1/2 and Lemma A4 implies ai aj ∼ρ ak aℓ . If {i, j, k, ℓ} has three distinct elements assume, without loss of generality, that j = ℓ. Let b1 = ai , b2 = aj , b3 = ak and note that b1 , b2 , b3 is a test sequence. By (R) we can choose b4 , b5 such that b1 , b2 , b3 , b4 , b5 is a test sequence. By Lemmas 4 and 5, b1 b2 ∼ρ b4 b5 and b2 b3 ∼ρ b4 b5 . Then Lemma A1 implies b1 b2 ∼ρ b2 b3 , that is, ai aj ∼ρ ak aℓ . Finally, if {i, j, k, ℓ} has two distinct elements then ai aj = ak aℓ and by Lemma A1 we have ai aj ∼ρ ak aℓ . If ai1 , . . . , ai2n are 2n distinct elements of the test sequence a1 , . . . , am and ∪2 n are also 2n distinct elements of the test sequence a1 , . . . , am , then k=1 aik ∼ρ

Lemma A7: aj1 , . . . , aj2n ∪2 n k=1 ajk .

Proof: By induction on n. When n = 1, the statement is true by Lemma A6. Next, assume it is true for n and let ai1 , . . . , ai2n+1 be 2n+1 distinct elements of a test sequence. Also let aj1 , . . . , aj2n+1 be another 2n+1 distinct elements of the same test sequence. Let bk = ai2k−1 ai2k for k = 1, . . . , 2n . Also let ck = aj2k−1 aj2k for k = 1, . . . , 2n . Lemma A5 implies that bk ∼ρ bℓ for all k, ℓ. Hence b1 , . . . , b2n is a test sequence. Lemma A5 also implies that ck ∼ρ cℓ for all k, ℓ. Hence c1 , . . . , c2n is also a test sequence. Moreover, Lemma A5 implies that bk ∼ρ cℓ for all k, ℓ. So we can relabel both b1 , . . . , b2n and c1 , . . . , c2n to be each a set of 2n distinct elements of the same test sequence. By the ∪2n+1 ∪2 n ∪ 2n ∪2n+1 induction hypothesis, k=1 aik = k=1 bk ∼ρ k=1 ck = k=1 ajk . Lemma A8:

If a1 , . . . , a2n +1 is a test sequence then ρ(aj , a1 a2 · · · a2n +1 ) = 1/2n +1 for

j = 1, 2, . . . , 2n + 1. Proof: By (R) we can find a2n +2 such that a1 , . . . , a2n +1 , a2n +2 is a test sequence. Then for any j > 1 we have ρ(a1 , a1 · · · a2n +1 ) = ρ(a2n +2 , a2 · · · a2n +2 ) = ρ(a2n +2 , a1 · · · aj−1 aj+1 · · · a2n +2 ) = ρ(aj , a1 · · · a2n +1 ) 28

where the second equality is implied by Lemma A7. Then, the feasibility constraint and the additivity of ρ yield the desired result. Lemma A9: aj1 , . . . , ajn ∪n k=1 ajk .

If ai1 , . . . , ain are n distinct elements of the test sequence a1 , . . . , am and ∪n are also n distinct elements of the test sequence a1 , . . . , am , then k=1 aik ∼ρ

Proof: Note that m ≥ n and choose an integer k such that 2k > m. By (R) we can find 2k + 1 − n distinct elements am+1 , . . . , a2k +1 , . . . , a2k +1+m−n such that a1 , . . . , a2k +1+m−n is a test sequence. Let b = am+1 · · · a2k +1+m−n . Then Lemma A8 implies that ρ(ai1 · · · ain , ai1 · · · ain b) = n/2k +1 = ρ(aj1 · · · ajn , aj1 · · · ajn b) and (I) then yields the desired result. Lemma A10:

If a1 , . . . , an is a test sequence then ρ(aj , a1 a2 · · · an ) = 1/n for all j.

Proof: By (R) we can find an+1 such that a1 , . . . , an+1 is a test sequence. Then for any j > 1 we have ρ(a1 , a1 · · · an ) = ρ(an+1 , a2 · · · an+1 ) = ρ(an+1 , a1 · · · aj−1 aj+1 · · · an+1 ) = ρ(aj , a1 · · · an ) where the second equality is implied by Lemma A9. Then, the feasibility constraint and the additivity of ρ yield the desired result. Lemma A11: If ai1 , . . . , aik are k distinct elements of the test sequence a1 , . . . , an and ∪n ∪k a = j=1 aij , b = j=1 aj , then ρ(a, b) = nk . Proof: By Lemma A10, we have ρ(ai , b) = ρ(aj , b) =

1 n

for all i, j. Then, the additivity

of ρ yields the desired result. Lemma A12:

If a, b ∈ A+ and a ∩ b = ∅, then 0 < ρ(a, ab) < 1.

Proof: Suppose ρ(a, ab) = 1, then (R) implies that there exists c such that c ∩ ab = ∅ and ρ(c, ac) > 1/2 . Hence, by Lemma A4, c ≻ρ a and therefore ρ(c, cb) > ρ(a, ab) = 1, a contradiction. By symmetry, we cannot have ρ(a, ab) = 0 either. 29

Assume ρ satisfies (R) and (I). Then, choose any ao ∈ A+ and define, v¯(ao ) = 1. Then, set v¯(∅) = 0 and for all b ∈ A+ such that b ∩ ao = ∅, let v¯(b) =

ρ(b, bao ) 1 − ρ(b, bao )

Finally, for any b ∈ A+ such that ao ∩ b ̸= ∅, find a ∈ A such that a ∩ bao = ∅ and ρ(a, ab) = 1/2 and let v¯(b) = v¯(a). Lemma A13:

The function v¯ is well-defined and satisfies the following

(i) v¯ : A → IR+ and v(a) = 0 if and only if a = ∅. (ii) v¯(a) ≥ v¯(b) if and only if a ≽ρ b. Proof: To prove that v¯ is well-defined, we first note that by Lemma A12, v¯(a) < ∞ for all a disjoint from ao . Next, suppose a1 , a2 are such that ao b ∩ a1 = ao b ∩ a2 = ∅ and ρ(a1 , a1 b) = ρ(a2 , a2 b). Then a1 ∼ρ a2 and hence, ρ(a1 , a1 a0 ) = ρ(a2 , a2 a0 ) and therefore v¯(a1 ) = v¯(a2 ), proving that v¯ is well-defined. By Lemma A12, v¯ satisfies (i). To prove, (ii), choose c such that c ∩ baao = ∅ and ρ(c, cao ) = 1/2 . Then, by Lemma A4, ao ∼ρ c. For any d ∈ A+ with d ∩ c = ∅, if d ∩ ao = ∅, then ao ∼ρ c implies ρ(d, dc) = ρ(d, dao ) = defined, we have ρ(d, dc) =

v ¯(d) 1+¯ v (d) .

v ¯(d) 1+¯ v (d) .

If d ∩ ao ̸= ∅, then since v¯ is well-

Hence, a ≽ρ b if and only if ρ(a, ac) ≥ ρ(b, bc) if and

only if v¯(a) ≥ v¯(b). Let nc denote the union of some n-element test sequence ci such that each element ci ∼ρ c for some c. Then, by Lemma A11, ρ(nc, anc) is the same for all such sequences provided nc ∩ a = ∅. Hence, from now on, we will let nc denote the union of any element test sequence with each element satisfying ci ∼ρ c. Lemma A14:

If nc ∼ρ ao , then mc ∼ρ b if and only if v¯(b) =

m n.

Proof: Assume nc ∼ρ ao and hence ρ(b, bnc) = ρ(b, bao ). By Lemma A11, ρ(mc, (n + m)c) = v¯(b) =

m n+m .

m n

By definition, ρ(mc, (n + m)c) = ρ(b, bnc) if and only if b ∼ρ mc. Hence

if and only if mc ∼ρ b.

Lemma A15:

ρ(a, ab) =

v ¯(a) v ¯(a)+¯ v (b)

for all a, b ∈ A+ such that a ∩ b = ∅. 30

Proof: First assume that v¯(a), v¯(b) are rational numbers. Then, there exists positive integers k, m, n such that v¯(a) =

k n

and v¯(b) =

is, c such that c ∩ ao = ∅ and ρ(c, cao ) =

1 n+1 .

m n.

Choose c such that nc ∼ρ ao ; that

Note that by Lemma A14, kc ∼ρ a and

mc ∼ρ b and hence ρ(kc, (k + m)c) = ρ(a, amc) = ρ(a, ab). But Lemma A11 implies ρ(kc, (k + m)c) =

k k+m ,

which yields the desired result.

If either v¯(a) or v¯(b) is not a rational number, then for any ϵ > 0, choose rational numbers, r1 , r2 such that r1 < v¯(a), r2 > v¯(b) and

r1 r1 +r2

>

v ¯(a) v ¯(a)+¯ v (b)

− ϵ. Then, choose

c, d such that a, b, c, d are all pairwise disjoint and v¯(c) = r1 and v¯(d) = r2 . Then, by the preceeding argument ρ(c, cd) = Hence, ρ(a, ab) ≥

v ¯(a) v ¯(a)+¯ v (b)

r1 r1 +r2

and by Lemma A13(ii), ρ(a, ab) ≥ ρ(a, ad) ≥ ρ(c, cd).

− ϵ for every ϵ > 0; that is, ρ(a, ab) ≥

argument ensures that ρ(a, ab) ≤

v ¯(a) v ¯(a)+¯ v (b)

v ¯(a) v ¯(a)+¯ v (b) .

A symmetric

and hence the desired conclusion.

To complete the proof the theorem, let vs = v¯({s}).

9.

Proof of Theorem 2 The proof of the if part is straightforward and omitted. We assume R∗ , I∗ and E

throughout this section. Lemma A16:

If a ∩ c = b ∩ d = ∅, a ∼ b and c ∼ d, then ac ∼ bd.

Proof: Assume a ∩ c = b ∩ d = ∅ = abcd ∩ e and c ∼ d. Let s ∈ e and a ∼ b. Then, choose b∗ ∼ b such that b∗ ∩ abcde = ∅ and c∗ ∼ c such that c∗ ∩ abb∗ cde = ∅. By R∗ , this can be done. Then, ρ(s, ace) = ρ(s, ac∗ e) = ρ(s, bc∗ e) = ρ(s, b∗ c∗ e) = ρ(s, b∗ ce) = ρ(s, b∗ de) = ρ(s, bde) as desired. Lemma A17:

∼ is an equivalence relation.

Proof: Clearly, ∼ is reflexive and symmetric. To prove it is transitive, assume a ∼ b ∼ c and let s ∼ d for d such that ac ∩ d = ∅. Case 1: d ∩ b = ∅. Then, ρ(s, ad) = ρ(s, bd) = ρ(s, cd) as desired. Case 2: d ∩ b ̸= ∅ and s ∈ / b. Then, let e = d\b. By R∗ we can choose d∗ ∼ d ∩ b such that d∗ ∩ abcd = ∅. Then, case 1 and d∗ ∼ d ∩ b implies ρ(s, ad) = ρ(s, aed∗ ) = ρ(s, ced∗ ) = ρ(s, cd) as desired. 31

Case 3: s ∈ b. Then, let e = d\{s}. By R∗ we can choose d∗ ∼ {s} such that d∗ ∩ abcd = ∅. Then, case 2 and d∗ ∼ {s} implies ρ(s, ad) = ρ(s, aed∗ ) = ρ(s, ced∗ ) = ρ(s, cd) as desired.

Lemma A18:

a ∼ ∅ implies a = ∅.

Proof: If a ̸= ∅, then by R∗ , there is b such that b ∩ a = ∅ and ρ(a, ab) > 0. Hence, ρ(b, ab) < 1 = ρ(b, b) which means that ρ(s, b) ̸= ρ(s, ab) for some s ∈ b, proving that a is not a duplicate of ∅. Lemma A19:

If a ∼ b ∈ M and a ̸= ∅, then there exists an onto mapping f : b → a

such that s ∼ f −1 (s) for all s ∈ a. Proof: The proof is by induction on the cardinality of |a|. If |a| = 1, then by Lemma A18, b ̸= ∅ and hence the constant function is the desired f . Suppose the assertion is true whenever |a′ | = n and let |a| = n + 1 for some n ≥ 1. By Lemma A18, b ̸= ∅. Choose t ∈ a and let c = {t}, c′ = a\{t}. Since b ∼ a and b ∈ M, there is d ⊂ b such that d ∼ c, d′ := b\d ∼ c′ and d ∩ d′ ∼ c ∩ c′ . Since c and c′ are nonempty, by Lemma A18, neither are d, d′ . Hence, by the inductive hypothesis, there exists a function g from d′ to c′ such that g −1 (s) ∼ s for all s ∈ c′ . Then, let f (s) = t if s ∈ d and f (s) = g(s) if s ∈ d′ . Clearly, f is the desired function. Lemma A20:

a ∈ M if and only if |a| ≥ |b| for all b ∼ a.

Proof: Note that if |a| = 0, then the result follows immediately from Lemma A18. So, assume a ̸= ∅. The only if part follows immediately from Lemma A19. To prove the if part, assume |a| ≥ |b| for all b ∼ a and choose b ∈ M such that b ∼ a. By Lemma A19, there is an onto function f : b → a such that f −1 (s) ∼ s for all s ∈ b. By hypothesis, |a| ≥ |b| and hence f must be a bijection. Consider any a ˆ ∼ a and c ⊂ a ˆ. By Lemma A17, a ˆ ∼ b and therefore, there exists d′ ⊂ b such that c ∼ d′ and a ˆ\c ∼ b\d′ . Let d = {f (s) | s ∈ b}. By Lemma A16, d′ ∼ d and a\d′ ∼ b\d. Again, by Lemma A17, c ∼ d and a ˆ\c ∼ a\d, proving that b ∈ M. Lemma A21:

If a ⊂ b ∈ M, then a ∈ M. 32

Proof: Suppose a ∈ / M and let d = b\a. Then, by Lemma A19, there exists c ∼ a such that |c| > |a|. By strong richness, we can assume c ∩ b = ∅. Then, by Lemma A16, cd ∼ ad = b and |cd| > |ad| = |b|. Hence, by Lemma A20, b ∈ / M. Let B0 = {s ∈ A | {s} ∈ M} and let B0 be the set of all finite subsets of B0 . Let C0 denote the partition of B0 induced by this equivalence relation and let θ : B0 → B0 be a selection from the equivalence classes of (∼, B); that is, θ is any function such that (i) θ(s) ∼ s for all s ∈ B0 and (ii) s ∼ t implies θ(s) = θ(t). Finally, let B = {θ(s) | s ∈ B0 } and let B be the set of all finite subsets of B. Lemma A22:

B0 = M.

Proof: That M ⊂ B0 follows from Lemma A21, while B0 ⊂ M follows from Lemma A16.

Lemma A23:

a ⊂ b ⊥ c implies a ⊥ c.

Proof: Suppose there is a′ ∼ a, c′ ∼ c such that c′ ∩ a′ ̸= ∅. Choose d ∼ b\a such that d ∩ c′ = ∅ and note that d ∪ c′ ∼ a ∪ b ∩ a = b. Hence, b and c are not nonoverlapping Lemma A24:

(i) If s, t ∈ B and s ̸= t, then {s} ⊥ {t}. (ii) For a, b ∈ B0 , a ⊥ b or there

is s ∈ a, t ∈ b such that s ∼ t. (iii) For a, b ∈ B, a ∩ b = ∅ if and only if a ⊥ b. Proof: (i) Suppose, s, t ∈ B, s ∼ a ∈ A and t ∼ b ∈ A such that a ∩ b ̸= ∅. By Lemma A22, s, t ∈ M, Hence, by Lemma A19, |a| = |b| = 1. Hence, a = b and therefore s ∼ t by Lemma A17; that is s ∼ t and by the definition of θ, s = θ(s) = θ(t) = t. (ii) Assume a, b ∈ B and there is a′ ∼ b and b′ ∼ b and s∗ ∈ a′ ∩ b′ . By part Lemma A19, there are functions f, g mapping a, b onto a′ , b′ such that f −1 (s) ∼ s and g −1 (t) ∼ t for all s ∈ a′ and t ∈ b′ . If follows from Lemma A17 that f −1 (s∗ ) ∼ g −1 (s∗ ). By Lemma A21, f −1 (s∗ ), g −1 (s∗ ) ∈ M and hence applying Lemma A19 again yields an onto function h : f −1 (s∗ ) → g −1 (s∗ ) such that h−1 (s) ∼ s for all s ∈ g −1 (t). By Lemma A20, h must be a bijection. Hence, there are s ∈ a and t ∈ a such that t ∼ s. (iii) Assume a, b ∈ B. That a ⊥ b implies a ∩ b = ∅ is obvious. To prove the converse, assume that a ⊥ b does not hold. Then, by part (ii) of this lemma there are s ∈ a and t ∈ a such that t ∼ s. Then, t = s by part (i) of this lemma and hence a ∩ b ̸= ∅. 33

Define the RCR ρB on the proper collection B is as follows: ρB (s, b) = ρ(s, b) for all b ∈ B such that b ̸= ∅. Lemma A25:

ρ(s, ab) = ρ(s, ab0 ) if s ∈ a, b0 ∈ B0 , s ⊥ b0 and b = {θ(s) | s ∈ b0 }.

Proof: Let b0 = b1 b2 · · · bk where s ∼ t if and only if i = j for all s ∈ bi and t ∈ bj . Hence, b1 , . . . bk is the partition on b0 induced by the equivalence relation ∼. First, we note that i ̸= j implies bi ⊥ bj . To see this, note that if bi and bj are not nonoverlapping, then by Lemma A24(ii), there are t ∈ bi and t′ ∈ bj such that t ∼ t′ and hence by Lemma A17, bi = bj and therefore i = j. ∑k Let n(b0 ) = i=1 |bi | − k. The proof is by induction on n(b0 ). If n(b0 ) = 0, then each bi is a singleton and hence b ∼ b0 by Lemma A16 and the result follows. Suppose the result holds whenever n(ˆb0 ) = n and let n(b0 ) = n + 1 ≥ 1. Hence, there is some i such that |bi | > 1. Choose t, t′ ∈ bi such that t ̸= t′ and let ˆb0 = b0 \{t}. By Lemma 23, bi ⊥ s and therefore E implies ρ(s, ab0 ) = ρ(s, aˆb0 ). By the inductive hypothesis, ρ(s, aˆb0 ) = ρ(s, ab) and hence ρ(s, ab0 ) = ρ(s, ab) as desired Lemma A26:

ρB satisfies satisfies R, I and is a Luce Rule.

Proof: Assume a, c ∈ B, a ̸= ∅ and δ ∈ (0, 1). By R∗ , there is ˆb ∈ A such that ρ(a, aˆb) = δ and ˆb ⊥ ac. Again by R∗ , we can choose b0 ∈ M such that bo ∼ ˆb and b ∩ c = ∅. Hence, δ = ρ(a, ˆb) = ρ(a, b0 ). By Lemma A22, b0 ∈ B0 . It follows from Lemma 17 that b0 ∼ ˆb ⊥ ac implies b0 ⊥ ac. Then, Lemma 23 implies b0 ⊥ c and b0 ⊥ a. Let b = {θ(s) | s ∈ b0 }. By Lemma 25, we have δ = ρ(a, ab0 ) = ρ(a, ab). Since b ⊥ c, we conclude b ∩ c = ∅. Hence, ρB satisfies R. It follows from Lemma A24(iii) and I∗ that ρB satisfies I. Then, Theorem 1 establishes that it is a Luce rule. Define cs ⊂ A for all s ∈ B as follows: t ∈ cs if and only if there exists s ∈ b ∈ B0 such that t ∼ b. Let C = {cs | s ∈ B}. Note that for all t ∈ A, there exists b ∈ M = B0 such that t ∼ b (by R∗ and Lemma A22). Pick any t′ ∈ b and let s = θ(t′ ) ∈ B. If s ∈ b then t ∈ cs . Otherwise, let b′ = {s} ∪ (b\{t′ }). It follows from Lemmas 16 and 17 that b′ ∼ b ∼ t and hence t ∼ b′ and again, s ∈ cs . It follows that C is an attribute set for A. 34

Lemma A27:

cs = ct implies s = t.

Proof: Since t ∈ ct , ct = cs implies t ∈ cs and hence {t} ∼ b for some b such that s ∈ b. By Lemma 22, {t} ∈ M and then by Lemma 20, |b| = 1 and hence {t} ∼ {s} and therefore t = θ(t) = θ(s) = s as desired. Let v be the Luce value that induces ρB and let w(cs ) = vs for all s ∈ B. By Lemma A27, w : C → IR++ is well defined and hence (w, C) is an attribute value on A. Let ρw be the RCR that (w, C) induces. We will prove that if ρ satisfies I∗ and E, then it is an WAR by showing that ρ = ρw . Lemma A28:

(i) s ∼ t, s, t ∈ b0 ∈ B0 implies ρ(s, b0 ) = ρ(t, b0 ). (ii) ρ(s∗ , b0 ) =

ρw (s∗ , b0 ) whenever s∗ ∈ b0 ∈ B0 . Proof: (i) Let a = {s′ ∈ b0 | s′ ∼ s}. Then choose c ∼ a\{s} such that c ∩ ab0 = ∅ and let d = b0 \a. Then, note that b\{s} ∼ dc ∼ b\{t} and therefore ρ(s, b0 ) = ρ(s, dcs) = ρ(t, dat) = ρ(t, b0 ). (ii) As in Lemma A25, let b0 = b1 b2 · · · , bk where s ∼ t if and only if i = j for all s ∈ bi and t ∈ bj . Hence, b1 , . . . bk is the partition on b0 induced by the equivalence relation ∼. Recall that i ̸= j implies bi ⊥ bj . Assume without loss of generality that s∗ ∈ b1 . If k = 1, then ρ(s, b0 ) = ρ(t, b0 ) for all s, t ∈ b by part (i) and hence ρ(s∗ , b) =

1 |b|

=

ρw (s∗ , b) as desired. If k > 1, let b = {θ(s) | s ∈ b0 } and let sˆ = θ(s∗ ). By definition, ρ(s, b) = ∑

vs

t∈b

vt

w(s) = ρw (s, b) w(t) t∈b

=∑

for all s ∈ b. Let ˆb = sˆb2 · · · bk and b∗ = s∗ b2 · · · bk . Since s∗ ∼ sˆ and by Lemma A25, w(ˆ s) ρ(s∗ , b∗ ) = ρ(ˆ s, ˆb) = ρ(ˆ s, b) = ∑ = ρw (ˆ s, b) = ρw (s∗ , b∗ )ρw (ˆ s, ˆb). w(t) t∈b Applying Lemma A25 again yields ρ(s, b∗ ) = ρ(s, b0 ) for all s ̸= s∗ ; that is, ρw (ˆ s, b∗ ). But then part (i) implies ρ(s∗ , b) =

1 w ∗ ∗ ρ (s , b ) = ρw (s∗ , b) |b1 |

as desired 35

∑ s∈b1

ρ(s, b) =

To complete the proof of the theorem, consider an arbitrary a ∈ A.

Let a =

{s1 , . . . , sn }. Define bi , ai for i = 0, . . . , k inductively as follows c0 = ∅, a0 = a. For i > 0, choose ci ∈ B0 such that ci ∼ si and ci ∩ aai−1 = ∅. Finally, let ai = ci (ai−1 \{si }). Then, since an ∈ B0 , Lemma A28 implies ρ(s1 , a) = ρ(c1 , b1 ) = · · · = ρ(c1 , an ) = ρw (c1 , an ) = ρw (s1 , a) as desired.

10.

Proof of Theorem 3 Take any q ∈ Ql and let v be the corresponding Luce value. Define w as follows:

w(a) = 0 whenever a is not a singleton and w(i) = vi for all i. Clearly, this attribute value induces q and therefore q ∈ Qw proving Ql ⊂ Qw . Next, we will prove that Qr = cl conv Ql . The first fact below requires no proof. Fact 1: The sets Q and Π are compact and convex. The next fact follows immediately from fact 1 and the definition of “q maximizes π.” Fact 2: If qi maximizes πi for i = 1, 2 and α ∈ [0, 1], then αq1 + (1 − α)q2 maximizes απ1 + (1 − α)π2 . Fact 3: The set Qr is compact and convex. Proof: That Qr is convex follows from facts 1 and 2 above. Next, we will prove that Qr is compact. For any a, b ⊂ A, let b\a = {i ∈ b | i ∈ / a} and ac = A\a}. Let [a, c] = {b ⊂ A | a ⊂ b ⊂ c}. Falmagne5 (1978) showed that q ∈ Q if and only if

0≤

∑

(−1)|b | qib c

(A5)

b:a⊂b n(2n −1)

for all i ∈ A and a ∈ A+ . Let C be the subset of IR+

that satisfies the inequalities

above. Clearly, C is closed and by Falmange’s theorem Qr = Q ∩ C. Since Qr is the intersection of a closed and (by fact 1) compact set, it too is compact. 5

Block and Marschak (1960) introduced the inequalities (A5) and identified them as necessary conditions for q ∈ Q to be an element of Qr .

36

The following well-know result was first proved by Block and Marschak (1960): Fact 4: Ql ⊂ Qr . Facts 3 and 4 imply cl conv Ql ⊂ Qr . The fact below establishes the reverse inclusion and yields cl conv Ql = Qr Fact 5: For every ϵ′ > 0 and q ∈ Qr , there exists qˆ such that |q − qˆ| < ϵ′ . Proof: Assume 0 < ϵ < 1 and for any u ∈ U , define the Luce value v ϵu as follows: viϵu = ϵui −1 u

Let δ u be the degenerate random utility that assigns probability 1 to u and let q δ be the RCR that maximizes δ u and let q v

ϵu

be the RCR induced by v ϵu . It is easy to

ϵu

u

see that as ϵ becomes arbitrarily small q v gets arbitrarily close to q δ . It follows that ∑ ∑ ϵu u q := u∈U πu q v ∈ conv Ql gets arbitrarily close to q ∗ := u∈U πu q δ . ∑ Note that π = u∈U πu δ u and hence by fact 2 (and a simple inductive argument) ∑ δu q∗ = maximizes π. Hence, for every q ∗ ∈ Qr we can find q ∈ conv Ql u∈U πu q arbitrarily close to q ∗ . Lemma A29:

Qw ⊂ Qr .

Proof: Fix a q in Qw and let w be the attribute value that induces q. ¯ Let v¯a = w(a) for all Let A¯ = {a | w(a) > 0} and let A¯ be the set of subsets of A. a∈a ¯. Interpret the function v¯ as a Luce value on A¯ and let ρ¯v¯ be the RCR induced by v¯ ¯ that is, for a ¯ on A, ¯ ∈ A¯+ and b ∈ A, { v ¯

ρ¯ (b, a ¯) =

v¯b / 0

(∑ a∈¯ a

v¯a

)

if b ∈ a ¯ if b ∈ /a ¯

¯ and let Π∗ be the set of all Let U ∗ be the set of all bijections from A¯ to {1, . . . , |A|} probability distributions on U ∗ . Define, for any b ∈ a ¯, [b¯ a] = {u∗ ∈ U ∗ | u ¯∗ (b) ≥ u ¯∗ (c) for all c ∈ a ¯} and call the RCR ρ¯ the maximizer of η ∈ Π∗ if ρ¯(b, a ¯) =

∑ u∗ ∈[b¯ a]

37

η(u∗ )

¯ Applying Fact 4 to this new setting yields η ∗ ∈ Π∗ such that for all b ∈ a ¯ ∈ A. ρ¯v¯ (b, a ¯) =

∑

η ∗ (u∗ )

(A6)

u∗ ∈[b¯ a]

¯ for all b ∈ a ¯ ∈ A. ¯ be the set of all bijections from A to {1, . . . , 2n − 1} and let Π ¯ be the set of Let U ¯ . We extend each u∗ ∈ U ∗ to U ¯ by choosing u ¯ such all probability distributions on U ¯∈U that (i) u ¯(a) ≥ u ¯(b) if and only if u∗ (a) ≥ u∗ (b) for all a, b ∈ A¯ and (ii) u ¯(a) > u ¯(b) if ¯ b ̸∈ A. ¯ Let η¯∗ ∈ Π ¯ be the random utility corresponding to an extension of η ∗ . a ∈ A, ¯ , let For any a ∈ A such that a ̸= ∅ and u ¯∈U ba¯u = arg Define

{ u ¯

ρ (i, a) =

max

{b |b∩a̸=∅}

1 |a∩bau ¯|

0

u ¯(b)

if i ∈ ba¯u otherwise

(A7)

¯ and i ∈ a ∈ A+ , let For any η ∈ Π ρη (i, a) =

∑

η(¯ u)ρu¯ (i, a)

(A8)

¯ u ¯ ∈U

¯ and (2) q = ρη¯∗ . Note We will prove that q ∈ Qr by showing (1) ρu¯ ∈ Qr for all u ¯∈U that (1) and (2) together establish that q is a convex combination of RCRs that are in Qr which together with Fact 3 above yields q ∈ Qr . Define ¯1 = {¯ ¯ | |b| > |a| = 1 implies u U u∈U ¯(a) > u ¯(b)} and ¯ 1 = {η ∈ Π | η(¯ ¯1 } Π u) > 0 implies u ¯∈U Claim:

¯ there is µu¯ ∈ Π1 such that ρu¯ = ρµu¯ For every u ¯∈U

Proof of Claim: If u ¯ ∈ U1 then µu¯ = δ u¯ where δ u¯ yields u ¯ with probability 1. Otherwise, let a be the u ¯-maximal element of A+ that is not a singleton. Let d = {i ∈ a : u ¯({i}) > ¯ →Π ¯ as follows: u(a)} and define the function T0 : U 38

′ ¯ that ranks a at the bottom and ranks (i) If d = a let T0 (¯ u) = δ u¯ for the unique u ¯′ ∈ U

all other pairs the same way as u ¯. ′ ¯ that ranks a at the bottom and (ii) If ∅ ̸= d ̸= a let T0 (¯ u) = δ u¯ for the unique u ¯′ ∈ U

ranks a\d the same way as u ¯ ranks a; u′ ranks all other pairs the same way as u ¯. (iii) If d is empty, define c¯ = {{i} : u ¯({i}) > u(a)} and let a ¯ = {{i} : i ∈ a}. Let Uu¯′ be the ¯ such that (i) u′ ranks elements of c¯ above elements set of all utility functions u ¯′ ∈ U not in c¯ and ranks pairs in c¯ the same way as u ¯; (ii) u′ ranks a at the bottom; (iii) u′ ranks elements in a ¯ ∪ c¯ above element not in a ¯ ∪ c¯; (iv) the ranking of all pairs b, b′ ̸∈ {a} ∪ c¯ ∪ a ¯ is unchanged. (Notice that there are |a| rankings in Uu¯′ ). ∑ It is straightforward to verify that ρT0 (¯u) = ρu¯ . Next, let T1 (π) = u¯∈U¯ π(¯ u)T0 (¯ u) and for n > 1, let Tk (η) = T1 (Tk−1 (η)). Once again, it is easy to verify that ρT1 (η) = ρη . ¯ 1 . Then, a Clearly, T0 (¯ u) = u ¯ whenever u ¯ ∈ U1 . Hence, for k sufficiently large, Tk (η) ∈ Π simple inductive argument ensures that ρu¯ = ρTk (¯u) proving the claim. ¯ , let l(¯ For any u ¯∈U u) ∈ U be the unique u ∈ U such that ui ≥ uj if and only if u ¯{i} ≥ u ¯{j} . Let L(η) be the π ∈ Π such that ∑

π(u) =

η(u)

{¯ u: l(¯ u)=u} ¯ ¯1 , ρδl(u) ¯ 1 . Together with Note that for all u ¯∈U = ρu¯ and, therefore, ρη = ρL(η) for η ∈ Π

the claim above, this proves (1), ρu¯ ∈ Qr . ¯a (c) = {¯ ¯ai = {¯ u | i ∈ ba¯u }. Then, (A6) − (A8) To prove (2), let U u | ba¯u = c} and U imply, ∗

ρη¯ (i, a) =

∑ ¯ u ¯ ∈U

=

∑

η¯∗ (¯ u)ρu¯ (i, a) = ∑

¯a (c) c∈C i u ¯ ∈U

=

∑

c∈C i

∑

η¯∗ (¯ u)ρu¯ (i, a) =

¯i u ¯ ∈U a

η¯∗ (¯ u)

1 = |a ∩ c|

∑ ∑ ¯∈[cC a ] c∈C i u

∑ ¯i u ¯ ∈U a

η¯∗ (¯ u)

η¯∗ (¯ u)

1 |a ∩ ba¯u |

∑ 1 ∑ 1 = η¯∗ (¯ u) |a ∩ c| |a ∩ c| a i

∑ 1 1 w(c) ∑ ρ¯v¯ (c, C a ) = = qia . |a ∩ c| |a ∩ c| a w(d) d∈C i c∈C

39

c∈C

u ¯∈[cC ]

11.

Proof of Theorems 4 and 5 First, we will prove Lemma 1. Take any q ∈ Q and assume that there exists some

dual totally monotone κ such that qia = Lκa for all a ∈ A+ and i ∈ A. Since κ is dual monotone, there exists a probability γ on A+ such that ∪

κ(a) =

γ(b).

b∈C a

For c ∈ A+ , let γc denote the element of Γ1 that assigns probability 1 to c. Clearly, {

if b ∩ c ̸= ∅ otherwise if a ∩ b ∩ c ̸= ∅ otherwise

1 0 { γc κa (b) = 1 0

κγc (b) =

for all a ∈ Ac . Let κA denote κ for any characteristic function κ. Then, the definition of Lκa yields c κγ a

L

{ (i) =

if i ∈ a ∩ c otherwise

1 |a∩c|

0

It is easy to verify that κγa =

1 ∑ γ(c)κγc κ(a) c∈A+

and since the Shapley value is linear, it follows that γ

γ

∑

γ(c) d∈Aa γ(d) c:i∈a∩c |a ∩ c|

Lκa (i) = Lκa (i) = ∑

1

This proves lemma 1. To prove Theorem 2, note that the last display equation reveals that q is the RCR induced by the attribute value γ and hence an WAR. For the converse, suppose w induces the WAR ρ and let γ(a) = ∑

w(a) . w(c)

c∈A+

Note that the attribute value

γ ∈ Γ also induces ρ. Then, the display equation above implies qia = Lκa (i) for all a ∈ A+ and i ∈ A. 40

11.1

Proof of Theorem 5

Again, we will first prove Lemma 2. To prove (i), assume κ ∈ K and define κ′ , the dual of κ as follows: κ′ (a) = 1 − κ(ac ) Clearly, κ′ is also a capacity. Then, define γ ∗ : A → IR by Möbius inversion; that is, let γ ∗ be the unique function on A such that κ′ (a) =

∪

γ ∗ (b)

b⊂a

for all a ∈ A. It follows that ∑

κ(a) = 1 − κ′ (a) =

γ ∗ (b).

b∈Aa

Since κ(∅) = 0, γ ∗ (∅) = 0. Then, define γ as the restriction of γ ∗ to A+ and note that, κ(a) =

∪

γ ∗ (b) =

∪

γ(b).

b∈Aa

b∈Aa

Since κ = κγ . To prove (ii), replace Γ1 with ΓK in the corresponding part of the proof of Lemma 1. Similarly, the derivation of Corollary 4 from part (ii) of Lemma 2 is identical to the derivation of Theorem 4 from Lemma 1.

41

References Berkson J (1944). ”Application of the logistic function to bio-assay”. Journal of the American Statistical Association, 39(227): 35765. Bliss C. (1934), The Method of Probits, Science, 79: 38-39. Block and Marschak (1960), Random orderings and stochastic theories of responses, In I. Olkin, S. Ghurye, W. Hoeffding, W. Madow, & H. Mann (Eds.), Contributions to probability and statistics. Stanford, Calif.: Stanford University Press, 1960: 97-132. Debreu, G., (1960), “Review of Individual choice behaviour by R. D. Luce,” American Economics Review, 50: 186188. Dekel E., B. Lipman and A. Rustichini (2001), “Representing Preferences with a Unique Subjective State Space,” Econometrica, 69 (4): 891 - 934. Dekel, Lipman and Rustichini (2009), “Temptation Driven Preference,” Review of Economic Studies, 76(3): 937 - 971. Ergin and Sarver (2010), “A Unique Costly Contemplation Representation,” Econometrica, forthcoming. Falmagne, J. C. (1978), “A representation theorem for finite random scale systems,” Journal of Mathematical Psychology, 18(1): 52-72. Gul F. and W. Pesendorfer (2001), Temptation and Self-Control, Econometrica, 69(6): 1403 - 1435. Huber, J., J.W. Payne and C. Puto, (1982)“Adding asymmetrically dominated alternatives: violations of regularity and the similarity hypothesis.” Journal of Consumer Research, 9: 9098. Huber J. and C. Puto (1983), “Market Boundaries and Product Choice: Illustrating Attraction and Substitution Effects,” The Journal of Consumer Research, 10, (1): 31-44. Kreps D. (1979), A Representation Theorem for ”Preference for Flexibility” Econometrica, 47(3): 565-577. Kreps D. and E. Porteus (1978), Temporal Resolution of Uncertainty and Dynamic Choice Theory, Econometrica, 46(1): 185-20. Luce, R.D. and P. Suppes, (1965), Preference, utility, and subjective probability. In: Luce, R.D., Bush, R.R. and Galanter, E., Editors, 1965. Handbook of mathematical psychology Vol. 3, Wiley, New York, 249410. Luce, R.D., (1959). Individual choice behavior: a theoretical analysis. , Wiley, New York. 42

Luce, R. D. (1977), “The Choice Axiom after Twenty Years,” Journal of Mathematical Psychology, 15: 215-233. K¨ oszegi B. and M. Rabin’s (2006), “A Model of Reference-Dependent Preferences,” Quarterly Journal of Economics, 121,(4): 1133-1165. McFadden, (1978). D., Modelling the choice of residential location. In: S. Karlqvist, L. Lundquist, F. Snickers and J. Weibull, Eds., Spatial Interaction Theory and Planning Models, North-Holland, Amsterdam (1978). Manzini M. and P. Mariotti’s (2007), “Sequentially Rationalizable Choice,” American Economic Review, 97(5): 1824-1839. Noor J. (2010), Temptation and Revealed Preference, Econometrica, forthcoming. Pollak, R. A. (1968), “Consistent Planning,” Review of Economic Studies, 35, (2): 201-208. Peleg B. and M. E. Yaari (1973), “On the Existence of a Consistent Course of Action when Tastes are Changing,” Review of Economic Studies, 40, (3): 391-401. Shapley, L. S. (1953), A Value for n-Person Games. In: Contributions to the Theory of Games II. H. W. Kuhn and A. W. Tucker, eds. Princeton, Princeton University Press. 307-317. Simon, H. (1978), “Rationality as Process and as Product of Thought,” American Economic Review, 68, (2): 1-16. Simonson, I. (1989), Choice based on reasons: the case of attraction and compromise effects, Journal of Consumer Research, 16, 158174. Stovall, J. E. (2010), Multiple Temptations, Econometrica, 78(1), 349 - 376. Strotz, R. H. (1955), Myopia and Inconsistency in Dynamic Utility Maximization Review of Economic Studies, 23,(3): 165-180. Thurstone, L.L., (1927), A law of comparative judgment. Psychological Review 34, 273286. Zermelo, E. (1929), Berechnung der Turnier-Ergebnisse als ein Maximum-problem der Wahrscheinlichkeitsrechnung, Mathematische Zeitung 29, 436460.

43

Random Choice as Behavioral Optimization - CiteSeerX

Random Choice as Behavioral Optimization - CiteSeerX

Suggest Documents

Firms as random walks - CiteSeerX

random sampling in graph optimization problems - CiteSeerX

Behavioral Public Choice: A Survey

Behavioral Public Choice: A Survey

Using Behavioral Theories of Choice to Predict Drinking ... - CiteSeerX

Random Utility-Based Discrete Choice Models for Travel ... - CiteSeerX

Flight Capital as a Portfolio Choice - CiteSeerX

Exercise as a Healthy Lifestyle Choice - CiteSeerX

Random Costs in Combinatorial Optimization

Contingent Choice Behavioral Models in the ...

Generation-dependent female choice: behavioral ... - Oxford Journals

Classifying Behavioral Attributes Using Conditional Random Fields

A Global Optimization Heuristic for Portfolio Choice with ... - CiteSeerX

A Global Optimization Heuristic for Portfolio Choice with ... - CiteSeerX

Reliable ABC model choice via random forests

Random-Choice Replication of ... - Journal of Virology

The Power of Choice for Random Satisfiability

The determinants of random choice - Springer Link

Optimization of a Dynamic Random Surface Code for ... - CiteSeerX

as a career choice as a career choice

Temporal Preference Optimization as Weighted Constraint ... - CiteSeerX

Making Behavioral Activation More Behavioral - CiteSeerX

Behavioral and Cognitive-Behavioral Approaches to ... - CiteSeerX

Comparison Between Swarm Intelligence Optimization and Behavioral ...