soundness conditions for prescriptive decision

SOUNDNESS CONDITIONS FOR PRESCRIPTIVE DECISION ANALYSIS Richard J. Wallace and Ilya Ashikhmin Cork Constraint Computation Centre and Department of Computer Science University College Cork, Cork, Ireland email: [email protected], [email protected]

ABSTRACT Prescriptive decision analysis involves both formal methods and judgments by the decision maker. Consequently, prescriptive methods must meet two kinds of conditions for soundness, one related to formal adequacy, the other to the correspondence of models and methods with actual human capacities. This work proposes guidelines for evaluating these methods based on whether the features of human judgments needed to produce the requisite preference ordering support the assumptions of the models. This is distinct from a model’s descriptive adequacy.

INTRODUCTION More than 40 years ago, Peter Fishburn wrote an article on how utility is measured in practice [10]. The article is oddly unsatisfactory in certain respects, because, despite the care with which the author catalogued and characterised these methods, no attempt was made to assess their adequacy as measures of “utility”. In this paper, we address this problem, and make some tentative proposals regarding the criteria by which such adequacy can be judged. Our concern is with prescriptive decision analysis (herafter referred to as PDA). Beginning in the late 80’s, this category has been used to designate the application of formal or semi-formal procedures, based in part on assumptions about rationality, in order to aid the decision maker (DM). PDA can be distinguished from an older concept, now called normative decision analysis, which is concerned with characterising rational decision making according to first principles. Both, in turn, can be distinguished from descriptive decision analysis, which is concerned with decision making as an empirical phenomenon. A concept pivotal in all three contexts is utility, usually conceived as a “subjective scale of values”. In the descriptive context, this term designates an order-preserving function that characterises a preference relation. In the prescriptive context, it designates a scale, derived from interaction between decision analyst and decision maker that is intended to serve as the basis for making decisions, i.e. choosing from among a set of alternatives. The benefits of using such a scale are several: (i) one can express an indefinite number of alternatives succinctly, (ii) one can handle tradeoffs among alternatives, (iii) one can predict selections based on prior judgments. Here, we are especially concerned with multi-criteria decision analysis (MCDA). This usually involves aggregated forms of utility, in which preferences over many attributes are combined, e.g. an additive utility function [18]. Because of the complex assumptions that go into the construction of aggregate scales and the different conditions under which scales are constructed (e.g. for attributes with continuous versus finite domains), MCDA now includes a wide range of models and procedures [9].

- 4401 -

A pivotal fact in all of this is that “preferences” in prescriptive contexts are based on human judgments. This means that the interpretation and validity of preference representations derived using models in MCDA depend in part on the correspondence between operations posited by the models and the capacities and activities of human beings. Moreover, representations derived from human judgments must have some correspondence to the resulting preferences reflected in actual choices for PDA to be meaningful. To elucidate the issues involved, we use the terms, soundness-in-conception and soundness-in-use. The former refers to the formal adequacy of a method of (prescriptive) decision analysis, the latter to the degree to which the assumptions of the method are realised in practice. Thus, although utility scales derived in the prescriptive context are in a sense artificial, they are not arbitrary, provided they are derived using procedures that fulfil the requisite soundness conditions. This implies that PDA must be viewed from the perspective of human judgments, in addition to a perspective based on formal representations of preference orderings. This entails considering relations between prescriptive and descriptive analysis in some depth. In fact, the argument can be made that this constitutes a proposal for a distinct subarea that could be called ”descriptive prescriptive decision analysis”.

TERMINOLOGY AND NOTATION

of In this paper, we consider preferences and multi-attribute alternatives in the usual way. Given a set outcomes or alternatives, preferences are represented as a binary relation . The typical interpretation of and then iff given a choice between and , will be this relation is that, for chosen. The binary relation indicates the case where there is no consistent choice in favour of either alternative or where DM expresses indifference between them. Also, if the preference relation has the properties of transitivity and completeness, then one can define an order-preserving function, , called a utility function.

In the present paper we distinguish between different expressions of preference, and in particular between preference judgments for outcomes and physical selections (choices). In some cases we distinguish preference relations based on different types of judgment. This is expressed by subscripts on the basic notation for preference, e.g. refers to a preference based on overt choices of over , refers to one based when judgments are expressed as direct assignments of on preference judgments in favour of , numbers to alternatives (ratings), when judgments express ratios of value, etc.

! #"%$&'

In addition to additive utility under riskless choice, we discuss expected utility. To distinguish these two forms of multi-attribute utility we used the acronyms MAVT and MAUT. These are based on the terminology of [18], who use “value” and “utility” for the concepts we refer to as utility and expected utility.

THE UTILITY CONCEPT Vicissitudes Because of the tenuous nature of the concept, the history of utility has been characterised not only by exegesis and debate but by striking shifts and reversals. Its modern use dates from the late 19th century when Jeavons and others tried to explain economic exchange using the notion of satiety with increased consumption (leading to diminishing “final” or “marginal” utility) [14]. A notable example of reversal in perspective occurred when Pareto ‘deduced’ a notion of utility (called “opthelimity”) from indifference curves [22]. The latter had originally been introduced to clarify certain features of the hypothesized utilities, but now it was made to serve as the foundation for the concept. After another qualitative shift, utilities (more precisely, utility functions) were derived from binary relations on sets of alternatives (preference relations). In addition to developing a more adequate formulation of the concept in the form of an order-preserving - 4402 -

function (enhanced conceptual soundness), this also served to anchor the concept empirically in observed sets of preferences. Another shift occurred at the inception of MCDA, when the idea of marginal rates of substitution expressed by indifference curves was transferred from its original context of economic exchange and applied to individual decision making [18]. This, together with ideas of independence, enabled decision theorists to establish conditions that allowed utilities to be aggregated using addition. It also provided a means for determining constants of proportionality. Thus were established conditions for soundness-in-conception. At the same time, it is not obvious that this new application of the concept of marginal rates of substitution meets conditions of soundness-in-use. In its original form it could be justified by reference to averages over large numbers of economic agents; but in the individual case it assumes certain forms of comparative judgment, and the validity of this requires independent justification. A related concept that has been central to the field of decision analysis since the mid 1940’s is expected utility. In its original form, ‘expected utility’ was actually of the riskless type, but it was combined with probabilities in the form of mathematical expectations [4]. Starting with von Neumann and Morgenstern, a more radical version evolved in which lotteries themselves were ordered. With further development, which clarified the interpretation and led to a simplified derivation of this form of utility scale, this has come to be the dominant interpretation [11]. Expected utility offers another case study of the problem of soundness-in-use. Although the von NeumannMorgenstern axioms are sometimes characterised as obvious or commonsensical, it is not obvious that they really serve – even abstractly – as an acceptable characterisation of human judgment. An example is the axiom of continuity, which involves infinite divisibility of the utility scale (and which is critical to the proof of an interval scale of expected utility). This problem inspired some authors to devise an approach to expected utility based on finite sets of alternatives [7].

Interpretations If we are to generate utility scales or preference orderings in the prescriptive context, we need some interpretation of the basis for scaling or ordering. That is, since we know that people can make judgments (e.g. comparisons or rankings) in this arena with some degree of consistency, we must have some conception of the basis for these judgments. Unfortunately, it is not clear what people are actually doing in these circumstances. Unlike cases where judgments are assessments of stimulus properties (”Which of these two lights is brighter?”) or are based on external cues (e.g. ”Given these scores on these tests, should the individual be considered maladjusted?”), preference or utility judgments do not have clear external correlates. In fact, different authors have given different accounts of the basis for such judgments. Most commonly, utility has been associated with happiness or pleasure; this was the interpretation of Jeavons, and of Bentham before him, and has continued to the present day (e.g. [15]). The major problem with this interpretation is that it appears to confuse the enjoyment of a commodity with the factors responsible for its selection (prior to any possible enjoyment), or with the basis of evaluation of alternatives. It is also difficult to account for the use of utilities in the prescriptive context when the agent considered is a business or administrative entity. In such cases, DMs evaluate alternatives in terms of benefit to a collective concern, rather than to themselves. (For an example, see [17], Chap. 12.) The latter consideration also poses problems for an alternative view of utility – as want or desire. Another interpretation of utility is in terms of assessment of future benefit. An early expression of this can be found in Bentham’s definition, also quoted by Jeavons (which is oddly at variance with their subsequent discussions): “By utility is meant that property in any object whereby it tends to produce benefit, pleasure, good or happiness or ... to prevent the happening of mischief, pain, evil or unhappiness to the party whose - 4403 -

interest is considered” [3], p. 2. This interpretation can be extended to cases where the entity benefited is an organisation. Interpretation is especially problematic in the case of expected utility. Here, one can either combine one of the above interpretations with a notion of psychological expectations ( a` la Bernoulli), or assume that people are scaling expectations (a` la Von Neumann and Morgenstern). None of this brings into question the interpretation of utility functions as representations of certain kinds of preference orderings, since this follows directly from the representation theorems, cf. [21]. This is simply to say that, given a set of alternatives and a preference relation with the requisite properties, then there exists an associated order-preserving function. However, under this circumscribed interpretation utility has no explanatory force. And, in particular, it does not give adequate empirical support for the construction of utility scales in the context of PDA. Of course, these scales are utility functions in the formal sense. The important question concerns the empirical status of the preference relations implied by these functions, given that the latter are derived from certain kinds of human judgment. In the first place, it must be shown that these in situ judgments are consistent with the formal properties that underlie the theoretical constructs. It must also be shown that the scales have some sort of predictive power in regard to actual choices. Both requirements involve conditions that constitute soundness-in-use. To clarify these points, consider an extreme example. We order a set of alternatives according to our fancy so that the ordering has the requisite formal properties. Then we put a gun to the DM’s head and ask him to accept this ordering. If he does, then we have obtained an ordering and a utility scale. Obviously, this is not a satisfactory method of generation in practice. (But it would be if deriving a utility function were a sufficient criterion for acceptance.) Nor does the utility scale obtained in this manner require an interpretation in terms of processes intrinsic to the DM, other than fear.

UTILITY SCALES AS CONSTRUCTIONS Aggregated utility In prescriptive decision making, utilities are derived for alternatives with several or many attributes. In order to handle such entities, strategies have been devised to decompose a complex alternative into a set of attributes, each with a set of values. Utilities (or simply orderings) are then established for each set of attribute values, and these subutilities are then combined in some fashion to obtain a scale or ordering across alternatives. In some ways, the most extreme aggregation strategy is to derive additive utilities for alternatives, that are based on linear combinations of subutility scale values, as in the MAVT/MAUT framework. Within this framework, additivity is justified by independence assumptions along with empirical or formal methods for determining scaling constants (determining marginal rates of substitution in MAVT, and solving simultaneous equations in MAUT [18]). If these assumptions can be met and the procedures can be carried out correctly, then one is justified in constructing a utility scale that is additive in character. However, care is required to justify this manouver. Addition does not have a direct psychological basis, since there is no evidence that psychological effects sum in this manner (cf. [29]). Instead, additive utility is essentially an artificial concept, which seems to represent an assessment of overall benefit rather than reflecting immediate pleasure or desire. Additive utility can, therefore, be said to epitomize the artificiality of derived preference representations. The above conclusions are not refuted by the existence of linear additive models of attitude and judgment. Such models have gained wide currency as representations of diagnostic and predictive judgments based on multiple cues [16]. However, linear additive models are not inconsistent with averaging effects, which - 4404 -

can be represented by the appropriate coefficients. In addition, linear models of judgment do not assume uncorrelated independent variables. At the same time, they do support the idea that addition is a natural form of combination for human judges, especially since there is little evidence that interaction terms improve the representations.

The nature of the artificiality Clearly, the ‘artificiality’ of prescriptive utilities lies in the methods of combination. This reflects the fact that people cannot always handle combinations of attribute values in a coherent and reliable manner, whether because of limitations in working memory or because they have no ready means of balancing tradeoffs intuitively. Introducing even a second attribute in a decision task can produce effects such as intransitivity and the Allais paradox. This is also shown by the frequency of inconsistent choices, even with a limited number of attributes and attribute values, e.g. [19]. At the same time, it is reasonable to assume that there are a priori preferences holding among attributes and among the values associated with a single attribute. These are the necessary building blocks for the artificial construction.

CONDITIONS FOR SOUNDNESS-IN-USE In this paper we are trying to distinguish a certain class of conditions that we call collectively “soundnessin-use”. This term is meant to reflect the fact that these conditions, or criteria, pertain to the application of a prescriptive model, above and beyond its strictly formal features. To date we have been able to delineate the following criteria: 1. Judgments or choices made in practice must be based on psychological processes whose features exhibit relations consistent with those specified by the preference model. 2. The methods must be consistent with certain principles of rationality. 3. The methods must support reliable, i.e. repeatable, judgments. Although these criteria, and in paricular criterion (1), can be characterised formally, whether or not they actually hold in practice is always an empirical question. And as we show, each of these criteria can support evaluation of specific methods of preference elicitation. In addition, we mention another, less specific criterion: that the prescriptive method be consistent with a viable interpretation of utility. We have already discussed additive utility models, which are consistent with utility interpreted as future benefit, or as an assessment of fitness stemming from a decision. As a potentially problematic example we mention multiplicative utility models, since it is not clear whether this interpretation is as compelling in this case (while strictly psychological interpretations are even less compelling).

(

)

(

Criterion (1) in essence assumes mappings and between the entities and utilites, respectively, as defined by a preference model and the entities and utilities derived from human judgments or choices. is meant to express equivalence of relations:

*+ iff (, - /.,(0 1 where * is a relation between entities and in the model and . is a relation between real-world entities 32 and 42 such that (, - 65732 and (, 8 95:42 . ) is meant to express equivalence in operations, either between utilities in a model or between judgments of preference by DM. Thus, if there is an operation ; * such that < =5> @? ; *A - 4405 -

; . the following must hold: )B - % =5C)B < @? % ; .D)B < % An alternative way of stating the last requirement is that the mapping ) must be a homomorphism. then for the corresponding operation

Criterion (1) is closely related to the procedure invariance principle, cf. [30]. However, because we are dealing with the prescriptive context, we are most concerned with discrepancies between judgments and models as opposed to discrepancies among judgments.

FORMAL CONSTRUCTS VS. PSYCHOLOGICAL PROCESSES In this section, we consider several forms of judgment that have been described in the literature on decision analysis, to evaluate their conformance with criterion (1) above. In doing this, we draw on evidence from “psychophysics”, a field concerned with judgments of quantitative features of perceptions and cognitions, such as perceived brightness and loudness, stimulus duration, and strength of attitudes [12]. Practical utility assessment, therefore, also falls within the purview of psychophysics. Violations of criterion (1) can take two distinct forms. The most serious is when the model in question requires judgments whose characteristics are not commensurate with human capacities. In the less serious case, certain types of judgment have features that make them inappropriate in practice, but since the model does not require judgments in this form, others can be substituted.

Riskless choice methods Ratio judgments. A number of utility models involve assessments of ratios between quantities. In some cases, DMs are explicitly asked to make ratio judgments. The most prominent examples are SMART (a simplified MAVT-based model [8]) and AHP (a model in which both weights on attribute values and subutility scales are represented by eigenvalue vectors [25]). In SMART, DMs make ratio judgments meant to represent relative importance of attributes. In AHP, values on a discrete 9-point scale represent ratios of either importance or value [25]. In MAVT itself, although the relation between marginal rates of substitution is ratio in conception, the judgment required is one of indifference between two attributes values. Now, it has been shown in an entirely convincing manner that subjective judgments of ratios in sensory/perceptual dimensions involve the same mental operations as judgments of difference, and that this mental operation is best described as a subtractive process, i.e. a judgment of difference [5] [13]. This means that direct ratio judgments do not conform to the original models, which can only be realised in practice by genuine ratio judgments. In other words, they violate equivalence conditions under mapping . As noted above, this criticism does not apply to the basic MAVT procedure. In other words, here supports the necessary equivalence. In this respect, therefore, MAVT meets conditions for soundness-in-use as well as those for soundness-in-conception.

( EFG&H

Ratings. It is not uncommon for simple ratings of attributes or alternatives to be used to generate utility scales. Especially in the context of additive utility, this is a highly suspect procedure. In addition to the wellknown range effects, ratings may also be affected by the human propensity to combine stimulus dimensions by averaging. As a result, it is possible for people to rate a composite of positively rated values for two or more attributes lower than a single attribute value alone [1] [29]. This is because people rate single attributes and collections of attributes on the same scale. But this flatly contradicts the assumption of additivity. In other words, violates the homomorphic properties of .

I

)

In this case, the questionable soundness of this method does not imply that models used in association with this method are fundamentally unsound. The implication is only that direct ratings should not be used to scale utilities or to form additive utility functions. The problematic character of this method of judgment may account for some of the discrepancies found in comparative studies of MCDA techniques, e.g. [20]. - 4406 -

Risky choice methods When we consider utility models that combine probability and value, issues related to the interpretation of utility and to soundness-in-use are even more significant. In this context, it is worth noting that it is not clear that models which involve expectations are descriptively valid. This is brought out very clearly by a study of preference judgments based on duplex gambles involving both gains and losses [26]. In this situation, a linear regression model, in which probabilities and payoffs are treated as separate elements and each given an appropriate weight, gave a much better account of preferences for gambles (in the form of bids or ratings) than did models involving expectations. At the same time, the above considerations do not in any way rule out the use of expected utility models in a prescriptive context. In such cases, all that is necessary is that expected utility models can be mapped to human judgments in a way that supports soundness-in-use. (This example, incidentally, shows how vital it is to distinguish soundness-in-use from descriptive validity.) Derivation of utility scales within an expected utility framework such as MAUT requires probability judgments, usually in the form of “certainty equivalents”, where the DM compares a 50:50 gamble involving the best and worst alternatives with a single alternative, assumed to occur with probability 1. (Probabilities can are normally used; among other things, this take on other values, but in practice equal values for and avoids discrepancies between subjective and objective probabilities, assuming that probabilities sum to 1.)

J

K