Note: This is the submitted version of a paper that appears in the Journal of Multi-Criteria Decision Analysis http://onlinelibrary.wiley.com/doi/10.1002/mcda.1568/full Deflecting Arrow by Aggregating Rankings of Multiple Correlated Criteria According to Borda Neal D. Hulkower* 208 SW Eckman St McMinnville, OR 97128 Phone: (503) 857 0628
[email protected] and John Neatrour PO Box 248 Hume, VA 22639 Phone: (540) 364 2163
[email protected]
*
Corresponding author 1
Abstract This paper has two objectives. The first is to review and address concerns raised by Hazelrigg that Arrow’s Impossibility Theorem prevents the selection of rational aggregation methods for use in engineering trade studies. In addressing these concerns, the work of Saari is cited to establish the fact that the Borda Count is the only “non-dictatorial” positional voting method that satisfies the criteria for a rational decision procedure while using complete information. Hence the resulting rank ordering of the alternatives is the most reliable outcome. Several previous studies that use other aggregation methods are critiqued and Borda is applied to examples to illustrate the differences in the outcomes. The second objective is to extend the applicability of Borda to include attributes such as cost, schedule duration, and certain technical and performance measures that are generally more reasonably described as correlated random variables. Exact Probabilities by Simulation with Borda, a method introduced by Hulkower that improves a technique by Book for determining which candidate in a trade study is the probable lowest-cost alternative, is generalized to include multiple correlated criteria each of which is expressed as a random variable and thus incorporates probabilistic uncertainty. Keywords Arrow’s Impossibility Theorem, Borda Count, Multicriteria Decision Making, Trade Studies
2
1 Introduction Regardless of what it is otherwise called, a trade study comprises five elements: a set of alternatives, assessment criteria, the assessments against each criterion for each alternative, a method for aggregating the assessments, and a decision-making authority. While recognizing the importance and challenges associated with selecting the first two elements, determining the third, and the need to identify who will ultimately make the final selection and under what constraints, this paper focuses on the method of aggregation. We emphasize the fact that Saari’s work on the characteristics of the Borda Count (BC) (Borda, 1781) has refuted concerns raised by Arrow’s Impossibility Theorem. We then extend a method called Exact Probabilities by Simulation with Borda (EPbSwB) introduced in (Hulkower, 2010) to correlated criteria expressed as random variables thereby broadening the applicability of Borda to multiple criteria of all possible types, including those that incorporate probabilistic uncertainty, for evaluating alternatives. This extension is illustrated with a concrete hypothetical example. Hazelrigg (1996) first suggested that Arrow’s Impossibility Theorem (Arrow, 1963) prevented the selection of a rational aggregation method. In particular, he cautioned “any methodology that demands the construction of a group utility function in any aspect of its construction is logically inconsistent and doomed to failure.” (Hazelrigg, 1996, p. 162) Four years later, Saari (2000a, 2000b) identified an inconsistency in Arrow’s properties. This inconsistency suggested a modification of Arrow’s properties which enabled him to prove that the BC was the sole aggregation method that satisfied a consistent set of properties. Nevertheless subsequent work has not taken into account Saari’s result. Despite the extensive work of Saari (2000a, 2000b, 2001, 2008, 2010, 2015, 2016) and Saari and Sieberg (2004), much of the engineering design community has not yet recognized the unique advantages of the BC in overcoming the obstacles identified by Hazelrigg (see, for example, (Li and Yang, 2004; See and Lewis, 2004; Franssen, 2005; Bernroider and Mitlöhner, 2007; Jacobs et al., 2014)). Notable exceptions include (Dym et al., 2002; Griffin, 2010). 1.1 Borda as Dictator Deposer and Minimizer of Inconsistencies Arrow’s Impossibility Theorem asserts that it is impossible to have a voting scheme that satisfies the following five properties: complete transitivity outcomes (transitivity means if A is preferred to B, notated as A ≻ B, and B ≻ C then A ≻ C; complete means that each voter will rank each pair), unrestricted domain (no possible ordering is excluded), Pareto condition (if all voters rank A ≻ B then the social outcome ranks A ≻ B), Independence of Irrelevant Alternatives (IIA) (the ranking for each pair strictly depends on how each voter ranks that pair), and there is no dictator (the social outcome is not determined by a single individual’s ranking of preferences without regard to the rankings of others). In other words, the only voting method to satisfy the first four criteria is a dictatorship. Given n candidates ranked by each voter according to his or her preference with ties permitted, BC assigns n-1 points to the top rated alternative, n-2 to the next and so forth down to 0 points to the lowest ranked alternative. Alternatives that are tied each receive the average of the total points associated with the positions the group occupies. The points each candidate receives are summed to obtain the overall ranking. With one important modification to IIA, Saari (2000a) demonstrated that the only non-dictatorial positional voting scheme that also 3
satisfies transitivity, unrestricted domain and the Pareto condition is the BC. Noting that insistence on IIA is inconsistent with a transitive outcome (Saari, 2000a, Section 8), he offered an alternative called the Intensity form of Independence of Irrelevant Alternatives (IIIA) that considers not only the relative position of two candidates but also some measure of the strength of preference. In other words, to depose Arrow’s dictator one needs a voting scheme that, when determining the (A, B) outcome, differentiates between A ≻ B ≻ C ≻ D and A ≻ C ≻ D ≻ B. The BC does so using the number of candidates separating the two being considered as the measure of intensity. Saari (2000a) also proved that the BC is the only positional voting scheme whose sole source of inconsistencies over subsets of candidates are “Condorcet n-tuples” or cycles of rankings that result in a tie. For example, a Condorcet triple is A ≻ B ≻ C, B ≻ C ≻ A, and C ≻ A ≻ B. Any positional voting scheme aggregating this set of rankings will yield a tie. If one of the candidates is removed or another is added, however, the tie is broken. If a collection of rankings, known as a profile, contains Condorcet n-tuples, it can result in inconsistencies of this type. 1.2 Organization of the Paper The next section reviews some of the extensive literature on multicriteria decision-making focusing on those papers that address aggregation methods and illustrates the different outcomes when the BC is used instead. The third section describes our method that extends EPbSwB to multiple criteria that are expressed as correlated random variables. Examples applying this method are found in the fourth section. The fifth section contains a discussion and the final section, a summary. 2
Review of Selected Literature
In this section we offer a critique of selections from the literature since 1996 that discuss the applicability of social choice theory, specifically Arrow’s Impossibility Theorem and the BC, for the purpose of putting our work in context. In several instances, we redo examples using the BC to show how different outcomes can result. In a paper that predates Saari’s work, Lansdowne and Woodward (1996) apply the BC to determine the top maintenance driver for the Airborne Warning and Control System (AWACS) based on actual data for seven criteria. They cite the ease of use and Borda’s ability to handle data only sufficiently precise to allow an ordinal ranking. Scott and Antonsson (1999) contest the relevance of Arrow’s Impossibility Theorem to engineering design by asserting that the latter is not in the realm of social choice. “Decision with multiple criteria and a single, idealized decision maker is one problem in decision theory; social choice, in which each idealized individual’s sovereignty must be guaranteed, is another” (Scott and Antonsson, 1999, p. 227). First, it is important to note that neither Arrow’s nor Saari’s theorems require any such guarantee of sovereignty; they are strictly mathematical results. Nowhere in either’s work is a sovereignty property required. If the alternative designs are ranked by each criterion rather than the individuals participating in the design study, the problem is unavoidably recast in the realm of social choice since the mathematics is completely agnostic as 4
to who or what is determining the rankings. Moreover, if one is to claim in this case that the mathematics of social choice, in particular, Saari’s demonstration of Borda’s unique applicability, is irrelevant, one must show that the assumptions do not hold. In fact, just the opposite is true. The authors incorrectly contend that “A numerical scale cannot work in the social choice problem. Because interpersonal comparison is not allowed, the scale can only be assigned arbitrarily. While a particular scale may appear to address an inconsistency for a particular problem, it is merely a matter of arithmetic to recast the problem so that it is directly subject to Arrow’s Theorem” (Scott and Antonsson, 1999, pp. 222-223). As noted above, the BC assigns a numerical scale to the rankings which can serve as a strength of preference without giving any single voter a disproportional impact on the outcome. Furthermore, arbitrary reassignment of scale is not relevant to the problem and is artificial. Only the effects of order preserving scale reassignments need be considered in optimization. In the examples in section 4, we will illustrate that ordinal samples can approximate a cardinal scale. Dym et al. (2002) provide a means of performing pairwise comparisons of alternatives and aggregating results in a manner that is equivalent to the BC. The evaluations are made by individuals and collected in a pairwise comparison chart (PCC) which keeps track not only of the number of times each of the pair wins but also the number of times it loses thereby preserving all information. The authors question the relevance of a social choice framework since “[t]rue collaboration takes place when team members must reach a consensus on each comparison” (Dym et al., 2002, p. 242). In particular, they acknowledge that although BC violates IIA, it may not matter in an engineering design environment. Saari and Sieberg (2004) drive home the point that critical information is lost when comparing alternatives in pairs or in larger parts and incorrect decisions can result. The authors also demonstrate the correct way to conduct pairwise comparisons that is identical to using the PCC and hence is equivalent to the BC. In their paper, the voters are criteria and not individuals, implicitly assuming the relevance of a method from social choice theory. Addressing the issue of direct conflict of preferences of decision-makers, See and Lewis (2004) introduce the Group Hypothetical Equivalents and Inequivalents Method (G-HEIM). The BC is considered as a means of aggregating individual preferences but rejected because it violates IIA and does not allow individual strength of preference to be considered. As noted above, insistence on IIA can negate the transitivity of the social outcome, a concern that the authors recognize in rejecting pairwise comparisons but seem unaware of regarding this property. Allowing individual decision-makers to incorporate strength of preference introduces distortions that can result in a violation of “one voter, one vote.” Instead, “G-HEIM aggregates preferences from each individual without requiring precise attribute weights from each group member” (See and Lewis, 2004, p. 4). The attribute weights are a byproduct of the computationally intensive method. G-HEIM is applied to the problem of selecting from among eight compact car designs based on five attributes: engine size, horsepower, mileage, price and acceleration from 0 to 60 miles per hour (See and Lewis, 2004, Table 3). The seven step method incorporating nonlinear strength of preference of individuals yields a ranking of Car #1 ≻ Car #3 ≻ Car #7 ≻ Car #5 ≻ Car #4 ≻ Car #8 ≻ Car #6 ≻ Car #2. We redid the analysis treating the five attributes instead of the individual decision-makers as the voters with biggest engine size, greatest horsepower, highest 5
mileage, lowest price and shortest acceleration time most preferred. The BC yields an overall ranking of Car #1 ≻ Car #3 ≻ Car #7 ≻ Car #8 ≻ Car #4 ≻ Car #5 ≻ Car #6 ≻ Car #2. The positions of Car #5 and Car #8 are reversed, otherwise the outcomes are the same. Li and Yang (2004) address the problem of imprecision in the assessments of attributes by invoking fuzzy numbers, specifically triangular fuzzy numbers. Interestingly, triangular distributions are frequently used in cost risk and schedule risk analysis as reasonable approximations to lognormal distributions (Scherer et al., 2003) which are believed to better represent a range of possible cost and schedule duration. The linear programming technique for multidimensional analysis of preference (LINMAP) method is extended “for solving multiattribute group decision making problems in a fuzzy environment” (Li and Yang, 2004, p. 264). LINMAP relies on pairwise comparisons, which is known to ignore critical information (Saari and Sieberg, 2004; Dym et al., 2002). The extended method produces a group weight vector and a group fuzzy ideal solution. For each decision-maker, the rank ordering of alternatives can be derived using as a metric the square of the weighted Euclidean distance between the group fuzzy ideal solution and the triangular fuzzy number vector representing the decision-maker’s ratings of each attribute for each alternative. The smaller the distance is, the higher the ranking. The rankings of all the decision-makers are then aggregated using a social choice function such as the BC. The LINMAP method is applied to the selection of an extended air-fighter from among four choices, Ai, i = 1 to 4, based on six attributes, four of which are quantified (maximum speed, cruise radius, maximum loading, and price) and two which are not (reliability and maintenance). The former are expressed as normalized triangular variables with each value in the triple the same. The latter are assigned linguistic variables (very good, good, fair, poor, very poor) by each of three decision-makers, which are translated into triangular fuzzy numbers and normalized. Using the distance metrics to obtain rankings of the four aircraft for each of the decision-makers and aggregating with the BC, the overall ranking is A1 ≻ A3 ≻ A4 ≻ A2. We redid the analysis treating the six attributes as the voters, with fastest maximum speed, largest maximum load and lowest price most preferred. We were also able to rank the alternatives for each of the two unquantified attributes, reliability and maintenance, by using the ratings given by each of the three decision-makers and aggregating them using the BC. The resulting overall ranking of the four alternatives based on the six attributes is A3 ≻ A1 ≈ A4 ≻ A2 where ≈ indicates a tie. In this case, the top ranked alternative differs. Franssen (2005) continued to argue that Arrow’s Impossibility Theorem represents a serious problem for multicriteria decision analysis. As in (Dym et al., 2002), the author discusses BC in the context of its violation of IIA. He contends that “Arrow restricted the information that individuals bring to the collective decision process to ordinal preferences. He judged the information on intensity of their preferences that might be elicited from people as too biased and untrustworthy to be allowed to enter the collective decision-making procedure” (Franssen, 2005, p. 47). Although Bernroider and Mitlöhner (2007) cite (Saari, 2001), they seem to not appreciate the BC’s unique characteristics and reopen consideration of a dozen methods for rank aggregation including many that ignore some of the information available. Instead of a rigorous approach, 6
the authors use a nearest neighbor tree to compare the outcomes. They conclude that their “findings are consistent with other results in identifying Borda and Copeland as a core set of voting rules, with the remaining rules being placed apart in the neighbor tree.” Unfortunately, they “suggest to chose [sic] rules which show large distances in order to provide for a wide range of the possibly different rankings” (Bernroider and Mitlöhner, 2007, p. 173). Katsikopoulos (2009) casts the discussion of how best to approach engineering design as a dichotomy between coherence and correspondence. “A difference between coherence and correspondence is that in coherence the criterion is internal (logical consistency) while in correspondence the criterion is external (success in the real world). While criteria of logic are essentially domain-independent, criteria of correspondence depend on the decision problem” (Katsikopoulos, 2009, p. 149). The author goes on to provide two counterexamples to the claim that achieving coherence always implies achieving correspondence. A coherent approach implies objective measures of goodness such as adherence to a set of rational criteria whereas “correspondence is difficult to assess in design decision-making” (Katsikopoulos, 2009, p. 151). As does Hazelrigg (2010), we argue that while a flawed method may yield a successful design by some measure, there is no guarantee that it always will. “When a mathematician offers a method for general use, the method is proven over its domain” (Hazelrigg, 2010, p. 144). Reliance on ad hoc methods and heuristics to achieve practical useful outcomes is certainly understandable in the absence of rigorous theory. In the world of cost analysis, for example, we regularly rely on cost estimating relationships (CERs) and schedule estimating relationships (SERs) whose forms are not based on any rigorous theory but whose outputs provide results with some claim to objectivity and basis in reality; they summarize past experience. In the case of the BC, we have a rigorously established social choice method that uniquely overcomes the constraints of Arrow’s Impossibility Theorem and is applicable across a broad range of decision-making including that done as part of engineering design. Frey et al. (2009) advocate the Pugh Controlled Convergence (PuCC) method and attack concerns raised in the literature about decision-making in engineering including those in (Saari and Sieberg, 2004). This paper prompted a rebuttal (Hazelrigg, 2010), a rebuttal to the rebuttal (Frey et al., 2010), and an editorial by the editor of Research in Engineering Design (Reich, 2010). Despite the assertion that PuCC does not involve voting, it generally does include summary scores for each alternative based on +, -, or S assigned to each criterion. We agree with Hazelrigg (2010) that this is tantamount to voting and hence is subject to the relevant theory. Given the insistence that these summary scores are not used to select a single best alternative (Frey et al., 2009) or to inform a decision about the retention of a candidate (Frey et al., 2010) one must wonder what useful purpose they serve. We also share Hazelrigg’s concern that PuCC lacks “a rigorous mathematical derivation proven to work in all cases,” a point not addressed in (Frey et al., 2010). We also note that many criteria are quantifiable as either fixed numbers or distributions yet PuCC prefers the subjective judgment of the design team. Only when consensus cannot be achieved are additional data collected to better inform the hold outs. We certainly agree that “[i]f an engineer is faced with a solid body of evidence showing the superiority of one alternative over another,…they [sic] must either conform to the evidence or else their view is irrelevant to rational decision making” (Frey et al., 2009, p. 57). But instead of going through the consensus process at the core of PuCC, we advocate using objective, quantifiable criteria when available and appropriate, and only relying on expert opinion 7
expressed as rank ordered preferences for unquantifiable criteria. As is illustrated in (Hulkower, 2010) and in the next section, the BC is particularly appropriate for combining the rankings based on each type of criterion. Additionally, by including the ranking of a particular criterion or a particular expert more than once, weighting can be achieved without losing the benefits of BC (Saari, 2001). Saari (2010) offers a cautionary tale in the form of a “generalized inconsistency theorem” about the potential pitfalls associated with aggregation techniques in wide use in the engineering design community as well as other areas. The theorem, which is motivated by Arrow’s result, “carries the disturbing message that, even should all participants strictly adhere to compatibility conditions, the divide-and-conquer process of determining a system outcome by seeking ‘excellence’ (or at least seeking a reasonable entry for each component) can cause inefficiencies and/or incompatibility outcomes” (Saari, 2010, p. 081006-5). While he does not offer a remedy analogous to the BC which overcomes Arrow’s Impossibility Theorem, he adopts the analogy of the Rubik’s Cube and asserts that “[r]ather than solving the problem as a collection of parts, the approach must coordinate interactions among the different faces” (Saari, 2010, p. 081006-7). He cites the method for aggregating pairwise comparisons in (Saari and Sieberg, 2004) as an example of coordinated interaction. Griffin highlighted the benefits of the BC as an engineering decision tool noting its advantages over pairwise comparisons (Griffin, 2010). Ignoring the BC and therefore still troubled by the potential impact of Arrow’s Impossibility Theorem, Jacobs et al. (2014) recast the latter “into an Arrowian type of impossibility theorem for performance aggregation in engineering design.” (Jacobs et al., 2014, p. 6) Among the five conditions adapted is independence of irrelevant concepts, the troublesome property that can prevent the social outcome from being transitive. In a simple example comparing three concepts against four criteria, the disparity between the outcome using a weighted sum method with a five-point performance scale and a ranking scale aggregated with the BC, though not acknowledged as such, is attributed to violation of IIA. While it is true that BC violates IIA, it ensures transitivity of the social outcome, and with the substitution of IIIA, avoids the difficulties of Arrow. The letter goes on to highlight other issues in which Arrow can get in the way of resolving including design uncertainty. The method described in the next section offers an example of a mathematically sound technique for handling not only criteria that are quantified by fixed numbers or those that are unquantifiable and are ranked simply by preference but also those that are represented by distributions and correlated with each other. In doing so, it extends the applicability of the BC to the engineering design process in which the decision making is based on criteria of any type. 3
Exact Probabilities for Multiple Correlated Criteria Expressed as Random Variables by Simulation with Borda
Book (2010) introduced a method called Exact Probabilities by Simulation (EPbS) to address the problem of comparing cost distributions. He describes it as follows: “...we draw 10,000 (or any number of) independent random (Monte Carlo or Latin Hypercube) samples from each of these distributions to simulate the costs of each of the candidates. Each random number drawn 8
represents a possible value of the respective candidate’s cost, determined in accordance with that candidate’s cost probability distribution, mean, and standard deviation” (Book, 2010, p. 55). At each iteration, only the lowest cost alternative is recorded. The alternative that is the lowest cost the most times is declared the probable lowest cost candidate. Hulkower (2010) noted that while this method does compare all of the cost distributions at each step, it fails to use all of the information generated. In particular, what is ignored is the ranking of the remaining alternatives. EPbSwB was introduced not only to identify the lowest cost alternative but also to capture the position of all alternatives at each iteration then to aggregate the rankings using the BC to yield an overall ranking of candidates from probable lowest to probable highest cost. Especially during concept exploration and early design phases, many attributes of systems, not just cost, are more reasonably characterized as random variables. These include schedule duration, physical attributes such as weight, and performance characteristics such as probability of kill. EPbSwB extends naturally to any number of criteria expressed as random variables. Since there is more than one criterion, however, correlation between them must be considered and accommodated. For example, it is clear that cost and schedule duration are correlated. Similarly, cost, especially when generated by a CER, is correlated with the cost driver(s) used as independent variables in the formula, many of which are also random variables. The same applies to schedules based on SERs. Prior to implementing this method, the interdependencies must be quantified. This is not a trivial task. If it reasonable to expect linear dependence, correlation coefficients can be used. Guidance for determining these can be found in (Covert, 2006). Once the correlation coefficients are determined, one proceeds as Book describes with Monte Carlo or Latin Hypercube sampling of each of the distributions representing a single criterion. At each iteration, rank order the alternatives by each criterion from the most to the least desirable. For example, an alternative with the shortest schedule duration would be ranked the highest. The one with the highest probability of kill would be ranked in first place. As in EPbSwB, use the BC to aggregate the rankings at each iteration and determine a single ranking of alternatives for each criterion. Use Borda again on these rankings and any others based on additional criteria to arrive at an overall ranking of the alternatives. Figure 1 illustrates this process. In cases where several variables are related by a nonlinear relationship, the problem is more complex. For example, compressing a software development schedule can result in increased cost of development. For nonlinear relationships a single parameter (such as a linear correlation coefficient) cannot describe the interrelationship among variables. A nonlinear relationship requires at least two degrees of modeling freedom to capture the behavior. The example of compressed software development schedules can be shown to require at least three parameters to describe it. In order to model nonlinear dependencies and criteria one must resort to copula methods (Nelson, 2006; Trivedi and Zimmer, 2005). For our purposes both linear correlations and nonlinear relations can be generalized to be handled by copulas. Linear correlation models have corresponding single parameter copulas. The appropriate values for single parameter copulas are set the same way; regardless of whether the copula is Gaussian, Clayton, Gumbel, etc. the parameter is set to mimic the linear correlation behavior. Nonlinear relations require more general copulas (Neatrour, 2010) and can give rise to more interestingly shaped clouds of 9
points representing possible outcomes. Such copulas are commonly called empirical since they can be specified via look up tables.
Figure 1. Process flow for method of incorporating criteria characterized as random variables in decision making As in EPbS and EPbSwB, an adequate number of random samples, at least 10,000 to ensure no more than 1% error, can be drawn from the copula to simulate the values of the variables. At each iteration, a composite value can be determined, for example, cost per percent kill, which can be used to rank order the alternatives. Proceeding as in EPbSwB, BC is used to aggregate 10
the rankings to arrive at an overall outcome for those variables. This composite ranking can be readily combine with those of other criteria to arrive at an overall ranking of alternatives reflecting the contributions of all. 4
Examples
To illustrate the approach recommended in this paper, two examples built on the one first presented in (Hulkower, 2010) each appending a single additional criterion are presented in this section. In the first case, the burn rate, i.e., monthly expenditures in millions of fiscal year 2010 dollars (FY10M$) per month, of the software development is added. This may be a reasonable choice for a criterion when there are strict budget or cash flow constraints. In the second case, the total cost of software development is considered as a criterion. 4.1 Burn Rate as the Additional Criterion Table 1 presents the characteristics of four hypothetical geostationary mobile communications satellites being considered in a trade study that includes all of the criteria in the example in (Hulkower, 2010) plus the burn rate. Note that the distribution of cost without software development is assumed to be lognormal. The burn rate is represented as a beta distribution defined by its mean and standard deviation that are byproducts of the sampling described below and are only shown to two significant digits, since there is some jitter when the simulations are redone.
Criterion Mean Cost/Standard Deviation (FY10M$) Without Software Development Mean Burn Rate/Standard Deviation for Software Development (FY10M$/month) Number of Simultaneous Voice Circuits Design Life (years) Number of similar systems built by contractor
A 100/20
Alternative B C 90/30 80/40
D 70/50
0.35/0.10
0.34/0.11
0.35/0.11
0.35/0.10
10000 12 3
9000 15 1
11000 10 2
12000 12 0
Table 1. Characteristics of four hypothetical geostationary mobile communications satellites including burn rate for software development Table 2 contains the rankings of the alternatives by criterion.
11
Criterion Probable Cost Without Software Development (from (Hulkower, 2010) Probable Burn Rate for Software Development Number of Simultaneous Voice Circuits Design Life Number of Similar Systems Built by Contractor
Ranking B≻D≻C≻A C≻B≻A≻D D≻C≻A≻B B≻A D≻C A≻C≻B≻D
Table 2. Rankings of four hypothetical geostationary mobile communications satellites for each criterion including burn rate for software development The ranking of cost without software development was derived using EPbSwB (Hulkower, 2010). The ranking of cost/schedule for the software development was derived using the Constructive Cost Model II (COCOMO II) (Boehm et al., 2000) as follows. The COCOMO II post architecture model provides a cost schedule relationship depending on the degree of schedule compression (the SCHED variable) required and many other variables. For our purposes, we only use the additional variable describing experience of the organization in developing similar software (the ‘Precedence’ variable in COCOMO II). The output is the mode (most likely) of the cost distribution as a function of schedule. Distribution of cost values around the mode is described by a multiplicative coefficient of variation which in the post architecture model is given by 0.8 for the low value (L) and 1.25 for the high value (H). We simulate results for four organizations regarded as candidates for the same body of work consisting of 100,000 lines of source code. The four candidate organizations for development are labeled A, B, C and D thus aligning them with the four alternatives in (Hulkower, 2010). Organization A (the A team) is highly experienced and the Precedence variable is set to XHI for extra high. Organization B has average experience and its Precedence is set to NOM, representing nominal. Organization C is inexperienced and its Precedence variable is set to LO for low. Organization D is the least experience so its Precedence variable is set to VLO for very low. Except for A whose cost of software development is $22,000 per staff month, the cost of maintaining effort of a software developer per staff month is assumed to be $20,000. For each team the schedule variable (SCHED) is set to VLO for a highly compressed schedule, LO for a somewhat compressed schedule, NOM for a nominal full time schedule, HI for a relaxed schedule, and VHI for a very relaxed schedule. COCOMO II calculates degrees of compression for the compressed schedule cases and additional costs rooted in diseconomies of scale from having teams that are sub optimally sized to perform the work at hand; shortening the schedule requires too many developers for the task which implies higher work costs. However COCOMO II does not calculate new schedule points for relaxed schedules. The model implicitly assumes that for schedules longer than nominal that the costs are flat. So since COCOMO II does not calculate the revised schedule we are free to assign it arbitrarily and we choose the SCHED = HI to be twice the nominal and SCHED = VHI to be three times the nominal schedule. Figure 2 illustrates the relationship between the mode of the cost distribution and schedule for each of the four Precedence cases. The increase in costs at compressed schedules is 12
monotonic. It is regarded as due to the need to add additional personnel to produce the required volume of code and diseconomies of scale due to having additional burdens on group communication and coordination in a software development project. The half bathtub shape can be smoothed by noting that the first three points in order of schedule can be used to form an interpolating curve when the schedule is compressed. Three points are sufficient to determine a curve with appropriate degrees of freedom. A quantic, cubic, and constant terms polynomial of the form was used. 16 14
Mode Cost M$
12 10 Case A
8
Case C
6
Case B
4
Case D
2 0 0
10
20
30
40
50
Schedule Months
Figure 2. Relationship between the cost distribution mode and the schedule The cost probability distribution is conditioned on the schedule random variable using one of the above relationships for the mode. Although the relationship of the equation for the mode and the coefficient of variation suggests using a log normal distribution, there is a reason why that choice is inappropriate: it overestimates the probability of costs approaching both zero and infinity. For this reason we choose a scaled beta distribution (a four parameter beta distribution). The beta distribution on x with support on x in (0, 1) has a probability density function given by ; ,
,
. The mode of this distribution is given by
1 /
.
The scaled beta distribution has support on y in (L, H). The scaling is given by with the probability density function mode for this distribution is given by
1
; , , , 1
/
2 .
,
. The
We can use the fact that for our COCOMO II data we have the mode as 5L/4 and H = 25L/16 to find a constraint on the shape parameters using the expression for the mode. Substituting in the expression for the mode and solving, we find .
13
We want the cost distribution conditioned on schedule to have positive skew and to be unimodal. This requires that 1, 1, . So for our illustrative example the choices of 4, 4.75 are both desirable from the point of view of properly positioning the mode in the cost interval for a given schedule, and consistent but not unique. Thus we produce a sample of the joint schedule – cost distribution by using a random variable function to generate a value for the cumulative distribution function in schedule which is then inverted using a four parameter scaled beta function. We choose the schedule distribution to have beta function shape parameters such that the sampling is uniform. Another random variable is then used as the basis for the cumulative probability in cost which is then inverted using the four parameter scaled distribution function with shape parameters (4, 4.75) for (α, β). This is done 10,000 times. At each iteration, the burn rate measured in FY10M$ per month is calculated for each alternative and a ranking from lowest to highest is produced. These rankings are aggregated using the BC to arrive at the overall ranking shown in Table 2. In addition, the mean and standard deviation for each alternative’s distribution of burn rate shown in Table 1 is computed. By sampling the burn rate 10,000 times, the cardinal scale which measures the burn rate in FY10M$ per month is well approximated by the corresponding ordinal rankings. For the remaining criteria, a higher ranking is given for a larger number of simultaneous voice circuits, a longer design life, and a larger number of similar systems built by the contractor. The rankings in Table 2 are converted to Borda Scores as described in section 1.1 and then summed to give the consensus ranking: B ≻ C ≻ A D. Table 3 summarizes the results.
Criterion Probable Cost Without Software Development Probable Burn Rate for Software Development Number of Simultaneous Voice Circuits Design Life Number of Similar Systems Built by Contractor Totals
A 0 1 1 1.5 3 6.5
Borda Scores B C 3 1 2 3 0 2 3 0 1 2 9 8
D 2 0 3 1.5 0 6.5
Table 3. Borda Scores by criterion of four hypothetical geostationary mobile communication satellites including burn rate for software development 4.2 Cost of Software Development as the Additional Criterion Table 4 presents the characteristics of four hypothetical geostationary mobile communications satellites being considered in a trade study that includes all of the criteria in the example in (Hulkower, 2010) plus the cost of software development which is represented as a beta distribution defined by its mean and standard deviation. As in section 4.1, these statistics are byproducts of the sampling and are only shown to two significant digits since there is some jitter when the simulations are redone. Table 5 gives the rankings and Table 6 the 14
corresponding Borda Scores. The consensus ranking is B ≻ A ≻ D ≻ C. Note that this differs from the consensus ranking when burn rate is used as a criterion instead of cost of software development thus illustrating the importance of carefully selecting the most pertinent criteria.
Criterion Mean Cost/Standard Deviation (FY10M$) Without Software Development Mean Cost/Standard Deviation for Software Development (FY10M$) Number of Simultaneous Voice Circuits Design Life (years) Number of similar systems built by contractor
Alternative B C 90/30 80/40
A 100/20
D 70/50
9.41/0.90
9.72/1.14
10.29/1.20
10.69/1.03
10000 12 3
9000 15 1
11000 10 2
12000 12 0
Table 4. Characteristics of four hypothetical geostationary mobile communications satellites including cost of software development Criterion Probable Cost Without Software Development (from (Hulkower, 2010)) Probable Cost of Software Development Number of Simultaneous Voice Circuits Design Life Number of Similar Systems Built by Contractor
Ranking B≻D≻C≻A A≻B≻C≻D D≻C≻A≻B B≻A D≻C A≻C≻B≻D
Table 5. Rankings of four hypothetical geostationary mobile communications satellites for each criterion including cost of software development
Criterion Probable Cost Without Software Development Probable Cost of Software Development Number of Simultaneous Voice Circuits Design Life Number of Similar Systems Built by Contractor Totals
A 0 3 1 1.5 3 8.5
Borda Scores B C 3 1 2 1 0 2 3 0 1 2 9 6
D 2 0 3 1.5 0 6.5
Table 6. Borda Scores by criterion of four hypothetical geostationary mobile communication satellites including cost of software development
15
5
Discussion
The result of the extension of EPbSwB to multiple criteria is an ordinal ranking of alternatives to identify at a top level those that best satisfy the particular set of criteria. It can be followed up with a more refined optimization before final selection. An analogy is the two-step process of identifying optimal two-impulse trajectories described in (Hulkower, et al., 1984) and applied in (Lau and Hulkower, 1987) to produce a measure of accessibility of near-Earth asteroids. To locate the minimum total change in velocity required to go from one body to another, a coarse search is performed over the entire range of possible launch and arrival orbital positions. When the region containing the global minimum is identified, a more sophisticated optimizer that can include additional impulses and gravitational assistance is recommended. Similarly and especially when the consensus ranking of alternatives has more than one ranked high, more detailed analysis should be performed. As illustrated in Section 4 and not surprising at all, the ranking of alternatives is extremely sensitive to the set of criteria suggesting that sensitivity analyses be performed before making a final choice. Though we have focused on engineering design applications because of the extensive literature spawned by Hazelrigg’s 1996 paper, we emphasize that the applicability of BC and the extension of EPbSwB to multiple correlated criteria should be viewed more broadly. Because this method is not restricted to any specific field, it should be considered whenever rankings of alternatives based on aggregated preferences are desired. 6
Summary
A goal of this paper is to emphasize that Arrow’s Impossibility Theorem should no longer be used as an excuse to avoid considering the applicability of a method of aggregating preferences. Saari has demonstrated that this theorem relies on two potentially conflicting properties; insistence on IIA can yield non transitive, or irrational, outcomes. He goes on to establish that by substituting IIIA for IIA, thereby incorporating strength of preference measured by the number of candidates between the two being compared, transitivity of outcome is preserved. Most significantly, he proved that only the BC satisfies the four properties of Arrow’s Impossibility Theorem with IIIA replacing IIA. In other words, BC vanquishes the dictator and opens a clear path to applying this preferential voting scheme to a wide range of problems including selection from among design alternatives. A critical survey of the engineering design literature is included to address lingering concerns over Arrow’s theorem and to assess alternative aggregation methods. We also illustrate differences between the outcomes of some of these methods and the BC, recommending the latter because of its use of complete information and its unique conformity to four rational properties. Building on Saari’s result, this paper presents a logical, novel, and nontrivial extension of the method of EPbSwB to multiple correlated criteria represented by random variables. This extension requires selecting the most appropriate method for determining the correlation and for selecting the basis for ranking. The examples illustrate one way of determining the correlation 16
and consider two different criteria that result in different outcomes. When implemented as part of a trade study, the method produces an outcome that readily combines with rankings by other criteria to determine via the BC the overall ranking of alternatives that best reflects the preferences conveyed by each criterion.
17
References Arrow KJ. 1963. Social choice and individual values, 2nd edn. Yale University Press, New Haven. Bernroider EWN, Mitlöhner J. 2007. Using social choice rule sets in multiple attribute decision making for information system selection. Annals of Information Systems 2: 165-175. Boehm BW, Abts C, Brown AW, Chulani S, Clark BK, Horowitz E, Madachy R, Reifer D, Steece B. 2000. Software cost estimation with COCOMO II. Prentice Hall PTR, Upper Saddle River, New Jersey. Book SA. 2010. Cost risk as a discriminator in trade studies. Journal of Cost Analysis and Parametrics 3: 45-59. Borda JC. 1781. Mémoire sur les élections au scrutiny, Histoire de l’Académie Royale des Sciences, Paris. In Classics of Social Choice, McLean I, Urken AB, (eds). 1995. The University of Michigan Press, Ann Arbor, 83-89. Covert RP. 2006. Correlation in Cost Risk Analysis. Paper presented at the Annual Society of Cost Estimating and Analysis Conference, Tysons Corner, VA, June 13-16. Dym CL, Wood WH, Scott MJ. 2002. Rank ordering engineering designs: pairwise comparison charts and Borda counts. Research in Engineering Design 13: 236-242. Franssen M. 2005. Arrow’s theorem, multi-criteria decision problems and multi-attribute preferences in engineering design. Research in Engineering Design 16: 42-56. Frey DD, Herder PM, Wijnia Y, Subrahmanian E, Katsikopoulos K, Clausing DP. 2009. The Pugh Controlled Convergence method: model-based evaluation and implications for design theory. Research in Engineering Design 20: 41-58. Frey DD, Herder PM, Wijnia Y, Subrahmanian E, Katsikopoulos K, de Neufville R, Oye K, Clausing DP. 2010 Research in engineering design: the role of mathematical theory and empirical evidence. Research in Engineering Design 21: 145-151. Griffin MD. 2010. How do we fix system engineering? IAC-10.D1.5.4. Paper presented at 61st International Astronautical Congress, Prague, Czech Republic, 27 September – 1 October. Hazelrigg GA. 1996. The implication of Arrow’s impossibility theorem on approaches to optimal engineering design. Journal of Mechanical Design 118: 161-164. Hazelrigg GA. 2010 The Pugh controlled convergence method: model-based evaluation and implications for design theory. Research in Engineering Design 21:143-144.
18
Hulkower ND. 2010. The probable lowest-cost alternative according to Borda. Journal of Cost Analysis and Parametrics 3: 29-36. Hulkower, ND, Lau, CO, Bender, DF. 1984. Optimal Two-Impulse Transfers for Preliminary Interplanetary Trajectory Design. Journal of Guidance, Control and Dynamics 7: 458-461. Jacobs JF, van de Poel I, Osseweijer P. 2014. Clarifying the debate on selection methods for engineering: Arrow’s impossibility theorem, design performances, and information basis. Research in Engineering Design 25: 3-10. Katsikopoulos KV. 2009. Coherence and correspondence in engineering design: informing the conversation and connecting with judgment and decision-making research. Judgment and Decision Making 4: 147-153. Lansdowne ZF, Woodward BS. 1996. Applying the Borda ranking method. Air Force Journal of Logistics 20: 27-29. Lau CO, Hulkower ND. 1987. Accessibility of Near-Earth Asteroids. Journal of Guidance, Control and Dynamics 10: 225-232. Li D-F, Yang J-B. 2004. Fuzzy linear programming technique for multiattribute group decision making in fuzzy environments. Information Sciences 158: 263-275. Neatrour J. 2010. Methods for parametric joint cost and schedule risk analysis. Paper presented to the St. Louis Chapter of the Society of Cost Estimating and Analysis, February 22. Nelson RB. 2006. An introduction to copulas, 2nd edn. Springer, New York. Reich Y. 2010. My method is better! Research in Engineering Design 21: 137-142. Saari DG. 2000a. Mathematical structures of voting paradoxes: I. Pairwise vote. Economic Theory 15: 1-53. Saari DG. 2000b. Mathematical structures of voting paradoxes: II. Positional voting. Economic Theory 15: 55-101. Saari DG. 2001. Decisions and Elections: Explaining the Unexplained. Cambridge University Press, Cambridge, UK. Saari DG. 2008. Disposing Dictators, Demystifying Voting Paradoxes: Social Choice Analysis. Cambridge University Press: New York. Saari DG. 2010. Aggregation and Multilevel Design for Systems: Finding Guidelines. Journal of Mechanical Design 132: 081006-1 – 081006-9.
19
Saari DG. 2015. Social Science Puzzles: A Systems Analysis Challenge. Evolutionary and Institutional Economics Review 12: 123-139. Saari, DG. 2016. From Arrow's Theorem to ‘Dark Matter.’ British Journal of Political Science 46: 1-9. Saari DG, Sieberg KK. 2004. Are partwise comparisons reliable? Research in Engineering Design 15: 62–71. Scott MJ, Antonsson EK. 1999. Arrow’s Theorem and Engineering Design Decision Making. Research in Engineering Design 11: 218-228. See T-K, Lewis K. 2004. A Formal Approach to Handling Conflicts in Multiattribute Group Decision Making. Paper presented at the ASME Design Engineering Technical Conferences Design Automation Conference, Salt Lake City, September 28 – October 2. DETC2004-57342 Scherer WT, Pomroy TA, Fuller DN. 2003. The triangular density to approximate the normal density: decision rules-of-thumb. Reliability Engineering & System Safety 82: 331-341. Trivedi PK, Zimmer DM. 2005. Copula Modeling: An Introduction for Practitioners. Foundations and Trends® in Econometrics 1: 1-111.
20