Computers and Operations Research 98 (2018) 198–210
Contents lists available at ScienceDirect
Computers and Operations Research journal homepage: www.elsevier.com/locate/cor
Efficient algorithms using subiterative convergence for Kemeny ranking problem Prakash S Badal a,∗, Ashish Das b a b
Department of Civil Engineering, Indian Institute of Technology Bombay, Mumbai, India Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India
a r t i c l e
i n f o
Article history: Received 3 August 2017 Revised 29 May 2018 Accepted 5 June 2018 Available online 6 June 2018 Keywords: Consensus ranking Heuristics Kemeny distance Kemeny–Snell distance Median ranking Rank aggregation problem
a b s t r a c t Rank aggregation problem is useful to practitioners in political science, computer science, social science, medical science, and allied fields. The objective is to identify a consensus ranking of n objects that best fits independent rankings given by k different judges. Under the Kemeny framework, a distance metric called Kemeny distance is minimized to obtain consensus ranking. For large n, with present computing powers, it is not feasible to identify a consensus ranking under the Kemeny framework. To address the problem, researchers have proposed several algorithms. These algorithms are able to handle datasets with n up to 200 in a reasonable amount of time. However, run-time increases very quickly as n increases. In the present paper, we propose two basic algorithms— Subiterative Convergence and Greedy Algorithm. Using these basic algorithms, two advanced algorithms— FUR and SIgFUR have been developed. We show that our results are generally superior to existing algorithms in terms of both performance (Kemeny distance) and run-time. Even for large number of objects, the proposed algorithms run in few minutes. © 2018 Elsevier Ltd. All rights reserved.
1. Introduction In several fields such as political science, computer science, social science, medical science, and policy making, we often encounter situations where performances (or scores) of various objects are quantitatively available over a number of attributes (or from a number of judges). Based on the performances over different attributes, a comprehensive preference order of the objects is required. The problem is important from a practical standpoint. For example, consider a group of ten investors who have hosted a competition for startup idea among students of an institute. Proposals for independent ideas are invited from different teams of students. Subsequently, each investor ranks every student-team based on the decreasing order of their preference. Based on the rankings from the ten investors, the goal is to come up with a single combined ranking for student-teams. The highest ranked student-team in the combined ranking gets the highest proportion of investor funds. In our framework, we have a set of n items (or objects) to be ranked by k different judges. The rankings of the n items by the k judges can also be interpreted as n objects being ranked on k different attributes. The preferences of different objects can ∗
Corresponding author. E-mail addresses:
[email protected] (P. S Badal),
[email protected] (A. Das). https://doi.org/10.1016/j.cor.2018.06.007 0305-0548/© 2018 Elsevier Ltd. All rights reserved.
be depicted either through orderings or rankings. When objects are stacked in the order of preference, i.e., the best object at the top and the worst at the bottom (or vice versa), the resultant list of objects is called orderings. On the other hand, when ranks are associated with objects stacked in a fixed order, the resulting list of ranks is called rankings. Consider four objects that need to be ranked, and suppose that these objects are P, Q, R, and S, listed in that order. Then, an ordering < S P R Q > corresponds to the ranking (2 4 3 1). Here, the number 2 implies that object P has rank 2. Similarly, Q, R, and S have ranks 4, 3, and 1, respectively. When judges assign distinct ranks to each of the n object, the resulting ranking is called full ranking. A judge can give the same rank to multiple objects, and in that case, the ranking is referred to as a tied ranking. When one or more judges leave the ranks of some objects unassigned, then the resulting ranking is called incomplete ranking. Under the above setup, the objective of the problem is to find a combined ranking that best fits the k different rankings corresponding to the k attributes. These k rankings are termed as input rankings. The combined best-fit ranking of the n objects is termed as consensus ranking. The problem of combining the k input rankings is called the rank aggregation problem. The rank aggregation problem has been studied in various fields with different names such as median rank problem, social choice problem, Kemeny problem, etc. (Arrow, 1951; Dwork et al., 2001; Kemeny, 1959; Kemeny and Snell, 1962). Determination of consen-
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
sus ranking is an essential component of various models involving rank data, for instance, distance-based models (Mallows, 1957) and rank clustering (Heiser and D’Ambrosio, 2013). Numerous approaches for the rank aggregation problem have been proposed in the literature. One of the earliest attempts to formally conceive the idea of aggregating ranks traces to Nicholas of Cusa in 1435 (Sigmund, 1963). For the election of the Holy Roman emperor, Cusanus suggested that all voters assign points to each candidate and the candidate having the highest sum of the points be elected. On similar lines, de Borda (1781) offered a rather scientific analysis for deciding the election winners. Around the same period, de Condorcet (1785) proposed a method for pairwise comparison of alternatives and noted the intransitivity of the method, also known as the Condorcet’s paradox. Kendall (1938) proposed a method to aggregate ranks based on the rank aggregation coefficient. On similar lines, Kemeny (1959) defined a distance measure to find the best fit of the rankings. Further, Kemeny and Snell (1962) proposed a set of three axioms to define a general distance metric. Rank aggregation under Kemeny framework, which is also the focus of the present paper, satisfies these axioms. Finding a consensus ranking under Kemeny framework, just like the traveling salesman problem, is an NP-hard problem (Bartholdi et al., 1989). With the work of Cohen and Mallows (1980) and Thompson (1993), statistical analysis of ranking was revived in the eighties and nineties (Heiser, 2004). Thompson (1993) and Heiser (2004), under a geometric configuration, established that the vertices of the permutation polytope form the sample space of rankings. Other distance-based models under axiomatic frameworks have also been developed (Cook, 2006; Cook et al., 2007; 1986; Cook and Saipe, 1976). However, due to differences in distance measure and axioms, similar to D’Ambrosio et al. (2017), the present paper is not comparable to those models. Emond and Mason (2002) proposed a Branch-and-Bound algorithm to find a consensus ranking. Subsequently, Branch-andBound was employed by D’Ambrosio and Heiser (2009) and D’Ambrosio and Heiser (2016) for recursive partitioning algorithms for preference ranking. The drawback of the algorithm is that its run-time becomes unmanageable when the number of objects is more than 25 (D’Ambrosio et al., 2015). As has been noted in the literature, it is evident that the complexity of the consensus ranking problem is primarily governed by the number of objects n and not by k. Ali and Meila˘ (2012) surveyed various approaches and performed empirical comparisons to determine algorithms that work well in practice. They concluded that though no algorithm reached both the smallest distance and lowest runtime simultaneously, the local search and Branch-and-Bound approach seemed like better choices. To overcome drawbacks in the Branch-and-Bound algorithm, D’Ambrosio et al. (2015) and Amodio et al. (2016) developed an algorithm, called QUICK, which, though similar to the Branch-andBound algorithm, produces a remarkable saving in the computational time. QUICK achieves a local minimum solution for the consensus ranking. However, the run-time of QUICK increases drastically for n ≥ 200 (taking days). D’Ambrosio et al. (2015) also proposed another algorithm, called FAST, which takes much longer than QUICK but tries to achieve an improved minimum in terms of Kemeny distance. Recently, D’Ambrosio et al. (2017) proposed a differential evolution algorithm, called DECoR, which is able to arrive at solutions in reasonable computing time even for large n (up to 200). They compared their results with other heuristic algorithms and showed that both QUICK and DECoR produce reasonably good solutions, beating all other solutions available in the literature. Inherently, the run-time for algorithms proposed in the literature increases rapidly when the number of objects reaches a certain limit.
199
This increment in the run-time is of non-deterministic polynomial order. In the present paper, we propose algorithms for finding consensus ranking under Kemeny’s axiomatic framework. Two basic algorithms have been proposed. First, called Subiterative Convergence, combines the results of rank aggregation of smaller subsets of objects. It does not directly search the full space of rankings. Second, called Greedy Algorithm, repositions one object at a time to a new position that results in a better ranking. Thus, both the basic algorithms reduce the search space efficiently. Furthermore, based on the two basic algorithms, we propose two advanced algorithms. Run-time is polynomial-bounded for the proposed algorithms. This overcomes the problem of non-deterministic polynomial order increment in run-time. Various examples show that the proposed algorithms outperform state-of-the-art algorithms in terms of Kemeny distance as well as run-time. In Section 2 we introduce two criteria for filtering optimal Kemeny solutions. In Section 3 two basic algorithms are proposed to find consensus ranking under Kemeny framework. In Section 4 we propose two advanced algorithms. Comparison of the proposed algorithms with existing algorithms, based on various datasets, is presented in Section 5. Finally, Section 6 provides concluding remarks. 2. Consensus ranking under Kemeny’s framework In what follows, without loss of generality, we consider the k attributes as benefit attributes. In other words, the higher the value of the attribute the better the rank of that particular object. If a particular attribute is not a benefit attribute, it is made so by multiplying the values of the attribute by −1. Following Kemeny and Snell (1962), for a given ranking α of n objects, the diagonal elements of its corresponding n × n score matrix (aij ) are 0. The off-diagonal elements aij , i = j, are 1, −1, or 0, depending on whether i is ranked above j, i is ranked below j, or i and j are tied, respectively. Then, the Kemeny-Snell metric d(α , β ) between any two rankings α and β is defined as follows:
d (α , β ) =
n n 1 |ai j − bi j |, 2
(2.1)
i=1 j=1
where (aij ) and (bij ) correspond to score matrices of rankings α and β , respectively. Smaller values of d(α , β ) is associated with lesser mismatches between two rankings α and β . We refer to the Kemeny–Snell metric d, as the Kemeny distance. In the present paper, we restrict ourselves to full rankings. Let Zn be the universe of all full rankings with n objects. Then, the cardinality of Zn is |Zn | = n!. Let k input rankings be denoted by α u , u = 1, . . . , k. For ρ ∈ Zn , we define total Kemeny distance of ρ as
totK (ρ ) =
k
d (αu , ρ ).
(2.2)
u=1
Kemeny (1959) introduced the ranking problem where the consensus ranking σ (also called Kemeny-optimal solution) is defined as
σ = arg min totK (ρ ) = arg min ρ ∈Zn
ρ ∈Zn
k
d (αu , ρ ).
(2.3)
u=1
Emond and Mason (2002), based on Kendall’s correlation coefficient, had defined an extended correlation coefficient τ X between any two rankings α and β , which is related to d as follows:
τX ( α , β ) = 1 −
2d ( α , β ) . n (n − 1 )
(2.4)
200
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210 Table 1 Kemeny distances for n = 4, k = 6 with mintotK = 34 . i
ρi
1 2 3 4 5 6 7 8 9 10 11 12
(1 (1 (1 (1 (1 (1 (3 (3 (2 (2 (2 (2
2 2 3 3 4 4 4 4 3 3 4 4
3 4 2 4 2 3 2 1 4 1 3 1
4) 3) 4) 2) 3) 2) 1) 2) 1) 4) 1) 3)
d(α 1 , ρ i )
d(α 2 , ρ i )
d(α 3 , ρ i )
d(α 4 , ρ i )
d(α 5 , ρ i )
d(α 6 , ρ i )
maxKi
varKi
4 6 6 8 8 10 10 8 10 4 12 6
6 4 8 6 10 8 8 10 4 10 6 12
6 4 8 2 6 4 4 6 0 10 2 8
4 6 2 8 4 6 6 4 10 0 8 2
8 6 6 4 4 2 2 4 2 8 0 6
6 8 4 6 2 4 4 2 8 2 6 0
8 8 8 8 10 10 10 10 10 10 12 12
2.27 2.27 5.47 5.47 8.67 8.67 8.67 8.67 18.27 18.27 18.27 18.27
Therefore, following (2.2), for ρ ∈ Zn , the average extended correlation coefficient can be expressed as
average
τX =
k 2 totK (ρ ) 1 τX (αu , ρ ) = 1 − . k n ( n − 1 )k
(2.5)
u=1
Emond and Mason (2002) showed that minimizing total Kemeny distance is equivalent to maximizing average τ X . It may be noted from (2.5) that total Kemeny distance has a one-to-one relation with average τ X . In the literature, average τ X has been used predominantly. However, in order to avoid approximating average τ X to few decimal places, in the present paper, we use total Kemeny distance as a measure of the relative performance of a ranking ρ ∈ Zn . 2.1. Criteria for filtering Kemeny solutions Kemeny-optimal solutions need not be unique. We now address the situation where two or more rankings have the same total Kemeny distance. Our intention is to differentiate such rankings and choose fewer of them (preferably, one) as better rankings. We offer two criteria to demarcate better ranking(s) among the rankings with the same value of total Kemeny distance. In the literature, it has been suggested that two or more rankings that achieve the same value of mintotK = min totK(ρ ) are ρ ∈Zn
equivalent (Kemeny and Snell, 1962). Such equivalent Kemenyoptimal rankings tend to have minimum disagreements, where a disagreement is said to occur when an object P is ranked above object Q under one ranking while vice versa under another ranking. However, for such cases of multiple Kemeny-optimal solutions, we end up with too many rankings to choose from. This may not be desirable in many practical situations, for example, the competition for startup idea discussed in Section 1. It may be argued that a more agreeable consensus ranking is one that has nearly equal Kemeny distances from individual input rankings. Therefore, in this section, we suggest two criteria for selecting a better ranking among the rankings with the same value of mintotK. Example 2.1. Let us assume that there are n = 4 objects, ranked over k = 6 attributes. The rankings corresponding to the six attributes are, α1 = (3 1 2 4 ), α2 = (3 1 4 2 ), α3 = (2 3 4 1 ), α4 = (2 3 1 4 ), α5 = (2 4 3 1 ) and α6 = (2 4 1 3 ). Since n = 4, there are a total of |Z4 | = 4! = 24 possible rankings. As an optimal solution to the Kemeny ranking problem, one gets twelve σ rankings, where σ is as defined in (2.3). The Kemeny-optimal rankings ρ i , i = 1, . . . , 12 have totK (ρi ) = mintotK = 34. These rankings are given in the second column of Table 1. We now introduce two criteria to further filter the Kemenyoptimal rankings. Let m be the number of rankings having total
Kemeny distance of mintotK. Also, let ρ i be rankings with mintotK, i = 1, . . . , m. Criterion 1: We first find the maximum of the Kemeny distances d(α u , ρ i ), u = 1, . . . , k for each ρ i . We denote the maximum Kemeny distance of ρ i , as maxKi . Thus, the number of disagreements between α u and ρ i is less than or equal to maxKi , for all u. Our first criterion is to minimize maxKi over i. Use of this criterion attempts to ensure homogeneity in the Kemeny distances. In other words, rankings having a uniformity in the Kemeny distance across all attributes are given priority. Criterion 2: For each ρ i , we find the variance of d(α u , ρ i ), u = 1, . . . , k, and denote it by varKi . Our second criterion is to minimize varKi over i. Doing so attempts to homogenize Kemeny distances by eliminating the rankings with extremities in their Kemeny distances. Last two columns of Table 1 give the values of maxKi and varKi corresponding to each of the twelve rankings with mintotK. Based on our first and second criteria applied in that order, we select top two rankings from Table 1 that have minimum varKi out of the four cases with minimum maxKi . These criteria suggest that the top two rankings represent the consensus ranking better than the other two rankings having same mintotK and same minimum maxKi . The process of applying the Kemeny method so as to obtain ρ i ’s having minimum varKi over all i for which the total Kemeny distance is mintotK and maxKi is least will be called a modified Kemeny method. Thus, the modified Kemeny method applies the Kemeny method and further filters the solutions using criteria one and criteria two. It is possible that even after applying maxKi and varKi criteria, we end up with more than one consensus rankings. In such cases, we consider each of those rankings to be equivalent. By applying the above two criteria, we have reduced the subspace of rankings that were equivalent in terms of minimum total Kemeny distance. 3. Two basic algorithms for rank aggregation Despite being a closed problem, rank aggregation soon becomes unmanageably computation-intensive. For instance, if there were 10 objects for rank aggregation, the process of comparing the universe as done in Example 2.1, would require of full rankings, 6 10! × 10 2 ≈ 163 × 10 calculations for each attribute. Due to the sheer number of required calculations, the usual practice is to employ algorithms that try to find approximate consensus rankings. We now propose two basic algorithms— Subiterative Convergence and Greedy Algorithm— for the rank aggregation problem. Though the consensus ranking has been restricted to full rankings in the present work, input rankings can contain ties. An input ranking containing ties can be handled by assigning an average rank to each of the tied objects. For example, ordering < P Q ∼ R S > corre-
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
sponds to the ranking (1 2.5 2.5 4). Later in Section 4, we develop two advanced algorithms based on these two basic algorithms. In what follows, we refer to total Kemeny distance, in short, as Kemeny distance.
Algorithm 1: Subiterative Convergence Osc . inputs : Input rankings, given ranking ρ1 , subiteration length
η
1
3.1. Subiterative Convergence
2
Modified Kemeny method, discussed in Section 2.1, compares the relative order of objects from the universe of full rankings to select the most representative ranking. Subiterative Convergence algorithm breaks the problem of comparing all the n objects in a single shot to that of aggregating multiple comparisons of a smaller number of objects using the modified Kemeny method. Doing so allows for an indirect, but not absolute, comparison of a large number of objects with an approximately linear increase in the number of calculations. Subiterative Convergence algorithm begins with a given ranking ρ 1 and considers first few objects η ( < n) for comparison. This internal comparison ignores all the other objects. Since the cardinality of the universe of full rankings for this comparison is small (η!), modified Kemeny method is applied to determine the consensus ranking among these η objects. The top-ranked object from this consensus ranking is moved out. Next, the remaining (η − 1 ) objects along with the next object from ρ 1 are considered for application of modified Kemeny method. Again, the top-ranked object from this comparison is moved out, and the process continues for (n − η + 1 ) times until we have considered the last object in ρ 1 . Top-ranked objects from each comparison are stacked in the same order they were moved out followed by the consensus ranking of the last η objects. Now we have a revised ranking ρ 2 . The same process is carried out over ρ 2 to obtain a new ranking ρ 3 . The process is repeated until convergence is encountered. The converged ranking(s) is termed as output ranking(s) of the Subiterative Convergence algorithm. For a given ranking, let Oiter be an operator that compares all n objects in the above sub-iterative manner and returns an output ranking. A single application of Oiter over a given ranking is called an iteration. Rank aggregation of η objects at a time is called a subiteration. The parameter η ( < n) is predefined and we term it as subiteration length. One iteration requires (n − η + 1 ) number of subiterations. Thus, we have
4
Oiter (ρ p , η ) = ρ p+1 ,
3
(3.1)
where p = 1, 2, 3, . . . denotes the iteration index. Subiterative Convergence algorithm (denoted by operator Osc ) initiates with a given ranking ρ 1 and a subiteration length η. Starting with ρ 1 , rearrangement of the objects is carried out using Oiter in successive iterations until a convergence is obtained. The ranking is said to have converged when the following condition is satisfied:
Oiter (ρq , η ) = ρq+1 = ρq−+1 ,
(3.2)
where q ≥ 1 and ≥ 1 are integers. Here, denotes the length of the convergence cycle. Immediate convergence corresponds to = 1, when Oiter operated over ρ q acts as an identity operator. In this case, ρ q is termed as the converged ranking. Whereas, a cyclic convergence is said to occur when ≥ 2. In this case, rankings ρq−+1 , ρq−+2 , . . . , ρq form a cycle of converged rankings. Mathematically, for Subiterative Convergence,
Osc (ρ1 , η ) = ρq−+r ;
r = 1, 2, . . . , .
(3.3)
Also, Osc can be expressed as a composite operation defined as follows:
Osc ≡ Oiter ◦ Oiter ◦ . . . (q times ) · · · ◦ Oiter (ρ1 , η ).
(3.4)
201
5 6 7 8 9 10 11 12
q = 1; repeat γ = ρq ; for i = 1 → (n − η + 1 ) do // iteration ψ = ModifiedFullKemeny(γ [1 : η]); // subiteration ρq+1 [i] = ψ [1]; remove ψ [1] from γ ; end ρq+1 [n − η + 1 : n] = ψ ; q = q + 1; until ρq ∈ {ρ1 , ρ2 , . . . , ρq−1 }; suppose ρq = ρq− , where ∈ {1, 2, . . . , (q − 1 )}; output: all ranking(s) between ρq−+1 and ρq
A pseudo-code for Osc is provided in Algorithm 1 . It may be noted that though the operations in Subiterative Convergence further filter the Kemeny-optimal solutions using the modified Kemeny method, the modifications remain internal to Osc . This does not change in any way the original problem of finding consensus ranking with minimum Kemeny distance that is addressed here. Next, we discuss the importance and selection of ρ 1 . 3.1.1. Mean seed ranking Subiterative Convergence algorithm described above obtains the converged ranking(s) from a given ranking ρ 1 and the input rankings α u , u = 1, . . . , k. The ranking ρ 1 used for initiating the algorithm is called seed ranking. It has been observed that the quality of converged ranking fundamentally depends on the seed ranking. To overcome the issue, researchers have used multiple repetitions of algorithms with random starting points to pick the best solution (Hartigan and Wong, 1979; Hastie et al., 2009). We propose a good initial seed ranking by preliminary aggregation of input rankings α u . This can be obtained by ranking each object based on the sum of ranks of the corresponding object under all the attributes. For example, the object with the smallest sum of ranks is assigned rank 1, whereas the object with the highest sum of ranks is assigned rank n. The seed ranking, thus defined, is called mean seed ranking. Though mean seed ranking is a good way to start the algorithm Osc , it can also be carried out with any arbitrary seed ranking. 3.1.2. Seed-based iteration We now present an iterative seed generation technique that starts with the application of Osc on a good initial seed ranking and sequentially perturbs seed rankings in order to obtain improved solutions of the rank aggregation problem. One of the easiest ways to generate distinct seed rankings is to find random seed rankings. The advantage of random seed generation is that the calculations are independent of one another and therefore, can be performed in parallel on a multi-core processor machine. However, since the cardinality of all possible rankings is staggeringly high (n!), random seed rankings have, in general, not proven to perform well with the increase in the number of objects n. To overcome this shortcoming, the following methodology is proposed for updating the seed ranking: Step 1. The process begins with η and a certain number of repetitions ω. Step 2. Apply Osc with mean seed ranking γ 1 , resulting in the output ranking ψ 1 . Mathematically, ψ1 = Osc (γ1 , η ).
202
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
Step 3. Initiate a list of input seed rankings and another list of converged output rankings by γ 1 and ψ 1 , respectively. Step 4. Divide n objects of ψ 1 into two parts— first part comprises of first n/2 objects, and remaining objects constitute the second part. Flip these two halves of ψ 1 individually. Stack them together to generate γ 2 . Further, ψ2 = Osc (γ2 , η ). Update and with γ 2 and ψ 2 , respectively. Step 5. For r = 3, 4, 5, . . . , ω, obtain γ r as the mean ranking of ψ 1 and ψr−1 , where mean ranking is defined in Section 3.1.1. Check whether γ r ∈ . a. If not, use this mean ranking as γ r . b. If yes, flip two halves of ψr−1 and stack them to obtain γ r. Obtain ψr = Osc (γr , η ) and update and with γ r and ψ r , respectively. Step 6. Compare Kemeny distances of all rankings in . Select the ranking(s) with the smallest Kemeny distance. The above steps constitute seed-based iteration denoted by Osc (γ1 , η, ω ). For the default value ω = 1, the seed-based iteration reduces to (3.3). 3.2. Greedy Algorithm Greedy Algorithm (denoted by operator Oga ) is a heuristic approach that attempts to improve an available local minimum by repositioning one object at a time in the given ranking ρ 1 to a new location that results in the smallest Kemeny distance. Greedy Algorithm takes locally optimal decisions in the hope of inching towards consensus ranking. Each object from ρ 1 , one at a time, is placed to new positions in their vicinity up to a distance of s on both sides. Here, parameter s is termed as search radius. Note that for the objects towards the two edges of ρ 1 , spilling over to the other side across the ends is permitted. Kemeny distances of all such rankings are calculated. Subsequently, the ranking that has smallest Kemeny distance is reported as the output ranking ρ 2 . Thus, we have:
Oga (ρ1 , s ) = ρ2 .
(3.5)
Note that it is possible that no better ranking exists for any movement of the considered object. In that case, output ranking remains unaltered. Search radius s can take a maximum value of
n/2. When s = n/2, each object is individually attempted for repositioning to all possible positions. A pseudo-code for Oga is provided in Algorithm 2 . Algorithm 2: Greedy Algorithm Oga . 1 2 3 4
5 6 7 8
inputs : Input rankings, given ranking ρ1 , search radius s initialize ρ2 = ρ1 ; for i = 1 → n do for j = −s → s do ψ j = ranking after moving the object ρ1 [i] by j positions in ρ2 ; d j = Kemeny distance of ψ j from input rankings; end update ρ2 by the ranking corresponding to min j (d j ); end output: ρ2
4. Two advanced algorithms for rank aggregation In the previous sections, we introduced two basic algorithms for rank aggregation— Subiterative Convergence and Greedy Algorithm. Both algorithms initiate with a given ranking and attempt
to arrive at an output ranking that is closer to consensus ranking than the given ranking. It is evident from the very nature of both algorithms that apart from the given ranking, selection of the parameters (i.e., η and s) plays an important role in determining the nature of output ranking. As noted in Section 3.1, Subiterative Convergence algorithm offers only an indirect comparison of all n objects. As a consequence, it often gets stuck in a ranking that is a local minimum in terms of the Kemeny distance. Similarly, Greedy Algorithm once performed on a good initial seed ranking does not offer any significant improvements further. Hence, there is a need for extending these algorithms in order to move closer to global consensus ranking. In this section, we develop two advanced algorithms for rank aggregation using Osc and Oga algorithms. The underlying strategies behind both these advanced algorithms are (1) to start from a reasonably good seed ranking and perturb the output to arrive at a better solution of rank aggregation and (2) to interconnect the basic algorithms and allow for adjustments in their parameters. Once Subiterative Convergence algorithm terminates, there is no way to kick-start it without picking another seed ranking. Giving the output of Osc as next input does not help because we end up getting the same ranking in the very next iteration. This prompts to either use another algorithm or perturb the output ranking for next application of Osc . We use the former approach in the first algorithm called FUR, and the latter in the second algorithm called SIgFUR (homonym of ‘cipher’). 4.1. FUR The algorithm FUR starts with the application of Osc over mean seed ranking (Section 3.1.1). In the case of multiple output rankings (cyclic convergence), single output ranking of Osc is selected based on the criteria set out in Section 2.1. This output ranking is then used as input to Oga . The output from which is then again used as input to Osc . The process is repeated until the input and output of Oga are identical. Each application of Osc in FUR is carried out using a set of η values, denoted by N . Further, Osc in FUR has three branches namely— Fixed, Update, and Range— a. Fixed branch uses a single value of η for each application of Osc ; b. Update branch of the algorithm starts Osc with first element of N as η and then its output is given as input to Osc with next element of N as η and so on; c. Range branch of the algorithm involves separate application of Osc using all values of η from N . Subsequently, the best ranking out of these is considered as the output of Osc from this branch. Rankings obtained from individual branches of FUR are stored in a separate list. Once all three branches are exhausted, FUR terminates by reporting the best obtained ranking. The selection of parameters is based on prior information about the nature of algorithm and the limitation over run-time. Larger values of η in N significantly add to the run-time. In addition, a large value of η does not necessarily result in a better solution. It is recommended that the maximum value of η, in N , is less than or equal to 8. Furthermore, since for small values of η, rank aggregation takes relatively less run-time, it is recommended to include smaller values of η in N (even as small as 2). The parameter s of Oga has an upper bound of n/2. In FUR, Oga is always employed after the application of Osc . Therefore, input to Oga is already a good solution of consensus ranking, and a high value of s is usually not expected to improve the optimum ranking vis-à-vis the increase in run-time. In any case, based on the experience of authors, it is recommended to use a value of s ≤ 30.
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
When N is a singleton set, Update and Range branches of FUR do not get activated, and FUR translates to a loop of Osc and Oga . Individually, Subiterative Convergence and Greedy Algorithm form two extreme cases of FUR, i.e., (a) when s = 0, FUR corresponds to performing only once, the three branches of Subiterative Convergence Osc , without Oga , and (b) when N is a null set, FUR corresponds to the Greedy Algorithm Oga in a loop, without Osc . A pseudo-code for FUR is provided in Algorithm 3 . We also provide
203
the parameters of SIgFUR are N1 , ω, s, and N2 . The selection of parameters N1 and N2 is based on prior information about the nature of algorithm and the run-time required for different values of subiteration length. The parameter ω is usually selected based on the run-time limitation. A pseudo-code for SIgFUR is provided in Algorithm 4 . We also provide a flowchart for the implementation
Algorithm 4: SIgFUR. Algorithm 3: FUR. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
inputs : Input rankings, N (set of η), and s KemList = empty, I = length(N ), BranchId = 1, i0 = 1; for BranchId = 1 → 3 do γ1 = mean seed ranking; ψ1 = empty; for i = i0 → I do while ψ1 = γ1 do if BranchId == 1 then // Fixed branch ψ1 = Osc (γ1 , N [i] ); else if BranchId == 2 then // Update branch ψ1 = Osc (γ1 , N [1] ); for j = 2 → i do ψ1 = Osc (ψ1 , N [ j] ); end else if BranchId == 3 then // Range branch ψ1 = best of Osc (γ1 , N [1 : i] ); γ1 = Oga (ψ1 , s ); end update KemList with Kemeny distance of γ1 ; end set i0 = 2; end output: the ranking with min(KemList )
a flowchart for the implementation of FUR in Appendix A. 4.2. SIgFUR Desired properties of an algorithm for rank aggregation are (1) to offer a good solution of consensus ranking quickly (say, within few minutes) and (2) to devise a strategy that tries to find better solutions without time constraint. First property helps in quickly locating the ranks of objects with a coarse precision and second property targets to achieve the global consensus ranking. The algorithm FUR, as will be shown later through examples, performs well on both the desired properties by changing the parameters. For instance, increasing the value of s can improve the output ranking. However, as pointed out earlier, even smaller values of s often capture the ranking that the highest value of s (= n/2) would have achieved. Furthermore, expanding N to consider higher values of η has two limitations. First, the run-time increment is sudden, and second, the algorithm does not have any parameter that can be attuned to search for an improved solution if the user has a liberty of allotting extra time. In view of the above limitations of FUR, we propose another algorithm that essentially uses the same basic algorithms Osc and Oga . The algorithm called SIgFUR starts with the seed-based iteration of Osc with ω repetitions as presented in Section 3.1.2. This is followed by Greedy Algorithm over the output of the seed-based iteration. Thereafter, FUR is carried out over this resultant ranking of the Greedy Algorithm. To summarize, SIgFUR is the application of Seed-based Iteration followed by greedy algorithm and FUR. Seed-based iteration in SIgFUR is carried out for a set of η values, N1 . Similarly, FUR is executed with parameters N2 and s. Thus,
1 2 3 4 5 6 7 8 9 10 11 12
inputs : Input rankings, N1 (set of η for seed-based iteration), N2 (set of η for FUR), ω, and s KemList = empty; I = length(N1 ); i = 1; for i = 1 → I do γ1 = mean seed ranking; ψ1 = Osc (γ1 , N1 [i], ω ); // Seed-based Iteration γ1 = Oga (ψ1 , s ); // greedy algorithm while ψ1 = γ1 do ψ1 = FUR with γ1 , N2 , and s; // FUR end update KemList with Kemeny distance of γ1 ; end output: the ranking with min(KemList )
of SIgFUR in Appendix B.
5. Performance comparison in terms of Kemeny distance and run-time By and large, algorithms proposed in the literature tend to look out for a consensus ranking directly (D’Ambrosio et al., 2015; 2017; Emond and Mason, 2002). Therefore, in their algorithms, the number of calculations increases drastically with increase in the number of objects n. This is not noticeable for smaller values of n, because of the very short run-time. However, the rapid increase in the run-time of the rank aggregation problems can be easily observed as n increases. We compare the performance of different algorithms in the literature against our two proposed algorithms— FUR and SIgFUR. For the comparison, we specifically consider three algorithms— QUICK, FAST, and DECoR (D’Ambrosio et al., 2015; 2017). The algorithm DECoR proposed by D’Ambrosio et al. (2017) is a heuristic algorithm that can deal with n up to 200 in reasonable computing time. For up to 50 objects, the QUICK algorithm has been advised to be preferred. After this threshold of 50, DECoR is much faster than QUICK (D’Ambrosio et al., 2017). In what follows, we present several examples to investigate the performance and run-time of FUR and SIgFUR algorithms. In the literature, average τ X has been used predominantly for comparison of performance. However, as explained in Section 2, in order to avoid approximating average τ X to few decimal places, we use Kemeny distance as a measure of the relative performance of rankings. Kemeny distance has a one-to-one relation with average τ X . In other words, minimizing Kemeny distance is equivalent to maximizing average τ X . Comparison of algorithms are carried out for two cases, i.e., Scenario 1– “limited time”: An output ranking is desired in short time (say, a few minutes); and Scenario 2– “unlimited time”: The best ranking is desired with no constraint on time. All simulations that we present (including that of QUICK and FAST) are performed using MATLAB® on a system with processor i5-3337U @ 1.8 GHz / 8 GB RAM in a single-thread mode.
204
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210 Table 4 Results of rank aggregation for 240 × 4 data set.
Table 2 Results of rank aggregation for 100 × 15 dataset. Algorithm
Parameters
Scenario 1— limited time FUR N = 3–5; s = 20 N1 = 6; ω = 20; SIgFUR s = 20; N2 = 3–5 QUICK — Scenario 2— unlimited time FUR N = 3–6; s = 30 N1 = 2–6; ω = 20; SIgFUR s = 30; N2 = 3–6 FAST –
Kemeny distance
Run-time (s)
61,088
255
61,100
291
61,132
396
61,066
621
61,066
4679
61,126
18,707
Algorithm
Parameters
Scenario 1— limited time FUR N = 4; s = 6 N1 = 4; ω = 1; SIgFUR s = 6; N2 = 4 – DECoR∗ Scenario 2— unlimited time FUR N = 3–5; s = 30 N1 = 5; ω = 20; SIgFUR s = 40; N2 = 3–6 QUICK –
Kemeny distance
Run-time (s)
29,142
60
29,142
65
29582
#
28,926
67 1575
28,922
586
28,930
7361
∗
Table 3 Comparison of algorithms for 50 datasets of size 50 × 15. Algo1 vs. Algo2
Parameters for Algo1
Scenario 1— limited time FUR vs. QUICK N = 3–5; s = 17 N1 = 6; ω = 10; SIgFUR vs. QUICK s = 25; N2 = 3–5 Scenario 2— unlimited time FUR vs. FAST N = 3–8; s = 25 N1 = 2–6; ω = 100; SIgFUR vs. FAST s = 25; N2 = 2–7
Kemeny distance
Run-time
44-3-3
44-0
44-5-1
44-1
37-7-6
37-0
42-3-5
42-0
Example 5.1. Performance of FUR and SIgFUR against QUICK and FAST for a 100 × 15 dataset: To test the robustness of the algorithms, QUICK and FAST, D’Ambrosio et al. (2015) generated a random data of 100 objects with 15 attributes. The dataset can be accessed at http://www. math.iitb.ac.in/∼ashish/hundredrankdata. Results of rank aggregation for this dataset using various algorithms are presented in Table 2. Parameters used for the proposed algorithms are also indicated. Comparisons are made on performance and run-time, where performance is measured in terms of Kemeny distance. Recall that smaller the Kemeny distance, better is the associated ranking. For the “limited time” scenario, QUICK is used for the comparison, whereas FAST is used for the “unlimited time” scenario. From Table 2, it is noted that for scenario 1, FUR and SIgFUR algorithms achieve rankings with smaller Kemeny distances and still take lesser run-time than QUICK and FAST. For scenario 2, the FAST algorithm takes more than 5 h, whereas FUR and SIgFUR fetched a better consensus ranking in about 10 and 80 min, respectively. Example 5.2. Performance of FUR and SIgFUR against QUICK and FAST for 50 sets of 50 × 15 dataset: For an extensive comparison of FUR and SIgFUR algorithms against QUICK and FAST algorithms, we randomly generated 50 datasets of 50 objects and all 15 attributes from the 100 × 15 dataset described in Example 5.1. This was followed by performing rank aggregation algorithms over each of the 50 datasets for the two scenarios. Table 3 presents the summary of this comparison. The full table of Kemeny distances and run-time for all 50 datasets is given in Appendix C. In Table 3, the column “Algo1 vs. Algo2” shows the two algorithms being compared. The 3-tuples under the column “Kemeny distance” indicate (i) the number of datasets for which Algo1 performs better than Algo2, (ii) the number of datasets for which Algo2 performs better than Algo1, and (iii) the number of datasets for which both Algo1 and Algo2 perform equally well, respectively. The tuples given under the column “Run-time” indicate (i) the number of datasets for which Algo1 performs better than Algo2 and also takes shorter run-time, and (ii) the number of datasets for which Algo2 performs better than Algo1 and also takes shorter run-time, respectively.
For DECoR, the run-time is as obtained in D’Ambrosio et al. (2017), which was performed on a more powerful, i7-2630QM @ 2.0 GHz / 8 GB RAM (as against our run-time based on i5-3337U @ 1.8 GHz / 8 GB RAM). # Kemeny distance for DECoR is reverse calculated from the average τX = 0.74215, as reported in D’Ambrosio et al. (2017).
In the “limited time” scenario, both FUR and SIgFUR algorithms achieve better ranking than QUICK in 44 out of 50 datasets and take smaller run-time as well. For the “unlimited time” scenario, FUR and SIgFUR attain better rankings in more than 75% of the datasets and yet take considerably less time. Example 5.3. Performance of FUR and SIgFUR against QUICK and DECoR for large n (240 × 4 dataset): PrefLib repository (Mattei and Walsh, 2013) is a comprehensive resource for preference datasets. In this example, we consider ED-0 0 015-0 01.soc dataset from the repository. This data contains a strict ordering (i.e., no ties) of 240 cities across the globe over four judges (n = 240, k = 4). Algorithm QUICK takes a long time to come up with a solution and is, therefore, pitted in the “unlimited time” scenario. DECoR, on the other hand, qualifies as the competing algorithm for the “limited time” scenario. Results are summarized in Table 4. It is observed from Table 4 that for the “limited time” scenario, FUR and SIgFUR algorithms achieve rankings with much smaller Kemeny distances and yet take less run-time than DECoR. For the “unlimited time” scenario, the QUICK algorithm takes more than 2 h, whereas FUR and SIgFUR fetched a better consensus ranking within merely 30 and 10 min, respectively. The list of top-25 and worst-25 cities for the “limited time” scenario is provided in Appendix D. Example 5.4. Performance of FUR and SIgFUR against QUICK for very large n (400 × 15 dataset): This example is considered to show the robustness and speed of FUR and SIgFUR algorithms. The 400 × 15 dataset is generated by using the 100 × 15 dataset of Example 5.1. Under each attribute, ranks of object numbered 101–200 are derived by adding 100 to the ranks of objects numbered 1–100. Similarly, ranks of object numbered 201–300 and 301–400 are generated. Thus, we have 400 objects and 15 attributes. Algorithm QUICK takes a very long time (84780 s ≈ 24 h) to come up with a ranking that has the Kemeny distance of 244528. Alternatively, the algorithm FUR (with N = 5; s = 20) takes only 1421 s ( ≈ 25 min) to achieve a solution with Kemeny distance 244400; and the algorithm SIgFUR (with N1 = 3; ω = 50; s = 15; N2 = 3) takes only 2095 s ( ≈ 35 min) to achieve a solution with Kemeny distance 244344. Thus, FUR and SIgFUR algorithms not only outperform QUICK in terms of Kemeny distance but also have a reduced run-time of a few minutes (as compared to almost a day required by QUICK). For situations where n is not large (say, ≤ 35), run-time is usually very small (only a few seconds). To check the applicability of our algorithm for smaller datasets, we consider 100 random
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
samples of size 30 × 15 generated from the 100 × 15 dataset of Example 5.1. FUR (with N = 6; s = 15) outperforms QUICK in 66 out of 100 cases, while both perform equally well in 16 cases. In terms of run-time, both the algorithms take approximately 5 s. In other words, FUR results in a better or equally good consensus ranking as QUICK in 82 out of 100 cases. From the above examples, it is evident that our proposed algorithms outperform existing state-of-the-art algorithms in terms of performance as well run-time. Our algorithms are able to efficiently handle much larger values of n. Furthermore, the robustness of our algorithms is demonstrated in Example 5.2, where for each of the 50 random datasets, FUR and SIgFUR give good results without adjusting the parameters. We considered many more examples and found similar performances of FUR and SIgFUR. 6. Concluding remarks We have proposed two algorithms— FUR and SIgFUR— for the rank aggregation problem under Kemeny framework. Both of these algorithms rest on the cornerstone of two elegant algorithms, referred to as Subiterative Convergence and Greedy Algorithm, that have been developed in the paper. Subiterative Convergence Osc attempts to find consensus ranking by breaking the problem of comparing a large number of objects in one go to that of aggregating several comparisons of a much smaller number of objects taken at a time. Greedy Algorithm Oga tries to find improved local minimum by repositioning each object individually within a specified distance from its position. The required number of calculations in the original Kemeny optimization problem is n! n2 k whereas, Osc requires only η! η2 ( p + )(n + η − 1 )k calculations. Thus, Osc requires O(n) calculations. Similarly, Oga requires 2sn n2 k calculations, which is O(n3 ). Thus, the time taken by each of the basic algorithms is polynomial-bounded, and even for very large n, the algorithms work within reasonable time. Parameter selection plays an important role in determining the nature of output ranking from Osc and Oga . Besides, Osc often gets stuck in a ranking that is a local minimum in terms of the Kemeny distance. Similarly, Oga once performed on a good initial seed rank-
205
ing does not offer any significant improvements further. To overcome these issues, FUR and SIgFUR have been developed in this paper. Starting with mean seed ranking, both FUR and SIgFUR overcome the problem of initial seed selection. By combining Osc and Oga in a loop, FUR and SIgFUR kick-start the rankings stuck up at local minima. Also, different branches of FUR overcome the problem of parameter selection. In case of SIgFUR, seed-based iteration inherently perturbs the rankings to obtain better consensus ranking. Due to reduction in the number of calculations for Osc and Oga , to the polynomial order, FUR and SIgFUR continue to perform well without taking disproportionately longer run-time for very high values of n. It is noted that FUR and SIgFUR outperform existing state-of-the-art algorithms in terms of Kemeny distance and runtime. If one were to choose between FUR and SIgFUR, then we recommended FUR for the “limited time” scenario and SIgFUR for the “unlimited time” scenario. There are situations where the rank aggregation problem associates preassigned weights to attributes. Even though we have not specifically mentioned about the weighted rank aggregation problem, it is easy to incorporate weights in our proposed algorithms by using weighted total Kemeny distance. Currently in our proposed algorithms, though input rankings can contain ties, output rankings have been restricted to full rankings. Also, rank aggregation of incomplete rankings is not dealt in the present study. Some progress has been made on these fronts and will be reported in a future paper. Acknowledgment The authors thank Rakhi Singh for some fruitful and interesting discussions. The authors are also very thankful to the four anonymous reviewers for their insightful comments, which greatly improved the presentation of the paper.
206
Appendix A. Flowchart for FUR
Appendix B. Flowchart for SIgFUR
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
Appendix C. Complete results for 50 datasets of size 50 × 15 Parameters
1
FUR QUICK
N = 3–5; s = 17 –
Kemeny 15,192 15,224 Run-time∗ 36.9 51.8 Kemeny
FUR QUICK
SIgFUR vs. QUICK
N1 = 6; ω = 10; s = 25; N2 = 3–5
SIgFUR QUICK
–
SIgFUR QUICK FUR vs. FAST
N = 3–8; s = 25 –
FUR FAST FUR FAST
SIgFUR vs. FAST
N1 = 2–6; ω = 100; s = 25; N2 = 2–7
SIgFUR FAST
–
SIgFUR FAST Algorithm FUR QUICK FUR QUICK SIgFUR QUICK
FUR FAST
Kemeny 15,042 15,062 Run-time 32.3 51.3 Kemeny 15,042 15,062 Run-time 34.9 51.3 Kemeny 15,032 15,042
3
4
5
6
7
8
15,178 15,198
15,088 15,100
14,816 14,838
15,272 15,284
15,180 15,180
15,064 15,084
15,074 15,084
31.1 51.5
24.3 51.3
31.4 51.2
36.5 51.4
47.9 51.3
41.0 51.3
32.8 51.2
15,194
15,184
15,090
14,816
15,260
15,192
15,070
15,068
15,224 Run-time 34.7 51.8 Kemeny 15,192 15,204 Run-time 1003 2517 Kemeny
15,198
15,100
14,838
15,284
15,180
15,084
15,084
38.6 51.5
34.6 51.3
35.1 51.2
34.0 51.4
52.1 51.3
38.4 51.3
31.2 51.2
15,178 15,188
15,088 15,082
14,814 14,816
15,260 15,272
15,172 15,176
15,064 15,068
15,066 15,072
1927 2615
1094 2557
1291 2545
1328 2393
1670 2442
1895 2544
1267 2343
15,192
15,174
15,086
14,812
15,260
15,170
15,064
15,064
15,204 Run-time 1179 2517
15,188
15,082
14,816
15,272
15,176
15,068
15,072
1483 2615
1169 2557
1164 2545
1429 2393
1853 2442
1241 2544
1319 2343
10
11
12
13
14
15
16
17
18
19
20
21
22
14,802 14,800
15,278 15,306
14,842 14,854
15,020 15,044
14,572 14,586
14,888 14,898
15,008 15,016
14,720 14,736
15,132 15,142
14,926 14,948
15,036 15,078
14,740 14,738
15,146 15,146
30.1 51.2
40.1 51.5
47.8 51.5
40.8 52.2
42.1 51.9
42.2 52.0
50.3 51.8
31.4 51.2
46.6 51.2
36.2 51.2
39.9 51.2
30.5 51.4
35.4 51.2
14,800 14,800
15,276 15,306
14,852 14,854
15,024 15,044
14,572 14,586
14,888 14,898
15,008 15,016
14,714 14,736
15,136 15,142
14,912 14,948
15,034 15,078
14,746 14,738
15,148 15,146
37.5 51.2
44.7 51.5
37.1 51.5
37.0 52.2
38.3 51.9
42.4 52.0
43.4 51.8
40.9 51.2
30.4 51.2
39.7 51.2
38.8 51.2
49.3 51.4
36.5 51.2
14,800 14,800
15,274 15,284
14,842 14,846
15,020 15,028
14,568 14,566
14,888 14,890
15,008 15,014
14,716 14,720
15,132 15,140
14,912 14,926
15,034 15,036
14,740 14,738
15,142 15,142
207
SIgFUR QUICK
9
2
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
FUR vs. QUICK
Algorithm
208
(continued) Algorithm
Algorithm
37
FUR QUICK SIgFUR QUICK SIgFUR QUICK FUR FAST FUR FAST SIgFUR FAST
FUR QUICK FUR QUICK SIgFUR QUICK SIgFUR QUICK FUR FAST FUR FAST SIgFUR FAST SIgFUR FAST
Kemeny 14,904 14,914 Run-time 31.4 52.2 Kemeny 14,908 14,914 Run-time 38.6 52.2 Kemeny 14,904 14,904 Run-time 1138 2449 Kemeny 14,898 14,904 Run-time 1498 2449
All run-time in seconds.
24
25
26
27
28
29
30
31
32
33
34
35
36
15,248 15,256
15,326 15,346
14,764 14,774
15,478 15,482
15,058 15,076
15,174 15,192
15,134 15,152
14,798 14,814
14,860 14,860
14,980 14,998
15,016 15,024
15,244 15,248
14,828 14,832
31.3 51.3
40.7 51.2
37.8 51.2
37.5 51.4
31.9 51.5
27.8 51.7
36.0 51.5
29.1 51.3
33.0 51.7
39.2 51.3
28.1 51.3
34.0 51.2
43.2 51.6
15,248 15,256
15,326 15,346
14,762 14,774
15,478 15,482
15,054 15,076
15,176 15,192
15,130 15,152
14,796 14,814
14,856 14,860
14,982 14,998
15,016 15,024
15,256 15,248
14,828 14,832
44.7 51.3
39.4 51.2
39.3 51.2
40.3 51.4
44.9 51.5
40.8 51.7
48.1 51.5
36.9 51.3
50.4 51.7
41.0 51.3
44.5 51.3
42.5 51.2
33.6 51.6
15,248 15,250
15,326 15,336
14,764 14,770
15,480 15,482
15,044 15,048
15,174 15,176
15,134 15,136
14,796 14,804
14,854 14,856
14,980 14,986
15,010 15,008
15,244 15,242
14,824 14,828
1459 2414
1339 2405
1377 2411
1365 2412
1059 2423
1078 2406
1185 2401
1249 2404
1446 2401
1299 2414
1248 2404
986 2406
1120 2415
15,248 15,250
15,324 15,336
14,762 14,770
15,468 15,482
15,050 15,048
15,174 15,176
15,132 15,136
14,796 14,804
14,848 14,856
14,980 14,986
15,010 15,008
15,238 15,242
14,826 14,828
2075 2414
2221 2405
1290 2411
1696 2412
1105 2423
1479 2406
1123 2401
1150 2404
1453 2401
1443 2414
1582 2404
1499 2406
1290 2415
38
39
40
41
42
43
44
45
46
47
48
49
50
15,086 15,090
14,712 14,742
15,032 15,038
14,974 14,980
15,438 15,450
15,242 15,264
14,766 14,786
14,576 14,584
14,886 14,882
14,474 14,482
14,952 14,972
15,272 15,280
15,254 15,274
34.2 52.3
36.5 51.9
27.5 52.1
29.5 51.8
40.4 51.8
37.0 52.0
23.8 52.0
30.3 51.9
27.0 52.0
36.2 51.9
41.8 51.9
37.0 52.0
38.7 52.1
15,074 15,090
14,718 14,742
15,032 15,038
14,974 14,980
15,432 15,450
15,240 15,264
14,758 14,786
14,566 14,584
14,890 14,882
14,474 14,482
14,940 14,972
15,266 15,280
15,246 15,274
43.3 52.3
40.1 51.9
39.9 52.1
45.2 51.8
34.9 51.8
38.5 52.0
50.6 52.0
44.3 51.9
39.4 52.0
43.8 51.9
37.4 51.9
49.0 52.0
47.0 52.1
15,074 15,074
14,712 14,722
15,030 15,038
14,974 14,970
15,430 15,436
15,238 15,248
14,758 14,758
14,566 14,566
14,886 14,882
14,474 14,476
14,942 14,946
15,262 15,272
15,246 15,262
1320 2405
1260 2415
1402 2408
1513 2411
1398 2413
1326 2425
1376 2411
1166 2404
1092 2407
1200 2405
1808 2412
1871 2406
1606 2410
15,074 15,074
14,712 14,722
15,032 15,038
14,968 14,970
15,430 15,436
15,238 15,248
14,758 14,758
14,564 14,566
14,882 14,882
14,474 14,476
14,940 14,946
15,266 15,272
15,246 15,262
1282 2405
1295 2415
1389 2408
1796 2411
1329 2413
1387 2425
1705 2411
1567 2404
1464 2407
1429 2405
1354 2412
1497 2406
1491 2410
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
SIgFUR FAST
Kemeny 14,820 14,836 Run-time 30.5 51.5 Kemeny 14,820 14,836 Run-time 34.7 51.5 Kemeny 14,816 14,820 Run-time 1038 2403 Kemeny 14,816 14,820 Run-time 1242 2403
FUR QUICK
∗
23
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
209
Appendix D. The list of top-25 and worst-25 cities for Preflib ED-0 0 015-0 01.soc data for “limited time” scenario (average τX = 0.7460) FUR
SIgFUR
Top-25
Worst-25
Top-25
Worst-25
Rank
City
Rank
City
Rank
City
Rank
City
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Paris London Madrid Washington DC Singapore Berlin Mexico City The Valley Victoria Road Town San Jos Monaco West Island Tokyo Amsterdam Santiago Rome Douglas San Marino George Town Buenos Aires San Salvador Beijing Stanley San Juan
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
N’Djamena Yamoussoukro Honiara Marigot Mbabane Ashgabat South Tarawa Belmopan Funafuti Gustavia Mamoudzou Palikir Hargeisa Cockburn Town Mata-Utu Naypyidaw Alofi Avarua Hagatna Grytviken Sri Jayawardenepura El Alamein Episkopi Cantonment Abuja Ngerulmud
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Paris London Madrid Washington DC Singapore Berlin Mexico City The Valley Victoria Road Town San Jos Monaco West Island Tokyo Amsterdam Santiago Rome Douglas San Marino George Town Buenos Aires Beijing San Salvador Stanley San Juan
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
N’Djamena Yamoussoukro Honiara Marigot Mbabane Ashgabat South Tarawa Belmopan Funafuti Gustavia Mamoudzou Palikir Hargeisa Cockburn Town Mata-Utu Naypyidaw Alofi Avarua Hagatna Grytviken Sri Jayawardenepura El Alamein Episkopi Cantonment Abuja Ngerulmud
210
P. S Badal, A. Das / Computers and Operations Research 98 (2018) 198–210
References Ali, A., Meila˘ , M., 2012. Experiments with Kemeny ranking: what works when? Math. Soc. Sci. 64 (1), 28–40. Amodio, S., D’Ambrosio, A., Siciliano, R., 2016. Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the kemeny axiomatic approach. Eur. J. Oper. Res. 249 (2), 667–676. Arrow, K.J., 1951. Social Choices and Individual Values. Oxford, England, Wiley. Bartholdi, J., Tovey, C.A., Trick, M.A., 1989. Voting schemes for which it can be difficult to tell who won the election. Soc. Choice Welfare 6 (2), 157–165. de Borda, J.C., 1781. Mémoire sur les élections au scrutin. Histoire de l’Academie Royale des Sciences. Cohen, A., Mallows, C.L., 1980. Analysis of ranking data. Technical Report. Bell Laboratories Memorandum, Murray Hill, New Jersey. de Condorcet, M., 1785. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Cambridge University Press. Cook, W.D., 2006. Distance-based and ad hoc consensus models in ordinal preference ranking. Eur. J. Oper. Res. 172 (2), 369–385. Cook, W.D., Golany, B., Penn, M., Raviv, T., 2007. Creating a consensus ranking of proposals from reviewers’ partial ordinal rankings. Comput. Oper. Res. 34 (4), 954–965. Cook, W.D., Kress, M., Seiford, L.M., 1986. An axiomatic approach to distance on partial orderings. RAIRO-Oper. Res. 20 (2), 115–122. Cook, W.D., Saipe, A.L., 1976. Committee approach to priority planning: the median ranking method. Cahiers du Centre d’Études de Recherche Opérationnelle 18 (3), 337–351. D’Ambrosio, A., Amodio, S., Iorio, C., 2015. Two algorithms for finding optimal solutions of the Kemeny rank aggregation problem for full rankings. Electron. J. Appl. Stat. Anal. 8 (2), 198–213. D’Ambrosio, A., Heiser, W.J., 2009. Decision trees for preference rankings. In: Proceedings of the 7th Meeting of the Classification and Data Analysis Group, Catania, Italy, pp. 133–136.
D’Ambrosio, A., Heiser, W.J., 2016. A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika 81 (3), 774–794. D’Ambrosio, A., Mazzeo, G., Iorio, C., Siciliano, R., 2017. A differential evolution algorithm for finding the median ranking under the Kemeny axiomatic approach. Comput. Oper. Res. 82, 126–138. Dwork, C., Kumar, R., Naor, M., Sivakumar, D., 2001. Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. ACM, pp. 613–622. Emond, E.J., Mason, D.W., 2002. A new rank correlation coefficient with application to the consensus ranking problem. J. Multi-Criteria Decis. Anal. 11 (1), 17–28. Hartigan, J.A., Wong, M.A., 1979. Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc.. Ser. C (Appl. Stat.) 28 (1), 100–108. Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning. Springer Series in Statistics, New York. Heiser, W.J., 2004. Geometric representation of association between categories. Psychometrika 69 (4), 513–545. Heiser, W.J., D’Ambrosio, A., 2013. Clustering and prediction of rankings within a Kemeny distance framework. In: Algorithms from and for Nature and Life. Springer, pp. 19–31. Kemeny, J.G., 1959. Mathematics without numbers. Daedalus 88 (4), 577–591. Kemeny, J.G., Snell, J.L., 1962. Mathematical Models in the Social sciences. Blaisdell Publishing Company. Kendall, M.G., 1938. A new measure of rank correlation. Biometrika 30 (1/2), 81–93. Mallows, C.L., 1957. Non-null ranking models. I. Biometrika 44 (1/2), 114–130. Mattei, N., Walsh, T., 2013. Preflib: a library of preference data http://preflib. org. In: Proceedings of the 3rd International Conference on Algorithmic Decision Theory (ADT 2013), pp. 1–7. Sigmund, P.E., 1963. Nicholas of Cusa and Medieval Political Thought. Harvard University Press, Cambridge. Thompson, G., 1993. Generalized permutation polytopes and exploratory graphical methods for ranked data. Ann. Stat. 1401–1430.