Enhancement of Fuzzy Rank Aggregation Technique

17 downloads 1119 Views 181KB Size Report
context of web, rank aggregation has been applied in meta-searching. However, the usage of prevailing search engines and meta-search engines, even though ...
Enhancement of Fuzzy Rank Aggregation Technique Mohd Zeeshan Ansari, M.M. Sufyan Beg and Manoj Kumar

Abstract The rankings of an object based on different criteria pose the problem of choice to give a ranking to that object at a position nearest to all the rankings. Generating a ranking list of such objects previously ranked is called rank aggregation. The aggregated ranking is analyzed by computing Spearman Footrule distance. The ranking list chosen by minimizing Spearman Footrule distance is NP-Hard problem even if number of lists is greater than four for partial lists. In the context of web, rank aggregation has been applied in meta-searching. However, the usage of prevailing search engines and meta-search engines, even though some of them being designated as successful, reveal that none of them have been effective in production of reliable and quality results, the reason being many. In order to improve the rank aggregation, we proposed the enhancement in the existing Modified Shimura technique by the introduction of a new OWA operator. It not only achieved better performance but also outperformed other similar techniques.



Keywords Web Meta-searching distance Fuzzy logic





Rank aggregation



Spearman footrule

M.Z. Ansari (&) Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India e-mail: [email protected] M.M. Sufyan Beg Department of Computer Engineering, Aligarh Muslim University, Aligarh, India e-mail: [email protected] M. Kumar Department of Computer Engineering, Delhi Technological University, New Delhi, India e-mail: [email protected] © Springer India 2016 S.C. Satapathy et al. (eds.), Proceedings of the Second International Conference on Computer and Communication Technologies, Advances in Intelligent Systems and Computing 381, DOI 10.1007/978-81-322-2526-3_14

[email protected]

127

128

M.Z. Ansari et al.

1 Introduction Several search engines are required on the World Wide Web, the reason being that only a single search engine is not sufficient for retrieval of effective and reliable results. Over the huge amount of information available on the web, no ranking algorithm is widely acceptable. A single search engine is not sufficient to span its coverage all over the web [1, 2]. The existing search engines suffer from a few drawbacks. Firstly, the indexing of web data is time and space consuming. Secondly, due to the rapid changes in web data, the search engine has to always manage a trade-off between the coverage and update frequency [3]. Thirdly, the advertisers may entice a paid ranking that leads to loss of fair information and accuracy in rankings [1, 4]. Lastly, the overall coverage also does not include the indexing of proprietary information such as online digital libraries, etc. [3]. In order to overcome these drawbacks, the rank aggregation technique is applied into web searching and the search engine obtained by such technique is called meta-search engine. Meta-searching exploits the various techniques and algorithms of the participating search engines by combining all the results together. Moreover, the multiple search engines have differing coverage area depending upon the underlying algorithm, and it covers the entire web. It also enables a consistency check on the results by spam reduction [2, 5]. In addition, it makes easy to formulate a query for the user [1, 4]. Therefore, the meta-search engines have gained importance over the single search engine. A meta-search engine is designed by applying rank aggregation on the results of the participating search engines. Therefore, the efficiency and performance of meta-search engine depends on the algorithm used for rank aggregation. The applications of rank aggregation also include sports and competitions, collaborative filtering, database middleware, consumer ratings, etc. [6]. Rank aggregation is defined as, given n lists each of k elements each, generating a new list of elements which is closest to all the n lists. The closeness is computed by means of distance measures. The various rank aggregation techniques proposed in literature [1–5, 7–11] are (i) score-based methods; (ii) positional methods; (iii) probabilistic methods; (iv) learning techniques; (v) markov chains; and (vi) soft computing techniques. The performance of a rank aggregation technique is measured by the two distance measures, namely, Kendall tau distance and Spearman footrule distance. The distance measure of the final aggregated list is calculated with each of the input list. Subsequently, a normalized aggregated distance is then obtained. A low value of this distance signifies that the technique is effective. The minimization of this normalized aggregated distance is called Optimal Aggregation Problem, and it is NP-Hard in case of partial lists even when number of lists is greater than four [1, 4, 7, 8]. On the other hand, the technique that satisfies Condorcet criteria for rank aggregation offers much spam resistance [1, 4, 12]. The markov chains and soft computing techniques have been found satisfying much of the Condorcet criteria.

[email protected]

Enhancement of Fuzzy Rank Aggregation Technique

129

We investigated the weakness of Modified Shimura technique and proposed an effective enhancement in it. We designed a new fuzzy membership function for the relative quantifier. This relative quantifier is used by the proposed OWA operator involved in the technique. Moreover, we are successful in preserving the Condorcet criteria. We analyzed the fuzzy rank aggregation algorithms, viz., membership function ordering, Shimura’s technique, and Modified Shimura’s technique. We analyzed Borda’s method and used it as a benchmark technique. These are discussed in Sect. 3. The proposed technique is presented and discussed in Sect. 4. Finally, in Sect. 5, we present a comparative performance analysis based on Spearman footrule distance and running time. It is found that the proposed enhancement in Modified Shimura technique exhibits better performance for rank aggregation.

2 Background The definitions used in the succeeding sections are as follows. Definition 1 Given a set of lists {L1, L2, …, Lk}, the rank aggregation is defined as the problem of generating a list L such that L is closest to {L1, L2, …, Lk}. Definition 2 Given a set of lists {L1, L2, …, Lk}, the optimal rank aggregation is defined as the problem of generating a list L such that the Spearman footrule distance F(L, {L1, L2, …, Lk}) is minimized, as in [1, 4]. Definition 3 Given a set of partial lists {L1, L2, …, Lk}, the partial footrule optimal aggregation is defined as the problem of generating a list L such that F(L, {L1, L2, …, Lk}) is minimized, as in [4]. This is a special case of optimal rank aggregation where the lists are partial top d lists obtained from search engine results.

3 Rank Aggregation Techniques 3.1

Borda’s Method

Given a set of k lists {L1, L2, …, Lk}, to each element cj in Li, the Borda’s Method assigns a score given as        Bi cp ¼ cp : Li cp [ Li cj : The total score of each element is given by

[email protected]

130

M.Z. Ansari et al. k   X   B cj ¼ Bi cj : i¼1

The descending sort on the Borda’s Score gives the aggregated list [4]. Borda’s method for rank aggregation has been extensively used as a benchmark technique.

3.2

Membership Function Ordering

The application of mean and variance of the position in mean by variance technique yields useful results in [4]. While applying the same two attributes in fuzzy logic, the Gaussian membership function can be obtained as " #! 1 1 ðx  xdi Þ2 ld i ð xÞ ¼ qffiffiffiffiffiffiffiffiffiffiffi exp  ; 2 r2di 2pr2di where xdi is the mean of documents, r2di is the variance of documents, and ld i ðxÞ is the ranking membership value of document di at position x. The document having greatest membership value at a given position is assigned that position.

3.3

Shimura’s Technique

Shimura proposed a fuzzy logic technique for rank ordering of objects [13]. It defines a pairwise function fy(x) as the membership of object x over object y. For a given set of objects, it calculates a relativity function f(x|y) which is the fuzzy measure of preferring one object over another. In this way, the membership value of each object with respect to every other object is obtained. Sorting them in descending order leads to a list of aggregated objects.

3.4

Modified Shimura Technique

The membership value of rank of each document is determined using Shimura’s algorithm. It applies minimum function for choosing a document among n documents. Due to this minimum function, the descending sort on membership values results in many ties. Therefore, the minimum function was replaced by an OWA operator. To calculate the weights of OWA operator, the “at least half” linguistic quantifier is applied as follows.

[email protected]

Enhancement of Fuzzy Rank Aggregation Technique

131

For a relative quantifier with m criteria, the weights can be calculated as     i i1 wi ¼ Q Q ; i ¼ 1; 2; 3; . . .; m with Qð0Þ ¼ 0; m m where Q(r) is the membership function of the relative quantifier, defined as 8 < 0 if r\a; ra Qðr Þ ¼ ba ; if b  r  a; : : 1 if r [ b For, at least half quantifier, a = 0, b = 0.5.

4 Proposed Rank Aggregation Technique The score-based methods such as Borda’s method have been found ineffective in rank aggregation. The application of positional methods poses a problem of NP-hardness in case of partial lists. Therefore, researchers have been using soft computing techniques to optimize the Spearman footrule distance. The fuzzy logic technique of Shimura exhibits poor performance, and hence improved to Modified Shimura. The Modified Shimura is enhanced to the proposed technique by the introduction of a new OWA operator. We designed a non-linear membership function for the calculation of weights of the proposed OWA operator.

4.1

Proposed Enhancement in Modified Shimura Technique

Let U be the universal set of objects. The pairwise function fy(x) is a membership function of object x with respect to y, with fx(x) = 1 and fy(x) = 0 for x 62 U. The relativity function f(x|y) which is the fuzzy measurement of choosing x over y is defined as f ðxjyÞ ¼

f ð xÞ  y : max fy ð xÞ; fx ð yÞ

Now, for a given set X = (x1, x2, x3, …, xn), and the object x 2 X, the relativity function, for choosing x among X = (x1, x2, x3, …, xn), is defined as f ðxjX Þ ¼ min½f ðxjx1 Þ; . . .; f ðxjxn Þ; f ðxjxÞ ¼ 1:

[email protected]

132

M.Z. Ansari et al.

In meta-searching, we have results of N search engines, resulting into N lists, L1, …, LN, such that each list is a subset of U. For k = 1, …, N and i, j = 1, …, n, the pairwise function can be defined as   jLk ðxi Þ\Lk xj j fxj ðxi Þ ¼ ; N where Lk(x) is the rank of document x in list Lk. For n documents each in N lists, the relativity function of each element with respect to each other element can be defined as Ci ¼

X   f xi jxj wj : j¼1...n

Now, Ci is the membership value of rank of each element, where f(xi|xj) is the jth highest membership value and wj is the set of weights of the proposed OWA operator. The weights of the OWA operator are calculated using the membership function shown in Fig. 1. We chose to introduce a non-linearity in the membership function while preserving all the characteristics of Modified Shimura also. It gives the values of weights in decreasing fashion as moving from membership value 0– 0.5 as desired. It yields better performance and also takes less time. The desired membership function of the proposed quantifiers is defined as 8 < 4r 2 if 0  r  12 ; Q ðr Þ ¼ 1 ð1Þ if 12  r  1; : 0 otherwise: The membership function of (1) is represented in Fig. 1. Better results from the proposed operator are achieved using the non-linear membership function in place of linguistic quantifiers when applied on partial lists. Fig. 1 Membership function of proposed technique

1

0.5

[email protected]

1

Enhancement of Fuzzy Rank Aggregation Technique

133

5 Experiments and Results 5.1

Experimental Setup

We collected the search engine results of Bing, Gigablast, Google, MySearch, and Wow. The queries executed on these search engines, also used in [1, 4], are as follows: affirmative action, alcoholism, amusement parks, architecture, bicycling, blues, cheese, citrus groves, classical guitar, computer vision, cruises, Death Valley, gardening, graphic design, gulf war, HIV, lyme disease, mutual funds, parallel architecture, sushi and zener. The top 100 results returned by each query from the five search engines are collected. This is used as the benchmark dataset for the implementation of the algorithms discussed in the previous sections. An aggregated list is produced as an output against each algorithm. The average normalized Spearman footrule distance of each newly produced aggregated list is calculated. The time consumed by each algorithm is also recorded.

5.2

Results of Proposed Technique

RANK AGGREGATION TECHNIQUE

An effective rank aggregation algorithm should have low value of Spearman footrule distance. The average normalized footrule distance has been represented in Fig. 2. It clearly shows that the proposed technique has the lowest value. Modified Shimura takes the second place followed by Membership Function Ordering. This also shows the comparative performance of all of them. The running time of the rank aggregation techniques is shown in Table 1. The time taken in seconds by each technique with respect to Borda’s method is given. The proposed technique takes the minimum time.

Propsed Technique

Modified Shimura

Shimura

MFO

Borda 0.25

0.275

0.3

0.325

0.35

0.375

0.4

AGGREGATE FOOTRULE DISTANCE

Fig. 2 Comparison of average normalized footrule distance of rank aggregation techniques

[email protected]

134 Table 1 Running time

M.Z. Ansari et al. Aggregation technique

Time relative to borda

Borda MFO Shimura Modified Shimura Proposed technique

1.0 537.0248 13.15276 4.206506 2.956860

6 Conclusion A new rank aggregation technique is proposed which find its application in meta-searching. The proposed technique is a further enhancement in Modified Shimura technique. We have proposed a new OWA operator by designing a non-linear membership function for its relative quantifier. The analysis shows that the proposed enhancement is effective and efficient than the Modified Shimura technique. We have also successfully implemented and analyzed the other existing fuzzy rank aggregation techniques namely membership function ordering, Shimura’s technique for fuzzy ordering, and put forward a comparative analysis of existing and proposed technique. The proposed technique is significantly found effective in terms of performance by measuring Spearman footrule distance and time efficient against the other techniques.

References 1. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the Tenth ACM International Conference on World Wide Web, pp. 613–622 (2001) 2. Akritidis, L., Katsaros, D., Bozanis, P.: Effective rank aggregation for meta searching. J. Syst. Softw. 84, 130–143 (2010) 3. Renda, M.E., Straccia, U.: Web meta search: rank vs. score based rank aggregation methods. In: Proceedings of the ACM Symposium on Applied Computing, March 09–12 (2003) 4. Beg, M.M.S., Ahmad, N.: Soft computing techniques for rank aggregation on the world wide web. World Wide Web J.: Internet Inf. Syst. 6, 5–22 (2003) 5. Aslam, J.A., Montague, M.: Models of meta search. In: Proceedings of 24th SIGIR 2001, pp. 276–284 6. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation revisited. Manuscript (2001) 7. Beg, M.M.S., Ahmad, N.: Fuzzy logic and rank aggregation for the world wide web. Stud. Fuzziness Soft Comput. J. 137, 24–46 (2004) 8. Yasutake, S., Hatano, K., Takimoto, E., Takeda, M.: Online rank aggregation. In: Proceedings of 24th International Conference ALT 2013, pp. 68–82 (2013) 9. Qin, T., Geng, X., Liu, T.Y.: A new probabilistic model for rank aggregation. Proc. Adv. Neural Inf. Proc. Syst. 23, 681–689 (2010) 10. Liu, Y.T., Liu, T.Y., Qin, T., Ma, Z. M., Li, H.: Supervised rank aggregation. In: Proceedings of the ACM International Conference on World Wide Web, pp. 481–489 (2007)

[email protected]

Enhancement of Fuzzy Rank Aggregation Technique

135

11. Ailon, N.: Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica 57(2), 284–300 (2008) 12. Ross, T.J.: Fuzzy Logic with Engineering Applications. McGraw-Hill, New York (1997) 13. Shimura, M.: Fuzzy sets concepts in rank ordering objects. J. Math. Anal. Appl. 43, 717–733 (1973) 14. Borda, J.C.: Memoire sur les election au scrutiny. Histoire de l’Academie Royale des Sciences (1781)

[email protected]