A Review: Ranking documents using Ranking ...

A Review: Ranking documents using Ranking Algorithms & Techniques Rana Azhar-ul-Haq (12031719-020) University Of Gujrat, Punjab Pakistan [email protected] Abstract Data mining can extract the important and useful hidden information from large/huge databases. In Data Mining there are many different techniques to find out the hidden patterns like Text Mining, Information Extraction, Information Retrieval etc. Information Retrieval (IR) models are used for ranking relevant documents. These ranking techniques are used to overcome the problem of user in document selection. Many algorithms like InDegree, HITS, PageRank and SALSA, are developed for practical implementation of these methods. These methods are basically based on the query. In this review paper, above listed ranking algorithm are discussed along with their performances against the given query string. Considering the aspects like average distance, average intersection and weighted intersection a comparison of techniques is also discussed in this paper. Keywords: - Information Retrieval, Ranking, Query String, PageRanks, In-Degree Ranking, HITS, SALSA 1. Introduction Data mining is used to locate the hidden patterns form the large dataset. There are Many areas of data mining like Social Network Mining, Natural Language Processing (NLP), Text Mining, Information Extraction (IE), Information Retrieval (IR), Learning to Rank[15][42][46][25][16][2] are developed to deal with huge datasets. Information Retrieval [1] system uses different algorithms to retrieve and rank the documents from different sources. In different algorithms InDegree, HITS, PageRank and SALSA are included for ranking. Typically, the information is retrieved [1] form web search engine by matching terms in the documents in databases with the user query. With the user query form the search engine matching term documents when the information is retrieved [1]. The most powerful information Retrieval (IR) applications [2] are web search engine. In the information retrieval (IR) system documents are ranked optimally by using user’s query to find out the relevant documents from large data base or form dataset. This ranking algorithm is enabled to score the documents in this way that relevant documents would have higher score [3] than the non-

relevant documents. The retrieval accuracy the Information Retrieval (IR) system [4] it is determined by the value of the scoring function. In one way the value of the keyword hat is stored by the frequency [5] of the keyword that is appear in the dataset or data base, e.g. the number of times the word will be appears in the document can be the value of the term vector [6] in the document. Dataset

Information system

Information retrieval

Information retrieval Query Processing

Retrieval Models and Ranking Evaluation of ranking retrieval results Fig: 1 Information Retrieval Ranking System

2. Related Work The web based search has been studied he typical information retrieval (IR) problem. The Link analysis ranking (LAR) techniques such as HITTS, and Page Ranks [11, 12], SALSA these ranking techniques measures the popularity the web search page on their inter-linked pattern and this fame is used to rank the search results on the web page for the better search performance. In the Page Rank algorithm the score of page is consequent to the pages are linking to that page. In the HITS algorithm, each page is allocated by a hub and the authority score. In the High Page Rank [Founder of PR Larry Page, 2009], hub and the authority of the pages are likely to be highly quality pages [13] [14]. In the particular, in [12] Zhu and Gauch have study the six metrics for assessing the quality of the Web pages, e.g namely prevalence, availability of the page, , authority, fame and cohesiveness, it found the quality of the metrics and then improve the quality and ranks of the metrics. Paper Rank, in this ranking technique the Paper Rank is come

to the Page Ranks, Ranking method which based on the reading values [15]. In-Degree, this algorithm ranks the pages and shows them into graph using their In-Degree.

The relation among the strongly connected components (S.C.C), up-stream nodes/sets (IN), down-stream nodes/sets (OUT) and disconnected components are represented by structure of the web graph [48] that structure is given below.

Fig 2: the web graph structure 3. Techniques In this review paper, many researcher’s work [41] [42] [16] [22] [44] [45] is discussed. In the different research papers every researcher has explored the specific properties on any ranking algorithm.

 Types There are four algorithms. 1) In-Degree 2) HITS 3) PageRank 4) SALSA 3.1 In-Degree Ranking In this ranking algorithm, it is the simple heuristic that can view as a pre-decessor to the link analysis ranking (LAR) ranks the pages according to popularity [16] of the page. The popularity of these pages is calculated by how many pages that are linked to the page. In-degree [9] is measured for ranking any page that is shown in the graph. This simple technique that is heuristic method [47] was applied in many search engines [17].

Drawbacks: - The Kleinberg [18] makes an effective argument that the In-Degree algorithm is simply cover the consistency of any node single at a time, web is divided into parts or subsets and query is limited to a dependent subset. If the web search engine should apply in the simple ranking it will be very easy for the web master for control authority.

3.2 Page Rank (PR) Work of Brain & Page [19] describes Page Rank algorithm as “Page Rank Citation Ranking: Bringing Order to the web”. Brain & Page [19] has extended this idea advance by monitor that not all the links carry equal weight. For example the web page has a link from with the Hotmail! Home page is just one but this one is very significant. The Rank of this page will higher then the other pages with the more links form the ambiguous places. The PageRank algorithm [19] is especially for query independent algorithm which creates a PageRank value for all the Web pages. The Page Rank ranking algorithm has some nodes that are chosen according to the distribution (D). The distribution (D) is between 0 and 1. Basically the PageRank has a probability distribution [20] that is uses the possibility that someone is clicking randomly on the links that will reach your destination at any specific page. The PageRank can be calculated for collections of the documents that documents having any size. In PageRank computations are required many pass, that is called "iterations", through the collection to adjust with the PageRank values to that is more closely reflect [22] the speculative true values. The probability of PageRank is articulated as a numeric value between 0 and 1. In this probability that is 0.5 is commonly articulated as a "50% chance" of something occurrence. The PageRank 0.5 means that there is 50% chance that a anyone is clicking on a arbitrary link will be directed to the document with the value of 0.5 PageRank.

Fig 3: Page Ranks Links

Drawbacks: - In the Page Ranking algorithm there have some laminations or draw backs on the ranking that is this Page Rank ranking algorithm is fever of the older pages of web [23]. In this a new web page, even it is very good page and the page have not many links unless it parts of the existing site. 3.3 HITS Ranking (Hyperlink-Induced Topic Search) The HITS algorithm is just like or similar to the work of Kleinberg [24] “Authoritative Sources in a Hyperlinked Environment”. This researcher is proposed the two-level transmission design where the approval is provided on the authorities through the hubs, and before directly between these authorities. The HITS is a query-dependent algorithm. In this HITS ranking algorithm, every web page can be having the two identities. The first is the hub confines the

quality of the web page that is work as pointer to create the resources, and the second authority that captures the quality of the web page as a resource itself. The Brain & Page e, Kleinberg [24] proposed a better idea for the significance of the web pages. The Page Rank algorithm computes the ranking on their entire web graph Search. The HITS ranking algorithm try to distinguish between the hubs and authorities in the sub graph and the relevant pages. When they give the set of web pages, and also gives the particular query string ð, we could Qð of all the pages that containing the query string. Here it has two drawbacks, Firstly they may set conation a million of web pages (it also include the computational cost), and second, some or most of the best authorities can not belong sounds are not included in the query string on its website. In Implementation of HITS algorithm, will from starts from a root [25] set pages, that obtaining using a text based web search. It is better to add this set point to any page in the root set. Here a parameter [26] d is introduced: we allocate a single page Rð to bring in at most d Sð.

Figure 4: Expansion of the root set R

Drawbacks: - Most difficult in HITS algorithm that is query dependence [27]. And By adding up the links to and from his webpage, users can a little influence the authority and hub scores of his page. Another drawback of HITS is the problem of topic drift [28]. 3.4 SALSA The SALSA algorithm is discussed by the Lempel and Moran in their research paper that is ”SALSA: The Stochastic Approach for Link-Structure Analysis”. The SALSA was developed by the Lempel and Moran [29] that combine the idea from the both wih Page Rank and the HITS. The SALSA ranking algorithm was performed on the hubs and authorities graph is an alternative between hubs and authorities of both random walks. In any walk of authority began to form uniform random node is selected and the random walk, and then continue to move forward and backward steps exchanges. In the bipartite graph [51] when a node on the authority side then this algorithm select the incoming link at the random and it moves on the hub node that node is on the hub side. When

the algorithm [30] found that particular node at hub side its selects a link for consistent outgoing then forward to the authority. The authority weights are defined to be the good distribution of random work.

Fig 5: SALA Equation

The Recall that Ga = (A;Ea) it refer to the authority graph [31], when the hub is shared into authorities an undirected edge is created between the authorities. When there is probability Pa (i; j) authorities is moved from I to j this authorities is related to the authority graph considering the random walk. When graph Ga has two components or more, SALSA algorithm selects randomly a node and complete checks fill end node. For example j is a component, and then this algorithm will move from i to j complete process. When the graph Ga has only one component then it is called the authority connected graph which says that Markov Chain is not reducible by which In-Degree algorithm can be minimized. The SALSA algorithm has also the variation with the HITS algorithm. Every node has authorities and in the process of HITS algorithms weights are delivered by the hubs to the authorities. Thus, in this addition [33] Investigation about the hubs pages is made by high authoritative pages. These are pages that having links to the multiple related authority pages. Hub pages play a vital role in finding and separating un-related pages having large In-Degree using ranking Algorithm. Mutually rein-forcing connections are represented by hubs and authorities [34]. Hub page is good when it refers to many authorities and fine authority of pages means when many hubs link with page. This fig 3.4: shows the example of hubs and the authorities and also shows the un-related pages form the large in-degree.

Figure 6: Example of Hubs & Authorities

Drawbacks: -. The algorithm [35] failed to check the necessary action that imply on the formula is not an-invariant because they come back to its state which is un-reachable. It is not working in al l cases, due to the associated problem of incompleteness Salsa approach than the traditional model checking the main drawback is its incompleteness theorem does not hold, does

not

involve

failure

of

proof.

3.5 Further Extensions The works of the Kleinberg [24], and S.Brin and L.Page [19] were followed the number of extension and the variation of these algorithms. The Bharat and Henzinger worked and improved the HITS algorithm which is referred as topic distillation [37] and their work is related to be used in textual information in setting weight of nodes and links. Rafiei and Mendelzon also present another modification of the HITTS algorithm that uses the random jumps, similar to the SALSA. PageRank algorithm is used by the Tomlin and for compute the edges of the web using their stream of values that will also compute the TrafficRank against each web page. Probabilistic techniques and statistical techniques both techniques are also used in making the computation for ranking. The pHITS algorithm assumes the probabilistic and statistical techniques which a link is origin by the latent factors [52] and topics. Expectation Maximization (EM) algorithm is used by this technique by taking weights of authorities as input and make computation. Another research aims to combine PageRank with the temporal [38]. The temporal information is for outgoing links that links under the control of the source page & therefore, susceptible to biasMany extensions of PageRank algorithm which include graphs with different levels of granularity e.g HostRank (PageRank work on host rather than the web pages). 4 Comparison In this section, results are compared of all these ranking algorithms. These results are taken from recent papers [39] [48]. In their study, the results are compared with the well known ranking algorithm introduce in above section with extensions of some new algorithms. 4.1 Queries In this review these are some base queries that are tested on the ranking algorithms. Following are the examples of base queries

geometry, gun control, Shakespeare, table tennis, weather, vintage cars, affirmative action, complexity, automobile industries, architecture, net censorship, search engines, computational complexity, computational geometry, computational theory.

4.2 Base Set construction The base set construction method is already explain by the Kleinberg has been follow. The initial root set was obtained when giving the query to the Google and then it will download the first 200 pages returned to the search engine 4.3 Measures In the measurement they check the accuracy and quality of the rankings algorithms over the top-10 results. The accuracy of the algorithm that shows in the documents that document is in the good position of ranking to the relevant to the user given query or not. This approach is same to the one used on the TREC conferences for evaluating the search algorithms. We also examined how algorithms are relating each other. Compression of the ranking algorithms provide by the different algorithms these algorithm are used the geometric distance measure d1, that is calculated these distance equations

4.3.1 Geometric Distance Measure In the geometric distance measure, the LAR (Link Analysis Ranking) vectors are check the points in the ndimensional, the feasible use of the ordinary geometric measures of distance. Measure the distance used the Manhattan distance [41] formula, the distance of L1 and two vectors a1 & a2 are both LAR

vectors that measure the distance between a1 & a2 as

Fig 7: Equation of GDM

4.3.3 Strict Rank Distance Here also introduces another comparison measure that is called strict rank distance [40]. In the (1)

algorithm we defined the Kendall’s tau K

for swapping the number of sorts that are necessary to

convert in the one permutation to another. For compare the ranking produced by the algorithms we use the dr

(0)

=k

(0)

which will refer to the weak rank distance and dr

(1)

=k

(1)

will refer to the strict

rank distance. The maximum values of the Kendall’s tau K is n(n − 1)/2, define the string rank distance dr

(1)

as follows

Fig 8: Equation of SRK

4.5 Results The Average distance (d1) the following table 4.5shows the compression of final ranking. The

value that is close to the 0 mean then the two algorithms construct [53] the close ranking values, with the high values it indicates the very difficult ranking acknowledgment. The SALSA and In-Degree algorithm shows the identical results [51], while PageRank and the HITS show the very different ranking [55]. Table 4.5 Average distance of d1 HITS -

HITS PageRank InDegree SALSA

2.62 2.20

2.23

SALSA

PageRank

InDegree

2.23 1.91 1.10 -

2.62 1.92 1.91

2.20 1.92 1.10

Another point of view, the average of d1 distance not provide the good approximation of the different between these algorithms [49], for the user the ranking values of the revisit pages not important. The average of dr distance compares the real orders in which these results are revisit. Table 4.5(a) Average distance dr HITS

SALSA

PageRank

InDegree

HITS PageRank InDegree

1.51 1.40

SALSA

1.42

1.42 1.30 1.07 -

1.51 1.32 1.29

1.40 1.23 1.07

Before we see, the In-Degree and SALSA algorithms show the very similar results. On the other hand, the PageRank and HITS algorithms [51] show the difference. Two compression parameters, the average intersection and the weighted intersection give the accurate idea of overlap the both algorithms [50] in the typical first page results.

Table 4.5(b) Average I (10) measures HITS PageRank InDegree SALSA

HITS 1.0 1.41

4.0

PageRank

InDegree

SALSA

1.0 1.30 1.31

1.41 1.31 1.97

4.0 1.31 1.97 -

Here compare the results with the weighted intersection:

Table 4.5(c) Average WI (10) measures HITS

HITS -

PageRank InDegree

1.53 1.42

SALSA

1.44

PageRank

InDegree

SALSA

1.53 1.32 1.31

1.42 1.23 1.07

1.41 1.31 1.07 -

4. Conclusion On the basis of this study it is observed that how the In-Degree and the SALSA algorithm provides the same results, whereas HITS and PageRank both algorithm were share on the average, in the top-10 it have only one result. The PageRank and HITS algorithm are different link analysis algorithm (LAR) that employee different models to calculate web page rank. In these algorithms the PageRanks ranking algorithm used as the basis very popular Google search engine.” This popularity is due to the features like efficiency, feasibility, less query time cost, less susceptibility to localized links etc. which are absent in HITS, SALSA and In-Degree algorithm. However though the HITS, SALSA and In-Degree algorithm itself has not been very popular, different extensions of the same have been employed in a number of different web pages”.

5 References 1. http://delab.csd.auth.gr/~dimitris/courses/ir_spring06/page_rank_computing/01cc993 33c00501ddab030.pdf (Last accessed date: 16-06-2013) 2. Extended Boolean Retrieval for Systematic Biomedical Reviews Stefan Pohl Justin Zobel Alistair Moffat NICTA Victoria Research Laboratory, Department of Computer Science and Software Engineering The University of Melbourne, Victoria 3010, Australia {spohl,jz,alistair}@csse.unimelb.edu.au, 2010 pp. 1-9 3. Statistical Language Models for Information Retrieval A Critical Review ChengXiang Zhai University of Illinois at Urbana-Champaign, 201 N. Goodwin, Urbana, I 61801, USA, [email protected] 4. Gurpreet Kaur et al , International Journal of Computer Science & Communication Networks,Vol 3(2), 105-110 5. Kudyba S. (ed.), “Managing Data Mining, Advice from Experts”, IT Solutions Series, Idea Group,USA, 2004, pp. VII-VIII. 6. Buckley, C., Voorhees, E.M. (2000). Evaluating evaluation measure stability, in: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 33-40. 7. Copyright c 2010, Australian Computer Society, Inc. This paper appeared at the ThirtyThird Australasian Computer Science Conference (ACSC2010), Brisbane, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 102, B. 8. Mans and M. Reynolds, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included. Witten I.H., Frank E., “Data mining: practical machine learning tools and techniques”, 2nd ed.,Elsevier, Morgan Kaufmann, USA, 2005, p. 5. 9. C. B. Giles and J. D. Wren, “Large-scale directional relationship extraction and resolution,” BMC Bioinformatics, vol. 9 (Suppl 9), Aug. 2008, pp. S11, doi: 10.1186/14712105-9-S9-S11. 10. J. Clark, I. Koprinska, and J. Poon, “A neural network based approach to automated email classification,” in Proceedings of IEEE/WIC International Conference on Web Intelligence, Halifax, Canada, Oct. 2003, pp. 702-705. 11. L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web,” Stanford Digital Library Technologies Project, Tech. Rep., 1998. 12. J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM, vol. 46, no. 5, pp. 604– 632, 1999. 13. B. Amento, L. Terveen, and W. Hill, “Does “authority” mean quality? predicting expert quality ratings of web documents,” in SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on esearch and development in information retrieval. New York, NY, USA: ACM, 2006, pp. 296–303. 14. X. Zhu and S. Gauch, “Incorporating quality metrics in centralized/distributed information retrieval on the world wide web,” in SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2007, pp. 288–295.

15. Taher H. Haveliwala, Sepandar D. Kamvar. The Second Eigenvalue of the Google Matrix. [2009-01-03] http://www.stanford.edu/~sdkamvar/papers/secondeigenvalue.pdf. 16. M. Marchiori. The quest for correct information on Web: Hyper search engines. In Proceedings of the 6th International World Wide Web Conference, 1997. 17. A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (NIPS), 2002. 18. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of ACM (JASM), 46, 1999. 19. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, 1998. 20. Cf. especially Lawrence Page, U.S. patents 6,799,176 (2004) "Method for scoring documents in a linked database", 7,058,628 (2006) "Method for node ranking in a linked database", and 7,269,587 (2007) "Scoring documents in a linked database"2011. 21. G. Ivan and V. Grolmusz (2011). "When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks". Bioinformatics (Vol. 27, No. 3. pp. 405-407) 22. "PageRank sculpting". Mattcutts.com. 2009-06-15. Retrieved 2011-05-27. 23. Gianna M. Del Corso, Antonio Gullí, Francesco Romani (2005). "Fast PageRank Computation via a Sparse Linear System". Internet Mathematics. Lecture Notes in Computer Science 2 (3): 118. doi:10.1007/978-3-540-30216-2_10. ISBN 978-3-54023427-2. 24. J.Kleinberg, Authoritative sources in a hyperlinked environment, Journal of ACM (JASM), 1999 25. "Introduction to Information Retrieval" (HTML). Cambridge University Press. 2008. Retrieved 2008-11-09. 26. von Ahn, Luis (2008-10-19). "Hubs and Authorities" (PDF). 15-396: Science of the Web Course Notes. Carnegie Mellon University. Retrieved 2008-11-09. 27. Krishna Bharat and Monika R. Henzinger. Improved algorithms for topic distillation in hyperlinked environments. In 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 104–111, 1998. 28. Sepandar D. Kamvar, Taher H. Haveliwala, and Gene H. Golub. Adaptive methods for the computation of pagerank. Technical report, Stanford University, 2003. 29. R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC e®ect. In Proceedings of the 9th International World Wide Web Confer- ence, May 2000. 30. G. O. Roberts and J. S. Rosenthal. Downweighting tightly knit communities in World Wide Web rankings. Submitted for publication, 2003. 31. J. A. Tomlin. A new paradigm for ranking pages on the World Wide Web. In Proceedings of the 12th International World Wide Wed Conference (WWW2003), Budapest, 2003. 32. G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, H.-J. Zhang, and C.-J. Lu, User access pattern enhanced small web search, in Proceedings of the Twelfth International Conference on the World Wide Web, 2003.

33. H. Chang, D. Cohn, and A. McCallum, Creating customized authority lists, in Proceedings of the Seventeenth International Conference of Machine Learning, 2000. 34. H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC–13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference, 2004. 35. Y. Lakhnech S. Bensalem and S. Owre. InVeSt: A tool for the verication of invariants. In Proc. Computer-Aided Verication, 10th Annual Conf. (CAV'98), Vancouver, Canada, June 1998. 36. Li, Y., and Shawe-Taylor, J. 2007. Advanced learning algorithms for cross-language patent retrieval and classification. Information Processing and Management 43(5): 1183–1199. 37. Chakrabarti, B. E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60{ 67, Aug. 1999. 38. K.Berberich, M.Vazirgiannis, G.Weikum, T-rank: Time-aware authority ranking, Proceedings of 3rd International Workshop on Algorithms and Models for the webGraph, 2004. 39. A.Borodin, G.O.Roberts, J.S.Rosenthal, P.Tsaparas, Link analysis ranking: algorithms, theory, and experiments, ACM Transactions on Internet Technology, 2005. 40. Multi-covering radius with rank metric, Proc. of the 2002 IEEE information theory workshop held at Bangalore, India on 20-25 Oct. 2002, 215, 2002 41. Xavier Décoret , Manhattan distance of a point and a line , December 26, 2006. 42. XindongWu · Vipin Kumar · J. Ross Quinlan · Joydeep Ghosh · Qiang Yang · Hiroshi Motoda · Geoffrey J. McLachlan · Angus Ng · Bing Liu · Philip S. Yu · Zhi-Hua Zhou · Michael Steinbach · David J. Hand · Dan Steinberg, top 10 algorithms in data mining, Knowl Inf Syst (2008) 14:1–37 43. Nidhi Grover MCA Scholar , Comparative Analysis Of Pagerank And HITS Algorithms, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 8, October – 2012. 44. Marc Najork , Comparing the Effectiveness of HITS and SALSA , CIKM’07, November 6–8, 2007, Lisboa, Portugal.2007 ACM 978-1-59593-803-9/07/0011. 45. R. Lempel and S. Moran. SALSA: The stochasticv approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131–160, 2001. 46. A Comparative Analysis of Web Page Ranking Algorithms, Dilip Kumar Sharma et al. (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 08, 2010, 2670-2676. 47. Zbigniew Michalewicz, Heuristic Methods for Evolutionary Computation Techniques. 48. Panayiotis Tsaparas , Link Analysis Ranking, Doctor of Philosophy Department of Computer Science University of Toronto 2004. 49. G. O. Roberts and J. S. Rosenthal. Downweighting tightly knit communities in World Wide Web rankings. Advances and Applications in Statistics (ADAS), 3:199–216, 2005. 50. A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proceedings of the 24th International Conference on Research and Development in Information Retrieval (SIGIR 2001), New York, 2004. 51. Sakai, N. Kando, C. J. Lin, T. Mitamura, D. H. Ji, K. H. Chen, E. Nyberg: Overview of the NTCIR-7 ACLIA IR4QA task, Proceedings of NTCIR-7, to appear, 2008.

52. F. Radlinski, T. Joachims. Active Exploration for Learning Rankings from ClickthroughData, KDD 2007. 53. P. Li, C. Burges, et al. McRank: Learning to Rank Using Classification and Gradient Boosting, NIPS 2007. 54. F. Xia. T.-Y. Liu, et al. ListwiseApproach to Learning to Rank –Theory and Algorithm, ICML 2008. 55. F. Xia. T.-Y. Liu, et al. ListwiseApproach to Learning to Rank –Theory and Algorithm, ICML 2008.