Evaluation of information retrieval: precision and

0 downloads 0 Views 240KB Size Report
Email: [email protected]. *Corresponding author. Uma Kanjilal. Department of Library and Information Science,. Indira Gandhi National Open University,.
224

Int. J. Indian Culture and Business Management, Vol. 12, No. 2, 2016

Evaluation of information retrieval: precision and recall Monika Arora* Department of IT, Apeejay School of Management, Dwarka Institutional Area, New Delhi, India Email: [email protected] *Corresponding author

Uma Kanjilal Department of Library and Information Science, Indira Gandhi National Open University, Maidan Garhi, New Delhi, India Email: [email protected]

Dinesh Varshney School of Physics, Devi Ahilya University, KhandwaRoad Campus, Indore, M. P., India and Multimedia Regional Centre, Madhya Pradesh Bhoj Open University, Khandwa Road Campus, Indore, India Email: [email protected] Abstract: The information retrieval system evaluation revolves around the notion of relevant and non-relevant documents. The performance indicator such as precision and recall are used to determine how far the system satisfies the user requirements. The effectiveness of information retrieval systems is essentially measured by comparing performance, functionality and systematic approach on a common set of queries and documents. The significance tests are used to evaluate functional, performance (precession and recall), collection and interface evaluation. We must focus on the user satisfaction, which is the key parameter of performance evaluation. It identifies the collection of relevant documents under the retrieved set of collection in specific time interval. The recall and precision technique are used to evaluate the efficacy of information retrieval systems. The response time and the relevancy of the results are the significant factors in user satisfaction. The comparison of search engine Yahoo and Google based on precision and recall technique. Keywords: performance evaluation; information system retrieval; precision; recall.

Copyright © 2016 Inderscience Enterprises Ltd.

Evaluation of information retrieval: precision and recall Reference to this paper should be made as follows: Arora, M., Kanjilal, U. and Varshney, D. (2016) ‘Evaluation of information retrieval: precision and recall’, Int. J. Indian Culture and Business Management, Vol. 12, No. 2, pp.224–236. Biographical notes: Monika Arora is an Associate Professor and Head of Department of IT and Operations. She had over 18 years of experience in industry, teaching and research. She is an author of over 60 papers in national, international conference, national and international journals. She played a role as reviewer in International Conference of Data Management and International Journal of Interscience. She is also co-chair and chair of various conferences in Delhi Region. Her research area is in the field of intelligent and efficient data retrieval in semantic web, search engines, social network analysis, database system, knowledge management systems. Uma Kanjilal is a Professor of Library and Information Science and currently the Director I/C of Advanced Centre of Informatics and Innovative Learning (ACIIL) at Indira Gandhi National Open University, New Delhi. She has a PhD in Library and Information Science from Jiwaji University, Gwalior; and PG Diploma in Distance Education from IGNOU. She has more than 24 years of experience in open and distance learning system. She has been actively involved in planning and developing library and information science programmes at IGNOU. Among her specialisation areas are ICT applications in libraries, digital libraries, e-learning and multimedia courseware development. She was a Fulbright scholar in the University of Illinois, Urbana Champaign, USA from July 1999 to February 2000. She has more than 30 research papers, articles in national and international journals and other publications to her credit. She has authored one book and co-edited two books. Dinesh Varshney received his Doctorate degree from the Barkatullah University for his research work on pairing mechanism of high temperature superconductors. He pursued higher studies at the Scuola Normale Superiore, Pisa (Italy) and University of Loughborough (UK). He has published over 370 research papers in international journals/conf. proceedings including two exhaustive critical reviews in contemporary areas and three books (edited). During his teaching and research endeavour of 24 years, he has supervised 20 PhD scholars. His spans over a wide spectrum of condensed matter physics and materials science, which includes pairing mechanism, non-adiabatic effects, electrical conduction, thermal transport, anisotropic magnetic, optical, dielectric, thermoelectric effects, metal-insulator transition, structural phase transition, phonon dynamics etc. His innovative ideas are implemented through a well-established Advanced Materials Research Laboratory, established at Indore and also through several collaborative programmes. This paper is a revised and expanded version of a paper entitled ‘Performance evaluation in information retrieval system’ presented at 3rd International Conference on Data Management, Institute of Management Technology, Ghaziabad in collaboration with University of Saskatchewan, Canada and Nanyang Technological University, Singapore, 11–12 March 2010.

225

226

1

M. Arora et al.

Introduction

Information retrieval (IR) is the one of the oldest disciplines in information science. Efficiency techniques are used throughout the retrieval process. The documents have to be organised and structured for later retrieval and usage (Mooers, 1950; Savino and Sebastiani, 1998). IR is defined as a process/method for prospective user of information which is able to convert his/her need for information into an actual list of citations to documents in storage containing information useful to him/her. IR has been traditionally concerned for the representation, storage, searching and locating of information that is relevant to a user (Ingwersen, 1992). Traditionally, the IR systems searched large flat-text collections. Today there is a heterogeneous setting. The data which are coming in various forms like hypertext, semi-structured data, highly noisy data like blogs, multimedia data like video, images, etc. The evaluation becomes more challenging in the present scenario (Radlinski and Craswell, 2013). In the user centric organisations, suppose there are two retrieval systems for serving a dish to the customer (Mandl, 2008). How you can identify the better of the two. The IR plays an important role in establishing the relation between the user and the content. As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters (Schuth et al., 2014). The users appear to prefer simple support tools and techniques, such as efficient IR, classification, browsing and presenting, while leaving more space to human initiative and creativity. The rapid developments in information technologies (IT) have always supported the system with the new possibilities accompanied by various things. It identifies the collection of relevant and retrieved set of documents in different time interval. The recall and precision techniques are used to evaluate the efficacy of IR systems. The main objective of efficient retrieval is to facilitate easier access and promote the communication between the system, user and the data. Originally, efficiency retrieval follows the laboratory framework, which is a systems-oriented framework and suggests documents, search requests, their representation (queries), the database and the matching with the keywords (Järvelin, 2007). The value of displaying the relevant records is a result of response to a query and has been assumed by IR research strategies or paradigms (Amin et al., 2011). The key measure of the utility is only the user happiness. The response time and the relevancy of the results are the significant factors in user satisfaction. The user satisfaction depends on the user interface, which includes design, clarity, precision and responsiveness and relevance. The present investigation discussed the properly cited and not cited documents. It also used the formal evaluation methodology, which is developed for evaluating retrieval results. Sincere efforts were made to determine and evaluate sources of the system failure and also how these failures may best be remedied.

2

Literature review

In the evaluation of IR system, the system uses a precise answer to a well-formulated query from a structured database (Zoghi et al., 2014a). And also the system assists the user in satisfying the information needs (INs) by interpreting his INs and providing the information items that are relevant to him. There is a great diversity in the user need

Evaluation of information retrieval: precision and recall

227

ranging from answers to precise questions, requiring specific information to a broad navigation of information items. The topics of IR are predefined and the evaluation is carried out by using the standard IR metrics of precision and recall. The documents are regarded as information items that consist of text, in any natural language. They have a known length and their content serves as the source of indexing features (Järvelin, 2007). The search requests (or topics) and the documents are unstructured. The natural language text internally takes care to represent INs (Radlinski and Craswell, 2013). They are associated with independent indexing features that derive from their content through natural language processing (NLP). These indexing features are usually words whose semantics help to describe the document’s main themes and summaries its content. Precision is the relationship between the number of retrieved relevant documents R, with respect to a query statement Q and the number of documents D that have been retrieved based on it, i.e., R/D (Belew, 2001). However, if we want to evaluate the performance of a system in terms of retrieving every potentially relevant document in a collection, it needs to examine recall. Recall is defined as the number of relevant documents retrieved R, related to total number of relevant documents in the collection C, i.e., R/C. For example the search identifies 25 documents; 20 are relevant but five were on irrelevant topics in that case precision: 20/25 = 0.8 (80% of hits were relevant) and recall: 20/50 = 0.4(40% of relevant were found). Precision is easily measured because a knowledgeable person identified and decides whether it is relevant or not. Only relevant documents have to be examined. An inverse relationship between precision and recall also exists. The laboratory approach to IR has been predominately system-oriented. This does not involve the participation of human users. Therefore, it allows better control over the experimental variables, as well consistency and comparability among the research findings. However, although these traditional goals have been very important, it has become evident that IR is inherently an interactive process. This finding suggests that taking into account the interaction between the user and the IR components and processes is crucial for an effective system design and evaluation (Radlinski and Craswell, 2013). Most laboratory IR studies are constrained by the system’s definition of needs and range of responses, which do not necessarily match those of users (Kuhlthau, 1991). This rigid view of information seeking reduces a real-life process to a laboratory simulation that does not account for the inherent complexity and dynamics of physical actions, experienced affective feelings and cognitive thoughts, concerning processes and content. In addition, INs are treated as static, suggesting that the information a user is looking for does not change during a search. They argue that INs should be regarded as transient, dynamic, mental constructs that develop over time, as a result of exposure to new information. Empirical evidence indicates that user’s cognitive state can change during IR interaction, suggesting that INs and relevance assessments are dynamic. Finally, even though the techniques employed for the representation and handling of information items remain useful, they can be selected and controlled by human users instead of the system (Marchionini, 2004). Relevance judgments should be also applied by users, according to their needs and capabilities (Hofmann et al., 2011). Overall, the burden should be placed on the user, thus shifting from a system-optimising paradigm to a user-centred one, where the retrieval tasks are embedded in real-life activities and the system acts as the mediator that facilitates the interaction with the information sources of models that address a wider, holistic view of information access (Kuhlthau, 1991).

228

M. Arora et al.

Thus, the evaluation identifies the user presence is important in the scenario. The evaluation is based on relevance of the documents and the retrieval of documents (Zoghi et al., 2014a). The criteria of evaluation are the relevance, defined as a system, which provides the most relevant result to the user (considering the user satisfaction level). The relevance is a subjective thing that assumes to be measured. It is measurable to only some extent. It depends on what is user expectation. It depends upon the personality of the individual. The returned result indicates how appropriate the results are in satisfying your IN. The second criterion of evaluation is, retrieval documents is a measure of the evaluation (Griffiths, 2003). The IR performance evaluation is concerned with quantifying the relevance of the IR system results in terms of effectiveness measures. Some of the key issues of evaluation of IR systems are: a

to quantify relevance for user INs

b

to measure the effectiveness of IR systems and compare them, it is necessary to build a test collection that suitably reflects the deployment environment of the system

c

relevance judgments on huge data sizes could be difficult and it is necessary to ensure that the relevance judgments are complete.

The relevance is evaluation measures may be a binary, which is represented by only two values relevant (1), non-relevant (0). The retrieval and relevancy both are important in current IR. How many ‘relevant documents’ are retrieved is important in efficient data retrieval. The evaluation Measures is focussed on primarily system and then data and user. The system should be such that it should retrieve the relevant documents only and then further filter on the basis of user need. The evaluation of relevant and non-relevant is proposed to evaluate by the technique called recall and precision. This evaluation is an ongoing process after the retrieval and goes on to improve until and unless the user is satisfied referred to Figure 1. The IN ask the user what it wants. Figure 1

Evaluation model for efficient data retrieval workflow

Evaluation of information retrieval: precision and recall

229

The user gives the information with respect to the user interface screen. Every query that you search gives the IR, but the main purpose of evaluation is to consider the improved query until and unless, it is assembled and satisfied by the user (Manning et al., 2008; Amin et al., 2011). The following objectives are laid down for the study: 1

identification of information in search engines Yahoo and Google for retrieval

2

assessment of recall and precision of the selected search engines

3

understanding the effect of nature and types of queries on precision and recall of the selected search engines.

3

Methodology used

The process was carried out in three stages. In the first stage, related material available in print and electronic format was collected for the study. In the second stage, search engines was selected and search terms drawn subsequently. In the third stage, the search engines Google and Yahoo were accessed for the select terms. Finally, the data was analysed for results for search engines Google and Yahoo. Single terms were submitted in natural form, compound terms as suggested by respective search engines and complex terms with suitable Boolean operators ‘AND’ and ‘OR’ between the terms to perform special searches. Five separate queries were constructed for each term in accordance with the syntax of the select search engine. The study has chosen the advanced mode of search throughout the study to make use of available features for refining and producing precise number of results. Each query was submitted to the select engines which retrieved a large number of results but only the first ten results were evaluated to limit the study in view of the fact that most of the users usually look up under the first ten hits of a query. Each query was run on both search engines on the same day in order to avoid variation that may be caused due to system updating. These first ten hits retrieved for each query were classified as scholarly documents and other categories. The IR sends the data to the evaluation. The evaluation checks for the recall and precision values of the data. Then the evaluation results send the data to the assembler for the convergence of the relevance records and divergence (Hofmann et al., 2011). The data assemblers send the divergent records to the filter and join to check the satisfaction of the user. The feedback mechanism plays an important role, which depends on the user interpretation of values and the data if required then again send for pre-processing. If the IN is satisfied then it passes to the filter and if not satisfied then again sent to the query for the evaluation process. Now the relevance feedback (Salton and Buckley, 1990) is quantified and only the relevant documents are extracted. The filter processes the relevant data and incorporates the information into the ranking system to maximise the utilisation and not adversely effect rankings go to improve query option (Zoghi et al., 2014a) (Schuth et al., 2013).

3.1 Methodology used Precision=

RelRetrieved Retrieved

230

M. Arora et al.

The precision and recall are the relative and dependent terms for the evaluation. Precision is based on the documents retrieved but the Recall is based on the relevant documents in collection (Sanderson and Zobel, 2005). The high precision is only retrieved records but high recall is to find all the retrieved records as relevant (Zoghi et al., 2014b). These large set retrieval documents are relevant documents. The recall is the percentage of relevant documents returned compared to everything that is available: Recall =

RelRetrieved Rel in Collection

The relations of precision and recall under the four different variations are discussed in Figure 2. When the retrieval is small (say 10%) and relevant may be large (90%), then this will be a case where very high precision and low recall. In just opposite case where the relevant is low and retrieval may be high in that case high recall but low precision. When both retrieval and recall are the same 90%, then in that case that will be the best case of retrieval. Finally, if the relevant documents are large but they are not relevant i.e., the both are at disjoint selection (0 for both). This will be the case where there is low precision and very low recall. Figure 2

Contingency table of relevant and retrieved (see online version for colours)

It is very clear that the retrieval and relevant are directly used to defined precision and recall respectively so the precision and recall are defined as: Precision P = RetRel/Retrieved the range is between 0 and 1 and given as P [0, 1] Recall

R = RetRel/Relevant the range is between 0 and 1 and given as R [0, 1].

Here, out of the total documents, they can be grouped into the categories of relevant or non-relevant and retrieved or non-retrieved. According to the user the document may be relevant to the user or not, is very difficult to find. It can be based of the fixed set of documents available. The second category is the retrieved documents i.e., after the query is applied the list of documents comes in front of you. One can keep the track of retrieved

Evaluation of information retrieval: precision and recall

231

documents i.e., the total number of documents result after applying the search but not on the retrieval collection of documents. It is the vast collection of documents that is difficult to keep track and is still growing. To make it very simple the contingency table is defined of considering all the cases of retrieval, non-retrieval, relevant and non-relevant defined in Figure 3. The figure defines the document of all different categories. Figure 3

Retrieved vs. relevant documents (see online version for colours)

Recall and precision are the evaluation measures of IR. Precision and recall are the basic measures used in evaluating search strategies. There is a set of records in the database which is relevant to the search topic records are assumed to be either relevant or irrelevant (these measures do not allow for degrees of relevancy). The actual retrieval set may not perfectly match the set of relevant records. Recall is the ratio of the number of relevant records retrieved to the total number of relevant records in the database. It is usually expressed as a percentage. Precision is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved. It is usually expressed as a percentage.

3.2 Data analysis and interpretation The records must be considered either relevant or irrelevant when calculating precision and recall. Obviously records can exist which are marginally relevant or somewhat irrelevant. Others may be very relevant and others completely irrelevant. This problem is complicated by individual perception: what is relevant to one person may not be relevant to another. Measuring recall is difficult because it is often difficult to know how many relevant records exist in a database. Often recall is estimated by identifying a pool of relevant records and then determining what proportion of the pool the search retrieved. There are several ways of creating a pool of relevant records: one method is to use all the relevant records found from different searches. The survey is carried out in three different domains of the participants for the evaluation of retrieval using two different search engines Google and Yahoo. This survey is used to find out the occurrence of the relevant records in first attempts. The participants have to use the keywords and see the relevancy of the document (see Annexure).

232

M. Arora et al.

The 60 different users of categories (student, a technical software engineer, a person as teaching profession) are asked to fill the records of the questionnaire and the evaluation is based on the following details are examined: the recall/precision plot for the three categories using Google and Yahoo as a search engine. For the interpretation of data collected, the evaluation graph is plotted in different search engine for three different categories viz. a student, a technical software engineer and one engaged in teaching profession. For the different search engine, user’s categories had participated in the survey. For evaluation the order of the relevant document is consider where it has 100% probability. The recall-precision (%) is generated for the evaluation of search results. Considering the same example of first ten documents. The precision and recall plots for the sample data discussed in Table 1. Based on the data consider the calculated values for precision and recall average values for all the three categories are too high in case of Google and too low. The plots of recall and precision are considered separately for the data using Table 1 and plotted in Figures 4 and 5. The plots consider defines the variation of the calculation of recall and precision after every document retrieval. The evaluation results of the categories using Google search engine and Yahoo search engine are interpreted for the categories of software engineers, students and teachers/faculty members. Table 1

Precision and recall dataset Google search

Yahoo search engine

Recall

Precision

Recall

Precision

1

0.11

1

0.25

1

2

0.22

1

0.5

1

3

0.33

1

0.5

0.66

4

0.44

1

0.5

0.5

5

0.55

1

0.5

0.4

6

0.66

1

0.5

0.33

7

0.77

1

0.75

0.429

8

0.88

1

1

0.5

9

0.88

0.88

1

0.44

10

1

0.9

1

0.4

Figure 4

Precision vs recall plot for Google search engine (see online version for colours)

Evaluation of information retrieval: precision and recall Figure 5

233

Precision vs recall graph for Yahoo search engine (see online version for colours)

The precision vs. recall graph provides performance of the search engine. When the plotted line is in the upper-right portion of the graph, the selected category is performing well. When the plotted line is in the lower-left portion of the graph, this indicates that the category’s performance is poor. The following curve shows that a Google search engine performs well; the curve is in the upper-right portion of the graph. The higher the blue line is, the better the performance. If a category is performing poorly (that is, the blue line is lower). This approach follows by Google based on the experimental paradigm that has previously been used for assessing the performance of rankers (Schuth et al., 2014). It represents that performance of Google is much better than yahoo. Based on the data, consider the calculated values for precision and recall average which means that all the categories are equally satisfied with the data retrieval mechanism provided to them. The results interprets that the Google is the popular search engine that satisfied the user need of all the basic categories considered. It also interprets that there is a huge amount of data available in the repository. It basically depends of the search query that you give. For example ‘Java’ keyword is searched differently by different users, such as a Software Engineer gives keyword as ‘Java database connection’, a student gives as ‘Java tutorials’ and a faculty gives as ‘lecture notes on Java’. It is just a matter of defining the keyword, rest Google will do it for you. This interprets that the Google satisfied the user with 90% of relevant and retrieved documents and is the most popular search engine in the world Yahoo satisfied the user on an average of 30% to 40% only. The Google defines the contingency of high precision and high recall but yahoo defines high recall or low recalls but always a very low precision.

4

Conclusions and future work

The results depict better performance in retrieving scholarly documents. Google is the best alternative for getting web-based scholarly documents. Google acquired the highest recall and precision due to the induction of its journal citations along with web resources; otherwise Google would rank the first. Yahoo offers a good combination of recall and precision but has a larger overlap with other search engines which enhance its relative recall over Google search engine. Further, the results reveal that structured queries (i.e., phrased and Boolean) contribute in achieving better precision and recall. The findings also establish the case that precision is inversely proportional to recall i.e., if precision increases recall decreases and vice versa.

234

M. Arora et al.

For example, an IR system takes care of the recall value. it will increase the recall by retrieving more and more documents and the cost of increase in the number of non-relevant documents retrieved will decrease in the precision. Similarly, a classification system basically categorises in the way what to say and what to hide for deciding suppose you search for a ‘orange’ (categorises as fruit and colour) where orange as ‘fruit’ have attain high precision value but orange as ‘colour’ have low recall value. The IR evaluation (recall/precision) is not easy for a large collection of web documents. The precision and recall both are the related terms. So this will give the relayed results at the time of evaluation. Interactive IR is important. The different retrieval and relevant is different from individual to individual. It is more dependent on the order of retrieval of relevant documents. Ordering really matters a lot in case of this evaluation. For the further development from recall and precision the F-measure also called as F1 measure. They are the effectiveness measures. They can be based on more than one category. They can be based on a query that belongs to more than one category. Therefore, treat each category as a separate binary classification task. The recall and precision are invaluable to any experienced searcher. Knowing the goal of the search – to find everything on a topic, just a few relevant documents, or something in-between – determines what strategies the searcher will use. There are a variety of search techniques may be discussed for future work, which may be used to affect the level recall and precision. A good searcher must be adopting the searching technique and using them.

References Amin, K., Kearns, M. and Syed, U. (2011) ‘Bandits, query learning, and the haystack dimension’, COLT 2011 – The 24th Annual Conference on Learning Theory, 9–11 June, Budapest, Hungary, pp.87–106. Belew, R.K (2001) Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW, pp.253–257, Cambridge University Press, New York, NY, USA. Griffiths, J.R. (2003) ‘Evaluation of the JISC information environment: student perceptions of services’, Information Research, Vol. 8, No. 4 [online] http://informationr.net/ir/8– 4/paper160.html (accessed 7 April 2015). Hofmann, K., Whiteson, S. and de Rijke, M. (2011) ‘Balancing exploration and exploitation in learning to rank online’, ECIR 2011: Proceedings of the Thirty-Third European Conference on Information Retrieval, pp.251–263, April. Ingwersen, P. (1992) Information Retrieval Interaction, 246pp, Taylor Graham, London. Järvelin, K. (2007) ‘An analysis of two approaches in information retrieval: from frameworks to study designs’, Journal of the American Society for Information Science and Technology, Vol. 58, No. 7, pp.971–986. Kuhlthau, C.C. (1991) ‘Inside the search process: information seeking from the user’s perspective’, Journal of the American Society for Information Science, Vol. 42, No. 5, pp.361–371. Mandl, T. (2008) ‘Recent developments in the evaluation of information retrieval systems: moving toward diversity and practical applications’, Informatica – An Intl. Journal of Computing and Informatics, Vol. 32, pp.27–38. Manning, C.D. (2008) Introduction to Information Retrieval, Online version available, Chapter 8, Evaluation, 482pp, Cambridge University Press, Cambridge. Marchionini, G. (2004) ‘From information retrieval to information interaction’, Advances in Information Retrieval: 26th European Conference on IR Research, ECIR 2004, Sunderland, UK, 5–7 April 2004, pp.1–11, Lecture Notes in Computer Science, Springer, Berlin.

Evaluation of information retrieval: precision and recall

235

Mooers, C.N. (1950) ‘Information retrieval viewed as temporal signaling’, Proceedings of the International Congress of Mathematicians, Vol. 1, pp.572–573. Radlinski, F. and Craswell, N. (2013) ‘Optimized interleaving for online retrieval evaluation’, WSDM ’13. Salton, G. and Buckley, C. (1997) Improving Retrieval Performance by Relevance Feedback, pp.355–364, Morgan Kaufmann, San Francisco. Sanderson, M. and Zobel, J. (2005) ‘Information retrieval system evaluation: effort, sensitivity, and reliability’, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 15–19 August, Salvador, Brazil. Savino, P. and Sebastiani, F. (1998) ‘Essential bibliography on multimedia information retrieval, categorization and filtering’, Slides of the 2nd European Digital Libraries Conference Tutorial on Multimedia Information Retrieval. Schuth, A., Hofmann, K., Whiteson, S. and de Rijke, M. (2013) ‘Lerot: an online learning to rank framework’, Living Labs ’13. Schuth, A., Sietsma, F., Whiteson, S. and de Rijke, M. (2014) ‘Optimizing base rankers using clicks: a case study using BM25’, in ECIR 2014: Proceedings of the Thirty-Sixth European Conference on Information Retrieval, pp.75–87, April. Zoghi, M., Whiteson, S., de Rijke, M. and Munos, R. (2014a) ‘Using confidence bounds for efficient on-line ranker evaluation’, WSDM 2014: Proceedings of the Seventh ACM International Conference on Web Search and Data Mining, pp.73–82, February. Zoghi, M., Whiteson, S., de Rijke, M. and Munos, R. (2014b) ‘Relative confidence sampling for efficient on-line ranker evaluation’, WSDM ’14.

236

M. Arora et al.

Annexure Document relevance survey This survey is aimed in outlining the performance evaluation of search engine in terms of retrieval and relevance behaviour of the http://www./Internet users. The participants are effectively exploring the search engines to download the referenced documents as per the requirements. The requirement is how much they are satisfied with the available information and the relevance of the search engine and the browser. The below questionnaire is prepared to explore as many reformulations are required and to identify search for the relevant documents. No

Parameter

1

Users age group:

2

Users education

3

Users occupation/status

4

Which search engine is rapidly used?

5

Which browser is preferably used?

6

The keywords used for search

7

The total number of retrieved records or documents

8

The total number of records received at the first page

9

The relevant (technical) number of records at the first page

10

The order in relevancy of record (like 2, 4, 5, 6, 7)

11

Is it required to explore the consecutive pages (Y/N)

12

If yes do you diverge from the domain what you are looking for?

13

If no do you retrieve the sufficient information?

Description

Student/professional/retire/faculty

The 1–5 questions are based on the general information about the usage of the individual. The number 6–13 are based on the performance evaluation of the search engine. In primary research, the survey is conducted to performance evaluation of the search engine in terms of retrieval and relevance behaviour of the http://www./Internet users. There are approximately 100 participants are effectively exploring the search engines to download the referenced documents as per the requirements. The requirement is how much they are satisfied with the available information and the relevance of the search engine and the browser. This survey is aimed in evaluate the relevancy documents found in World Wide Web by using internet browsers.

Suggest Documents