Use of collaborative recommendations for web search - CiteSeerX

2 downloads 451 Views 430KB Size Report
Sep 26, 2007 - document in Google's search result list and the search query that was used to .... Other people's search results have positively influenced my ...
Journal of Information Science OnlineFirst, published on September 26, 2007 as doi:10.1177/0165551507080413

Use of collaborative recommendations for web search: an exploratory user study

Xiangmin Zhang and Yuelin Li Rutgers University, USA

Abstract. This study investigated use of collaborative recommendations in web searching. An experimental system was designed. In the experimental system, recommendations were generated in a group report format, including items judged relevant by previous users, search queries and the URLs of documents. The study explored how users used these items, the effects of their use, and what factors contributed to this use. The results demonstrate that users preferred using queries and document sources (URLs), rather than relevance judgment (document ratings). The findings also show that using recommended items had a significant effect on the number of documents viewed, but not on precision or number of queries. Task difficulty and search skills had significant impact on the use. Possible reasons for the results are analyzed. Implications and future directions are discussed.

Keywords: collaborative recommendations; web search; information retrieval; user evaluation

1.

Introduction

Collaborative (or social) filtering technology generates item recommendations for the current user based on other users whose rating behavior is similar to the current user [1–4]. This technology has mainly been used in e-commerce in domains such as movies, music, video, and books. The technology is attractive to information retrieval (IR) systems because in reality, people tend to seek relevant information/papers from colleagues or friends [5, 6]. The recommendations are intuitively helpful in overcoming the information overload problem. Google’s page link ranking algorithm [7] implies the idea of collaborative filtering, and it makes a big difference in search performance compared to other conventional search engines.

Correspondence to: Xiangmin Zhang, SCILS, 4 Huntington Street, New Brunswick, NJ 08901, USA. Email: [email protected].

Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

Copyright 2007 by Chartered Institute of Library and Information Professionals.

1

Xiangmin Zhang and Yuelin Li

However, research on collaborative recommendations has mainly been focused on the algorithms that generate recommendations. Few studies have been conducted from the user’s perspective, in terms of user acceptance and the impact of collaborative recommendations on user performance [8]. How users use the recommendations remains an open question. The results from some initial investigations do not seem to be very optimistic. For example, Torres et al. [9] found that only 50% of the users were satisfied with the recommendations for computer science literature. Another study [10] found that only 25–30% of users considered the recommended documents relevant in the computer science field. It is therefore desired to evaluate collaborative recommendations to understand why users use them and why they accept or reject the items recommended. Results from such research can inform collaborative IR systems design and help increase the usage of collaborative recommendation features. The aim of the current study was to understand the issues with the use of collaborative recommendations in IR systems: the information items the user would like to have recommended; the factors that would affect the use of recommendations, and the relationships between the use of recommendations and the user’s performance. Specifically, this study addressed the following research questions: 1. In general, would users be willing to use collaborative recommendations (useful search results by previous users)? And if so, how and what information items would users use? In this study, three information items were considered as potential recommendations to users: the relevance judgments on the retrieved documents by previous users; the source of document(s), i.e. URLs, and the queries that were used. The relationships of these items and the kind of knowledge related to search have been analyzed in Zhang and Li [11]. Part or all of these three items are recommended by some systems. For example, I-SPY recommends relevance documents and search queries. ASK.com and Google recommend queries. So far no research has been done to examine which of these items the users would prefer as recommendations. 2. Could the use of collaborative recommendations improve users’ search performance and satisfaction? Theoretically, the recommended items are supposed to be relevant ones that the user might have missed. Presenting recommendations is supposed to be able to improve the user’s search performance. Accordingly, user satisfaction with the search results would also be increased. This study intended to test these hypotheses. 3. What factors affect the use of collaborative recommendations? Research in IR has identified many factors that may affect the user’s performance, such as the user’s search skill, knowledge about the task, and so on. As part of the information seeking process, use of collaborative recommendations may also be affected by some user characteristics. The remainder of the paper is organized as follows: related research is reviewed in the next section. The research method for the study is described next. Results of the study are then reported, followed by a section of discussion. The paper concludes with a summary and future research directions in the final section.

2.

Related research

The idea of supporting collaborations in IR systems during the search process has been proposed for over a decade. Twidale and Nichols [12] introduced the ARIADNE system as an example of computerized support for collaborative browsing in a library catalogue system. The system supports only collaborative browsing, not search activities, and it supports collaborative activities only for small groups of people. When constructing search agents for users, Newell [13] found that instead of simply conducting a search using traditional methods, a user may obtain better search results by using an existing agent created by another user with a similar background to do a similar search. Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

2

Xiangmin Zhang and Yuelin Li

The search results generated by the existing agent would be of interest to this user. Romano et al. [14] described a prototype system, CIRE, which combines the features of IR and group support systems and attempts to solve the problem of individuals searching independently in IR. One limitation of the system is that it was designed for small working teams rather than the general public. AntWorld [15] is a collaborative web IR system that captures human intelligence: people’s relevance judgments on the retrieved web documents. The system enables people to share relevant searches by using the collaborative filtering recommendation method. However, no explicit sharing of search queries is supported. Smyth et al. [16] implemented the collaborative filtering method in the search engine I-SPY as a personalized web search tool. The system re-ranks search results for the current user based on previous searches. A user evaluation found that the users who were presented with the re-ranked results were able to answer more questions and answered them more correctly than the users whose search results were not re-ranked. They also had shorter queries and selected the search results closer to the top of the list, taking advantage of the re-ranked results based on previous successful searches. Hust et al. [17] proposed a new query expansion method based on the collaborative IR concept: to use the terms in globally available similar search queries to expand the current query. Their experiment shows that the method has some advantages compared to conventional query expansion methods. Lee [18] developed a collaborative web searching environment, named VisSearch, for sharing web search results among people who have similar interests. VisSearch recommends search queries and URLs of useful web sites. In an experiment, 15 participants were assigned to a control group and 10 participants to an experimental group. The control group used a conventional web searching environment, and the experimental group used the VisSearch environment. The results indicated that the recommended search queries and web sites were useful for improving the search: overall significantly longer search queries were issued by the experimental group in comparison to the control group and the longer queries seemed to help the participants locate more useful web sources. However, since the participants in the experimental group were required to use only the VisSearch environment, the study did not examine if users preferred to accept the recommendations, or why. The major issue with the aforementioned systems is the lack of user evaluations of the collaborative recommendations. Efforts have been made in recent years. Jung et al. [19] analyzed the use of SERF, a search recommendation system for library web sites, over three months based on the system’s log data. The results of the analysis show that recommendations with human evaluation could increase the efficiency and effectiveness of the search process. Those users who received recommendations needed to examine fewer results, and recommended documents were rated much higher than documents returned by a traditional search engine. The results, however, were completely based on the log data. Insight from the user’s perspective is missing. To test different algorithms to generate recommendations, Torres et al. [9] conducted an online experiment with users to see how they would react to the recommendations of computer science research papers. Their study found that only 46% of the participants were satisfied with individual recommendations and 62% were satisfied with the overall set of recommendations. Satisfaction with the recommendations generated by a single pure collaborative filtering algorithm was even lower than 40%. McNee et al. [10] tested six algorithms for recommending research papers in computer science. Part of their study was a user evaluation of the recommendations. The evaluation considered two factors: quality of individual recommendations and novelty of individual recommendations. Among the six algorithms, two were real collaborative filtering algorithms: user–item and item–item algorithms. Although as many as about 60% of participants felt these two algorithms tended to produce more novel recommendations, the quality of the recommended documents in terms of relevance turned out to be poor. Only 25–30% of users considered the recommended documents were relevant. As many as 60% of the users considered the recommended documents irrelevant. The results strongly suggest further studies are needed to find out the reasons for this poor performance. Previous studies show that the collaborative recommendation technology is promising and has potential, but document recommendations are not well accepted. Improvement of algorithms might Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

3

Xiangmin Zhang and Yuelin Li

Document ID

Document source (URL)

Usefulness judgment

Time stamp

Annotation Document ranking position

User ID

Search query

Fig. 1. The group report: information about one particular webpage/document.

do a better job, but the fundamental question for the use of collaborative recommendations is: what would be the user’s attitude towards document recommendations? User studies can provide input so that collaborative recommendations can be made based on user need.

3. 3.1.

Methodology The experimental system

An experimental system, CIRR (Collaborative Information Retrieval Research, see http://xzhang.rutgers.edu:8080/isatc/), was designed as a research tool for collaborative IR experiments. The system has the capability of capturing relevance judgments and recording search sessions, including search queries as well as the retrieved document URLs, and annotations. These captured information items can then be accessed by the current user as collaborative recommendations. Google is configured as the underlying search engine. 3.1.1. Collaborative recommendations in the experimental system Instead of using typical collaborative filtering algorithms to generate recommendations, the experiment system uses a ‘group report’ function to present recommendations. The group report includes records of the activities of all users from the chosen group for a search topic. The report provides the URL, the judgment(s) made by previous users, the user ID who made the judgment, the time stamp, the ranking position of the document in Google’s search result list and the search query that was used to find the document. Documents/webpages for a search topic are arranged by the highest relevance judgment ever given to them by group members, the participants of an experiment. The user can view this report for a particular document to see other people’s judgment, or for other people’s search queries to see how the document is retrieved. Figure 1 depicts one page of the group report. We chose this group report format instead of the automatic collaborative recommendations because we wanted to give the user more information, not just a document, but also the associated queries, the relevance judgments, annotations, etc. to see which information item(s) the user would Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

4

Xiangmin Zhang and Yuelin Li

Fig. 2. The CIRR task console with the document window in the background.

Fig. 3. The task console window.

like to use. This group report format gives the current user more flexibility on whether or not to use the recommendations and specifically what information item(s) they would like to use. 3.1.2. Functions of the experimental system When using the experimental system, the system opens two browser windows: one is the document window for conducting searches using Google; the other is the system’s task console. The user switches between these two windows to work with the system. A screen shot of the two windows is pictured in Figure 2. The document window is a regular browser window. The task console is a small window that serves as a user interface to the system’s database server: saving and retrieving data to and from the database. As a system that intends to facilitate collaborative searches, the task console provides the major functions for fulfilling this goal. It can record the user’s relevance judgments and annotations, and display the group report that makes previous searchers’ search results on the same topic available to the current user. An enlarged screen shot of the task console is depicted in Figure 3. Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

5

Xiangmin Zhang and Yuelin Li

U

Document window

Searching & viewing results

Google search

S E R Task Console Window

Recording relevance judgment, annotation Requesting group report

Saving & retrieving Database

Fig. 4. The experimental system: components and functions.

A system components and functions diagram is described in Figure 4. A more detailed description of the system can be found in Zhang et al. [20]. 3.2.

Research design

3.2.1. Participants Among the 22 graduate students participating in the study, nine were master of library and information science (LIS) students and the remaining 13 were graduate and undergraduate students from sciences and humanities. The LIS participants had completed a searching course offered by the MLIS program at the university. This one-semester course trains students in knowledge of and skills in searching various databases and the internet. These LIS participants thus were considered ‘trained searchers with search expertise’. Accordingly, the others were treated as ‘non-trained searchers without search expertise’. 3.2.2. User tasks and procedures Six search topics, which were pre-loaded into the system, on health and genetically modified food were used in the study as search tasks. These search topics were designed at two levels of difficulty. The first three topics were easy ones while the last three were difficult ones. The difficult topics involved two or more relatively independent areas, while the easy ones dealt with mainly just one subject area. The difficulty level was also determined by the ease of finding relevant documents. Before the search topics were finalized, test searches using Google were conducted by this project’s researchers to determine the difficulty of each topic, in terms of the number of search queries to be submitted and the number of relevant documents to be retrieved. The results of the test searches proved that the easy ones generated a sufficient number of relevant documents in just one or two query sessions, and the difficult ones needed more query sessions to find enough relevant documents. The topics are listed in the Appendix. An online pre-search questionnaire was used to collect the participants’ background data. A postsearch questionnaire was used for each search topic to collect the participants’ self-assessed task difficulty, the helpfulness of previous knowledge about the search topic, the difficulty to get started on the search topic, and their satisfaction with the search results. The participants’ opinions regarding the usefulness of the group report feature were collected through an exit questionnaire, after they had completed all search tasks. Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

6

Xiangmin Zhang and Yuelin Li

The study was conducted in a usability laboratory. Participants were scheduled to come to the lab session one at a time. Before the formal session started, each participant was given a quick demonstration on how to use the system. The group report feature was introduced and the participants were encouraged to use it. The participants conducted searches on the topics in the order of their own preference. For each search topic, they were required to identify at least eight relevant documents or webpages. The relevance judgment was completely up to the participants, based on their understanding of the search topics. Participants were given a two-hour limit to complete the whole experimental session. They were asked to think aloud and their search sessions were logged and video-taped. In order to have some results in the group report for the first participant to view, the results of the test searches were recorded and served as the initial entries in the group report. Since the entries in the group report would mainly come from the participants, the more participants had done their tasks, the more entries would appear in the group report. The later participants would have more results to see than the initial users. 3.3.

Variables and measures

3.3.1. Use of the collaborative recommendation feature (group report) For a specific search topic, it was indicated, either ‘yes’ or ‘no’, by the participant’s answer to a question in the exit questionnaire. Further, after the participants rated each statement in Section 3.3.3. below (on a five-point scale), they were asked for the reasons why they gave the specific rating. In several open-ended questions in the questionnaire comments were also solicited on how they used the group report. 3.3.2. Users’ satisfaction with the search results Users’ satisfaction with the search results was measured by the participants’ ratings on a five-point scale statement in the post-search questionnaire, from ‘Not at all’ (1) to ‘Extremely’ (5). 3.3.3. Usefulness of collaborative recommendations This was evaluated by the participants’ ratings on the following five statements from ‘Disagree’ (1) to ‘Strongly agree’ (5) in the exit questionnaire: a. The group report is easy to understand. b. I like working with the group report. c. I usually agree with other group members’ judgment. d. Other people’s search results have positively influenced my usefulness judgment. e. Overall, the group report feature helps me get more documents/pages for the given tasks. 3.3.4. Search performance Search performance was measured by the number of documents viewed, the number of queries used for a search topic, and Mean Average Precision (MAP) of the search results [21]. The number of documents viewed during the search is a meaningful measure for search performance. Because the experimental time was controlled, and the number of relevant documents was required from the user, if a user viewed relatively fewer documents but also completed the task, the user was more efficient in search. The user may have used more effective queries and would be more capable of pinpointing relevant documents. MAP is the mean of the average precision scores across possibly multiple search queries for a single topic. ‘Average Precision’ is the mean of the precision obtained after each relevant document is retrieved by a search query: |Ra|

Average Precision =

∑ Pi |Ra|, i=1

where Pi is the precision of the relevant document at the ith position in the ranked list; Ra is the number of relevant documents retrieved by the query. This measure takes into account the number of relevant documents retrieved and the ranking of these relevant documents in the result list. For Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

7

Xiangmin Zhang and Yuelin Li

22

Number of participants

17

11

12

10

10

9 7

7

ST2

ST3

7

2 ST1

ST4

ST5

ST6

−3 Search topics Number of participants who used the report

Fig. 5. Use of group report by search topics.

a single query session MAP is the mean of the precision obtained after each relevant document is retrieved. For multiple topics/queries, MAP is the mean of the average precision scores of each of the topics/queries in the experiment. For an effective search, it is important that the participant should not only find relevant documents but also be able to use a query that can have the relevant documents ranked high in the result list. Search time could also be a measure of performance. We did not use this measure because on one hand, the number of queries and the number of documents viewed reflected the time consumed by a participant: the more queries for a topic and the more documents viewed, the more time the user would spend on the search. On the other hand, because the participants were given a time limit (two hours), the time measure itself may not truly reflect the performance. In fact the amount of time for the tasks that used the group report and those that did not was compared. There was no significant difference between the two types of tasks. Data analyses were performed at two levels: at the search topic level for each of the six topics and at the individual task level for each specific search conducted. At the search topic level, data were analyzed based on the six search topics. Each topic was considered a case to analyze. At the task level, on the other hand, data were examined regarding each specific search conducted by each individual participant. For each search topic, 22 searches were conducted, and each search was taken into account for data analysis.

4.

Results

Results of the study are based on both qualitative and quantitative data. While the quantitative data allowed certain statistical tests, the qualitative data, including verbal protocols and/or written comments in the exit questionnaire, provided insight into the participants’ behavior while using collaborative recommendations. Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

8

Xiangmin Zhang and Yuelin Li

5 4.5

4.64

4.27 3.95

Mean ratings

4 3.27

3.5

3.5

3.45

ST5

ST6

3 2.5 2 1.5 1 0.5 0 ST1

ST2

ST3

ST4

Search topics Mean ratings

Fig. 6. Mean ratings of task difficulty on search topics.

Number of participants who used the report

22

17

12 9 7

5 3

2 2

−3

2

0 0

1

2

3

4

1 5

6

Number of search topics for which the participants used the report Number of participants

Fig. 7. Distribution of the use of group report vs the number of search topics.

4.1. General use of the collaborative recommendation feature (group report) at search topic (ST) level The use of the group report for each of the six search topics is summarized in Figure 5. In general, for any single search topic, the number is less than or equal to half of the total number of participants. Task difficulty emerged as a factor affecting the use of the collaborative recommendations. As Figure 5 demonstrates, more participants used the function for the last three search topics ST4, ST5, and ST6, than for the topics ST1, ST2, and ST3, although the differences were not statistically significant. To confirm if the participants themselves would also consider these three difficult ones, the participants were asked to assess the difficulty level of the six topics after they had finished searching. The assessment was elicited by a five-point rating statement: ‘Was it easy to do the search Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

9

Xiangmin Zhang and Yuelin Li

on this topic?’ in the post-search questionnaire. The participants’ mean ratings for each search topic (see Figure 6) confirmed that the first three search topics were easier than the last three. Among 22 participants, 13 (59%) used the group report during their searching and 9 (41%) did not use it. For those who used the group report, the number of search topics for which the group report was used ranges from one to six. The distribution of the use of the function in terms of the number of search topics is presented in Figure 7. Figure 7 shows that nine participants did not use the group report at all. Five participants used the function for only half or less of six search topics. Only eight participants used the function for more than three search topics. The participants who used the group report were asked to rate the feature’s usefulness against the five statements in Table 1. Overall, only two ratings were over 3 (agree), which means in general the participants agreed that the group report was easy to understand and they liked to work with this feature. The lowest rating was for the third statement, that is ‘I usually agree with other group members’ judgment’ (M = 2.37), which indicates that the participants tended not to agree with other searchers’ usefulness judgments. In terms of the participants’ background, eight trained participants and five non-trained participants used the feature. However, there were no significant differences between the two types of participants’ ratings. It could be the case that the two types of users tended to have the similar experience with the group report. It could also be due to the small sample size in the study. The use of the collaborative recommendations is further analyzed at the individual task level in Section 4.2. Table 1 Usefulness ratings of the group report Statements

Mean ratings

The group report is easy to understand I like working with the group report I usually agree with other group members’ judgment Other people’s search results have positively influenced my usefulness judgment Overall, the group report feature helps me get more documents/pages for the given tasks

3.41 3.07 2.37

4.2.

2.50 2.80

Use of the collaborative recommendation feature at the task level

The use of the group report was investigated in more detail at the task level. Given 22 participants and six search topics, each participant conducted six search tasks. A total of 132 tasks were completed. Among the 132 completed tasks, 41% (54) of the tasks were finished with the help of the group report while 59% (78) were not. These two types of tasks, using and without using the group report, were compared to test the effects on search performance and user satisfaction with the search results. 4.2.1. Use of the collaborative recommendation feature and users’ search performance Did the use of collaborative recommendations improve the user’s search performance? Dividing all the 132 search tasks (22 participants × 6) into two categories: Y (used the group report) and N (did not use), Table 2 lists the mean MAP (Tp), the mean number of queries issued (Tq), and the mean number of documents viewed (Td) for each category and for each search topic. t-Tests did not find significant differences between the two categories (Y/N) in task MAP scores and the mean number of queries issued. Overall the tasks that used the feature had more queries than the tasks that did not use the feature, although the difference was not statistically significant. There was, however, a significant difference between the two categories in terms of the number of documents viewed: the participants viewed significantly fewer documents for the tasks using the group report than those tasks without using the feature (t(130) = −3.55, p < 0.01). Apparently the collaborative recommendations were helpful in reducing the user’s effort in viewing many documents. Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

10

Xiangmin Zhang and Yuelin Li

Table 2 Use of the group report and search performance ST1a

Tpb Tqc Tdd

ST2

ST3

ST4

ST5

ST6

Y

N

Y

N

Y

N

Y

N

Y

N

Y

N

0.33 2.67 7.33

0.46 2.00 8.85

0.39 2.57 6.71

0.48 2.27 8.47

0.60 1.57 6.43

0.54 1.93 12.60

0.43 2.90 6.00

0.43 3.00 9.67

0.44 2.36 8.00

0.45 2.00 8.09

0.38 2.40 8.30

0.28 2.00 8.58

a

Search topics 1–6. Mean MAP for the tasks that used (Y) and did not use (N) the group report. c Mean number of queries issued for the tasks that used (Y) and did not use (N) the group report. d Mean number of documents viewed for the tasks that used (Y) and did not use (N) the group report. b

Table 3 Use of the group report and users’ satisfaction with individual tasks Usage

ST1a

ST2

ST3

ST4

ST5

ST6

Y N

4.33b 4.31

4.29 4.07

5.00 4.47

4.00 3.83

3.73 3.73

3.60 3.50

a

Search topics 1–6. Mean ratings of the users’ satisfaction with the tasks that used (Y) and did not use (N) the group report under each search topic. b

4.2.2. Use of the collaborative recommendation feature and the users’ satisfaction with the search results Participants’ mean satisfaction ratings on search results for each search topic in the two categories (Y/N) were calculated (see Table 3). Overall the participants’ satisfaction with the search results of the tasks that used the collaborative feature was slightly higher than that of the tasks not using the feature. However, the t-test found no significant difference between the two types of tasks. Between the three performance measures and satisfaction, a significant correlation (Pearson r) was detected between MAP and satisfaction (r (130) = 0.20, p < 0.05). The participants felt more satisfied with their search results if they retrieved more highly ranked relevant search results. However, there was no significant correlation between the number of documents viewed and satisfaction, or between the number of queries issued and satisfaction. 4.3.

How did the participants use the collaborative recommendations?

When the participants used the group report, what specifically did they use? What factors affected the use of the group report? Why did some of them not use it? Answers to these questions can provide insight on how to design collaborative recommendation systems. 4.3.1. What information item to use from the group report? While contributing to the shared document ratings was required by the experiment (each participant had to rate at least eight relevant documents for each search topic), use of the group report was voluntary, although biased (favored) by the research design. Most participants felt that the advantage of the experimental system was to share other people’s queries, which helped them select appropriate search terms. For example, some participants wrote: [I]t gives ideas for more search terms. [T]his system allows us to see as to what search query, other guys used for gathering the results. This is particularly useful for difficult to find topics. Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

11

Xiangmin Zhang and Yuelin Li

Moreover, some participants used the group report to check whether they had missed important terms in their query. A participant said: I got a search term from reviewing the group report. It is good to be able to see queries grouped on certain topic, so I can find the answer.

and I borrowed some terms from other people, which improved my search.

The group report seemed to help the participants formulate more effective queries. The participants also pointed out another advantage of visiting the group report, that is, the group report helped them locate useful document sources (URLs). Participants stated: [I]t helps me improve my query and select types of sources that would contain good information.

and [I]t was useful to consult the group report and see how others had particular sites, what queries they had used, etc.

These participants believed that other people usually had useful findings. In addition, for some participants, sharing sources helped them narrow down their search by referring to similar sites and ‘skipping “useless” information’. This seemed to help them view fewer documents in order to finish the search task. In contrast to their interest in sharing search queries and sources, the participants were not very interested in sharing usefulness or relevance judgments. People usually preferred their own judgments. In commenting on the statement ‘I usually agree with other group members’ judgment’ in the exit questionnaire, some stated: [I] did not pay much attention to the judgment.

or I really didn’t pay much attention to the ‘usefulness’ judgments of the other participants – just the sites themselves.

and I guess I trust my own judgment a lot.

Others did not pay attention to the judgments but merely looked at queries. They commented that the usefulness judgment on a document was different from person to person. Sometimes they made a totally opposite judgment to the same documents/webpages that other searchers had judged. 4.3.2. Factors that affected the use of the collaborative recommendations Search expertise refers to a participant’s search knowledge level. In this study it was indicated by a participant’s academic background: LIS student or non-LIS student. LIS students generally have been trained professionally in searching. They would be expert searchers or users in the study. Other nonLIS participants would be non-expert searchers or users. Although no significant impact from search expertise was found at the search topic level, the data analyses at the task level found that search expertise was an important factor associated with the use of the collaborative recommendations. Dividing all individual 132 tasks into two categories by the participant background, i.e. trained searchers and non-trained searchers, a chi-square test found a significant association between the use of the group report and the participants’ search expertise (χ2 (1, N = 132) = 13.89, p < 0.01). Table 4 displays the cross-tabulation of two types of users with different search expertise and the use of the feature. A majority (33 out of 54, 61%) of the trained searchers used the group report while the majority (57 out of 78, 73%) of the non-trained searchers did not use the feature. Task difficulty. The participants used the group report when they encountered difficult tasks. Data at the task level further confirmed the impact of task difficulty on the use of the group report. According to participants’ ratings on task difficulty, all 132 tasks were divided into two categories: highly difficult Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

12

Xiangmin Zhang and Yuelin Li

Table 4 Search expertise and use of the group report

Use of the feature

Y N

Total

Trained searcher

Non-trained searcher

Total

33 (61%) 21 (39%) 54 (100%)

21 (27%) 57 (73%) 78 (100%)

54 (41%) 78 (59%) 132 (100%)

Table 5 Use of the group report (Y/N) and task difficulty Task difficulty

Use of the feature

Y N

Total

H

L

Total

22 (55%) 18 (45%) 40 (100%)

32 (35%) 60 (65%) 92 (100%)

54 (41%) 78 (59%) 132 (100%)

tasks (H) and less difficult tasks (L) based on the mean rating (mean = 3.83; H < 3.83, L > 3.83). A chi-square test found that the percentage of the highly difficult tasks that used the group report (22 out of 40, 55%) is significantly higher than that of the less difficult tasks (32 out of 92, 35%) (χ2 (1, N = 132) = 4.71, p < 0.05). Table 5 displays the cross-tabulation of task difficulty and the use of the feature. 4.3.3. Reasons for not using the collaborative feature Nine out of 22 participants (41%) did not use the feature. Confidence in one’s own search skills was observed as the major reason not to use the collaborative recommendations. Those who did not use the group report trusted themselves in doing the search, and considered it unnecessary to visit other people’s reports. One participant commented on the group report: For some people who don’t know the keywords, [they can] go to others’ report, and find useful keywords.

She herself did not visit the group report for anything. She explained: Since I search so much.

On the other hand, another participant explicitly said that she had no patience to read the report, and she could do better than other participants. Another participant even stated: [I]t [the group report] was mostly a waste of time. I could check out the sites on my own as quickly.

Lack of novelty was observed as another major reason for not using the group report. Some participants opened the group report but they did not use the information there. The reason was because they did not find anything new from the group report: […] nothing different from what I entered [in the search box].

Another participant stopped opening it in the searches after she visited the group report and found she could not find useful information from it. Another reason was that the group report button on the interface was not very salient. Some participants might have not noticed it: If the link to the group report is more aggressively visualized, I might use it.

5.

Discussion and implications

In general, the major reason for the participants to use the group report seemed to be the difficulty they had during the search. Users would use the collaborative feature when they felt the Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

13

Xiangmin Zhang and Yuelin Li

search task was difficult, which makes sense because normally only when a user encounters a problem will they seek other people’s help. Task difficulty has been found to affect users’ information searching behavior [22]. For the participants who used the group report, most did not seek help from the group report at the beginning of the search but when they encountered difficulties. The purpose of using this feature was to overcome the difficulty and to move the search forward. Participants did not think other people’s relevance judgments very helpful. In general they did not agree with other people’s relevance judgments. When they used the group report, they would prefer more search terms and URLs, rather than a relevance judgment on a document. The reason might be that relevance judgment is subjective. This finding may be able to explain the findings from previous studies that low percentages of users would think the recommended documents helpful [9, 10]. The trained users tended to use the potential recommendations in their searches, but the majority of non-trained users did not. This finding is not encouraging, but nor should it be surprising because the non-trained users are those who tend to use the minimal functions provided by search systems. They tend to live with just the basic search function [23]. Although no significant effect was found on retrieving relevant documents by using the collaborative recommendation feature, it was observed that the number of documents the user needed to view when using the feature was much lower than when the feature was not used. This finding is consistent with the finding from [19]. In addition to search expertise and task difficulty that emerged as the major factors influencing the use of the collaborative recommendations, other factors that had an impact included the user’s confidence and the novelty of the recommended items. These factors have been identified in other collaborative filtering system studies and in studies of users’ collaborative information seeking behavior [24]. The usability of the experimental system might have an impact on some participants’ use of the collaborative feature. Some participants did not use the feature because the ‘Group report’ button in the task console is not very salient, and the function could have been unnoticed. One participant admitted that she completely forgot to use this feature though at the beginning the investigator introduced it. The trained searchers were more critical of the usability issues of the group report, such as the clarity of the report and the visibility of the group function, than the non-trained searchers. For example, some of them made statements like: [T]he machine format of the queries made it slightly more difficult to understand than otherwise.

On the other hand, the non-trained searchers tended to give more positive assessments. For example, some non-trained participants noted that the group report’s format and color coordination make for easy reference

and: it [the group report] is well organized.

The usability problems certainly needed attention, but they did not seem to be the deciding factor underlying the users’ decisions about using the group report. As the data in Table 1 show, it was the attitude towards other people’s relevance judgment that had the most negative rating(s). The results are interesting but should be considered as preliminary because of the small number of participants. Nevertheless, the findings are worth consideration when designing collaborative information retrieval systems. It seems that query recommendation should be the first priority to be presented to the user. Recommending relevant documents should be done cautiously. The recommended items need to be indeed the ones the user would otherwise not be able to find, and the relevance judgment should be from qualified sources. Instead of presenting recommendations to every search, the recommendations should be targeted at the user’s difficult tasks, though how to automatically detect difficult search tasks is a challenging issue yet to be resolved.

Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

14

Xiangmin Zhang and Yuelin Li

6.

Conclusion

This study explored the use of collaborative recommendations in web search by users. The collaborative recommendations are generated in the experimental system by a list of relevant documents/webpages in a group report, rather than the typical recommendations generated automatically by collaborative filtering algorithms. The paper discusses the results of the study: how and what recommendations should be used, how the use of recommendations would influence search performance and user satisfaction, and the factors that may impact such uses. The results demonstrate that use of collaborative recommendations in information retrieval is complicated. In general, users were not enthusiastic about the use of collaborative recommendations. The ratings on satisfaction with the recommendations were at about the medium or below medium level. The participants preferred using queries and document sources to relevance judgments from other users. They used the recommendations only when they encountered difficulties in accomplishing their search tasks. The trained searchers were more likely to use the recommendations than non-trained searchers were. Use of collaborative recommendations did not seem to significantly affect the precision of the search results or the time spent on searching. However, the number of documents viewed for the tasks that used the group report was much lower than that for the tasks that did not use the group report. The findings reveal the complexities of use of collaborative recommendations in IR systems. More rigorous studies should be conducted in order to validate the results from the current study. Small sample size of users was one limitation of the study. Due to the small number of users in the study, caution should be taken when generalizing the findings. The results need to be confirmed in future studies with larger samples. Another limitation was that the recommendations were not pushed to the user as a typical collaborative filtering recommendation system would do. Rather, the user needed to make an effort to access the information items. This extra effort might have hindered the user’s willingness to use the group report. A third limitation was the assigned search topics instead of real life search topics based on the participants’ own information needs. The experimental system itself needs improvement in both its functionality and user interface. These issues will be addressed in future studies.

Acknowledgements This study was supported by a Rutgers University Information Science & Technology Council grant and a Rutgers University Research Council grant.

References [1] D. Goldberg, D. Nichols, B. M. Oki and D. Terry, Using collaborative filtering to weave an information tapestry, Communications of the ACM 35(12) (1992) 61–70. [2] W. Hill, L. Stead, M. Rosenstein and G. Furnas, Recommending and evaluating choices in a virtual community of use. In: I.R. Katz et al. (eds), CHI 1995: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Denver, Colorado (ACM, New York, 1995). [3] P. Resnick and H. Varian, Recommender systems, Communications of the ACM 40(3) (1997) 56–8. [4] U. Shardanand and P. Maes, Social information filtering: algorithms for automating ‘word of mouth’. In: I.R. Katz et al. (eds), CHI 1995: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM, New York, 1995). [5] M. Karamuftuoglu, Collaborative IR: toward a social informatics view of IR interaction, Journal of the American Society for Information Science 49(12) (1998) 1070–80. [6] X. Zhang, Collaborative relevance judgment: a group consensus method for evaluating user search performance, Journal of the American Society for Information Science and Technology 53(3) (2002) 220–35.

Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

15

Xiangmin Zhang and Yuelin Li

[7] S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998). Available at: www-db.stanford.edu/~backrub/google.html (accessed 12 July 2006). [8] J.L. Herlocker, J.A. Konstan, L.G. Terveen and J.T. Riedl, Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems 22(1) (2004) 5–53. [9] R. Torres, S.M. McNee, M. Abel, J. A. Konstan and J. Riedl, Enhancing digital libraries with TechLens. In: H. Chen et al. (eds), JCDL 2004: Proceedings of ACM/IEEE Joint Conference on Digital Libraries, Tuscon, Arizona (ACM, 2004). [10] S.M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S.K. Lam, A.M. Rashid, J.G. Konstan and J. Riedl, On the recommending of citations for research papers. In: E.F. Churchill et al. (eds), ACM Conference on Computer Supported Cooperative Work: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, New Orleans (ACM, New York, 2002). [11] X. Zhang and Y. Li, An exploratory study on knowledge sharing in information retrieval. In: HICSS 2005: Proceedings of the 38th Hawaii International Conference on Systems Science, Big Island, Hawaii, 3–6 January, 2005 (IEEE Computer Society Press, 2005). [12] M.B. Twidale, and D.M. Nichols, Designing interfaces to support collaboration in information retrieval, Interacting with Computers 10(2) (1998) 177–93. [13] S.C. Newell, User models and filtering agents for improved Internet IR, User Modeling and User-Adapted Interaction 7(4) (1997) 223–37. [14] N.C. Romano, D. Roussinov, J.F. Nunamaker and H. Chen, Collaborative information retrieval environment: integration of information retrieval with group support systems. In: HICSS 1999: Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences, Maui, Hawaii (IEEE Computer Society Press, 1999). [15] P.B. Kantor, E. Boros, B. Melamed, V. Menkov, B. Shapira and D.J. Neu, Capturing human intelligence in the net, Communications of the ACM 43(8) (2000) 112–15. [16] B. Smyth, E. Balfe, J. Freyne, P. Briggs, M. Coyle and O. Boydell, Exploiting query repetition and regularity in an adaptive community-based web search engine, User Modeling and User Adapted Interaction 14(5) (2004) 383–423. [17] A. Hust, S. Klink, M. Junker and A. Dengel, Towards collaborative information retrieval: three approaches. In: J. Franke et al. (eds), Text Mining: Theoretical Aspects and Applications (Physica-Verlag, Heidelberg, 2003). [18] Y.-L. Lee, VisSearch: a collaborative web searching environment, Computers and Education 44(4) (2005) 423–39. [19] S. Jung, K. Harris, J. Webster and J. Herlock, SERF: integrating human recommendations with search. In: D.A. Evans et al. (eds), CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, Washington, D.C. (ACM, 2004). [20] X. Zhang, Y. Li and S. Jewell, Design and evaluation of a prototype user interface supporting sharing of search knowledge in information retrieval. In: A. Grove (ed.), ASIST 2005: Proceedings of the 68th Annual Meeting of the American Society for Information Science and Technology, Charlotte, NC (ASIS&T, 2005). [21] R. Baeza-Yates and B. Riberio-Neto, Modern Information Retrieval (ACM Press, New York, 1999). [22] A. Spink, D. Wolfram, M.B.J. Jansen and T. Saracevic, Searching the web: the public and their queries, Journal of the American Society for Information Science and Technology 52(3) (2001) 226–34. [23] N.J. Belkin, D. Kelly, H.-J. Lee, Y.-L. Li, G. Muresan, M.-C. Tang, X.J. Yuan and X.-M. Zhang, Rutgers’ HARD and web interactive track experiments at TREC 2003. In: E.M. Voorhees and L.P. Buckland (ed.), TREC 2003: Proceedings of TREC 2003, Gaithersburg, Maryland (NIST, 2003). [24] H. Bruce, R. Fidel, A.M. Pejtersen, S. Dumais, J. Grudin, and S. Poltrock, A comparison of the collaborative information retrieval behaviors of two design teams, The New Review of Information Behaviour Research 4(1) (2003) 139–53.

Appendix: search topics Topic numbers as appeared in the experimental system. 19. Kava Kava for Stress You have heard that Kava Kava is a good herb to take to relieve stress. Search for sites that explain what Kava Kava is, how you would take it, and whether there is a recommended dosage.

Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

16

Xiangmin Zhang and Yuelin Li

20. Delayed Food Allergies You frequently have been feeling ill, and your friend suggests that you may be suffering from a type of food allergy called a delayed food allergy. You would like to find out what the difference is between delayed food allergies and immediate food allergies and what types of tests are available to see if you are allergic. 22. Risk Factors for Osteoporosis A relative of yours has been diagnosed with osteoporosis. You are concerned about whether or not you are at risk of the disease. What are some of the significant risk factors for osteoporosis? 23. Genetically Modified Foods and Transgenic Plants A friend of yours is a scientist and argues that ‘genetically modified foods’ is not the correct term to use. You would like to learn about what the term ‘transgenic plant’ means and if it differs from the term ‘genetically modified food’. 24. GMOs and Human Health Imagine that you have heard that over 70% of processed foods (such as baked goods) have ingredients that came from genetically modified plants. You want to know if eating these foods can be harmful to your health. Find websites on pages that contain the information you need. 25. Agricultural biotechnology and the environment You have heard that growing genetically modified plants such as corn harms the environment. You are a farmer who would like to decide whether to grow these crops. Find web sites or pages that support or refute this contention.

Journal of Information Science, XX (X) 2007, pp. 1–17 © CILIP, DOI: 10.1177/0165551507080413

17

Suggest Documents