The impact of text browsing on text retrieval performance - CiteSeerX

1 downloads 894 Views 160KB Size Report
hypertext interfaces to text retrieval systems can benefit recall and can also benefit ..... scrolling, selecting and querying, providing more and more varied text, may be ... due to the constraints of HTML at the time, passage selection queries were ...
Information Processing and Management 37 (2001) 507±520

www.elsevier.com/locate/infoproman

The impact of text browsing on text retrieval performance Richard C. Bodner a,*, Mark H. Chignell a, Nipon Charoenkitkarn b, Gene Golovchinsky c, Richard W. Kopak d a b

Department of Mechanical and Industrial Engineering, Knowledge Media Design Institute, University of Toronto, 5 King's College Road, Toronto, Ont., Canada M5S 3G8 School of Information Technology, King Mongkut's University of Technology, Thonburi, 91 Suksawasd 48 Bangmod, Ratuburana, Bangkok 10140, Thailand c FX Palo Alto Laboratory, Inc., 3400 Hillview Avenue, Bldg. 4, Palo Alto, CA 94304, USA d Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ont., Canada M5S 3G6 Accepted 19 September 2000

Abstract The results from a series of three experiments that used Text Retrieval Conference (TREC) data and TREC search topics are compared. These experiments each involved three novel user interfaces (one per experiment). User interfaces that made it easier for users to view text were found to improve recall in all three experiments. A distinction was found between a cluster of subjects (a majority of whom were search experts) who tended to read fewer documents more carefully (readers, or exclusives) and subjects who skimmed through more documents without reading them as carefully (skimmers, or inclusives). Skimmers were found to have signi®cantly better recall overall. A major outcome from our experiments at TREC and with the TREC data, is that hypertext interfaces to information retrieval (IR) tasks tend to increase recall. Our interpretation of this pattern of results across the three experiments is that increased interaction with the text (more pages viewed) generally improves recall. Findings from one of the experiments indicated that viewing a greater diversity of text on a single screen (i.e., not just more text per se, but more articles available at once) may also improve recall. In an experiment where a traditional (type-in) query interface was contrasted with a condition where queries were marked up on the text, the improvement in recall due to viewing more text was more pronounced with search novices. Our results demonstrate that markup and hypertext interfaces to text retrieval systems can bene®t recall and can also bene®t novices. The challenge now will be to ®nd modi®ed versions of hypertext interfaces that can improve precision, as well as recall and that can work with users who prefer to use di€erent types of search strategy or have di€erent types of training and experience. Ó 2001 Elsevier Science Ltd. All rights reserved.

*

Corresponding author. Tel.: +1-416-978-7581. E-mail addresses: [email protected] (R.C. Bodner), [email protected] (M.H. Chignell), nipon@it. kmutt.ac.th (N. Charoenkitkarn), [email protected] (G. Golovchinsky), kopak@®s.utoronto.ca (R.W. Kopak). 0306-4573/01/$ - see front matter Ó 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 3 0 6 - 4 5 7 3 ( 0 0 ) 0 0 0 5 9 - 5

508

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

Keywords: Hypertext; Information retrieval; User interfaces

1. Introduction Until the 1980s, search intermediaries typically carried out the task of text retrieval. The intermediaries worked in an environment where the expense of connection time to a database encouraged high precision and careful query formulation. Since that time, the end user has done an increasing amount of text retrieval and since 1995 such retrieval has been carried out with search engines on the World Wide Web (WWW). Other search environments where textual documents may also be browsed by end users includes online newspapers, electronic books and digital libraries. In this paper we examine the e€ects that browsing a larger amount of text has on performance while carrying out text retrieval. We will begin by reviewing the literature on information exploration as an activity that includes both browsing and text retrieval (search). We will then report some empirical results from our involvement with the Text Retrieval Conference (TREC), sponsored by the National Institute of Standards and Technology (NIST), which demonstrate that browsing and interacting with text during a search task can a€ect performance, particularly in terms of recall. 2. Information exploration contexts Prior to reviewing some of the experimental results we have obtained with respect to these relationships, we will review some of the research literature on information exploration constructs and contexts. We use the term ``exploration'' deliberately (in contrast to ``retrieval''), as it implies a human-centred rather than system-centred approach. Similarly, the term ``information seeking'' also implies a human-centred approach to the acquisition of information. Within exploration, we have focussed on the distinction between searching, where one is looking for something in particular (typically by describing it) and browsing, where one is ®nding things of interest by going through bits of information. The intuition that there is a meaningful distinction to be made between browsing and searching has been a central assumption in the literature on information exploration. However, an alternative approach is to view browsing and searching not as completely distinct categories of information exploration, but rather as di€erent aspects or con®gurations of a unitary process, where performance and behaviour are in¯uenced by the system, task and the causal relationships between these factors. Search strategy may also have an impact on information seeking performance. It may also change relatively quickly as people adopt di€erent strategies in information exploration (Bates, 1990). Hancock-Beaulieu (1990), for example, showed how users in a text retrieval task switched from one search strategy to another. She described users of libraries who began with a search for a particular item only to discover that the desired item was not available in the library. They then changed their strategy from search to scan (or browse). Belkin, Chang, Downs, Saracevic, and Zhao (1990) also observed user behaviour in libraries and similarly noted a number of changes from one kind of search strategy to another.

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

509

In addition to dynamic switching between strategies, there are also individual di€erences in searching ability (e.g., Borgman, 1989; Fenichel, 1981; Saracevic, 1991). Iivonen (1995) found that searchers working in di€erent types of search environments (e.g., public versus non-public organizations) had di€erent types of work experience and those di€erences in searchers' experience caused them to prefer di€erent search methods, di€erent terminological styles and di€erent search strategies. Borgman observed that people brought di€erent skills and talents to the task of information retrieval. She found high variability in searching behavior, even when the same system and the same database were used. Fenichel (1981) found that even in the same experience group, the size of the search process measures and outcome measures for the same search topics sometimes varied by a factor of 10 or more. Fidel and Crandall (1997) found that there were strong individual di€erences in judging the relevance of documents (for engineers and managers within the Boeing Company). For example, ``some participants wanted to read reports because they were non-technical, about a speci®c vendor, or basic and general, while others decided to delete reports for the same reasons. Unlike other criteria, they have no absolute relevance-related value because for some participants they indicated relevance and for others non relevance''. Aside from the need to characterize or account for intra- and inter-individual di€erences, there is also a need to distinguish between groups of users where the grouping is based on externally de®ned types (e.g., expertise) versus search style (an intrinsic factor, e.g., related in some way to cognitive style). Mick (1980) pointed out that even in the same experience group (e.g., intermediaries), people might have di€erent procedural knowledge (``knowing-in-action'') and might adopt di€erent working methods. These di€erences may be related to personal traits, or they may be related to the experience accumulated in di€erent environments. Mick indicated that experience with di€erent information environments was likely to lead to di€erent information seeking strategies. Information exploration frequently involves a combination of browsing and searching. Design opportunities exist for new systems and user interfaces that incorporate browsing and search as points located in di€erent positions along a shared set of dimensions (e.g., Waterworth & Chignell, 1991). However, search performance in a particular context is also in¯uenced by a number of other factors. Marchionini (1995), for instance, posits six factors that in¯uence the scope and character of information exploration: · Characteristics of the information seeker (including individual di€erences and the nature of the search task or topic). · The task as the manifestation of the information problem. · The nature of the search system. · The domain of information explored. · The setting or environment. · The outcomes. These six factors and their interaction with each other, can be used to describe the dominant structural e€ects conditioning the behaviour and strategies, of individual information seekers. For example, Marchionini identi®es ®ve basic dimensions of browsing that are based on the interaction of these six factors: the external representation of the information object sought; the mental representation of the information object sought; the organization of information objects in the

510

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

environment, the degree of interactivity and the cognitive e€ort required. From this perspective, several di€erent browsing strategies are identi®ed, based on the mix of values attributed to each dimension. For example, a scanning strategy of information exploration would involve a high degree of external representation of the object sought, a high degree of organization in the information environment and low values on the interactivity and cognitive e€ort dimensions. Alternatively, compared to these dimensions, one might view the adoption of a search strategy as conditioned by the speci®city of the task, the procedural knowledge of the information seeker and the richness of representation and organization within the information environment. Chang (1995) and Chang and Rice (1993) identi®ed four dimensions underlying the browsing process: the behavioural dimension, the motivation dimension, the cognitive dimension and the resource dimension. Each dimension is further divided into two sub-dimensions. Chang includes in her cognitive dimension not only the mental representation of the concept, but also a mental representation of where the object sits in the information space. Waterworth and Chignell (1991) in their model of information exploration refer to this indirectly through their ``target orientation'' dimension. They express this dimension in terms of how speci®c the information need is in the mind of the user and how the degree of speci®city predisposes a user toward either a querying or browsing strategy. A further important dimension of information exploration appears to be the degree of interactivity, indicated by the number and rate of choices made and by the actions taken at the point of the interface. We might infer that since browsing depends on close contact with the information (being largely data-driven) that higher interactivity or contact with the information is preferred. On the other hand, in classical approaches to text retrieval, a quick and precise search and consequently a low level of interactivity, might be preferred. We demonstrate however, that the user's choice of interactivity can vary considerably even within the text retrieval task and show that interactivity may in fact be an important indicator of the navigational style of the user. This interactivity dimension is re¯ected in what Waterworth and Chignell referred to as the `interaction method' and what Belkin et al. (1990) called the `mode of retrieval'. In either case, rapid, direct interaction with the information (or with some useful surrogate) would be enhanced through reference (i.e., identi®cation), as opposed to description (i.e., speci®cation). One important construct that we do not consider in this paper is motivation. Belkin (1995) showed how di€erences in motivation (i.e. goals) a€ected the use of electronic textual resources by humanities scholars and listed identifying, learning, judging, evaluating and contextualizing as examples of di€erent motivational factors. In the studies reported below, motivation was assumed to be relatively ®xed, due to the speci®c characteristics of the TREC search tasks that were used. 3. Empirical ®ndings In this section we will highlight our research ®ndings concerning the relationship between text browsing and search performance. The research was carried out as part of our participation in the TREC-3, -4 and -7 ad hoc tasks and the TREC-7 interactive track and experiments using the TREC data. The research followed a style of mixed exploratory and con®rmatory analyses that is characterized and further discussed in the paper by Golovchinsky, Chignell, and Charoenkitkarn (1997).

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

511

3.1. E€ects of marking up queries on text We wanted to develop innovative user interfaces for information retrieval and to understand the impact of those interfaces on performance. A key element of those interfaces was that there was no distinction between a querying mode and a text-viewing mode, so that queries could be expressed while reading text, without the user having to switch into a di€erent mode. We began (Golovchinsky & Chignell, 1993) by developing a method of marking Boolean queries on text. Essentially, the query mark-up system allowed users to select terms to be used in the query and to draw graphically the Boolean relationship between these terms. This brought querying closer to hypertext by eliminating query syntax errors and by making queries more interactive ± each click generated a corresponding set of matching documents. In the markup interface, subjects add terms to a query by selecting (clicking on) words in the text that they are viewing. By default, terms are combined with the implicit OR operator. To change this OR relationship to an AND, the user draws a line between the two selected terms (using a click and drag operation). After each interaction with the text, the system would issue a new query. Thus, a sequence where the user selects the word ``information'', followed by the word ``retrieval'' and then drags a line between the two words would be treated as three queries: · (information), · (information) OR (retrieval), · (information) AND (retrieval). There are a number of theoretical reasons for believing that markup-based querying will be a useful alternative to type-in approaches The reasons for expecting bene®ts from markup include the following: · Immediate feedback of query results (e.g., user knows if AND'ing some terms reduces the number of matching documents to zero). · Less cognitive e€ort (recognition rather than recall). · Less mode switching (querying, reading and browsing can all occur while in the same mode of viewing text documents). · Boolean syntax is hidden from users, allowing them to focus on the search task rather than syntactic issues. Subsequently, we assessed the performance of our query mark-up systems in TREC-3 and TREC-4 (Charoenkitkarn, Chignell, & Golovchinsky, 1995,Charoenkitkarn, Chignell, & Golovchinsky, 1996). Following these experiments, the e€ectiveness of marking up queries on text was further assessed by Charoenkitkarn (1996) in a series of experiments. Reported below are results from Charoenkitkarn's second experiment. In Charoenkitkarn's second experiment, 36 participants each carried out searches using eight topics from the TREC-3 conference and a 300 Megabyte database containing more than 90 000 documents that originally appeared in the San Jose Mercury News between 1990 and 1992. Subjects in the experiment were classi®ed as either search experts or search novices based on whether or not they had formal training in information retrieval. Eighteen of the thirty-six subjects were identi®ed as experts. Nine of these ``experts'' worked as librarians or on-line searchers and the remaining nine were either graduates of a university information science programme, or currently enrolled in such a programme. All participants used a single common interface in the experiment.

512

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

The di€erent interface conditions were induced by variations in the instructions provided to the participants in each condition. Thus the subjects in the hybrid condition were instructed on both the query mark-up and type-in functionality of the system, whereas the subjects in the mark-up and type-in conditions were only instructed in that functionality of the system. This experimental strategy was used so that extraneous di€erences in software used between the three conditions would not provide an alternative explanation for any di€erences observed. In almost all cases, participants used only the features described in the instructions and were not aware of the other functionalities actually available in the software. The interaction between expertise and interface condition was signi®cant at the 0.05 level for both recall (F ‰2; 171Š ˆ 4:08; P < 0:05) and precision (F ‰2; 171Š ˆ 3:06; P < 0:05). Fig. 1 shows how type of interface a€ected the search precision achieved by the novice and expert subjects. The experts' performance was generally not a€ected by di€erent interface conditions, for either precision or recall. In contrast, the precision obtained by novices di€ered between the conditions, with the greatest discrepancy occurring between the type-in (precision of 0.36) and mark-up conditions (precision of 0.52), as shown in Fig. 1. Recall for the novices tended to be higher in the markup and hybrid conditions than in the type-in condition. Overall, there was a tendency for the mark-up interface to lead to higher precision (for novice searchers, but not experts) and for the mark-up and hybrid conditions to lead to higher recall (but again, for novice searchers, rather than experts). The seven experts who had at least ®ve years of working search experience (``professional'') were contrasted with the other members of the expert group (``other experts''). Analysis of variance was carried out with the three levels of expertise (professional searcher, other expert, novice) as the independent variable. The dependent variable was the proportion of times that type-in querying commands were used in the hybrid interface. The proportion of type-in commands varied signi®cantly across the three expertise groups (…F ‰2; 91† ˆ 18:3; P < 0:001).

Fig. 1. Precision across three interface conditions for search experts versus novices (Charoenkitkarn, 1996, Experiment 2).

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

513

Professional searchers used type-in queries 96% of the time; other experts used them 72% of the time; novices used type-in queries only 43% of the time. Overall, these results showed that novices preferred to use the markup interface and their performance tended to improve as a result. In contrast, experts tended not to use markup where they had a choice and their performance was not signi®cantly a€ected by the presence or absence of the query mark-up interface. Experts tended to judge fewer documents than novices (an average of 65 each for experts versus 91 each for novices). Experts also used fewer queries than novices (an average of 31 versus 36 queries for novices). Five experts and six novices behaved di€erently than typical members of their groups. Re-grouping subjects then created a new factor called ``search pattern''. The ®rst group was labelled ``exclusive'' because as a group these subjects tended to select and view fewer documents. This group consisted of thirteen experts and six novices. Although they showed little di€erence in average precision (0.49 versus 0.43, F ‰1; 171Š ˆ 1:33; P < 0:25), they tended to have lower average recall (0.11 versus 0.16, F ‰1; 171Š ˆ 24:31; P < 0:001) compared with the second group. The second group (labelled ``inclusive'') consisted of twelve novices and ®ve experts. Members of the inclusive group tended to judge a high number of documents. Fig. 2 shows a scatter graph of average (for each participant) recall versus precision for exclusive and inclusive participants. The upper left region of Fig. 2 (high recall and low precision) contains inclusive participants (i.e., people who selected and viewed more documents) while the region below the main diagonal (low recall and high precision) tends to contain exclusive participants. The distinction between people with inclusive versus exclusive search strategies appeared to be a better predictor of performance than search expertise. There was a larger di€erentiation

Fig. 2. Scatter graph of average precision versus recall for exclusive and inclusive subjects (Charoenkitkarn, 1996, Experiment 2).

514

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

between inclusive and exclusive subjects on recall and precision results, than there was for the corresponding measure between expert and novice subjects. The di€erence in time taken to ®nd the ®rst relevant document was also greater between exclusive and inclusive subjects than it was between experts and novices. This shows how, in some cases at least, di€erences in search strategy may be more important predictors of performance than factors such as training and expertise. 3.2. Hypertext interfaces for information retrieval We then developed a method of automatically adding hypertext anchors to text; each anchor generated a full-text query (Golovchinsky, 1997a, Bodner, Chignell, & Tam, 1997). This created a hybrid system where an underlying search process was mediated by a hypertext interface in addition to allowing more traditional typed queries. The same system could thus function as a hypertext or as a search engine, depending on the point of view of the user, as revealed by the chosen interaction. In one case, the user clicked on certain words or phrases as if they were hypertext links. The document (or documents) returned were then viewed as endpoint(s) of the selected link. In the other case, text passage selections were viewed as conceptual feedback for an underlying query formulation process in which each selection or typed phrase led to a re®nement of or a shift in the concept of interest. One of the major outcomes of our research on information exploration has been that hypertext (pointing and clicking on link anchors) can provide a good interface for text retrieval without manual link creation (Golovchinsky, 1997b). In his Ph.D. dissertation, Golovchinsky developed a system, VOIR, for large-scale text retrieval with a hypertext user interface that employed a newspaper metaphor (see also Golovchinsky & Chignell, 1997). VOIR di€ered from the BrowsIR interface used by Charoenkitkarn, which compared two di€erent styles for formulating Boolean queries (graphical or text-based). VOIR allowed users to form queries in three ways: VOIR used typed queries, text selection and passagebased query reformulation to retrieve articles. Query reformulation was provided through the selection of hypertext anchors dynamically inserted into the text (Golovchinsky, 1997a). However a key similarity between the BrowsIR and VOIR systems was that the mark-up and hypertext interfaces allowed queries to be marked up or selected on text, without a need to switch modes. As a result, the user was able to view more of the available text during a search. If viewing more text can improve recall performance, is it the volume of text that is important, or the diversity? Golovchinsky examined this question by creating a newspaper interface where the number of articles (columns) in the newspaper varied. In other words, roughly the same number of words was presented in the di€erent conditions, but users saw two, four, or seven columns (articles) in the di€erent conditions, with diversity of textual information tending to increase with the number of columns used. Golovchinsky found that the number of articles displayed at once had a borderline e€ect on retrieved recall (F ‰12; 42Š ˆ 2:55; P < 0:10), which tended to be better when more articles (columns) were visible. Post-hoc comparison indicated that this di€erence was likely due to poorer recall in the two-column condition (0.194) versus the other two conditions (with a recall of 0.255 for the four column condition and 0.259 for the seven column condition). There was no corresponding di€erence in retrieved precision (F < 1).

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

515

Subjects in the two-article condition viewed on average 28% fewer articles than in the fourarticle condition and 47% fewer articles than in the seven-article condition. Subjects achieved 34% to 60% higher judged recall scores when viewing more than two articles at a time. This demonstrates that under appropriate circumstances, viewing more articles can help recall without hurting precision. Thus it may be that it is not just viewing more text that improves recall, but also seeing a greater diversity of text (i.e., a greater number of articles). These results may also be interpreted using the notion of cognitive overhead (Wright, 1991). Wright showed that subjects were less likely to use information relevant to the task if that information was not directly visible on the screen, but instead accessible via a mouse click. Similarly, subjects in Golovchinsky's study were less likely to view documents that required additional mouse-clicks. It is reasonable to conclude from these data that interfaces that expose subjects to many retrieved documents simultaneously should produce greater awareness of the contents of the database than interfaces that force subjects to page through the retrieved results one at a time. Even if an interface permits a user to see an equivalent number of documents by scrolling, selecting and querying, providing more and more varied text, may be needed to reduce cognitive overhead. On the face of it and using the principle ``less is more'', putting more text on screen might seem like poor interface design. However, people are very adept at scanning through large newspaper formats with a great deal of text and as screen technology improves, providing the opportunity to view large amounts of heterogeneous text seems like a good strategy for improving recall. Golovchinsky carried out cluster analysis to di€erentiate between di€erent search styles. Like Charoenkitkarn, he obtained two main clusters ± those spending much time reading and only making a few queries and a few relevance judgments (the ``readers'') versus those making a large number of queries and relevance judgments (the ``skimmers''). Using VOIR, skimmers achieved higher recall (0.18 versus 0.11 for readers) without a statistically signi®cant decrease in precision (Golovchinsky, 1997a). Allowing for di€erences in the interfaces used across the two studies, the inclusives who selected and viewed more documents in Charoenkitkarn's study are roughly equivalent to the skimmers who made more queries and relevance judgments in Golovchinsky's study. Similarly, the readers seem to be roughly equivalent to the exclusives identi®ed in Charoenkitkarn's study. Thus the common ®nding for both studies was that viewing text generally produced better recall. 3.3. Web-based dynamic hypertext Based on the experiences gained from Golovchinsky's experiments, we felt that our hypertext interface to information retrieval had a potential advantage over the typical hypertext interfaces used in many search engines available on the WWW, where the tasks of searching and browsing are separated. Thus the focus in our participation in TREC-7 was to compare our query-mediated link interface with one that mimicked the separation between the tasks of querying and browsing for an information retrieval search task. ClickIR, our subsequent experimental system (Bodner & Chignell, 1999) applied the algorithms used in VOIR for creating query-mediated links to the creation of web pages that had link anchors automatically inserted. The system also allowed users the ability to type in queries. Unlike VOIR,

516

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

due to the constraints of HTML at the time, passage selection queries were not implemented. Similar to the methods employed in VOIR, anchors were selected based on statistical knowledge of the document collection and a running model of the user's information need. The system tracked the link anchors that were selected by the user. Queries were inferred by using the text surrounding a selected link anchor and combining the link text with past queries. By tracking the links selected by the user, a simple model of the user's information need was developed. ClickIR used both single terms and phrases to help users distinguish between link anchors. From informal testing of previous versions of ClickIR, we found that subjects had diculty distinguishing among links that used the same word as an anchor within a document. This misunderstanding was primarily due to users' prior experiences with other hypertext systems where links using the same term as an anchor generally point to the same end node. This was not the case in our system since links were context sensitive. Phrases were used as link anchors in order to help the subjects distinguish between these related, but di€erent links. We took part in the ad hoc task and interactive track in TREC-7. For the ad hoc task, we were required to use our experimental system to retrieve the top one thousand documents for ®fty queries. Although this is primarily a non-interactive task, we took part in order to evaluate the e€ects of relevance feedback on retrieval performance. For the ad hoc task, subjects were given query topics provided by NIST. They were allowed to explore the document collection by clicking on links and were allowed to mark documents as being relevant to the query. The ``uoftimgu'' run was created by submitting only the resultant models of users' information need (i.e., the model that resulted after the allotted search time) to the search engine. The user model consisted of the terms from the links (and from the sentence surrounding the link) that the user traversed during the search session. The ``uoftimgr'' run was created using the resultant user model plus relevance feedback based on the documents marked relevant during the search process. Fig. 3 shows the recall-precision curves for both runs. When relevance feedback was used (uoftimgr), the average precision was above the median TREC-7 performance for 62% of the topics. However, when relevance feedback was not used

Fig. 3. Recall-precision curves for uoftimgu and uoftimgr runs.

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

517

(uoftimgu), our experimental system average precision was above the median only 44% of the time. This illustrates the power of relevance feedback in improving retrieval performance, an e€ect that has been noted many times in the TREC experiments. It should also be noted that di€erences in performance in the TREC task tend to be comparatively small. For instance, while the advantage of the uoftimgr system over the uoftimgu system looks very small in Fig. 3, it actually represented the di€erence between a relatively good set of results in TREC-7 and rather mediocre results. We also participated in the TREC-7 interactive track where our experimental system was compared with a control system, which closely mimicked the separation between the querying and browsing tasks found in the hypertext interfaces of most web search engines (see Bodner & Chignell, 1999, for details about the control system). In this study we again found improved recall when viewing more text in the hypertext interface, but at the expense of lower precision. Users of the ClickIR system had both signi®cantly higher recall scores (F ‰1; 7Š ˆ 42:74; P < 0:001, mean of 0.41 versus a mean of 0.37) and lower precision scores (F ‰1; 7Š ˆ 35:66; P < 0:001, mean of 0.65 for instance precision versus 0.70 for the control system). 4. General discussion A major outcome from our experiments at TREC and with the TREC data, is that hypertext interfaces to IR tasks (and markup query interfaces that encourage the user to view more text) tend to increase recall. In an experiment with the VOIR system, recall increased without a signi®cant drop in precision. With the ClickIR system, recall was also improved, although at the expense of precision. Recall and precision both improved with the BrowsIR system, but only for search novices. This pattern of results is summarized in Table 1. Our interpretation of this pattern of results across the three experiments is that increased interaction with the text (more pages viewed) generally improves recall. The improvement in recall is more pronounced with search novices who appear to bene®t more from this increased interaction. The e€ects on precision are more dependent on the type of user interface and the way in which the increased interaction with text is implemented. Each experiment yielded a di€erent result in this regard (likely due to the di€erent interfaces used) with precision increasing with BrowsIR, not changing signi®cantly with VOIR and decreasing with ClickIR. We also noted strong individual di€erences in our TREC research. We logged a variety of variables that characterized the search behaviour of subjects, including the traditional measures of recall and precision. We found that subjects could be placed into groups (clusters) based on the Table 1 The e€ects of increased exposure to text across the three studies cited BrowsIR (Charoenkitkarn) VOIR (Golovchinsky) ClickIR (Bodner)

Recall

Precision

Increased (novices) Increased (skimmers better) Increased

Increased (novices) Unchanged Decreased

518

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

di€erent search styles that they adopted. While several groupings were observed, one distinction was strongly apparent across di€erent experimental settings. This was the distinction between people who tended to scan a lot of articles, versus those who tended to look at fewer articles, but then read those articles more closely. This di€erence in search styles was also related to di€erences in outcomes. Search experts tended to have an exclusive or reader style of search. Signi®cant di€erences were found between the exclusives and inclusives (readers and skimmers) in terms of the levels of recall and precision obtained. Overall, the results indicate that: · Search pattern or navigation strategy has a major e€ect on retrieval performance. · Experts tend to be more precision oriented, viewing and selecting fewer articles. · Hypertext interfaces to IR encourage more recall oriented search styles by viewing more text and a larger variety of text. We believe that the third point above is particularly important and that interfaces that reduce the cognitive cost of interaction should be used in those tasks where the user requires or could bene®t from higher recall. Our results demonstrate that hypertext interfaces to text retrieval systems can bene®t recall and can also bene®t novices. It seems likely that this bene®t may come from the ability to view a few more documents without a signi®cant increase in the cognitive e€ort needing to be expended. One of the enduring problems in information retrieval is how to improve recall without sacri®cing precision and vice versa. The challenge now will be to ®nd modi®ed versions of hypertext interfaces that can improve precision, as well as recall. Another challenge will be to ®nd variants of hypertext interfaces that can bene®t search experts, as well as novices. It remains to be seen whether a single user interface with these desirable properties can be designed. If not, then it seems that di€erent user interfaces will have to be designed explicitly for di€erent types of users. 5. Conclusions From our experience at TREC and experiments with the TREC data, the major ®nding is that hypertext information retrieval interfaces increase recall because the user reads and interacts with more documents. Interfaces that only display a list of titles from relevant documents do not encourage the level of interaction with the document collection needed to e€ectively increase recall scores. All three sets of experiments showed varying levels of increased recall performance. In Charoenkitkarn experiments with BrowsIR, novices bene®ted from the interaction with the text of the documents and recall increased. With VOIR, Golovchinsky found that recall improved as the number of documents presented at once to the user also increased. Finally, Bodner's experiment with the TREC-7 interactive track also found that recall improved using ClickIR, the hypertext information retrieval interface, versus the control system that mimicked a search engine interface. Displaying more documents on the screen assists the user in interacting with more of the document collection. This increased interaction allows the user to gain a better understanding of the possible content of the collection and tends to lead to higher recall scores. As new interfaces are developed for text retrieval, we need to understand how they a€ect search strategy and performance. Di€erent types of users, with di€erent experience (and using di€erent

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

519

search strategies), will tend to react di€erently to a new interface or functionality. Some may bene®t from the new features, while others may not. The series of experiments that we have brie¯y reviewed above also indicates signi®cant search strategy di€erences between subjects, di€erences that led to signi®cant di€erences in performance. Experts tended to be more exclusive (precision-oriented readers), while novices were more inclusive, carrying out more queries and judging more documents. Given the strong di€erences found in search strategies and the relations of these individual di€erences both to search expertise and to retrieval performance, it seems likely that distinctly di€erent types of user interface may need to be designed for di€erent classes of user. Acknowledgements This research was supported by grants from the National Science and Engineering Research Council of Canada (NSERC), by the Information Technology Research Centre of Excellence of Ontario (ITRC) to the second author and by the Communications and Information Technology of Ontario (CITO). References Bates, M. J. (1990). Where should the person stop and information search interface start?. Information Processing and Management, 26(5), 575±591. Belkin, N. J. (1995). Design principles for electronic textual resources: investigating users and uses of scholarly information. In A. Zampolli, N. Calzolari, & M. Palmer, Current issues in computational linguistics in honor of Don Walker (pp. 479±488). Pisa: Giardini Editori. Belkin, N. J., Chang, S., Downs, T., Saracevic, T., & Zhao, S. (1990). Taking into account of user tasks, goals and behavior for the design of online public access catalogs. In D. Hendersen, Proceedings of ASIS'90, the 53rd annual meeting of the American Society for Information Science (pp. 69±79). Bodner, R. C., & Chignell, M. H. (1999). ClickIR: text retrieval using a dynamic hypertext interface. In E. M. Voorhees, & D. K. Harman, Proceedings of TREC-7, the seventh text retrieval conference (pp. 573±582). Gaithersburg, Maryland: National Institute of Standards and Technology (NIST). Bodner, R., Chignell, M., & Tam, J. (1997). Website authoring using dynamic hypertext. Proceedings of Webnet'97 (pp. 59±64). Borgman, C. L. (1989). All users of information systems are not created equal: an exploration into individual di€erences. Information Processing and Management, 25(3), 237±252. Chang, S. L. (1995). Toward a multidimensional framework for understanding browsing. Unpublished doctoral dissertation, Rutgers, The State University of New Jersey, New Brunswick, NJ. Chang, S., & Rice, R. E. (1993). Browsing: a multidimensional framework. In M. E. Williams, Annual review of information science and technology (Vol. 28, pp. 231±276). Medford, NJ: Learned Information Inc. Charoenkitkarn, N. (1996). The e€ect of markup-querying on search pattern and performance in large-scale text retrieval. Unpublished doctoral dissertation, Department of Industrial Engineering, University of Toronto, Toronto, Ont., Canada. Charoenkitkarn, N., Chignell, M. H., & Golovchinsky, G. (1995). Interactive exploration as a formal text retrieval method: how well can interactivity compensate for unsophisticated retrieval algorithms? In D. K. Harman, Proceedings of TREC-3, the third text retrieval conference (pp. 179±199). Gaithersburg, MD: National Institute of Standards and Technology (NIST). Charoenkitkarn, N., Chignell, M. H., & Golovchinsky, G. (1996). Is recall relevant? An analysis of how user interface conditions a€ect strategies and performance in large scale text retrieval. In D. K. Harman, Proceedings of TREC-4,

520

R.C. Bodner et al. / Information Processing and Management 37 (2001) 507±520

the fourth text retrieval conference (pp. 211±232). Gaithersburg, MD: National Institute of Standards and Technology (NIST). Fenichel, C. H. (1981). Online searching: measures that discriminate among users with di€erent types of experience. Journal of the American Society for Information Science, 32(1), 23±32. Fidel, R., & Crandall, M. (1997). Users' perception of the performance of a ®ltering system. Proceedings of SIGIR'97, the 22nd annual international ACM-SIGIR conference on research and development in information retrieval (pp. 198± 205). Golovchinsky, G. (1997a). Queries? Links? Is there a di€erence? Proceedings of ACM CHI'97, the conference on human factors in computing systems (pp. 407±414). Golovchinsky, G. (1997b). What the query told the link: The integration of hypertext and information retrieval. Proceedings of hypertext'97, the seventh ACM conference on hypertext (pp. 67±74). Golovchinsky, G., & Chignell, M. H. (1993). Queries-R-Links: graphical markup for text navigation. Proceedings of INTERCHI'93 (pp. 454±460). Golovchinsky, G., & Chignell, M. H. (1997). The newspaper as an information exploration metaphor. Information Processing and Management, 33(5), 663±683. Golovchinsky, G., Chignell, M. H., & Charoenkitkarn, N. (1997). Formal experiments in causal attire: case studies in information exploration. New Review of Hypermedia and Multimedia, 3, 123±158. Hancock-Beaulieu, M. (1990). Evaluating the impact of an online library catalogue on subject searching behaviour at the catalogue and at the shelves. Journal of Documentation, 46, 318±338. Iivonen, M. (1995). Searchers and searchers: di€erences between the most and least consistent searchers. Proceedings of SIGIR'95, the 18th annual international ACM-SIGIR conference on research and development in information retrieval (pp. 149±157). Marchionini, G. (1995). Information seeking in electronic environments. Cambridge: Cambridge University Press. Mick, C. K. (1980). Human factors in information work. In A. R. Benefeld, & E. J. Kazlauskas, Proceedings of ASIS'80, the 43rd annual meeting of the American Society for Information Science (pp. 21±23). Saracevic, T. (1991). Individual di€erences in organizing, searching and retrieving information. In J. Griths, Proceeding of ASIS'91, the 54th annual meeting of the American Society for Information Science (pp. 82±86). Waterworth, J. A., & Chignell, M. H. (1991). A model for information exploration. Hypermedia, 3(1), 35±58. Wright, P. (1991). Cognitive overheads and prostheses. Proceedings of hypertext '91, the third ACM conference on hypertext (pp. 1±12).