Users, structured documents and overlap : interactive ... - CiteSeerX

2 downloads 0 Views 1MB Size Report
Users, Structured Documents and Overlap: Interactive Searching of Elements and the Influence of. Context on Search Behaviour. Barbara Hammer-Aebi1 ...
Users, Structured Documents and Overlap: Interactive Searching of Elements and the Influence of Context on Search Behaviour Barbara Hammer-Aebi1, Kirstine Wilfred Christensen1, Haakon Lund1 and Birger Larsen1 1 Department of Information Studies Royal School of Library and Information Science Copenhagen, Denmark {[email protected], [email protected], [email protected], [email protected]}

Abstract. This paper contains an analysis of user behaviour when interacting with the result list of an information retrieval (IR) system retrieving elements from structured documents. The data set was obtained from the INEX 2005 Interactive Track, where a group of users searched information on travel destinations marked up in XML. The aim of this study is to determine the user preferences for element granularity and to examine how the users deal with overlapping elements. In addition, the difference in user behaviour when viewing the results in isolation or in the context of the surrounding elements is analysed. The results suggest that the users prefer element of depth 2-4 to whole documents. Users view fewer overlapping elements than expected and the user behaviour suggests the users consciously view some overlapping elements. There is almost no difference in behaviour whether the users view the element in the context of its document or in isolation. Symposium themes: Context-sensitive Information Seeking & Retrieval; Document Structure in Contextual IIR.

1 Introduction Documents have traditionally been considered as atomic units and have been indexed and retrieved as such in Information Retrieval (IR) systems (see, e.g. [1]). However, with the advent and more widespread use of mark-up languages, such as the eXtensible Markup Language1 (XML), it has become possible to derive smaller indexing and retrieval units from the document structure provided by the mark-up. Actively using document structure in IR has the potential to improve both browsing, querying and ranking [2]. Considering smaller units, it is evident that while the IR techniques that work on whole documents might also be applied to the indexing and retrieving the smaller units, new approaches might be needed to make the most of the opportunities offered 1

http://www.w3.org/XML/

To appear as: Hammer-Aebi, B., Christensen, K. W., Lund, H. and Larsen, B. (2006): Users, structured documents and overlap : interactive searching of elements and the influence of context on search behaviour. In: Proceedings of the first symposium on Information Interaction in Context (IIIiX), 18-20 October, 2006, Copenhagen, Denmark. (Accepted conference paper, 15 p.)

by the mark-up. Rather than regarding these smaller units or document elements as novel atomic units to be handled, the document elements may be seen as being highly contextual to each other within the same document [3]. For instance, the chart (C) and paragraph (P) shown in Fig. 1 below are each set in the context of each other and the containing document. Although these document elements can be extracted as atomic units, they may need the context of each other to be meaningful to a person looking for information, at least to a certain extent. How to find elements that are relevant to users’ requests and at the appropriate level of granularity has since 2002 been studied in the Initiative for the Evaluation of XML retrieval (INEX). The main focus in INEX has been on extending the traditional IR ad hoc laboratory model to deal with structured documents (see, e.g., [4]). An interactive track was, however, added in 2004 with the purpose of investigating user interaction with elements retrieval systems, and to aid the development of approaches for element retrieval which are effective in user-based environments [5]. In this paper, we report on a study of the influence of intra-document contexts on searching behaviour using data collected as part of the INEX 2005 Interactive Track (See [6] and section 2.1 below for details).

Fig. 1. Overlapping elements in the context of each other: a chart (C) in a paragraph (P) in an article (A).

Studying users’ interaction with element retrieval systems is important. Much progress has been made and many challenges identified in the ad hoc laboratory track in INEX, demonstrating that relevant elements can be retrieved in a variety of ways [7]. We have little knowledge, however, about how to best transfer these results into features that are useful for users. Studies of how users interact with prototype element retrieval systems might provide such knowledge and guide decisions on how to proceed with challenges in INEX as well as result in experiences that can inform practical implementations of element retrieval systems. Much of the needed knowledge is related to contextual issues: Is the context provided by the whole document so important that retrieving elements is of little use? How to deal with intra-document contextual relations, i.e., overlaps between elements? How to display the retrieved elements to users – as isolated, atomic units or in context of their surrounding elements? In this paper we investigate three research questions by examining the search behaviour of users interacting with an element retrieval system. First, we attempt to get indications of whether users prefer elements over whole documents, and if so,

what the preferred granularity is, that is, how much context do users prefer to see. Secondly, we study the extent to which overlapping contexts is a problem for users. Overlap between elements is one of the major challenges in the evaluation of element retrieval systems [8; 9]. Because the logical structure of XML documents are represented as tree structures element retrieval systems can retrieve information at different hierarchical levels, such as full documents, sections and subsections as well as paragraphs and figures. Treating the elements as atomic units, i.e., out of context, causes problems because the result list can contain elements from several hierarchical levels of the same document, with the risk that the same content will appear in two or more elements in the result list. In Fig. 1, the entire document A, a paragraph P and a chart C are three examples of elements. If a result list contains both P and C, the same content is listed twice. This makes defining performance metrics for ad hoc element retrieval a difficult task [8], and may also cause problems for users [10]. Third, we examine the influence of either presenting the content of the retrieved elements in context of the surrounding elements or in isolation. The remainder of the paper is structured as follows: Section 2 presents the experiment, which forms the basis of our analysis, the variables studied and the methods used in analysing the data. Section 3 presents the results, which are analysed and discussed in Section 4. Section 5 draws conclusions and gives suggestions for future work.

2 Methods 2.1 Data set The study is based on data collected in the INEX 2005 Interactive Track which had three different tasks in 2005 [6]. In task A and B a collection consisting of 16,819 scientific documents was made available for experiments either in a system provided by the track organisers (task A) or for experiments with own systems (task B). We have chosen to use data from task C in the interactive track, which used a collection of 462 documents with information on travel destinations from the Lonely Planet publishers. The documents, which were marked up in XML, were fairly large: On average, they contained 21 sections that each were between a few lines and a few pages long. The sections were divided further into seven hierarchical levels. Compared to the scientific documents used in task A and B, the Lonely Planet documents had tags that were much more meaningful (see Fig. 2 for an example). This is because the tags in the Lonely Planet collection are used to structure the documents and that the tags therefore have much more semantic information such as ‘weather’, ‘activities’ or ‘attractions’. The documents were indexed in an element retrieval system provided by Utrecht University, the B3-SDR system, which was used in two versions in task C: One presented the results highlighted in context of the full text, see Fig. 3; the other showed the results in isolation (Fig. 4). Fig. 2 shows the query box and result list after a user has submitted a query to the system; this was the same for both system versions. The returned hits consisted of both elements and whole documents, and

were grouped by destination and ranked inside each destination. The result list showed in part the structure of the document. The XML tags presented in the result list gave the user the opportunity to guess the content of the element, as well as the location of the element in the tree structure of the document, see Fig. 2 and Table 1.

Fig. 2. Screenshot of the system used showing the result list. The results are represented with the semantic XML mark-up and are grouped by destination.

All interactions with the system were recorded to a log. The following information was recorded in the log with time stamps; the system version (context/isolation), user ID, task number, query terms entered, and rank, document ID and path of viewed elements as well as any relevance assessments of these elements. The 114 logs are the main data analysed in this paper.

Fig. 3. Example of an element viewed in context.

A total of 29 test persons from four participating research groups each searched four simulated work tasks [11]. The simulated work tasks were chosen by the test persons from two groups of tasks with a total of eight different tasks, and the test persons were given a maximum of 10 minutes to solve each task. Each test person used both the system showing the results in context and in isolation. The task groups and systems were permuted to eliminate learning effects. When viewing the content of an element or whole document the test persons were prompted by the system to provide a relevance assessment of the content using the box shown in lower right corner of Fig. 3 and Fig. 4. The assessments could be given in one of two dimensions: The vertical axis determined the extent to which an element contained relevant information (exact; partial; not), and the horizontal whether an element needed the context of the surrounding elements to make full sense (broad; exact; narrow). This relevance scale is based on work done by Pehcevski, Thom and Vercoustre [12], because the 10-point scale used in the INEX 2004 Interactive track was to complex and difficult for the users to understand [13]. When a test person viewed the content of an element the box hovered in the lower right corner of the screen. If the user assessed the element, they were automatically taken back to the result list making assessment a natural part of the work flow.

Fig. 4. Example of an element viewed in isolation.

The test persons completed a number of questionnaires during the experiment. The gender distribution of the test persons was 12 female and 17 male. The age range was 20 – 47 with an average of 30.5 years. All participants came from an academic setting, either as students or as faculty. Other data from the questionnaires are not analysed in this paper. 2.2 Studied variables Apart from the relevance assessments described above we have studied the following variables: element overlap, element depth and task differences. Several types of overlap have been defined by Pehcevski and Thom to describe overlaps in lists of retrieved elements [14, p. 18]: • Ascendants overlap (A-overlap), which for a set of retrieved elements measures the percentage of elements that contain at least one other element in the set; • Descendents overlap (D-overlap), which for a set of retrieved elements measures the percentage of elements that are contained by at least one other element in the set;

• Overall overlap (O-overlap), which for a set of retrieved elements measures the percentage of elements that either contain or are contained by at least one other element in the set (a combination of A- and D-overlap). These measurements of overlaps are used for improving the IR system e.g. adjusting the ranking algorithms for element retrieval. Thus the focus is on the system and all the overlaps in the result list are counted from a given set of returned elements. In Table 1, a fictive result list is presented. From the system point of view the Aoverlaps are Hawaii/destination, Hawaii/destination/attractions and Fiji/destination. The D-overlaps would be all element except Hawaii/destination and Fiji/destination. Our focus is on the behaviour of the user and therefore we have adapted the overlap types to the context of users handling overlapping elements of the result list. Thus the terms will be used in a slightly different way; only the viewed elements will be considered, since users rarely look at all hits in the result list. This means that we count fewer overlaps. The fact that an overlap exists is not of interest in this study; only the fact that an overlap is viewed is interesting from a user perspective. The determination of how we count an A- or D-overlap also depends on the order of the viewed elements in the result list. In Table 1 the elements Hawaii/destination/attractions/attraction1 and Hawaii/destination/attractions/attraction2 are contained by Hawaii/destination/attractions/, and because they are viewed subsequently, they constitute D-overlaps. As Hawaii/destination is viewed last and contains all the other viewed elements, this element constitutes an A-overlap. The same applies to Fiji/destination. Hawaii/destination/weather and Fiji/destination/actractions/acttraction1/ are not taken in to account because they are not viewed by the user. Table 1. Document elements are represented by tags that make sense semantically. The order of the viewed documents shows that the user viewed both types of overlapping content. Result list

Element depth

Hawaii/destination/ Hawaii/destination/weather/ Hawaii/destination/attractions/ Hawaii/destination/attractions/attraction1/ Hawaii/destination/attractions/attraction2/

1 2 2 3 3

Viewed element order 4 1 2 3

Fiji/destination/ Fiji/destination/actractions/acttraction1/ Fiji/destination/transport/getting_there_and_away/topic/description/

1 3 5

2 1

Element depth measures how deep in the tree structure of the document a viewed element is placed. Note that elements of the same depth do not necessarily have the same size, e.g., measured in number of words. In the result list the tree structure is

denoted by slashes separating the semantic parts which name the elements. See Table 1 and Fig. 2. A whole document is represented by element depth 1. Differences between tasks were determined by cross-tabulating the 8 different tasks with the depth variable. 2.3 Methods used in the analysis We used the element depth as an indication of the granularity of an element. To find the preferred granularity, we extracted information from the logs to analyse the element depth of viewed and assessed elements. The weakness of this approach is that it only considers viewed and assessed elements. The logs do not contain data on which elements were available in the result list, and we do not know which elements the test persons had to choose from; only which ones they chose to view and assess. Also, measuring preference according to element depth instead of element size might not show an accurate picture, since the elements at one level can vary in size and in comprehensiveness. For the investigation of the extent to which overlapping contexts caused problems for users we identified A- and D-overlaps (as defined above) among the viewed elements and compared these to the all viewed documents. We did the same for assessed elements to get knowledge of the usefulness of the overlapping elements. Finally, we compared results from tasks performed in the context system and the isolation system to detect any differences in behaviour between the two. We compared the number of overlaps viewed and the depth of elements viewed. In the system which showed the content in isolation the test persons could get more context by choosing elements that were located at a higher hierarchical level in the tree structure. This was not necessary in the context view, where the users could browse the context of each element directly. The difference in the preferred granularity when using the two systems thus shows how the test persons coped with and without the context of an element.

3 Results 3.1 Element depth A total of 1506 elements were viewed in the experiment. As can be seen from Table 2, 23% were whole documents and the remaining 77% elements smaller than whole documents. Thus whole documents were the single granularity viewed most often, but elements smaller than whole documents constituted the vast majority of viewed elements. A fair proportion of the viewed elements were placed deep in the tree; almost half were at depth 4 or deeper. Fig. 5 shows the distribution of element depth over the eight tasks. As the number of test persons choosing to do each task was not the same the distribution in Fig. 5 is relative to the number of elements viewed for each task. It can be seen that for most tasks a similar pattern emerges. The exceptions are tasks 3 and 6 where much fewer whole documents (depth 1) were viewed. Both these tasks had much fewer viewed

elements compared to the rest, which may have influenced the results though. However, a Chi-square test shows that the distribution of element depth is not independent of the task. Even though there are differences in the depth of viewed elements between tasks, there is not a clear connection between task and users’ choice of depth. The chosen granularity might be a result of users’ personal preference. A thorough investigation of the factors influencing the choice of granularity has not been done. Table 2. Overall distribution of viewed elements over element depth. Depth 1 (whole document) 2 (sections) 3 (sub-sections) 4 (sub-sub-sections) 5 6 7 8 Total

Total 342 213 245 247 182 153 123 1 1506

(23%) (14%) (16%) (16%) (12%) (10%) (8%) (0%) (100%)

Depth - Task 100% 90% 80% 70%

Depth Depth Depth Depth Depth Depth Depth Depth

60% 50% 40% 30%

8 7 6 5 4 3 2 1

20% 10% 0% LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

Task

Fig. 5. Relative distribution of viewed elements over element depth and individual tasks.

3.2 Assessments Table 3 shows the distribution of assessed elements over element depth. Comparing Table 2 and 3 it can be seen that 91% of the viewed elements were assessed. This indicates that the assessment procedure was less obtrusive than in the 2004 Interactive track at INEX where only 62% were assessed [5]. Overall, a fairly low proportion of the viewed elements had useful and relevant content in the present experiment: only 11% were assessed as Exact and an additional 16% as Partial. The rest was either Not

relevant (44%), too Broad (16%) or too Narrow (13%). At all depths the percentage of elements assessed as Not relevant is high (min. 38%), and not much can be learned about which granularity is (not) preferred from analysing the Not relevant. Table 3. Distribution of assessments over element depth with percentages over element depth. Depth

Broad

Narrow

Exact

Partial

Not

Total

1

136 (43%)

3 (1%)

11 (3%)

33 (10%)

137 (43%)

320 (100%)

2

46 (23%)

17 (8%)

20 (10%)

29 (14%)

92 (45%)

204 (100%)

3

12 (5%)

30 (14%)

55 (25%)

41 (18%)

84 (38%)

222 (100%)

4

13 (8%)

48 (22%)

31 (14%)

32 (15%)

92 (43%)

216 (100%)

5

9 (5%)

36 (21%)

12 (7%)

35 (21%)

76 (52%)

168 (100%)

6

4 (3%)

19 (14%)

13 (10%)

29 (21%)

71 (52%)

136 (100%)

7

1 (1%)

21 (20%)

5 (5%)

23 (21%)

57 (53%)

107 (100%)

8

-

1 (100%)

-

-

-

1 (100%)

Total

221 (16%)

175 (13%)

147 (11%)

222 (16%)

609 (44%)

1374

Examining the relationship between relevance and element depth Table 3 shows as expected that that whole documents (depth 1) account for a large part of the Broad assessments, and that only a minor part of the whole documents and sections (depth 1 and 2) were assessed as Narrow. Whole documents (depth 1) were assessed to be too Broad or Not relevant in 86 % of all cases. Only 3% of the retrieved whole documents were assessed as being Exactly relevant. Table 4 shows the relative distribution of the elements assessed as Exactly relevant. Elements at depth 2-4 seem to be the preferred granularity accounting for 72% of the Exactly relevant, with 37% at depth 3 alone. Table 4. The distribution of elements with the assessment exact according to depth. Depth 1 2 3 4 5 6 7 8 Total

Exact 11 20 55 31 12 13 5 0 147

Percentage 7% 14% 37% 21% 8% 9% 3% 0% 100%

3.3 Overlapping elements A total of 422 elements (28 % of the viewed elements) were O-overlaps as shown in Table 5. Of these 216 (14%) were A-overlaps and 206 (14%) D-overlaps. Interestingly, there is almost no difference in the distribution of overlaps between the two views. They agree closely with the general distribution of viewed elements comparing context to isolation, as can be seen in Table 5.

Table 5. Overlaps in viewed elements and by type of system. Viewed

D-overlaps

A-overlaps

O-overlaps 191 (45%)

Context

704 (47%)

93 (45%)

98 (45%)

Isolated

802 (53%)

113 (55%)

118 (55%)

231(55%)

1506 (100%)

206 (100%)

216 (100%)

422 (100%)

Total

An overview of the assessments of the overlapping elements is shown in Table 6, as well as the distribution of all assessments. Of the elements that test persons had already seen in the context of a larger element (D-overlaps), a larger proportion (18%) was considered to be Exactly relevant, than in the total of assessed elements (11%). 28% of all A-overlaps were assessed as Broad, while only 16% off all assessed elements were assessed broad. A-overlaps would often be larger than an element viewed previously. Looking at the elements that were considered to be too Narrow, 19% were A-overlaps compared to only 13% all over. 95% of the D-overlaps and 96% of the A-overlaps were assessed. Table 6. Overlaps distributed over assessments. Assessment

D-overlaps

A-overlaps

Assessed elements

Broad

20 (10%)

59 (28%)

221(16%)

Narrow

26 (13%)

39 (19%)

175(13%)

Exact

35 (18%)

13 (6%)

147(11%)

Partial

41 (21%)

33 (16%)

222(16%)

Not Total

74 (38%)

64 (31%)

609(44%)

196 (100%)

208 (100%)

1374(100%)

3.4 Context vs. Isolated systems As Table 5 shows 802 (53%) of the viewed elements were viewed in isolation and 704 (47%) in context. It is noteworthy that the difference is so small because the users had the chance to browse in the whole document in the context view, which could have lead to much fewer viewed elements in the context view. Fig. 6 shows a breakdown of the proportion of viewed elements in the two systems distributed over element depth. As can be seen there is not large differences between the two systems. A slightly higher proportion of elements were viewed at depths 2-4 in the context system (the same granularities where the most Exactly relevant documents were found). More whole documents were viewed in isolation and so were documents at depths 5 – 7(8). Note that only one element of depth 8 was viewed.

Context-isolated vs depth 100%

100% 90% 80% 70% 60% 50%

58% 52% 48%

51% 49%

42%

52% 48%

57% 43%

58% 42%

59% Context Isolated

41%

40% 30% 20% 10%

0%

0% Depth 1

Depth 2

Depth 3

Depth 4

Depth 5

Depth 6

Depth 7

Depth 8

Depth

Fig. 6. Depth of elements viewed in context and in isolation.

4 Discussion On the whole, the results provide support for continued research into element retrieval. The majority of the viewed and relevant content was found in elements rather than whole documents: 77% of all viewed content was elements and almost all Exactly relevant were elements. Even bearing in mind that the system used in the experiment is an element retrieval system the results are noticeable. If the test persons disliked element retrieval and preferred to view whole documents they had the opportunity to browse the whole document in the context system variant. However, there were almost no differences to the isolated system, indicating that only limited browsing was done in the context system. The fact that elements were preferred in the experiment may be influenced by the characteristics of the collection: the documents were fairly long and the tags meaningful; both may have instigated interaction with elements. Compared to earlier studies, where the documents were scientific articles, the content in the Lonely Planet collection is probably better suited for element retrieval. A travel guide is rarely read from cover to cover, but rather browsed for relevant bits of information. In scientific articles more context is often needed to understand the content of an element. Regarding the preferred granularity, the majority of the relevant information was found at depths 2-4 in the Lonely Planet collection: sections, sub-sections, and subsub-sections accounted for 72% of the Exactly relevant content. This did not vary much across tasks or system type. Note that these results may be biased in favour of element retrieval: had the actual size of the elements, e.g., in number of words, been available large elements and whole documents might be found to contain a larger proportion of relevant content. Regardless of this, the element retrieval system was able to point to a large amount of relevant elements, and many of these were viewed and assessed by the test persons. The problem of overlapping contexts did not seem to be a huge problem in the current experiment. In interactive track at INEX 2004 the system displayed the

elements out of context, that is, the elements were not grouped by document and elements from the same document were scattered all over the result list according to the ranking of the element. With the 2004 system the test persons expressed annoyance with not being able to see all elements retrieved from a document at one glance [10]. Almost no overlaps were assessed and Pehcevski, Thom & Vercoustre [12] concluded that the users did not want to view overlapping elements. This problem seems to be solved with the B3-SDR system used in the current study, mainly because the elements from the same document were grouped in the result list. In combination with meaningful tags, this feature may have made it easier find a suitable level of granularity and to detect overlapping elements, and perhaps even deliberately choose overlapping elements as a means to view more context. Thus it seems to be possible to solve the earlier observed problems in relation to overlaps at the interface level when designing element retrieval systems. The very small differences between the in-context and in-isolation system versions are interesting. We expected the test persons to take advantage of the possibility for browsing and have fewer A-overlaps in the in-context system, because they would not need to access an element higher in the tree structure to see the context. The lack of significant differences might be interpreted as an indication that the context of surrounding elements did not matter that much to the test persons. It may, however, be that the result list itself provided enough context for the test persons to identify the relevant content. In addition, the experimental setting itself with first time users and fairly short time to solve each task (max. 10 minutes) may have affected the results. With users more experienced in using the system and with no time restriction differences between context and isolation systems might occur.

5 Conclusion The tests persons in our study find the possibility of viewing elements, that is, parts of documents a useful feature: Most of the viewed content and the vast majority of relevant content were elements. At the same time the structure of the whole documents seem to be well-suited for organising the retrieved elements and put them in context. Element retrieval then becomes not so much a question of either-or, but rather of both-and in relation to elements versus whole documents. In addition, it seems that the problem of overlapping elements can be solved in end-user systems at the interface level by replacing an atomic view of element retrieval with a contextual view (i.e., grouping results by document). If this is done, it appears that presenting the actual content in either isolation or in context does not lead to large changes in behaviour; at least with a setup similar to the one used in this study. The initial analysis presented here has focussed on a small number of variables and the relations between them. More data is available in the questionnaires and logs and this could be studied in future research to get more detailed results, e.g., by analysing comments made by the test persons on the two systems. In addition to the data set analysed in this paper, we have collected eye-tracking recording of 6 test persons doing 24 tasks. We plan to study these recordings in detail to observe any browsing

behaviour and analyse what the test persons actually looked at in the result list and the two ways of viewing the content Acknowledgments. We would like to thank Lonely Planet Publications Pty Ltd for access to their data, and Roelof van Zwol from Utrecht University for generous access to the B3-SDR system while writing this paper. INEX is an activity of the DELOS Network of Excellence on Digital Libraries.

References 1. 2. 3.

4. 5. 6.

7. 8. 9. 10. 11. 12.

13.

Baeza-Yates, R. and Ribeiro-Neto, B. (1999): Modern information retrieval. Harlow: Addison Wesley. 513 p. Chiaramella, Y. (2001): Information retrieval and structured documents. In: Agosti, M., Crestani, F. and Pasi, G. eds. Lectures on information retrieval : third European summerschool, ESSIR 2000. Berlin: Springer, p. 286-309. (LNCS ; 1980) Ingwersen, P. and Järvelin, K. (2004): Information retrieval in contexts. In: Ingwersen, P., van Rijsbergen, K. and Belkin, N. eds. ACM SIGIR 2004 Workshop on "Information Retrieval in Context". Sheffield: [University of Sheffield], p. 6-9. (http://ir.dcs.gla.ac.uk/context/) Lalmas, M. and Kazai, G. (2006): Report on the ad-hoc track of the INEX 2005 workshop. ACM SIGIR Forum, 40(1), p. 49-57. Tombros, A., Larsen, B. and Malik, S. (2005): The Interactive Track at INEX 2004. In: Fuhr, N., Lalmas, M., Malik, S. and Szlávik, Z. eds. Proceedings of INEX 2004. Berlin: Springer, p. 410-423. (LNCS ; 3493) Larsen, B., Malik, S. and Tombros, A. (2006): The interactive track at INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S. and Kazai, G. eds. Proceedings of INEX 2005. Berlin: Springer, p. 398-410. (LNCS ; 3977) (Preprint at http://inex.is.informatik.uni-duisburg.de/2005/workshop.html) Fuhr, N., Lalmas, M., Malik, S. and Kazai, G. eds. (2006) Proceedings of INEX 2005. Berlin: Springer. (To appear as LNCS ; 3977) Kazai, G., Lalmas, M. and de Vries, A. P. (2004): The overlap problem in content-oriented XML retrieval evaluation. In: Järvelin, K., Allan, J., Bruza, P. and Sanderson, M. eds. Proceedings of SIGIR 2004. New York: ACM Press, p. 72-79. Clarke, C. L. A. (2005): Controlling overlap in content-oriented XML retrieval. In: Marchionini, G., Moffat, A. and Tait, J. eds. Proceedings of SIGIR 2005. New York: ACM Press, p. 314-321. Tombros, A., Malik, S. and Larsen, B. (2005): Report on the INEX 2004 interactive track. SIGIR Forum, 39(1), 43-49. (http://www.sigir.org/forum/2005J-TOC.html) Borlund, P. (2003): The IIR evaluation model : a framework for evaluation of interactive information retrieval systems. Information Research, 8(3), paper no. 152. (http://informationr.net/ir/8-3/paper152.html) Pehcevski, J., Thom, J. A. and Vercoustre, A.-M. (2005): Users and assessors in the context of INEX: Are relevance dimensions relevant? In: Trotman, A., Lalmas, M. and Fuhr, N. eds. Proceedings of the INEX 2005 Workshop on Element Retrieval Methodology, p. 47-62. (http://www.cs.otago.ac.nz/inexmw/Proceedings.pdf) Pharo, N. and Nordlie, R. (2005): Context matters : an analysis of assessments of XML documents. In: Crestani, F. and Ruthven, I. eds. Proceedings of CoLIS5. Berlin: Springer, p. 238-248. (LNCS ; 3507)

14. Pehcevski, J. and Thom, J. A. (2005): HiXEval: Highlighting XML Retrieval Evaluation. In: Fuhr, N., Lalmas, M., Malik, S. and Kazai, G. eds. INEX 2005 workshop preproceedings, p. 11-24. (http://inex.is.informatik.uni-duisburg.de/2005/workshop.html)