) and phrase constraints to our new Indri queries. For instance, for INEX query (id=2009021) we have: //article[about(., ‘‘wonder girls’’)]
The formulated Indri query looks like: #combine[article](#1(wonder girls))
For CAS queries for any XML element type (noted as * ), we retrieve either aritle element only or additional elements such as bdy, link, p, sec, section, st, and title. An example CAS query is //*[about(., Dwyane Wade)] (id=2009018).
4
Retrieval Model and Strategies
We use the run of full article retrieval as the baseline for both CO and CAS queries. The retrieval model for the baseline runs is based on cross-entropy scores for the query model and the document model that is smoothed using Dirichlet priors. It is defined as follows: score(D|Q) =
l ! i=1
Pml (ti |θQ ) · log
"
tf (ti , D) + µPml (ti |θC ) |D| + µ
#
(1)
where l is the length of the query, Pml (ti |θQ ) and Pml (ti |θC ) are the Maximum Likelihood (ML) estimates of respectively the query model θQ and the collection model θC . tf (ti , D) is the frequency of query term ti in a document D. |D| is the document length. µ is the smoothing parameter. For XML element retrieval, we compute the relevance score (score(E|Q)) of the queried XML field (E) in regard to the given CAS query (Q). The smoothed document model (expressed in the log function) is tailored to compute the ML estimate of the XML element model Pml (ti |θE ). We set up our language model and model parameters based on the experimental results of similar tasks for INEX 2008. Here µ is considered to be 500.
4.1
Baselines
Baseline runs retrieve full articles for CO or CAS queries. Only the #combine operator is used. We submitted the results of CAS query for the Thorough and the Focused tasks and the results of CO query for the Relevant in Context and the Best in Context tasks. This baseline performance indicates the performance of the Indri search engine in the setting of XML element retrieval.
4.2
Strategies for Overlapping Removal
Within the ad-hoc XML retrieval track, there are four sub-tasks: • The Thorough task requires the system to estimate the relevance of elements from the collection. It returns elements or passages in order of relevance (where specificity is rewarded). Overlap is permitted. • The Focused task requires the system to return a ranked list of elements or passages. Overlap is removed. When equally relevant, users prefer shorter results over longer ones. • The Relevant in Context task requires the system to return relevant elements or passages clustered per article. For each article, it returns an unranked set of results, covering the relevant material in the article. Overlap is not permitted. • The Best in Context task asks the system to return articles with one best entry point. Overlap is not allowed. Because of the hierarchical structure of an XML document, sometimes a parent element is also considered relevant if its child element is highly relevant to a given topic. As a result, we obtain a number of overlapping elements. To fulfill the overlap-free requirement for the Focused task, the Relevant in Context task and the Best in Context task, we adopt the following strategies to remove overlapping element paths based on the result of the Thorough task: • Relevance Score: The result of the Thorough task is scanned from most to less relevant. When an overlapped element path is found within a document, then the element path with lower relevance score is removed. (see Figure 1). In case that overlapped elements in the same document have the same relevance score, we choose the element with the higher rank.
Fig. 1. Example result of the Focused task (qid=2009005)
Next the overlap-free result is grouped by article. For each query, the articles are ranked based on their highest relevancy score. For each article, the retrieved element paths keep the rank order of relevance (see Figure 2). For the Best in Context task, we choose the most relevant XML element path of each article as our result.
Fig. 2. Example result of the Relevant in Context task (qid=2009005)
• Relevance Score and Full Article Run: In addition to the Relevance Score strategy, we combine our overlap-free result with the result of the full article run (baseline run for CO query). We remove XML element paths whose article does not appear in the result of the full article runs (see Figure 3). The filtered result follows the rank order of our baseline run. We adopt the same strategy for the Reference task as well.
Fig. 3. Example result of the Reference task (qid=2009005)
5
Results
For each of the four sub-tasks, we submitted two XML element results and one extra result for the reference task. On the whole, we had 12 submissions to the ad-hoc track. Among them, 9 runs are qualified runs. Additionally, we report more results in this paper. 5.1
Full Document Retrieval
One of our goals for the ad-hoc track is to compare the performance of Indri search engine used in full document retrieval and XML element retrieval. The observation will be used to analyze our element language models and improve our overlapping removal strategies. For official runs, we submitted full document retrieval using both CO and CAS queries for four sub-tasks. Except the Thorough task, one run of the rest tasks was disqualified because of overlapped results. The qualified run for the Thorough task is an automatic run for CAS query (see Table 1). In the same table, we have an extra run for CO query.
For full document retrieval, the result of CAS query is slightly worse than that of the CO query. This performance gap indicates the difference between two expressions for the same topic of interest. Table 1. Results of full document retrieval tasks
performance metrics iP[.00] iP[.01] iP[.05] iP[.10] MAiP thorough (article.CO additional) 0.5525 0.5274 0.4927 0.4510 0.2432 thorough (article.CAS official) 0.5461 0.5343 0.4929 0.4415 0.2350
5.2
XML Element Retrieval
For official submission, we presented our results using the strategy of Relevance Score and Full Article Run. All qualified runs use CAS query. The result of the Thorough task is in Table 2. The additional runs are the original results without the help of full article runs. The run of element1.CAS returns article element type only for any element type (noted as * in ) requests while the run of element2.CAS returns all considered element types. Table 2. Results of XML element retrieval tasks thorough thorough thorough thorough
iP[.00] (element.CAS.ref official) 0.4834 0.4419 (element1.CAS additional) (element.CAS.baseline official) 0.4364 (element2.CAS additional) 0.4214
performance metrics iP[.01] iP[.05] iP[.10] 0.4525 0.4150 0.3550 0.4182 0.3692 0.3090 0.4127 0.3574 0.2972 0.3978 0.3468 0.2876
MAiP 0.1982 0.1623 0.1599 0.1519
The full document retrieval outperforms the element retrieval in locating all relevant information. The same finding shows in the performance difference between element1.CAS and element2.CAS. Our results again agree with the observation in previous INEX results. System wise, the given reference result (element.CAS.baseline) has better performance over Indri search engine (element.CAS.ref).
Focused Task The official and additional results of the Focused task are in Table 3. Our run (element.CAS.baseline) successfully preserves the retrieval result of the Thorough task and brings moderate improvement. The full document runs still have the highest rank in the Focused task.
Table 3. Results of XML element retrieval tasks focus focus focus focus focus focus
iP[.00] (article.CO additional) 0.5525 0.5461 (article.CAS additional) (element.CAS.ref official) 0.4801 (element.CAS.baseline official) 0.4451 (element1.CAS additional) 0.4408 0.4228 (element2.CAS additional)
performance metrics iP[.01] iP[.05] iP[.10] 0.5274 0.4927 0.4510 0.5343 0.4929 0.4415 0.4508 0.4139 0.3547 0.4239 0.3824 0.3278 0.4179 0.3687 0.3092 0.3999 0.3495 0.2909
MAiP 0.2432 0.2350 0.1981 0.1695 0.1622 0.1527
Relevant in Context Task As explained earlier, we rank documents by the highest element score in the collection and rank element paths by their relevance score within the document. Overlapping elements are removed as required. The retrieval result is in Table 4. The full document runs still dominate the performance and reference runs continue boosting retrieval results. Table 4. Results of XML element retrieval tasks relevant-in-context relevant-in-context relevant-in-context relevant-in-context relevant-in-context relevant-in-context
gP[5] (article.CO additional) 0.2934 (article.CAS additional) 0.2853 0.2216 (element.CAS.ref official) (element.CAS.baseline official) 0.1966 (element1.CAS additional) 0.1954 (element2.CAS additional) 0.1735
performance metrics gP[10] gP[25] gP[50] 0.2588 0.2098 0.1633 0.2497 0.2132 0.1621 0.1904 0.1457 0.1095 0.1695 0.1391 0.1054 0.1632 0.2150 0.1057 0.1453 0.1257 0.1003
MAgP 0.1596 0.1520 0.1188 0.1064 0.0980 0.0875
Best in Context Task This task is to identify the best entry point for accessing the relevant information in a document. Our strategy is to return the element path with highest relevance score in a document. The retrieval result is in Table 5. Using the relevance score as the only criteria has lead to a promising result when we compare the original runs (element1.CAS and element2.CAS) with the run boosted by the baseline (element.CAS.baseline). However, the run (element1.CAS) containing more article returns is still better than the run with other element returns.
6
Conclusion
In our official runs, we present our baseline results and results filtered by our document retrieval and the given reference run. In this paper, we provide additional results of the original document and XML element retrieval. The Indri search engine can provide reasonable result of XML element retrieval compared
Table 5. Results of XML element retrieval tasks best-in-context best-in-context best-in-context best-in-context best-in-context best-in-context
gP[5] (article.CO additional) 0.2663 0.2507 (article.CAS additional) (element.CAS.ref official) 0.1993 (element1.CAS additional) 0.2257 (element2.CAS additional) 0.2089 (element.CAS.baseline official) 0.1795
performance metrics gP[10] gP[25] gP[50] 0.2480 0.1944 0.1533 0.2305 0.1959 0.1499 0.1737 0.1248 0.0941 0.1867 0.1426 0.1125 0.1713 0.1343 0.1084 0.1449 0.1143 0.0875
MAgP 0.1464 0.1372 0.1056 0.1015 0.0924 0.0852
to results of full article retrieval and results of other participating groups. We can also use relevance score as the main criteria to deal with the overlapping problem effectively. However, the full document runs are still superior than XML element runs. When the results of reference and baseline run are used, the result of the Thorough task is improved. This may imply that the search engine is able to locate relevant elements within documents effectively. Besides the accuracy of relevance estimation, retrieval performance also depends on the effective formulation of Indri structured query. For example, the two ways of wildcard interpretation in element1.CAS and element2.CAS present different results in our experiments. The step of overlapping removal is the other key factor that may harm the retrieval performance. In our case, our result (element1.CAS) ranks high in the Thorough task but low in the Focus and Relevant in Context tasks. Except using the relevance score for removing the overlapping element paths, we may try other criteria such as the location of the element within a document. This is especially important for the Best in Context task as users tend to read a document from top-down. Acknowledgments This work is sponsored by the Netherlands Organization for Scientific Research (NWO), under project number 612-066-513.
References 1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) 2. Zhai, C.X., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. on Information Systems. 22(2), 179–214 (2004) 3. Schenkel, R., Suchanek, F.M., Kasneci, G.: YAWN: A Semantically Annotated Wikipedia XML Corpus. In 12. GI-Fachtagung fr Datenbanksysteme in Business, Technologie und Web, Aachen, Germany. (March 2007) 4. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A Language-model Based Search Engine for Complex Queries, In Proceedings of ICIA (2005)