Document not found! Please try again

An Empirical Study on Retrieval Models for Different Document ...

26 downloads 376 Views 304KB Size Report
ing on the document genre, that is, patents and newspaper articles. Issues related ..... inverse document frequency of term t (i.e., “1 + log( N nt. )”) dlbx number of ...
An Empirical Study on Retrieval Models for Different Document Genres: Patents and Newspaper Articles Makoto Iwayama

Atsushi Fujii∗

Noriko Kando, Yuzo Marukawa

Hitachi, Ltd. 1-280 Higashi-Kougakubo Kokubunji, 185-8601, Japan

Institute of Library and Information Science University of Tsukuba 1-2 Kasuga, Tsukuba 305-8550, Japan

National Institute of Informatics 2-1-2 Hitotsubashi Chiyoda-ku, 101-8430, Japan

[email protected]

[email protected]

[email protected], [email protected]

ABSTRACT

1.

Reflecting the rapid growth in the utilization of large test collections for information retrieval since the 1990s, extensive comparative experiments have been performed to explore the effectiveness of various retrieval models. However, most collections were intended for retrieving newspaper articles and technical abstracts. In this paper, we describe the process of producing a test collection for patent retrieval, the NTCIR-3 Patent Retrieval Collection, which includes two years of Japanese patent applications and 31 topics produced by professional patent searchers. We also report experimental results obtained by using this collection to reexamine the effectiveness of existing retrieval models in the context of patent retrieval. The relative superiority among existing retrieval models did not significantly differ depending on the document genre, that is, patents and newspaper articles. Issues related to patent retrieval are also discussed.

Reflecting the rapid growth in the utilization of large test collections for information retrieval (IR) since the 1990s, extensive comparative experiments have been performed to explore the effectiveness of various retrieval models. A sample of existing test collections includes those produced in TREC1 , CLEF2 , and NTCIR3 , where most of the target documents are newspaper articles and abstracts in technical publications. In the Third NTCIR Workshop (NTCIR-3), which was held over a period of one and a half years (from June 2001 to December 2002), the first serious effort was made to explore information retrieval, targeting patent documents. During the NTCIR-3 Workshop, the authors of this paper organized the Patent Retrieval Task [3] 4 , where a test collection of patent documents was produced and used to evaluate a number of participating IR systems. While a number of commercial patent retrieval systems and services have operated for a long time, patent retrieval has not received much attention in the IR community. A major reason is the lack of test collections targeting patent information. Although a TREC test collection includes patent documents, the proportion of those documents is quite small. Consequently, most systems do not seriously use techniques focusing on patent documents, because the effect may be overshadowed by a larger number of other genres of documents, such as newspaper articles. However, patent documents (applications) are associated with a number of interesting characteristics, such as document length and document structures, for information retrieval, even from a scientific point of view. This background motivated us to promote research and development on patent information retrieval, by providing a test collection consisting of patent documents. The purpose of this paper is twofold. First, we describe the process of producing the above collection, namely the

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval models; H.3.4 [Systems and Software]: Performance evaluation; H.3.5 [Online Information Services]: Data sharing

General Terms Measurement, Performance, Experimentation

Keywords test collections, patent retrieval, retrieval models ∗ The second author is also a member of CREST, Japan Science and Technology Corporation.

1

http://trec.nist.gov/ http://clef.iei.pi.cnr.it/ 3 http://research.nii.ac.jp/ntcir/index-en.html 4 In brief, “tasks” are equivalent to “tracks” in TREC. In the NTCIR-3 Workshop, different types of fundamental text processing techniques, such as information retrieval and text summarization, were classified and organized into different tasks. Research groups participated in the workshop on a task-by-task basis. 2

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’03, July 28–August 1, 2003, Toronto, Canada. Copyright 2003 ACM 1-58113-646-3/03/0007 ...$5.00.

INTRODUCTION

       

NTCIR-3 Patent Retrieval Collection (Section 2). Second, we report experimental results obtained by using this collection (Section 3). In brief, our focus is to re-examine the effectiveness of existing retrieval models in the context of patent information retrieval, which has not been explored much in past IR literature.

2. 2.1

!#" $ % &' &! (*)+",-% ./ !0) 34!5376 8 6 8 6#.9) % &';: $ % )"53=)E" )F,!#. G &0% ."/ $06#.9 34! &,)+H>. 8 ? / !0'0" / $06#.9 34! &,)I

Figure 1: An example scenario of technology survey. applications published in 1998 and 1999. Fundamentally, those applications are used as a collection from which relevant documents are retrieved in response to search topics. The NTCIR-3 Patent Retrieval Collection also contains edited Japanese/English abstracts for Japanese applications published in 1995–1999. Japanese abstracts, the JAPIO Patent Abstracts, were edited on the basis of abstracts and claims in source applications. In other words, the JAPIO Patent Abstracts are not exactly the same as the abstracts written by applicants. In addition, the length of the JAPIO Patent Abstracts is standardized and the average number of characters in the abstracts is approximately 400. English abstracts, the Patent Abstracts of Japan (PAJ), are human translations of the JAPIO Patent Abstracts. Those additional documents were primarily intended for training crosslanguage IR systems. However, those abstracts can also be used as target documents, because relevant abstracts can be identified from relevance assessments performed for applications. Table 1 shows the overview of patent documents in the NTCIR-3 Patent Retrieval Collection. Table 1: Documents in the NTCIR-3 Patent Retrieval Collection. Type Language Years # of documents File size (byte)

2.3

Applications full text Japanese 1998–1999 697,262 18,139M

JAPIO abstract Japanese 1995–1999 1,706,154 1,883M

PAJ abstract English 1995–1999 1,701,339 2,711M

Search Topics

As described in Section 2.1, search topics consist of newspaper articles (clippings) and supplementary memorandums describing user information needs. To produce realistic topics, we asked 12 members of the Intellectual Property Information Search Committee in the Japan Intellectual Property Association (JIPA)6 to produce 31 topics. Each JIPA member belongs to the intellectual property division in the company he or she works for, and they are all experts in patent searching. Figure 2 shows an example English search topic in SGML form. In this figure, the fields marked with
and 6

http://www.jipa.or.jp/english/index.html

correspond to a newspaper clipping and supplementary memorandum, respectively. As with other existing test collections, search topics also contain fields for titles (), descriptions (), and narratives (). For each topic, the description field generally includes terms (words) used in the title field. Intuitively, a combination of the article and supplement fields is as informative as the description field. However, newspaper articles often contain sentences irrelevant to search topics. Example terms associated with a topic are marked with . One or more patent application IDs relevant to the newspaper article are marked with . All 31 topics were initially written in Japanese, and were manually translated into English, Korean and, traditional or simplified Chinese. In the Patent Retrieval Task, for the purpose of crosssystem evaluation in a standard framework, each participant group was obliged to submit at least one retrieval result obtained with a combination of the article and supplement fields. However, in principle, any fields can be used to formulate queries depending on the purpose of evaluation.

2.4

Relevance Assessment

Relevance assessment was performed as follows: 1. After/while producing topics, the JIPA members performed manual searches to collect as many relevant patents as possible. We shall call the patent set assessed during the manual search “PJ”. The JIPA members were allowed to use any systems and resources, so that we were able to obtain a patent document set retrieved under the circumstances of their daily patent searching. 2. Participant groups submitted retrieval results, from each of which we extracted the top 30 patent documents, and produced a pool of patent documents for each of the 31 topics. We shall call this pool “PS”. 3. The JIPA members assessed the relevance for patent documents in “PS-PJ”, which were the patents they had not seen in their preliminary search. Grades of relevance were “A (relevant)”, “B (partially relevant)”, “C (irrelevant, judged by reading the content of a document)”, and “D (irrelevant, judged by looking at only the title of a document)”. When judging A, B, and C, claims and other fields were considered equally important. The average numbers of A, B, C, and D documents were 45.2, 29.3, 141.4, and 411.4, respectively. To produce the first six topics, the 12 members were divided into six groups consisting of two members, and each group produced one topic and performed relevance assessment. If there was disagreement between two members in the same group, they negotiated with each other to complete the six topics. However, for the remaining 25 topics, each of the 12 members independently produced one or more topics and performed relevance assessment, because negotiation (cross-check) was time-consuming. In all cases, members created queries in the domain of their professional work. Unlike existing methods for relevance assessment used in previous NTCIR workshops, where manual searches were performed after pooling to increase the exhaustiveness of the relevant documents, we performed manual searches before pooling to enable the comparison of the search abilities of

P004EN technology survey Device to judge relative merits by comparing codes such as barcodes with each other
JA-981031179 JA Society No 189 BANDAI lost a lawsuit for piracy filed by EPOCH at Tokyo District Court 1998-10-31 In settlement of the lawsuit filed by EPOCH INC., the toy manufacturer, against BANDAI CO., LTD. as compensation of 264 million for damages for infringement of a card game patent, the Tokyo District Court ordered BANDAI to pay about 114 million on the 30th. The presiding judge, Mr. Yoshiyuki Mori, indicated that some functions including key operation for the "Super Barcode Wars" mini game machine manufactured and sold by BANDAI CO., LTD. in July, 1992 to March, 1993 fell under the "technical range of a patent licensed to EPOCH INC.".
Determination of victory or defeat by comparing each other’s values based on codes from barcode readings does not conflict with the patent. What kind of devices determines leaders or victors by reading several codes such as barcodes and comparing the values corresponding to these codes? "Super Barcode Wars" is a type of mini game machine where recorded barcodes are read in cards featuring characters and the game proceeds in semi-real time by operating offence and defense keys. Sample codes include barcodes and magnetic codes, but shall not be defined as limited only to these. Sign, barcode, code, superiority or inferiority, victory or defeat, comparison, judgment PATENT-KKH-G-H01-333373

Figure 2: An example search topic in the NTCIR-3 Patent Retrieval Collection (topic ID: P004).

human experts and participant systems. The average numbers of A documents in “PJ-PS”, “PS-PJ”, and “PJ∩PS” were 14.2, 11.0, and 20.0, respectively. In other words, the numbers of relevant documents obtained by participating systems were fairly comparable with those obtained by the JIPA members.

3.

EXPERIMENTATION

3.1

Overview

The purpose of our experiments was to re-examine existing retrieval models from different perspectives, in the context of patent retrieval. For this purpose, we performed experiments on retrieving both patent documents and newspaper articles, and compared the results to find the scientific knowledge inherent in the patents, if any. Thus, we used the NTCIR-3 Patent Retrieval Collection (described in Section 2) and the NTCIR-3 CLIR (cross-lingual information retrieval) test collection consisting of two years worth of Japanese “Mainichi” newspaper articles, to perform comparative experiments. It may be argued that we can obtain certain knowledge by analyzing official results obtained with participating systems in the NTCIR-3 Workshop. However, because fundamental modules, such as morphological analyzers and term weighting methods, differed depending on the system, it is difficult to conduct a glass-box evaluation. In view of this problem, we implemented different retrieval models (systems) and performed comparative experiments independently of the NTCIR-3 Workshop. We used the ChaSen morphological analyzer (version 2.2.9) combined with the IPA dictionary (version 2.4.4)7 to extract nouns, verbs, adjectives, and out-of-dictionary words (i.e., words identified as “unknown” by ChaSen) as index terms from target documents, and performed word-based indexing. Out-of-dictionary words are often technical terms. We also used the same method to extract terms from search topics. We used GETA (Generic Engine for Transportable Association)8 , which includes C/Perl libraries for typical retrieval modules, and implemented different retrieval models on a distributed environment consisting of two PCs (CPU: dual Xeon 1.7 GHz, RAM: 2GB). Because all the software toolkits and test collections used are available to the public, our experiments can easily be reproduced. We used the following contents as target documents independently: • entire contents (full texts) of unexamined patent applications (Full), • author abstracts in unexamined patent applications (Abs), • claims in unexamined patent applications (Claim), • a combination of Abs and Claim (Abs+Claim),

Here, unexamined patent applications are target documents in the NTCIR-3 Patent Retrieval Collection (i.e., two years of Japanese patent applications published in 1998 and 1999). Table 2 shows statistics associated with the document length in words for different target document types. By looking at this table, one can see that the average length of Full is approximately 24 times that of Newspaper. The standard variance of Full is approximately 20 times that of Newspaper. In other words, the length of patent applications in words significantly differs depending on the document. Additionally, the maximum number of unique words (word types) contained in a single application is approximately 30,000, which is 20 times as large as that of newspaper articles. Figure 3 shows the distribution of the document length in words for different target document types. In this figure, the distribution of the document length for Abs and Jsh is roughly a normal distribution. Table 2: Statistics of document lengths in words. Genre Patent Type Full Abs Claim Abs+Claim Mean 3886 101 319 416 Std. variance 3198 32 386 392 Max 251,695 283 58,166 58,187 Max (unique) 29,189 133 3752 3775

Newspaper Jsh 165 35 930 229

165 160 4004 1397

Figure 4 shows the retrieval models compared in our experiments; they form a sample of typical models [9]. In this figure, “log(tf).idf.dl” is equivalent to a metric used in the SMART system [7]. Unlike the other seven models, SMART and BM25 [5]9 use the document length factor, which has been proved effective in the IR literature [8]. It should be noted that because the nine retrieval models were implemented based on common software modules, differences among retrieval models in retrieval accuracy are because of the effectiveness of the models themselves. For the purpose of formulating queries in patent retrieval, we used the following (combinations of) topic fields independently: , +, and
+. For retrieving newspaper articles, we used and a combination of and , extracted from all 42 topics in the NTCIR-3 CLIR collection. In all cases, we used only Japanese topics.

3.2

Results

Tables 3 and 4 show the mean average precision (MAP) values targeting patent documents and newspaper articles, respectively. In both tables, only A documents were regarded as relevant ones (i.e., rigid relevance). Figure 5 fundamentally shows the same information as Tables 3 and 4, but the x/y-axes correspond to retrieval models and average precision values, respectively.

• the JAPIO Patent Abstracts in 1998 and 1999 (Jsh), • two years of Mainichi newspaper articles in 1998 and 1999 (Newspaper). 7

http://chasen.aist-nara.ac.jp/ 8 http://geta.ex.nii.ac.jp/

9

Although log( N −nntt+0.5 ) of the BM25 may have an anomalous negative value where nt is very large [6], we sticked to the original formula in the experiments.

Figure 3: Distribution of the document length in words.

Retrieval model hits baseline tf idf tf.idf

wt bq,t × bd,t fq,t × bd,t fd,t fq,t × dlf d fq,t × idft fq,t × idft ×

log(tf)

(1 + log(fq,t )) ×

log(tf).idf log(tf).idf.dl BM25

fd,t dlfd

1+log(fd,t ) 1+log(avefd ) 1+log(fd,t ) (1 + log(fq,t )) × idft × 1+log(avef d) 1+log(fd,t ) 1 (1 + log(fq,t )) × idft × 1+log(avef × avedlb+S×(dlb d) d −avedlb) (K+1)×f d,t t +0.5 fq,t × log( N n−n ) × dlf d }+f t +0.5 K×{(1−b)+b avedlf

d,t

q query d document t term N number of documents in the collection nt number of documents in which term t exists bx,t existence (1) or absence (0) of term t in x fx,t frequency of term t in x idft inverse document frequency of term t (i.e., “1 + log( nNt )”) dlbx number of unique terms in x P dlfx sum of term frequencies in x (i.e., “ t∈x fx,t ”) x avefx average of term frequencies in x (i.e., “ dlf ”) dlbx avedlb average of dlbx in the collection The normalized scores for fd,t provided better results than those without normalization. The values of the constants (S = 0.2, K = 2.0, b = 0.8) were determined through preliminary experiments. Figure 4: Retrieval models (RSVq,d =

P t

wt ) compared in our experiments.

Table 3: Mean average precision values over the 31 topics targeting patent information (rigid relevance). Retrieval model hits baseline tf idf tf.idf log(tf) log(tf).idf log(tf).idf+dl BM25

D .1050 .0931 .0156 .1515 .0277 .1642 .2230 .2272 .2280

Full DN .0534 .0725 .0227 .1577 .0390 .1255 .2132 .2660 .2503

AS .0166 .0292 .0046 .0744 .0231 .0337 .1082 .1790 .0875

D .0727 .0732 .0132 .1197 .0239 .0579 .0884 .0887 .0838

Abs DN .0429 .0813 .0158 .1272 .0284 .0723 .1151 .1169 .0997

AS .0120 .0304 .0036 .0755 .0197 .0237 .0781 .0844 .0707

D .0516 .0566 .0136 .0941 .0279 .0669 .0978 .1028 .1039

Claim DN .0128 .0572 .0172 .0935 .0337 .0527 .1029 .1182 .1129

AS .0025 .0168 .0047 .0367 .0155 .0099 .0380 .0752 .0557

Abs+Claim D DN AS .0840 .0242 .0045 .0854 .0741 .0274 .0151 .0183 .0042 .1278 .1265 .0665 .0298 .0353 .0227 .0917 .0787 .0186 .1237 .1306 .0725 .1215 .1419 .1062 .1302 .1426 .0786

D .1171 .1066 .0113 .1730 .0222 .0821 .1226 .1184 .1356

Jsh DN .0547 .1138 .0166 .1682 .0258 .0899 .1465 .1501 .1474

AS .0373 .0538 .0025 .1271 .0166 .0457 .1223 .1271 .1015

D: , N: , A:
, S:

Figure 5: Mean average precision values for different retrieval models.

Table 4: Mean average precision values over the 50 topics targeting Mainichi newspaper articles (rigid relevance). Retrieval model D DN hits .1397 .1063 baseline .1436 .1865 tf .0755 .1054 idf .1914 .2443 tf.idf .1041 .1279 log(tf) .2266 .2124 log(tf).idf .2940 .2853 log(tf).idf+dl .2746 .3212 BM25 .2759 .3346 D: , N:

3.3

Discussion

We discuss suggestions derived from Tables 3 and 4 and Figure 5, from different perspectives.

Term frequencies In almost all cases (runs), the MAP value of the model relying solely on term frequencies (i.e., the tf model) was smallest in all the models compared. The tf model was even worse than the hits and baseline models. Additionally, the MAP value of the idf model was often decreased when combined with the tf model (i.e., the tf.idf model)10 . We used the paired t-test for statistical testing, which investigates whether the difference in performance is meaningful or simply because of chance [1, 4]. Table 5 shows that for patent retrieval, the MAP values of the idf and tf.idf models were significantly different in all 15 runs at the 5% level and in 12 cases at the 1% level, respectively. Besides this, when retrieving newspaper articles, the MAP values of the idf and tf.idf models were significantly different in both runs at the 1% level. These results suggest that a simple (naive) use of term frequencies was not effective and even decreased the MAP value. However, the logarithmic formulation of term frequencies (i.e., the log(tf) models) was effective when combined with the idf model. That is, the MAP value of the log(tf).idf model was greater than that of the idf model, specifically in cases where target documents are full texts. In Table 5, the MAP values of the log(tf).idf and idf models were significantly different in the three patent runs and the two newspaper runs (at the 1% level), respectively. At the same time, when retrieving abstracts, we could not find any significant differences between the log(tf).idf and idf models in the MAP value. In other words, the log(tf) model was effective in retrieving longer documents.

Inverse document frequencies Inverse document frequencies were generally effective. For Jsh and Abs, the MAP value of the idf model was greater than those of the SMART and BM25 models. For Jsh, the MAP value of the idf model was greater than those of the other models, irrespective of the topic field used. The MAP values of the idf model and the second highest model were significantly different at the 1% level (which is not shown in the tables). Additionally, for Abs, the MAP value of the idf model was greater than those of the other models except “AS”. The MAP values of the idf model and the second highest model were significantly different at the 5% level (which is not shown in tables). One possible rationale is that because abstracts (Jsh and Abs) are standardized (normalized) in terms of document length, the effect of the document length model in SMART and BM25 was decreased.

Comparison between SMART and BM25 The MAP values of SMART and BM25 were generally greater than those of the other models. The difference between SMART and BM25 in the MAP value was marginal. In fact, in Table 5 the MAP values of SMART and BM25 were significantly different at the 5% level, for only five cases. However, 10

This problem was also suggested in the first ACM SIGIR 2000 Workshop on Patent Retrieval. http://research.nii.ac.jp/ntcir/sigir2000ws/

for cross-genre retrieval (i.e., scenarios where queries were formulated from combinations of
and ), the MAP value of SMART was usually greater than that of BM25.

Cross-genre retrieval By comparing the MAP values obtained with and and those obtained with combinations of
and , one can see that the MAP value of cross-genre retrieval was generally smaller than that of conventional retrieval. A possible rationale is that the distribution of term frequencies differs depending on the document genre. In fact, in the NTCIR-3 Patent Retrieval Task, a retrieval model using different term frequencies depending on the genre [2] improved the MAP value of a standard model. At the same time, we do not pretend to draw any premature conclusions regarding the relative merits of different models in crossgenre retrieval. This issue remains an open question and should be further explored.

Comparison between full texts and abstracts The MAP values targeting full patent texts were generally greater than those for abstracts. Specifically, for SMART and BM25, the MAP values of full texts and abstracts were significantly different at the 1% level (which is not shown in the tables). By comparing Jsh and Abs in Table 3, the MAP value of Jsh was greater than that of Abs, except for the tf model. In Table 6, the differences were statistically significant at the 1% level, for most cases. This suggests that professional abstracts generally maintain higher quality than author abstracts for retrieval purposes.

Comparison between patents and newspapers By looking at Table 5, the relative superiority among different retrieval models did not significantly differ depending on the document genre (i.e., patents and newspaper articles). This may be counter-intuitive, because patent documents are significantly different from newspaper articles from a number of perspectives, such as document length and terminology. However, those perspectives did not affect the results of our experiments. In other words, existing standard models were relatively effective even for patent retrieval. One possible rationale is that our experiments simulated technology survey, which was relatively similar to the conventional retrieval scenario, compared with invalidity search. To further explore this issue, in the NTCIR-4 Patent Retrieval Task, we plan to perform an invalidity search task where each participant group searches five years worth of patent applications for those that could invalidate the demand in an existing claim.

4.

CONCLUSION

Given the growing number of large test collections for information retrieval (IR) since the 1990s, extensive comparative experiments have been performed to explore the effectiveness of various retrieval models. Most collections consist of newspaper articles and abstracts in technical publications. While a number of commercial patent retrieval systems and services have operated for a long time, patent retrieval has not received much attention in the IR community. One

Table 5: t-test result of the differences between retrieval models.

idf vs. tfidf idf vs. log(tf).idf log(tf).idf.dl vs. BM25

D À ¿

Full DN AS À > ¿ < À

D À >

Abs DN AS À À À

of the major reasons is the lack of test collections targeting patent information. This background motivated us to promote research and development on patent information retrieval, by providing a test collection consisting of patent documents. In this paper, we described the NTCIR-3 Patent Retrieval Collection and comparative experiments performed using this collection. First, we described the process of producing the NTCIR-3 Patent Retrieval Collection, which includes two years of Japanese patent applications. This collection also includes five years of Japanese patent abstracts and their English translations. Since this collection was produced to evaluate IR systems for technology survey, search topics were produced on the basis of technical newspaper articles. Human experts in patent searching produced 31 topics and also performed relevance assessment. Second, we reported experimental results obtained by using the above collection. For these experiments, we used open software toolkits to implemented nine existing retrieval models and re-examined the effectiveness of those models in the context of patent retrieval. To investigate the scientific knowledge inherent in patent retrieval, we also used the NTCIR-3 CLIR test collection consisting of two years of newspaper articles, and compared the results obtained with different genres of documents. Through our experiments, we re-validated past experimental results (e.g., discussions associated with the effectiveness of term frequencies, inverse document frequencies, and document length) in the context of patent retrieval. We also found that existing state-of-the-art retrieval models (i.e., SMART and BM25) were effective in patent retrieval. Future work will include investigating the effectiveness of various indexing methods, such as character/phrase-based methods, for patent retrieval. To further explore patent retrieval from a scientific point of view, in the NTCIR-4 Patent Retrieval Task, we plan to perform an invalidity search task where users search five years worth of patent applications for those that can invalidate the demand in an existing claim.

5.

REFERENCES

[1] D. Hull. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 329–338, 1993. [2] H. Itoh, H. Mano, and Y. Ogawa. Term distillation for cross-DB retrieval. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, 2003.

Patent Claim D DN AS À À >

Abs+Claim D DN AS À À >

Newspaper Full D DN À À ¿ ¿

Jsh DN À

D AS À À À > < > “¿”“À”: 0.01, “”: 0.05, “ ”: not significantly different

Table 6: t-test result of the difference between Jsh and Abs.

D À D À

hits DN idf DN À

AS À AS À

Jsh vs. Abs baseline D DN AS D DN À À À D

tf.idf DN AS

tf AS log(tf)

D >

DN

AS À

log(tf).idf log(tf).idf.dl BM25 D DN AS D DN AS D DN AS À À À À À > À À À “¿”“À”: 0.01, “”: 0.05, “ ”: not significantly different

[3] M. Iwayama, A. Fujii, N. Kando, and A. Takano. Overview of patent retrieval task at NTCIR-3. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, 2003. [4] E. M. Keen. Presenting results of experimental retrieval comparisons. Information Processing & Management, 28(4):491–502, 1992. [5] S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 232–241, 1994. [6] S. Robertson and S. Walker. On relevance weights with little relevance information. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 16–24, 1997. [7] G. Salton. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, 1971. [8] A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–29, 1998. [9] J. Zobel and A. Moffat. Exploring the similarity space. ACM SIGIR FORUM, 32(1):18–34, 1998.