Multilingual Information Retrieval with Asian

0 downloads 0 Views 389KB Size Report
While only one byte was used to code one character in European languages, ..... important meanings (usually included in a stopword list) or suffixes (e.g., “-ing”, ...
Multilingual Information Retrieval with Asian Languages Jacques Savoy

Institut interfacultaire d'informatique, Université de Neuchâtel, Pierre-à-Mazel 7 2000 Neuchâtel, Switzerland [email protected] Abstract There has been increasing interest in the Chinese, Japanese and Korean languages on the Web and the first objective of this paper is to compare the retrieval performances of nine vector-space and two probabilistic models when carrying out a monolingual search using these three Asian languages. Based on the latest NTCIR-3 test collection, our second goal is to analyze the relative merit of using various automated tools to translate Englishlanguage topics into Chinese, Japanese or Korean, and then submitting a search based on texts written in these languages. Moreover, we will show how to improve bilingual searches by using both a combined translation strategy and a data fusion approach. Finally, we will address the underling problems of multilingual searches when an English topic is used to search documents written in the English, Chinese and Japanese languages.

Introduction Given the increasing amount of information available on the Web written in various languages and from various sources (newspapers, news agencies, government sites, etc.), there is a need to provide effective access to these information sources. As the services offer increases, so does the demand. In September 2003, the number of online users was estimated to be 680!M (Global Reach1). These Web users do not speak the same language, and English native speakers represent only 35.6%, as compared to 12.2% for Chinese, 9.5% for Japanese (8% for Spanish, 7% for German, 4% for Korean, 3.7% for French and 3.3% for Italian). However, these numbers are difficult to estimate, due to the fact that some users may use other languages as well (for example, an estimated 32 million Americans switch from English to another language when accessing the Web from their homes (Spanish, for the most part)). Moreover, these percentages seem to be changing quite rapidly. In 1996 the English online population represented 80% of the total. In 2005 this proportion will represent 29%, and the Chinese online population 20%, Japanese 9% (Spanish 7%, German 6% and Korean 4.3%). In forthcoming years, it is expected that the use of Asian languages will increase at greater rates than it will for other languages. Moreover, most Asian languages are characterized by morphologies and syntax structures very different from those of European languages (Sproat, 1992). A Chinese sentence for example is made up of a continuous string of characters (or more precisely ideographs2 ) without any delimiting spaces. As such, finding words within such a continuous string is a significant task, which in turn affects the resolution of various technical problems related to linguistic analysis, machine translation or information retrieval. In addition, the number of characters contained in our Latin alphabet is limited to 26 (or 33 for the Cyrillic and 28 for the Arabic alphabets). On the other hand the Chinese language does not have a fixed number of characters, and the number of characters included in any encoding standard is large (e.g., around 13,500 in the BIG5 encoding system or around 7,700 in the GB/T 12345-90 standard system (Lunde, 1998)). With the proportion of Asian languages on the Web continually on the rise, both in terms of users and documents, and knowing that Asian languages represent a real challenge for the IR community, we have decided to investigate problems related to effective monolingual, bilingual and multilingual document search techniques using these languages. The results of our research on this topic will be presented as follows: Section!1 describes and evaluates various indexing and search strategies for monolingual Chinese, Japanese and Korean search systems. Given that English is an important language in the Far East and because English is included in the NTCIR-3

1 2

See the Web site http://global-reach.biz/globstats/ Also referred to as pictographs or logographs depending on their etymology.

test collection, this first section will also consider an English collection. Section!2 describes and evaluates bilingual searches that are based on an English topic, and able to retrieve information written in other languages, which in this case are Chinese, Japanese and Korean. Finally Section!3 discusses the underlying problems of multilingual information retrieval. Here, based on an English topic, the system must retrieve documents written in English, Chinese and Japanese and present the user with a single ranked list. 1. Monolingual information retrieval for Asian languages In order to develop IR systems for Asian languages, many of the underlying assumptions we have made about European morphology must be revised, and new indexing and retrieval strategies must be developed. While only one byte was used to code one character in European languages, now one to four bytes are needed for the Asian languages (Lunde, 1998). Moreover, Chinese documents may be written either in the traditional writing system (usually encoded in BIG5) or in simplified Chinese characters (encoded using a GB standard character set). On the other hand the vocabularies used are not always the same, due to the existence of various dialects (e.g., Mandarin, Wu, Hsiang, Min). In the Japanese language, documents may be written using Kanji ideograms (originating in China) together with the Hiragana and Katakana syllabic character sets and may possibly include some ASCII characters (used to express, for example, numbers or company names such as Honda). Finally, in the Korean language, both the Hanja and Hangul writing systems are found, although currently Hangul characters are clearly the ones most often used. This first section is organized as follows. Section!1.1 briefly describes the various corpora used in this paper and also during the NTCIR-3 evaluation campaign (Kando, 2003). Section!1.2 explains the main characteristics of both the nine vector-space schemes and the two probabilistic models used in our experiments. Section 1.3 compares our choices with other related studies. The last section provides an evaluation of various indexing and search strategies with respect to these four languages. 1.1. Overview of NTCIR-3 test collection The corpora used in our experiments were built during the third NTCIR evaluation campaign3 . This test collection, created to promote information retrieval studies on Asian languages, includes various newspapers written in four different languages. The English collection is taken from the Mainichi Daily News (Japan), Taiwan News and China Times English News (Taiwan). The Chinese collection contains news extracted from the United Daily News, China Times, China Times Express, Commercial Times, China Daily News, and Central and Daily News. These documents are written in Mandarin Chinese using the traditional Chinese character set. The Japanese collection is composed of articles taken from the Mainichi, while the Korean corpus is extracted from the Korean Economic Daily. Table!1 compares the various sizes of these corpora, ranking the Chinese collection as the largest, the Japanese corpus as second, the Korean corpus a very distant third, and the English collection as the smallest. When considering the mean number of distinct bigrams per document, Table!1 indicates that this value is clearly larger for the Chinese collection (363.4) than for the Korean (187.5) or the Japanese corpus (107.5). For the English collection, the mean number of distinct terms per document is 124.4. The relevance assessments resulting from this test collection assign four possible categories: “highly relevant” (for most important documents), “relevant”, “partially relevant” and “ n o t relevant”. In this study, we only made rigid assessments and thus only “highly relevant” and “relevant” items were considered as relevant, thus “partially relevant” and “not relevant” were categorized as irrelevant items. We assumed that only highly or relevant items would be useful for all topics. In certain circumstances however we assumed that partially pertinent records would be of some value. Given this rigid judgment system, the retrieval effectiveness measures depicted in this paper showed lower performance levels than they would have with more relaxed assessments. We believe however that the conclusions drawn would be similar whether using rigid or relax assessments. Table!1 also compares the number of relevant documents per topic, with the mean always being greater than the median (e.g., for the English collection, the average number of relevant documents per topic is 13.87 with the corresponding median being 6). These findings indicate that each collection contains numerous topics, yet only a rather small number of relevant items are found. For the English, Chinese and Japanese collections, the same set of 50 topics was created 3

See the Web site http://research.nii.ac.jp/ntcir/

and translated manually into the other languages. For each topic and each test collection however, relevant documents could not be found. For this reason we had to remove a given number of topics from the evaluation (topics having less than three relevant items were also removed). Finally the English collection consisted of 32 topics, while both the Chinese and Japanese corpus contained 42 topics. Appearing for the first time in a NTCIR evaluation campaign was the Korean corpus, having only 30 topics. As shown in Table!1, the Korean collection contained news from a different year (1994), thus making the available 30 topics different from those of the other three corpora. English 48!MB 22,927 1998-1999

Chinese Japanese Size (in MB) 490!MB 268!MB # of documents 381,681 220,078 Publication year 1998-1999 1998-1999 Encoding ASCII BIG5 EUC-JP Number of distinct indexing words or bigrams / document Mean 124.4 363.4 107.5 Standard deviation 74.1 219.9 84.5 Median 109 326 84 Maximum 834 5,935 2,430 Minimum 16 1 4 # of topics 32 42 42 Number rel. items 444 1,928 1,654 Mean rel.!/!topic 13.87 45.90 39.38 Standard deviation 21.98 48.03 57.82 Median 6 26 18 Maximum 121 (Q18) 169 (Q46) 296 (Q23) Minimum 3 (Q34) 3 (Q38) 3 (Q40)

Korean 68!MB 66,146 1994 EUC-KR 187.5 154.6 134 2,331 3 30 2,081 69.37 85.09 35.5 286 (Q4) 3 (Q11)

Table!1: Test collection statistics (resulting from rigid evaluation) 002 Joining WTO Find possible problems that industries will meet after Taiwan's joining WTO. It has taken Taiwan 10 years to get in to WTO. The Council For Economic Planning and Development, Chung-Hua Institution for Economic Research and Taiwan Institution for Economic Research evaluated the beneficial result of joining WTO. Related contents are supposed to include the evaluation contents, the advantages and disadvantages and the effects on agriculture, industry and business. If the documents only describe the opinions, comments, and attitudes of the America and other countries, or the political and diplomatic issues, they will be regarded irrelevant. Taiwan, WTO, agriculture, industry, benefits, economy, World Trade Organization 006 Nobel Prizes in Physics Retrieve reports relating to 1998 Nobel Prizes in Physics Nobel Prizes in Physics is to award the honor laurel to savants who contribute significant achievements toward quantum mechanics. Related contents are supposed to include introducing laurelled savants, Cui Qi, Laughlin, Stormer, their research contents, contributions, acceptance speeches and other related reports. Take Nobel prizes in physics in principal, ignore other Nobel Prizes. Nobel Prizes in Physics, Cui Qi, Laughlin, Stormer, research, physics, Table!2: Examples of two topics included in the NTCIR-3 test collection Following the TREC model, each topic was structured based on four logical sections: a brief title (“T“), a one-sentence description (“D“), a narrative (“N“) part specifying the relevance assessment criteria and a concept (“C“) section providing some related terms (examples are depicted in Table!2). The available topics, instead of being limited to a narrow subject range, reflect various information needs (such as “Pol Pot's war crimes”, “E-Commerce“, “China Airlines Crash” or “Satellite ST1”).

1.2. Indexing and searching strategies When faced with new languages and particularly with ideographic languages characterized by morphological constructions that are different from those used in most European languages (Peters et al., 2003), we thought it was important to evaluate the retrieval performance under various conditions, allowing us to draw some useful conclusions. We also thought that experiments based on a single indexing and search strategy, even when based on specific test collections, would make comparisons difficult (e.g., they would be based on different parameter settings, various query preprocessing techniques, make use of different blind query expansion approaches, etc.). In order to obtain a broader view, we decided to evaluate various indexing and search models. First we considered adopting a binary indexing scheme in which each document (or topic) was represented by a set of keywords, without any weight (retrieval model denoted “doc=bnn, query=bnn” or “bnn-bnn”). To measure the similarity between the document representation and the query or to define the document score, we computed the inner product. In order to weight the presence of each indexing term in a document surrogate (or in a query), we might account for the term occurrence frequency (retrieval model denoted “nnn-nnn”) or we might also account for their frequency within the collection (or more precisely the inverse document frequency, denoted by idf). Moreover, cosine normalization could prove beneficial and each indexing weight could vary within the range of 0 to 1 (retrieval model notation: “ntc-ntc”). Other variants might also be created. For example, the tf component may be computed as 0.5 + 0.5 ·![tf / max tf in a document] (retrieval model denoted “atn”). Moreover, we might apply a different weighting scheme for the document representations and for the queries (e.g, “doc=atn, query=ntc” or “atn-ntc”). We might also consider that a term's presence in a shorter document provides stronger evidence than it would in a longer document, leading to more complex IR models; for example, the IR model denoted by “Lnu” (Buckley et al., 1996), “dtu” (Singhal et al., 1999). Table!3 shows the exact weighting formulation for the various IR models used in this paper, where n indicates the number of documents in the collection and nti the number of distinct indexing terms included in the representation of Di . bnn ltn dtn

wi j = 1 wi j = (ln(tfi j) + 1) . idfj wi j = ln[(ln(tfi j)!+!1)!+!1]!.!idfj

Okapi

wi j =

lnc

wi j = †

((k1 +1) ⋅ tfij )

(K + tfij )

ln(tfij ) +1 t

 (ln(tfik ) +1)

nnn atn npn

wi j = tfi j wi j = idfj .![0.5+0.5.tfi j/max!tfi .] wi j = tfi j .!ln[(n-dfj )!/ dfj ] Ê1+ ln(tfij ) ˆ Á ln(mean tf) +1 ˜¯ Ë wi j = (1- slope) ⋅ pivot + slope ⋅ nt i

Lnu ntc

wi j =

2



k=1

ltc

wi j =

tfij ⋅ idf j t

 (tfik ⋅ idfk )

2

k=1

(ln(tfij ) +1) ⋅ idfj t

† ((ln(tfik ) +1) ⋅ idfk )



2

k=1

dtu

wi j = †

(ln(ln(tfij ) +1) +1) ⋅ idfj (1- slope) ⋅ pivot + (slope ⋅ nt i )

Table 3: Weighting schemes In addition to the previous models based on the vector-space approach, we also considered † used the Okapi probabilistic model (Robertson et al., 2000) probabilistic models. In this vein, we within which K = k1 !·![(1!-!b)!+!b!·!(li !/!avdl)] represents the ratio between the length of Di measured by li (sum of tfi j) and the collection mean noted by avdl. As a second probabilistic approach, we implemented the Prosit (or DFR, Deviation from Randomness) approach (Amati & van Rijsbergen, 2002), based on the following indexing formula: wi j = Inf1 i j!·!Inf2 i j = (1!- Prob1 i j)!·!Inf2 i j with

Prob1 i j = tfni j / (tfni j!+!1)

with tfni j = tfi j!·!log2 [1!+!((c!·!mean!dl)!/!li )]

Inf2 i j = -log2 [1!/!(1+lj )]!-!tfni j!·!log2 [lj !/!(1+lj )]

with lj = tcj !/!n

where wi j represents the indexing weight attached to term tj in document Di , tcj indicates the number of occurrences of term tj in the collection and n the number of documents in the corpus. In our experiments, the constants b, k1 , avdl (Okapi model), c and mean dl (Prosit model) are fixed according to values listed in Table!4, while we fixed the constant pivot as 0.1, and the slope as 100 (both used in “Lnu” and “dtu” models). These values were chosen because they usually result in a better retrieval effectiveness. For the English collection, the indexing process was based on words obtained by removing highfrequency terms (in this study, we used the SMART stopword list of 571 words), and removing some suffixes, in order to conflate word variants into the same stem or root. This last procedure, called stemming, is based on the SMART stemmer that incorporates the Lovins’ algorithm (1968). Language English Chinese Japanese Korean

Index word bigram bigram bigram

b 0.55 0.5 0.5 0.75

Okapi k1 1.2 1.2 1.2 1.2

avdl 500 500 500 500

c 1.0 1.5 2.0 2.0

Prosit mean dl 151 480 144 295

Table!4: Our parameter setting for the Okapi and Prosit models For the Asian languages, we indexed the documents based on the overlapping bigram approach. In this case, the sequence “ABCD EFGH” would generate the following bigrams {“AB”, “ B C ” , “CD”, “EF”, “FG” and “GH”}. In our work, we generated these overlapping bigrams4 only when using Asian characters. In this study, spaces and other punctuation marks (collected for each language in their respective encoding) were used to stop the bigram generation. Moreover, if we found terms written with ASCII characters (usually numbers or acronyms such as “WTO” in Table!2), we did not split these words. For Japanese language, such a bigram generation may include one Kanji and one Katakana character which may pick up phrase effects but may also be open to discussion. This simple indexing approach has the advantage of not requiring extensive knowledge of the underlying language and thus can be viewed as a language-independent indexing strategy. We also tried using unigrams (or characters) or trigrams to index Asian documents. However these indexing schemes clearly resulted in lower mean average precision levels, compared to the bigram-based approach. Of course, the most frequent bigrams can be removed before indexing. For the Chinese language, we defined and removed a list of 215 most frequent bigrams, for Japanese 105 bigrams and for Korean 80 bigrams. From the Japanese documents, we remove all Higarana characters before generating the bigrams. In this language, the 46 Katakana and 46 Hiragana characters are used to represent sounds, or more precisely syllables. The Hiragana are mainly used to write grammatical words (e.g., do, doing, and, of), and inflectional endings (e.g., possessive, subject or object marker) for verbs, adjectives and nouns. Thus removing Hiragana can be viewed as removing words not having important meanings (usually included in a stopword list) or suffixes (e.g., “-ing”, “-ed”), usually removed by a stemming algorithm. The Katakana characters are used mainly to write words of foreign origin (e.g., encoding, computer), or foreign names (e.g., F o r d ) or onomatopoeic words (e.g., buzz). In our Japanese corpus, the Hiragana characters represent around 40.8% of the total, while 9% were Katakana and 50.2% were Kanji (without counting spaces or linefeed characters). 1.3. Related work on monolingual Asian languages In studying various approaches to monolingual Chinese retrieval, Kwok (2003) obtained slightly better mean average precision when indexing Chinese documents using overlapping bigrams instead of both bigram and short-word indexing (a good overview can be found in (Luk & Kwok, (2002)). These short words are composed of one to three characters, and are generated using a segmentation procedure (e.g. based on the longest matching principle (Nie & Ren, 1999; Foo &

4

A non-overlapping generation would produce {“AB”, “CD”, “EF”, and “GH”}.

Li, 2004)). However, with another test collection (simplified Chinese), the combined use of bigrams, unigrams and short words results in better retrieval effectiveness than does only a bigram indexing scheme (Kwok, 1999). In this study, the author also indicated that the retrieval performance for short-word indexing does not really depend on an accurate word segmentation procedure, a finding confirmed by Nie & Ren (1999) and Foo & Li (2004). For Foo & Li (2004), the segmentation of Chinese sentence has an effect on the retrieval performance, and the recognition of a higher number of 2-character words usually contributes to the retrieval enhancement. However these authors do not find a direct relationship between segmentation accuracy and retrieval effectiveness (moreover, manual segmentation does not always produce better performance than character-based segmentation). Also using the NTCIR-3 test collection, Chen & Gey (2003) suggest indexing Chinese documents by combining both overlapping bigrams and characters (with a stoplist of 718 Chinese terms). Moreover, these authors found that short-word indexing schemes achieve retrieval performance levels comparable with the combination of bigrams and characters, a result also confirmed by (Nie & Ren, 1999). For the Japanese language, Chen & Gey (2003) following Fujii & Croft (1993), suggest removing all Hiragana characters, and indexing the documents using Kanji, Katakana and ASCII characters. As in our approach, words written in ASCII were not split. Using NTCIR-3 test collections and “D” topics, Chen & Gey (2003) obtained a mean average precision of 0.2802 when combining overlapping bigrams and characters vs. 0.2758 for a word-based indexing strategy (words were segmented with the Chasen morphological analyzer (Matsumoto et al., 1999)). The performance difference is too small (1.6%) to be worth consideration. For the Korean language, Chen & Gey (2003) proposed removing blank spaces and indexing documents written in this language with the same approach employed for Chinese (combining both bigrams and unigrams, with a stoplist of 97 bigrams and 15 unigrams). Previously Lee & Ahn (1996) have also suggested using n-gram representation for the Korean language. In fact, even though word boundaries are spaces, the Korean language also uses numerous suffixes and even prefixes. Moreover, words obtained by removing suffixes are often compound nouns, a morphological construction used frequently in this language, and such compound word can be separated into simple nouns using a morphological analyzer. Murata et al. (2003) obtained effective retrieval results using this type of tool. Lee et al. (1999) however showed that n-gram indexing could provide similar and sometimes better retrieval effectiveness than a word-based indexing approach, applied in conjunction with a decompounding scheme. 1.4. Evaluation of various IR systems To measure the retrieval performance, we adopted a non-interpolated mean average precision technique (computed on the basis of 1,000 retrieved items per topic by TREC_EVAL). To determine whether or not a given search strategy is better than another, a decision rule was required. To achieve this, we might apply statistical inference methods such as Wilcoxon's signed rank test or the Sign test (Hull, 1993) or hypothesis testing based on bootstrap methodology (Savoy, 1997). In this paper, we based our statistical validation on the bootstrap approach because this methodology does not require that the underlying distribution of the observed data follow the normal distribution. Thus, in the tables found in this paper we have underlined statistically significant differences based on a two-sided non-parametric bootstrap test, based on those means having a significance level fixed at 5%. We evaluated the various IR schemes under two topic formulations. First, the queries were built using only the descriptive part of the topics. Widely used during the NTCIR-3 evaluation campaign, this query construction will be denoted by “D” in the following tables. Secondly and in order to obtain an idea of the best mean average precision that could be achieved, we built the queries using all the topics' logical sections (query construction denoted “TDNC”). The mean average precision achieved by the eleven search models are depicted in Table!5, with the best performance under a given condition being shown in bold (these values are used as baseline for our statistical testing). Surprisingly, this data shows that the best retrieval scheme for short queries is not the same as that for long topics. For longer topic formulations, the best approach seems to be the Okapi probabilistic model, except in the Chinese collection, where the vector-space “Lnu-ltc” shows slightly better performance levels (0.2919 vs. 0.2895 (+0.8% of relative effectiveness)). In a related study on five European languages (Savoy, 2003), we also found that the Okapi model provides the most effective retrieval performance. However, based on the bootstrap test, the differences cannot always be viewed as significant (significance level of 5%). For example, in the Chinese collection, we cannot find a statistically significant difference between the best IR model (“Lnu-ltc“ in this case) and the Okapi, Prosit, and “dtu-dtn” search schemes.

For short queries (“D”), the vector-space model “Lnu-ltc” provided the most effective retrieval performance for all Asian languages, with performance differences being more obvious in the Japanese corpus (0.3366 vs. 0.3125 (+7.7% of relative effectiveness)) or with the Korean language (0.2165 vs. 0.2073 (+4.4%)). Ranked second was the Okapi model, and tied for third place were the “ltn-ntc” or “dtu-dtn” search schemes. Moreover, for the Korean corpus, our statistical test cannot find statistically significant differences between the five most effective search models. English 32!queries Model D TDNC Prosit 0.3532 0.4633 Okapi-npn 0.3630 0.4681 Lnu-ltc 0.3788 0.4453 dtu-dtn 0.3926 0.4385 atn-ntc 0.3597 0.4249 ltn-ntc 0.3779 0.4218 ntc-ntc 0.2798 0.3440 ltc-ltc 0.2955 0.3589 lnc-ltc 0.3201 0.4008 bnn-bnn 0.1450 0.0985 nnn-nnn 0.1157 0.1584

Mean average precision Chinese Japanese 42!queries 42!queries D TDNC D TDNC 0.1835 0.2830 0.2886 0.3530 0.2022 0.2895 0.3125 0.3738 0.2089 0.2919 0.3366 0.3732 0.2017 0.2804 0.2879 0.3253 0.1798 0.2707 0.2616 0.3387 0.2005 0.2574 0.2922 0.3354 0.1748 0.2139 0.2503 0.2907 0.1819 0.2599 0.1995 0.2496 0.1725 0.2633 0.2192 0.2638 0.1037 0.0793 0.1786 0.1886 0.0591 0.0440 0.1247 0.1457

Korean 30!queries D TDNC 0.1825 0.3285 0.2073 0.3530 0.2165 0.3070 0.2072 0.2596 0.2026 0.3032 0.2147 0.2848 0.1810 0.2592 0.1654 0.2458 0.1634 0.2606 0.0854 0.0487 0.0972 0.1344

Table!5: Mean average precision of various single search strategies (monolingual) Except for the binary indexing scheme (“bnn-bnn”), longer queries (“TDNC”) result in better performance than do short topic formulation (“D”). Performance differences between these two topic formulations usually seemed around 20% for the English and Japanese collections, and higher (around 40%) for the Chinese and Korean corpora. Overall, the classical “ t f !. !idf” (or “ntc-ntc”) resulted in retrieval effectiveness levels around 20% lower than the best search model (e.g., using the Japanese corpus and “D” topics, the best mean average precision is 0.3366 compared to 0.2503 (-25.6% of relative effectiveness) for the “ntc-ntc” model). Moreover, we could also incorporate blind query expansion (or pseudo-relevance feedback) before presenting the result list to the user. It has been observed that this type of technique seems to be useful for enhancing retrieval effectiveness. In this study, we adopted Rocchio's approach (Buckley et al., 1996) with a!=!0.75, b!=!0.75, whereby the system was allowed to add m terms extracted from the k best-ranked documents from the original search. To evaluate this proposition, we used the Okapi and the Prosit probabilistic models. Table!6 summarizes the results of our experiments. The lines labeled “Prosit” or “Okapi-npn” indicate the mean average precision before applying this blind query expansion procedure. The lines starting with “#doc!/!#term” indicate the number of top-ranked documents and the number of terms used to enlarge the original query. Finally, the lines labeled “&!Q!exp.” depict the mean average precision following blind query expansion (using the parameter setting specified in the previous line).

Model Prosit #doc/#term & Q!exp. Okapi-npn #doc/#term & Q!exp.

English 32!queries D TDNC 0.3532 0.4633 5!/!15 3!/!50 0.4381 0.5119 0.3630 0.4681 5!/!10 3!/!10 0.4026 0.4881

Mean average precision Chinese Japanese 42!queries 42!queries D TDNC D TDNC 0.1835 0.2830 0.2886 0.3530 10!/!100 10!/!60 10!/!75 10!/!200 0.2802 0.3425 0.3636 0.3718 0.2022 0.2895 0.3125 0.3738 10!/!100 10!/!200 5!/!50 10!/!150 0.2566 0.3054 0.3531 0.3823

Korean 30!queries D TDNC 0.1825 0.3285 5!/!20 3!/!30 0.2271 0.3538 0.2073 0.3530 5!/!20 10!/!40 0.2509 0.3706

Table!6: Mean average precision with blind query expansion (monolingual) From the data depicted in Table!6, we could infer that a blind query expansion technique might improve the mean average precision. When comparing both probabilistic models, this strategy seems to provide greater improvement with the Prosit than with the Okapi model. In addition, the

percentage enhancement is greater when considering short topics as compared to longer ones. For example, in the Chinese collection using the Prosit model and for “D” topics, blind query expansion improved mean performance, ranging from 0.1835 to 0.2802 (+52.7%), compared to 0.2830 to 0.3425 (+21.0%) for “TDNC” topics. As depicted in Table!6, performance differences were usually statistically significant when compared to a search without pseudorelevance feedback. 2. Bilingual information retrieval Since we live in a multilingual world, a search model must be designed that is capable, given a topic written in English, of retrieving documents written in other languages, which in this study are Chinese, Japanese or Korean. To cross this language barrier, we based our approach on free and readily available translation resources that automatically provide translations for topics submitted in the desired target language. In this study, we chose to translate the topics through processing them with four different machine translation (MT) systems and two bilingual dictionaries (EvDict, and Babylon), namely: BABELFISH babel.altavista.com/translate.dyn FREETRANSLATION www.freetranslation.com INTERTRAN www.tranexp.com:2000/InterTran WORLDLINGO www.worldlingo.com EVDICT www.samlight.com/ev/ BABYLON www.babylon.com When using the Babylon bilingual dictionary, we submitted search keywords to be translated word-by-word. In response, as translations for each word submitted, the Babylon system provided not only one but several terms (in an unknown order). In our experiments, we decided to pick the first available translation (labeled “Babylon!1”), the first two terms (labeled “Babylon!2”) or the first three candidates (labeled “Babylon!3”). Other authors faced with this same translation problem during the NTCIR-3 evaluation campaign proposed other choices. For example, Kwok (2003) suggest using the HuaJian MT system and the LDC bilingual wordlist5 (composed of 126,092 entries). Chen & Gey (2003) made use of the Babelfish MT system and when they encountered untranslated English words (mainly proper nouns), the authors suggested translating them using a parallel corpus (e.g., Hong Kong News6 ). As an alternate strategy producing better results, each untranslated English term was submitted to Yahoo!Chinese and the first 200 entries were downloaded, and segmented into words. After this step, from each line containing the specific English word, they extracted five Chinese words immediately to the left and to the right of the English word and included them in the translated topic (assigning a weight 1/n, with n = 1 to 5 representing the distance between the Chinese and the English words). A similar translation scheme was also applied for translation into Japanese, using Yahoo!Japan of course. When translating English topics employing our two bilingual dictionaries, four MT systems and the Okapi model, we achieved mean average precision as depicted in Table!7. This table also contains the retrieval performance obtained from topics translated by humans (line labeled “Okapi, manual transl.”), which we used as a baseline. Since some translation devices were not able to provide a translation for each language, thus Table!7 indicates missing entries by “n/a”. Compared to our previous work with European languages (Savoy, 2003), machine translated topics resulted in performance levels that were generally poor when compared to the manually translated topics. With the Korean collection, for example, the performance difference between manually translated topics and the best translation device (WorldLingo in this case) was around 61.8% (“D” queries, 0.2073 vs. 0.0791) or 54.6% (“TDNC” queries, 0.3530 vs. 0.1601). Moreover, the WorldLingo MT system seemed to produce the best translated topics for all three Asian languages. The poor performance displayed by Babelfish when translating the Chinese language seemed to be caused by a conversion problem (the Babelfish output format is in simplified Chinese or GB encoding, and we needed the topic in traditional Chinese or BIG5 encoding). In addition, we had only a few freely available translation resources per language. As

5

6

This MT system is available at http://www.hjtek.com/old/newnew/trae/main.htm, and the LDC bilingual wordlist at http://www.ldc.upenn.edu/Projects/Chinese/ See the Web site http://www.info.gov.hk

shown in Table!7, the differences in mean average precision were always statistically significant and they favored manually-based topic translation approaches. From English to … Model Okapi, manual transl. Babylon!1 Babylon!2 Babylon!3 EvDict WorldLingo Babelfish InterTrans FreeTranslation Combined translation with Okapi with Prosit

Mean average precision Chinese Japanese D TDNC D TDNC 0.2022 0.2895 0.3125 0.3738 0.0471 0.0788 0.1345 0.2059 0.0474 0.0800 0.1302 0.1819 0.0462 0.0833 0.1151 0.1779 0.0540 0.0993 n/a n/a 0.0935 0.1673 0.1676 0.2482 0.0441 0.0651 0.1667 0.2479 n/a n/a 0.0759 0.1015 0.0606 0.1119 n/a n/a Lingo!/!EvDict Lingo!/!Babylon!1 0.0920 0.1672 0.1984 0.2553 0.0810 0.1607 0.1798 0.2531

Korean D TDNC 0.2073 0.3530 0.0401 0.0834 0.0348 0.0851 0.0405 0.0802 n/a n/a 0.0791 0.1601 0.0774 0.1540 n/a n/a n/a n/a Lingo!/!Babelfish 0.0876 0.1677 0.0803 0.1523

Table!7: Mean average precision of various translation approaches using the Okapi model To improve the retrieval performance of translated topics, we developed three possible strategies and tested them. First, we combined the translation provided by two translation tools. For the Japanese language, we concatenated the results supplied by WorldLingo with those of “Babylon!1”, and for Korean, we combined the translations provided by WorldLingo with those of Babelfish. As shown in the last two lines of Table!7, this combined translation strategy enhanced the retrieval effectiveness using the Okapi or the Prosit search models. However, for Chinese, the results from concatenating WorldLingo and EvDict topic formulations did not improve mean average precision. Our second attempt to improve performance was to apply a blind query expansion on the combined translated topics. As shown in Table!8, this technique clearly enhanced the retrieval effectiveness when using the Okapi or the Prosit probabilistic models. As for monolingual IR (see Table!6), the results achieved by the Prosit system after pseudo-relevance feedback were usually better than those obtained by the Okapi search model. From English to … Model Okapi, combined #!doc!/!#!terms Okapi & Q expansion Prosit, combined #!doc!/!#!terms Prosit & Q expansion

Mean average precision Chinese Japanese D TDNC D TDNC 0.0920 0.1672 0.1984 0.2553 5!/!75 5!/!75 10!/!60 10!/!30 0.1329 0.1865 0.2363 0.2772 0.0810 0.1607 0.1798 0.2531 10!/!125 3!/!30 10!/!100 10!/!15 0.1710 0.2421 0.2586 0.3122

Korean D TDNC 0.0876 0.1677 10!/!200 10!/!300 0.1018 0.1873 0.0803 0.1523 3!/!20 10!/!200 0.1084 0.1748

Table!8: Mean average precision of various translation approaches using the Okapi model As a third strategy to enhance retrieval effectiveness, we considered adopting a data fusion approach that combined two or more result lists provided by different search models. In this case, we viewed each IR model as a distinct and independent source of evidence regarding document relevance. Fox & Shaw (1994) suggested various data fusion operators, however the simple linear combination, denoted SumRSV, usually seemed to provide the best performance. Given a set of result lists i!=!1, 2, …, r, this combined operator is defined as: r

Sum RSV = Â a i ⋅ RSVi

(1)

i=1

in which the value of ai (fixed at 1 for all result lists in our experiments) may be used to reflect retrieval performance differences between the retrieval schemes. We also considered † the round-robin approach whereby we took one document in turn from all individual lists and removed duplicates, keeping the most highly ranked instance. As described in Section!3, it is possible to use other combining operators.

Single and combined runs Okapi & Q expansion Prosit & Q expansion Round-robin Sum RSV (Eq.!1) Norm Max Norm RSV (Eq.!2) NormZ (Eq.!4)

Mean average precision (% change) D TDNC 42 queries 42 queries 0.2363 0.2772 0.2586 0.3122 0.2531 (-2.1%) 0.3081 (-1.3%) 0.2619 (+1.3%) 0.3140 (+0.6%) 0.2564 (-0.9%) 0.3072 (-1.6%) 0.2585 (-0.0%) 0.3072 (-1.6%) 0.2608 (+0.9%) 0.3083 (-1.2%)

Table!9: Mean average precision of various data fusion strategies (Japanese corpus) Table!9 shows the mean average precision when considering the Japanese collection, both with short (“D”) and long queries (“TDNC”). From this data, we can see that combining two search models may sometimes improve the retrieval effectiveness, for example, when applying the linear combination (labeled “Sum RSV ” in Table!9). However, using the performance achieved by the Prosit model as baseline, we cannot detect a statistically significant difference between the performances depicted in Table!9. Kwok (1999) also proposed linearly combining two runs, where the first was based on the indexing of short words and characters, and the second on bigram indexing. As in our experiment, combining short queries resulted in more improvement than did the longer topics. Considering that such retrieval strategy implies the construction of two inverted files and two searches, the advantages are questionable. 3. Multilingual information retrieval with Asian languages In the previous section, we obtained a better understanding of the retrieval effectiveness of various bilingual retrieval approaches. However, this represents only the first step in analyzing crosslanguage information retrieval systems. In this section, we will investigate the situation where users write a topic in English in order to retrieve relevant documents in English, Chinese and Japanese. To deal with this multi-language barrier we have based our approach on solutions described in the previous section. Thus, the different collections were indexed separately and, once they received the original or a translated request, they returned a ranked list of retrieved items. From these lists we need to produce a unique ranked result list, using a merging strategy that will be described in this section. As a first approach, we considered the round-robin method whereby we took one document in turn from all individual lists. Such a simple merging strategy will be used as baseline performance. However, we could also take account of the document score (or its degree of similarity with the query), a value denoted as RSVk for document Dk. To account for this document score, we might formulate the hypothesis that each collection be searched by the same or a very similar search engine and that the similarity values are therefore directly comparable (Kwok et al., 1995). Such a strategy is called raw-score merging and produces a final list sorted by document score, as computed by each collection. Kwok (2003) uses such a raw-score merging strategy because his retrieval search engine returns a log odds value as document score, related to the probability that the corresponding document is relevant. The retrieval system described by Chen & Gey (2003) also produces a document score directly related to the probability of relevance, and thus these authors also propose using the raw-score merging approach. Unfortunately document scores cannot usually be directly compared, thus as a third merging strategy we normalized the document scores within each collection through dividing them by the maximum score (i.e. the document score of the retrieved record in the first position). Such a merging scheme will be denoted as “Norm!Max”. As a variant of this normalized score merging scheme (denoted “Norm!RSV ”), we could normalize the document RSVk scores within the ith result list, according to Equation!2. This merging strategy was used by Zhang et al. (2003) during the last NTCIR-3 evaluation campaign. Norm RSVk = ((RSVk - MinRSVi ) / (MaxRSVi - MinRSVi ))

(2)

As a fourth merging strategy, we might use the logistic regression to predict the probability of a binary outcome variable, according to a set of explanatory variables (Le Calvé & Savoy, 2000). In our current case we predicted the probability of relevance for document Dk given both the logarithm of its rank (indicated by ln(rankk)) and the original document score RSVk as indicated in Equation!3. Based on these estimated relevance probabilities (computed independently for each language using the S+ software), we sorted the records retrieved from separate collections in

order to obtain a single ranked list. However, in order to estimate the underlying parameters, this approach requires a training set to compute an estimated value for parameters (a, b1 and b2 in Eq.!3). In our evaluations, we wanted to obtain an unbiased estimator for the real performance. We therefore adopted the leaving-one-out evaluation methodology that used all topics except the current topic as a training set. After obtaining the retrieval performance for this current topic, we repeated the same operations for all topics, and computed the mean, thus obtaining the mean average precision. Pr ob [ D k is rel | rank k ,rsv k ] =

ea+b1 ⋅ln(rank k )+b 2 ⋅rsv k

(3) 1 + ea+b1 ⋅ln(rank k )+b 2 ⋅rsv k As a fifth merging strategy, we suggest merging the retrieved documents according to the Z-score, taken from their document scores. Within this scheme, for the ith result list we need to compute the average of the RSVk (denoted MeanRSVi ) and the standard deviation (denoted StdevRSVi ). † on these values, we would then normalize the retrieval status value of each document Dk Based provided by the ith result list, by computing the following formula: NormZ RSVk = a i !.![((RSVk - MeanRSVi ) / StdevRSVi ) + di ] with di = ((MeanRSVi - MinRSVi ) / StdevRSVi )

(4)

i

within which the value of d is used to generate only positive values, and ai (usually fixed at 1) is used to reflect the retrieval performance of the underlying retrieval model. A similar merging strategy has been also proposed in the context of the topic detection and tracking (Leek et al., 2002). Finally, as a sixth merging scheme, we suggest a biased round-robin approach which extracts not just one document per collection per round but one document from the English corpus and two from the Chinese and Japanese collections. Such a merging strategy exploits the fact that the Chinese and Japanese corpora possess more articles than does the English corpus (see Table!1), under the assumption that relevant documents are uniformly distributed across collections. In a similar vein, Kwok (2003) has tried various heuristics to take account of this phenomenon and to reduce the number of extracted English documents, although without success. Table!10 depicts the retrieval effectiveness of various merging strategies. On the top part of this table, we have shown the mean average precision obtained independently for each language (based on a smaller number of queries) and using the Prosit search model with query expansion (see Table!6 for the English corpus, and Table!8 for the Chinese and Japanese collections). However, only for the “T” queries and with the Japanese corpus, we used the Okapi scheme with blind query expansion.

Single collection English (on 32 queries) Chinese (on 42 queries) Japanese (on 42 queries) Merging strategy Round-robin (baseline) Raw-score Norm Max Norm RSV (Eq.!2) Biased round-robin 1!/!2!/!2 NormZ (Eq. 4) NormZ (Eq. 4) 1!/!2!/!2 Logistic regression

Mean average precision (% change) T D TDNC 50 queries 50 queries 50 queries 0.4537 0.4381 0.5119 0.1427 0.1710 0.2421 0.2123 0.2586 0.3122 0.1084 0.0921 (-15.0%) 0.1245 (+14.9%) 0.1213 (+11.9%) 0.1187 (+9.5%) 0.1140 (+5.2%) 0.1261 (+16.3%) 0.1339 (+23.5%)

0.1277 0.1702 (+33.3%) 0.1490 (+16.7%) 0.1483 (+16.1%) 0.1423 (+11.4%) 0.1497 (+17.2%) 0.1636 (+28.1%) 0.1542 (+20.8%)

0.1646 0.2421 (+47.1%) 0.2041 (+24.0%) 0.2172 (+32.0%) 0.1857 (+12.8%) 0.1894 (+15.1%) 0.2045 (+24.2%) 0.2170 (+31.8%)

Table!10: Mean average precision of various merging strategies over document collections written in English, Chinese, and Japanese As a first merging strategy, we evaluated the round-robin approach, and this served as a baseline. When compared to this baseline, the performance differences resulting from the other merging strategies based on “D” or “TDNC” are considered as being statistically significant improvements. For the “T” topics however, differences in mean average precision can only be considered statistically significant for four of the merging strategies.

When different retrieval models are employed, the raw-score scheme does not provide acceptable results. In the current study, we employed the same IR model and used near similar parameters for query expansions on the Chinese and Japanese corpus, using the “D” or “TDNC” query formulation. In both cases, the raw-score merging strategy represents the best retrieval performance. For the “T” queries, the searches on English and Chinese collection were done using the Prosit model while, for the Japanese corpus, we used the Okapi scheme. In this case, the raw-score approach did not result in interesting retrieval performance levels. Both normalized merging schemes (“Norm!Max” or “Norm!RSV”) provided reasonable performance levels when used with “T” queries. When we took advantage of differences in collection size, the biased round-robin method provided a statistically significant improvements over the baseline. The weighted normalized Z-score (with a!=!1 for the English corpus and 2 for both the Chinese and Japanese languages) usually provided us with better performance levels than did the simple Zscore. Finally, the logistic model, where both the rank and the document score are explanatory variables, seems attractive when merging different retrieval schemes (usually implying incomparable document scores) which is the case with the “T” queries. However, this merging strategy requires a training sample in order to estimate the underlying parameters. When such a sample is not available, the best solution seems dependent on topic length (e.g., weighted Z-score for “T” queries, raw-score approach for “D” and “TDNC” topics). However, it is worth noting that across all topic formulations, the weighted Z-score approach presents interesting performance levels (the second best for “T” and “D” topic formulation, the four for the “TDNC” queries). From our point of view, this merging scheme, even though it may not provide the best retrieval effectiveness, is an attractive solution because its performance is rather regular across different topic formulations. Table!11 displays an overview of the efficiency of the various search models, indicating the size of each collection in terms of storage space requirements and number of documents. On the third line (labeled “# postings”), we indicated the number of terms in the inverted file, followed by the size of the inverted file and the time (in seconds) needed to build the inverted file. To implement and evaluate these various search models, we used an Intel Pentium III/600 (memory: 1!GB, swap: 2!GB, disk: 6!x!35 GB). The average time (in seconds) required to perform a search for one query in given in the last line (“D” query, without blind query expansion). Size (in MB) # of documents # postings Inverted file size Building time Search time

English 48!MB 22,927 121,939 158!MB 126 sec 0.00048 sec

Chinese 490!MB 381,681 2,704,517 1,245!MB 921 sec. 0.00104 sec

Japanese 268!MB 220,078 496,468 324!MB 267 sec. 0.00124 sec

Korean 68!MB 66,146 164,729 174!MB 135 sec. 0.00067 sec

Table!11: Inverted file construction and search performance Conclusion Based on our evaluations, we have shown that when indexing Asian languages based on bigrams, the IR models providing the best retrieval performance levels are the “Lnu-ltc” vector-space model for short topics (“D”) or the Okapi probabilistic model for long topics (“TDNC”) (see Table!5). This finding has confirmed previous work (Savoy, 2003) done on European languages and word-based indexing schemes, within which the Okapi model also represented the best search approach. To improve retrieval effectiveness, a blind query expansion is a worthwhile approach, especially when processing short queries (see Table!6). When analyzing the performance of bilingual searches, and contrary to those found for some European languages (Savoy, 2003), the number and quality of freely available translation resources is questionable. When translating the user’s information need from English into Chinese, Japanese or Korean language, the overall retrieval effectiveness decreases more than 40% (see Table!7). To improve this poor performance, we may concatenate two (or more) translations, employ a blind query expansion approach (see Table!8), and if the topics are rather short, a data fusion approach (see Table!9). However with this latter approach the improvement is not important and that fact that two inverted files are required raises some doubts about this technique.

When searching document collections written in various Asian languages, the simplest strategy consists of translating the topic submitted into the different languages, and then performing a search in these languages before merging the result lists. When evaluating various merging strategies using different query sizes, it appears that the effectiveness of these merging schemes varies considerably. Some approaches may work well under certain conditions and not so well within other circumstances. In this paper, we suggest using either the raw-score when using near similar search engines or the Z-scoring merging procedure, capable of producing interesting retrieval effectiveness. Acknowledgments The author would like to thank C. Buckley from SabIR for giving us the opportunity to use the SMART system, together with Pierre-Yves Berger and Samir Abdou for their help in translating the English topics. This research was supported by the Swiss National Science Foundation under Grant #21-66!742.01. References Amati, G. & van Rijsbergen, C.J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4), 357-389. Buckley, C., Singhal, A., Mitra, M. & Salton, G. (1996). New retrieval approaches using SMART. In Proceedings of TREC-4, (pp. 25-48). Gaithersburg, MD: NIST Publication #500-236. Chen, A. & Gey, F.C. (2003). Experiments on cross-language and patent retrieval at NTCIR-3 workshop. In Proceedings of NTCIR-3 (to appear). Foo, S. & Li, H. (2004). Chinese word segmentation and its effect on information retrieval. Information Processing & Management, 40(1), 161-190. Fox, E.A. & Shaw, J.A. (1994). Combination of multiple searches. In Proceedings of TREC-2, (pp. 243-249). Gaithersburg, MD: NIST Publication #500-215. Fujii, H., & Croft, W.B. (1993). A comparison of indexing techniques for Japanese text retrieval. In Proceedings of the 16th International Conference on Research and Development in Information Retrieval (ACM SIGIR ‘93) (pp. 237-246), New York, NY: ACM. Gey, F., & Chen, A. (2001). TREC-9 cross-language information retrieval (English – Chinese) overview. In Proceedings TREC-9 (pp. 15-24). Gaithersburg, MD: NIST Publication #500-249. Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th International Conference of the ACM-SIGIR'93, (pp. 329-338). New York, NY: The ACM Press. Kando, N. (2003). CLIR at NTCIR workshop!3: Cross-language and cross-genre retrieval. In C. Peters, M. Braschler, J. Gonzalo & M. Kluck (Eds.), Advances in Cross-Language Information Retrieval (pp. 485-504), Berlin: Springer-Verlag, LNCS #2785. Kwok, K.L., Grunfeld, L. & Lewis, D.D. (1995). TREC-3 ad-hoc, routing retrieval and thresholding experiments using PIRCS. In Proceedings of TREC-3, (pp. 247-255). Gaithersburg, MD: NIST Publication #500-225. Kwok, K.L. (1999). Employing multiple representations for Chinese information retrieval. Journal of the American Society for Information Science, 50(8), 709-723. Kwok, K.L. (2003). NTCIR-3 Chinese, cross language retrieval experiments using PIRCS. In Proceedings of NTCIR-3 (to appear). Le Calvé, A. & Savoy, J. (2000). Database merging strategy based on logistic regression. Information Processing & Management, 36(3), 341-359. Lee, J.H. & Ahn, J.S. (1996). Using n-grams for Korean text retrieval. In Proceedings of the 19 International Conference on Research and Development in Information Retrieval (ACM SIGIR ‘96) (pp. 216-224), New York, NY: The ACM Press. Lee, J.H., Cho, H.Y. & Park, H.R. (1999). N-gram-based indexing for Korean text retrieval. Information Processing & Management, 35(4), 427-441. Leek, T., Schwartz, R. & Srinivasa, S. (2002). Probabilistic approaches to topic detection and tracking. In J. Allan (Ed.), Topic Detection and Tracking: Event-based Information Organization, (pp. 67-83). Boston, MA: Kluwer. Lovins, J.B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1), 22-31.

Luk, R.W.P. & Kwok, K.L. (2002). A comparison of Chinese document indexing strategies and retrieval models. ACM Transactions on Asian Language Information Processing, 1(3), 225268. Lunde, K. (1998). CJKV Information Processing. Chinese, Japanese, Korean & Vietnamese Computing. New York, NY: O’Reilly. Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H. & Asahara, M. (1999). Japanese morphological analysis system ChaSen. Technical Report NAIST-IS-TR99009, NAIST, available at http://chasen.aist-nara.ac.jp/. Murata, M., Ma, Q. & Isahara, H. (2003). Applying multiple characteristics and techniques to obtain high levels of performance in information retrieval. In Proceedings of NTCIR-3 (to appear). Nie, J.Y. & Ren, F. (1999). Chinese information retrieval: using characters or words? Information Processing & Management, 35(4), 443-462. Peters, C., Brachler, M., Gonzalo, J. & Kluck, M. (Eds) (2003). Advances in Cross-Language Information Retrieval (CLEF-2002). Lecture Notes in Computer Science #2785. Berlin: Springer-Verlag. Robertson, S.E., Walker, S. & Beaulieu, M. (2000). Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36(1), 95-108. Savoy, J. (1997). Statistical inference in retrieval effectiveness evaluation. Information Processing & Management, 33(4), 495-512. Savoy, J. (2003). Report on CLEF-2002 experiments: Combining multiple sources of evidence. In C. Peters, M. Braschler, J. Gonzalo & M. Kluck (Eds.), Advances in Cross-Language Information Retrieval. Lecture Notes in Computer Science #2785, (pp. 66-90). Berlin: Springer-Verlag. Singhal, A., Choi, J., Hindle, D., Lewis, D.D. & Pereira, F. (1999). AT&T at TREC-7. In Proceedings of TREC-7, (pp. 239-251). Gaithersburg, MD: NIST Publication #500-242. Sproat, R. (1992). Morphology and Computation. Cambridge, MA: The MIT Press. Zhang, J., Sun, L., Qu, W., Du, L., Sun, Y., Fan, Y. & Lin Z., (2003). ISCAS at NTCIR-3: Monolingual, bilingual and multilingual IR tasks. In Proceedings of NTCIR-3 (to appear).

Suggest Documents