A Novel Document Retrieval Method Using the Discrete Wavelet Transform LAURENCE A. F. PARK, KOTAGIRI RAMAMOHANARAO, and MARIMUTHU PALANISWAMI The University of Melbourne
Current information retrieval methods either ignore the term positions or deal with exact term positions; the former can be seen as coarse document resolution, the latter as fine document resolution. We propose a new spectral-based information retrieval method that is able to utilize many different levels of document resolution by examining the term patterns that occur in the documents. To do this, we take advantage of the multiresolution analysis properties of the wavelet transform. We show that we are able to achieve higher precision when compared to vector space and proximity retrieval methods, while producing fast query times and using a compact index. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms: Algorithms, Experimentation, Performance Additional Key Words and Phrases: Daubechies, document retrieval, Haar, multiresolution analysis, proximity search, vector space methods, wavelet transform
1. INTRODUCTION Many current information retrieval systems are built around a similarity function. This function takes a query and a document as its arguments and generates a single score which represents the relevance of the query to the document. Current popular retrieval methods, namely, Vector Space Methods [Zobel and Moffat 1998; Buckley and Walz 1999] and Probabilistic Methods [Robertson and Walker 1999], base their similarity function on the hypothesis that a document is more likely to be relevant to a query if it contains more occurrences of the query terms. This implies that the similarity functions need the count of This work was supported by the Australian Research Council. Authors’ addresses: L. A. F. Park, ARC Centre for Perceptive and Intelligent Machines in Complex Environments, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia 3010; email:
[email protected]; K. Ramamohanarao, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia 3010; email:
[email protected]; M. Palaniswami, Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria, Australia 3010; email:
[email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or
[email protected]. C 2005 ACM 1046-8188/05/0700-0267 $5.00 ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005, Pages 267–298.
268
•
L. A. F. Park et al.
occurrences of each of the terms, and ignore any other information in the document. Vectors are used to represent documents and queries, where the vector space contains one dimension for each of the terms found in the document set. The contribution of each dimension in the document vector is found by counting the appearances of the associated term in the document. The similarity function simply applies weights to the query and document vectors and compares them to generate a score based on their likelihood of relevance. By converting the documents into vectors, we only take into account the number of times each term appears in the documents and disregard the positional information. Therefore, the resolution of these methods at a single level (usually the document level, implying coarse document resolution) means that each document is taken as a unit. Proximity methods [Clarke and Cormack 2000; Hawking and Thistlewaite 1996], base their similarity functions on the hypothesis that a document is more likely to be relevant to a query if the query terms are found within a smaller proximity to each other. Therefore, they use similarity functions that try to utilize the positional information to achieve higher precision. This is done by comparing the positions of the query terms in the document. By performing many comparisons, the query time grows. By examining positions, proximity methods use each term position as their resolution, implying a fine document resolution. Rather than observing a single resolution, we can observe multiple resolutions of a document by analyzing the term patterns throughout it. We hypothesize that a document is more likely to be relevant if the pattern of all query term appearances are similar. To examine the pattern of a term throughout a document, we could take note of each of the term positions, leading to query times similar to the proximity method. Or we could map our term positions into another domain in which we could easily analyze the positional patterns and achieve faster query times. Wavelet transforms allow us to do such a thing. The wavelet transform is able to break a given signal into wavelets (little waves) of different scale and position. This decomposition allows us to analyze the signal at different frequency resolutions and to identify the position of any spikes that may occur in the signal. Two-dimensional wavelet transforms have been used for image compression [Uhl 1994] and retrieval [Wang et al. 2001] for the mentioned reasons. Natural images (e.g., photographs) contain flowing colors that can be represented by low-frequency wavelets. Singularities in images (e.g., fast changes in color) can be shown as high-frequency, positioned wavelets. We see a stream of text in a similar way. If a term is appearing frequently and scattered about the document, it can be represented by a low-frequency wavelet, while a term that appears once is shown as a high-frequency positional wavelet. Wavelet transforms have been used in text visualisation systems [Miller et al. 1998]. Such systems attempt to graphically display which regions of a document contain the desired topic. The wavelet transform can assist this visualization by allowing the analysis to be done at multiple document resolution levels. We propose a new method of Spectral Document Ranking using the discrete wavelet transform (DWT). We will show that using the wavelet transform allows ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
269
us to achieve high precision, provides fast query times, and uses a compact index. Our focus in this article is to introduce a new method of text document ranking using the discrete wavelet transform so that we can analyze the document term patterns at various resolutions. Also, we will compare this new method with existing methods of document ranking. The article is organized as follows: Section 2 introduces the spectral-based retrieval model and describes the use of term signals. Section 3 introduces the wavelet transform and its self similarity properties, while Section 4 discusses the desired properties of wavelets in information retrieval. Section 5 discusses time and space complexity issues. Section 6 examines the compliance of our document ranking method, and Section 7 gives details of the experiments performed with results displayed in different forms. Section 8 concludes the article. 2. SPECTRAL-BASED DOCUMENT RETRIEVAL Spectral-based document retrieval [Park et al. 2001, 2002a, 2002b, 2004, 2005] finds relevant documents by considering the query term occurrence patterns. Documents that contain query terms which all follow a similar positional pattern are considered more relevant than documents whose query terms do not follow similar patterns. Vector space retrieval calculates the document score based upon the occurrence of the query terms in the document. Proximity methods calculate the document score based on the proximity of the query terms to each other. If a document contains query terms that are within a small proximity of each other, the document would score higher than another document whose query terms were not within the same proximity. Since the vector space method only observes the count of the terms, it would give the same score independent of the query term positions. Proximity searches use more of the document information to calculate the document score, but this takes time. Each query term must be compared to each other query term in the scoring process; therefore, as the number of query terms grows, the number of comparisons grows combinatorially. Spectral-based retrieval is able to overcome this problem by comparing the query terms in their spectral domain rather than their spatial domain. To do this, we create a term signal for each query term in each document, convert the term signals into term spectra using a spectral transform, and combine the term spectra to obtain a document score. The benefits of performing our calculation in the spectral domain are the following: — the components are orthogonal to each other; therefore we do not need to cross compare components; — the spectral domain magnitude and phase values are related to the spatial term count and position, respectively. 2.1 Term Signals A term signal is a sequence of values that show the occurrence of a particular term in a particular section of a document. The term signal for term t in ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
270
•
L. A. F. Park et al.
Fig. 1. An example of how the term signals are obtained. The top two lines, labeled “travel” and “wales” show the positions of the terms travel and wales in a document (the position is signified by the vertical stroke through the line). The bottom half shows the generation of the eight term signal components from the term positions.
document d is represented by f˜d ,t = [ f d ,t,0 f d ,t,1 · · · f d ,t, B−1 ],
(1)
where f d ,t,b is the value of the signal component. If we have B signal components and D terms in the document, we calculate the value of the bth component by counting the occurrences of term t between the bD/Bth word in the document and the {(b + 1)D/B − 1}th word in the document. Therefore, if B = 8, f d ,t,0 would contain the number of times term t occurred in the first eighth of document d . If B = 1, f d ,t,0 would contain the count of term t throughout the whole document. Figure 1 shows an example of the term signal creation. 2.2 Term Signal Weights If we examine weighting schemes found in vector space and probabilistic methods, we can see that they are used to reduce the impact of certain document and term properties from affecting the document score (e.g., document length should not affect the score) [Salton and Buckley 1988; Singhal et al. 1996]. These document and term properties exist in term signals as well, so we will use weighting to try to remove this dependence. Each component of a term signal represents a portion of the document it was taken from (a passage); therefore, we are able to use the existing document weighting schemes to weight each of the term signal components. The document weights [Zobel and Moffat 1998] used were —BD-ACI-BCA:
wd ,t,b =
1+log ( f d ,t,b ) ¯d (1−s)+sWd /W
,
f d ,t,b , f d ,b,t +τd /τ¯d 1+log ( f d ,t,b ) — BI-ACI-BCA: wd ,t,b = (1−s)+sW /W¯ , d d f d ,t,b ))/(1+log ( f¯ d ,t )) — Lnu.ltu (SMART): wd ,t,b = (1+log ((1−s)+sτ . d /τ¯d
— AB-AFD-BAA (Okapi):
wd ,t,b =
ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
271
where f d ,t,b is the bth component of the tth term in the d th document, f¯d ,t is the average term count for document d , Wd is the document vector l 2 norm, τd and τ¯d are the number of unique terms in document d and the average unique terms, respectively, and s is a slope parameter (set to 0.7 [Zobel and Moffat 1998]). 2.3 Term Spectra If we were to compare our query term signals in order to obtain a document score, we could compare component b of each term or we could compare different components in different terms. The former method would reduce to passage retrieval, while the latter method would be a form of proximity measure. As stated earlier, we do not want to compare the term signals positions, we want to compare their patterns. The most convenient way of doing this is to examine their wavelet spectrum, given by ζ˜d ,t = [ζd ,t,0 ζd ,t,1 . . . ζd ,t, B−1 ],
(2)
where ζd ,t,b = Hd ,t,b exp (iθd ,t,b) is the bth spectral component with magnitude Hd ,t,b and phase θd ,t,b. Previous analysis has been performed on observing the Fourier transform and the cosine transform of the term signals [Park et al. 2001, 2002a, 2002b, 2004, 2005]. These transforms decompose the signal into a set of infinite sinusoidal waves. This implies that they are able to extract the frequency information from the signal, but they focus on the signal as a whole. The wavelet transform is able to focus on the signal portions at different resolutions. This implies that frequency information is extracted from parts of the document, providing us with frequency and position information. The resulting term spectrum contains orthogonal components, implying that there is no need to cross compare the spectral components. Therefore, document scores can be obtained by combining the term spectra components across terms. 2.4 Spectral-Based Retrieval Model The spectral-based retrieval model (shown in Figure 2) uses the magnitude and phase information found in the query term spectra of each document to calculate the document score. If we obtain a set of term spectra consisting of complex valued components, we can treat the magnitude of the components as proportional to the occurrence of the term in the pattern, and we can treat the phase as the position of the pattern. A relevant document would have a high occurrence of query terms (implying a high magnitude of components) and a similar position of each pattern of query terms (implying similar phase). Therefore, we will split our process into two, so we can deal with the magnitude and phase separately. We know that each of the components are orthogonal; therefore we need only compare the nth component of each term spectra. The comparisons will lead to a score for each spectral component, which can be combined to obtain the overall document score. We have stated that term occurrence is related to the magnitude of the term spectrum, and query term occurrence is likely to be related to the relevance of ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
272
•
L. A. F. Park et al.
Fig. 2. The spectral-based retrieval model.
a document to the query. Therefore, we can assume that the query term spectra magnitude is likely to be related to the relevance of the document. To take the magnitude of each spectral component into account, we will simply combine the magnitude values by adding them. We stated that the spectral phase is related to the term position. Therefore, we want a similar phase for each component across all of the query terms. Phase is a radial value, so we cannot simply use the variance as a measure of phase similarity. Instead we will use phase precision. The phase precision method assigns each phase to a unit vector. The vectors are added and the magnitude is averaged. The phase precision value is the resulting magnitude. If all of the phases are the same, the unit vectors will add constructively and the resulting magnitude will be 1. If the phases are scattered, the unit vectors will add destructively and the resulting magnitude will be close to zero. The phase precision of component b in document d is t∈Q exp (iθd ,t,b) , d ,b = (3) #(Q) where Q is the set of query terms and #(Q) is the number of query terms. To ¯ d ,b) ignores the phases take this method a step further, zero phase precision ( of the components which have zero magnitude because these phase values do not mean anything. This gives us t∈Q, Hd ,t,b =0 exp (iθd ,t,b ) ¯ d ,b = (4) . #(Q) The zero phase precision can be used as a measure of how important the corresponding component is (which represents a particular pattern of terms). We use this value as a weight of the magnitude values of the same component. We also apply weights to each query term using the selected weighting scheme. By doing this, we achieve a score for each spectral component: ¯ d ,b sd ,b = wq,t Hd ,t,b. (5) t∈Q ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
273
Just as we selected the document weighting scheme, the query weighting scheme is also a matter of preference. The query weighting schemes we tried were —BD-ACI-BCA: wq,t = (1 + log ( f q,t )) log (1 + f m / f t ), — AB-AFD-BAA: wq,t = log (1 + N / f t ), — BI-ACI-BCA: wq,t = (1 + log ( f q,t ))(1 − lognt N ), 2 — Lnu.ltu (SMART): wq,t = (1 + log ( f q,t )) log (N / f t ), where f q,t and wq,t are the occurrence and weight of term t in query q, respectively, f t is the number of documents term t appears in, f m is the largest f t for all t, N is the number of documents, and nt is a noise measure of term t. Each of the weighting schemes were chosen due to their high precision for a particular query type [Zobel and Moffat 1998]. Experiments have shown that AB-AFD-BAA achieves high precision for short (1–10 terms) queries, BI-ACI-BCA achieves high precision results for long (about 80 terms) queries, and BD-ACI-BCA achieves high precision results when using both long and short queries [Zobel and Moffat 1998]. The Lnu.ltu method from SMART [Buckley et al. 1995] was chosen because it is a well-known method which has produced excellent results at TREC [Buckley et al. 1995]. To obtain the spectral document score, we combine the score components (sd ,b) using the norm function Sd = ˜sd p ,
(6)
where s˜d = [sd ,0 sd ,1 · · · sd , B−1 ], and ˜sd p is the l p norm given by ˜sd p =
B−1
|sd ,b| p .
(7)
b=0
We can see that, by increasing p, the dominant score components will have more effect on the score. In our experiments, we will be examining Sd for p = 1 and 2. 2.5 Generalization of Vector Space Model If we examine our retrieval model when B = 1 (implying only one signal component is needed for each term), we can show that our model behaves in the same manner as the vector retrieval methods. We must first note that the transform of a signal with one element obtained by using any linear transform which generates orthogonal spectral values (such as the Fourier, cosine and certain wavelet transforms) will be proportional to the original value. Therefore, if we call our transform T r, we achieve the following: T r(wd ,t,0 ) = αwd ,t,0 .
(8)
The transformed value is real, so the phase is zero. From this we can deduce that the zero phase precision will be 1 (since all query terms will have zero phase). The transformed value being real also implies that the magnitude of ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
274
•
L. A. F. Park et al.
the value is the value; therefore Hd ,t,0 = wd ,t,0 . By substituting these values into our score component Equation (5), we achieve sd ,0 = wq,t wd ,t,0 . (9) t∈Q
By choosing the l 1 norm to calculate the document score, we have Sd = wq,t wd ,t,0 ,
(10)
t∈Q
which is equivalent to the vector space retrieval model. 3. MULTIRESOLUTION ANALYSIS The Fourier transform (FT) [Proakis and Manolakis 1996] and the sinusoidal family of unitary transforms [Jain 1979] enable us to analyze our information as a whole. When given a signal, the Fourier transform of the signal will provide information about every frequency component that exists in the signal. This type of analysis is sufficient for stationary signals, but does not allow us to examine properties of transient signals. For example, we might want to find the point in time where a signal contains certain frequency components, or for a spatial signal, we might want to find a position in space where certain frequency components exist. Our previous work in the field of text retrieval [Park et al. 2001, 2002a, 2002b, 2004, 2005] gave us insight as to how we can use the Fourier and cosine transforms to easily combine the term magnitude and phase information into the document score and obtain high-precision results. But, as stated above, the Fourier transform provides frequency information for the whole document; hence we were not able to focus on important portions of the document. In this section, we will explain the limitations of the Fourier transform and show how we can utilize both the frequency and positional information using the wavelet transform. 3.1 Fourier Decomposition If we wish to represent a discrete signal f [t] as a Fourier series, we must find the coefficients F [k] which satisfy the following formula: f [t] =
T −1
F [k] exp (2π ikt/T ).
(11)
k=0
By doing so, we are able to show our signal ( f [t]) as a linear combination of the sinusoidal √ waves exp (2π ikt/T ) (patterns), where k is the frequency of the wave, i = −1, and T is the length of our signal. The coefficient F [k] is the calculated amplitude of the wave of frequency k. To calculate the values of the frequency coefficients (or frequency components), we can use the discrete Fourier transform F [k] =
T −1
f [t] exp (−2π ikt/T ).
t=0 ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
(12)
Novel Document Retrieval Method
•
275
This was used in our earlier experiments as our spectral transform. By using the Fourier transform, we scored documents higher if they had a cyclic positional pattern of query terms. The Fourier transform examines the signal as a whole; therefore, for a coefficient (weight of a signal pattern, which is in this case the sinusoidal wave exp (2π ikt/T )) to have a high weight, the pattern it represents must occur throughout the whole signal. The Fourier transform provides detailed frequency information, but with that we cannot pin-point where in the original signal certain frequencies come from. 3.2 Short Time Fourier Transform To try to focus on certain spans of a signal, the short time Fourier transform (STFT, also known as the windowed Fourier transform) [Mallat 2001] was developed. The STFT is similar to the FT in the way that it converts a signal to its frequency spectrum, but it windows the signal before doing so. By selecting many different intervals of the signal, the STFT allows the analyst to observe frequency components local to a certain time or position in the signal. The STFT comes in the form S f [u, k] =
T −1
f [t] g [t − u] exp (−2π ikt/T ),
(13)
t=0
where g [t −u] is the window function centred at u. By having a set window size, the pattern used in our analysis is smaller than the signal. Smaller patterns imply that the pattern analysis becomes local to a specific point in the document. A cyclic transient signal contains frequency components which exist for all times. If our signals were of this form, we would not care about the time resolution, but we would like high frequency resolution. Therefore, a STFT with a large window size or the Fourier transform would suffice in obtaining the frequency information. If we wanted to find where certain impulses existed in a transient signal, the Fourier transform would fail. All it could tell would be that the impulses were in the signal (global pattern analysis), but it could not specify at which times they occurred. In this case, we would use a STFT with small window size (local pattern analysis). The small window size implies high time resolution but low frequency resolution (due to the small variance in time selected). By using the STFT, we could tell where in time the impulses occurred, but we would only be able to notice high-frequency changes. Any slow changes to the signal would not be noticed in the frequency analysis due to the windowing. 3.3 Wavelet Transform We have seen that the STFT can be used to find time and frequency information at a set resolution and we have also seen that the resolution must be set in order to extract certain time-frequency information from a signal. If we want to examine a signal across different time-frequency resolution scales we must look into multiresolution analysis (MRA). MRA is intended to allow for high time resolution with poor frequency resolution at high-frequency levels and poor time resolution with high frequency resolution at low-frequency levels. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
276
•
L. A. F. Park et al.
To perform this task, a wavelet [Mallat 2001] is used instead of a windowing function. The name wavelet means a small wave or a wave which is not of infinite length (as in a sinusoidal wave). A wavelet is described by a function ψ ∈ L2 (R) (where L2 (R) is the set of functions f (t) which satisfy | f (t)|2 dt < ∞) with a zero average and norm of 1. A wavelet can be scaled and translated by adjusting the parameters s and u, respectively. 1 t−u ψu,s (t) = √ ψ . (14) s s The scaling factor keeps the norm equal to one for all s and u. The wavelet transform of f ∈ L2 (R) at time u and scale s is +∞ 1 ∗ t−u W (u, s) = f , ψu,s = f (t) √ ψ dt, (15) s s −∞ where ψ ∗ is the complex conjugate of ψ. The appeal of the wavelet transform is its ability to focus on regions of the signal. If we were to compare two or more signals for similarity, we could examine the signals top down. When the signals differ at a certain level, we know that we do not have to delve any deeper. This is an exciting property which can be used in information retrieval. If we consider a word signal, the wavelet transform of this signal will give us the location of the words at any desired resolution. For example, the first wavelet component will tell us if the word exists in the document. The second will tell us where the main cluster of the word is in the document. The third and fourth will identify the general areas where the word appears in the first and second half of the document, respectively, and so on until we get to the exact location of the word. To construct a wavelet function, ψu,s (t), we must first obtain a scaling function, φu,s (t) ∈ Vn . One of the properties that the scaling function must satisfy is that · · · ⊂ Vn+1 ⊂ Vn ⊂ Vn−1 · · · , where the set of φu,s (t) for all u is a basis of Vn (s = 2n for dyadic scaling), and 2 n∈Z Vn = L (R). This implies that we can show each set of scaling functions in terms of its subset of scaled scaling functions: Vn−1 = Vn ∪ Wn−1 ,
Vn ⊥ Wn−1 ,
(16)
where ⊥ implies orthogonality. An example of the relationship between Vn and Wn can be seen in Figure 3. If we observe the set of functions Wn , we can see that it satisfies the following properties:
Wn = L2 (R), Wn = ∅. (17) n∈Z
n∈Z
Therefore the set of functions Wn for all n is a basis for L2 (R). This set Wn is the set of shifted wavelet functions at resolution n. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
277
Fig. 3. An example of the enclosing scaling function (φu,s (t) ∈ Vn ) spaces as ovals and the wavelet function spaces (ψu,s (t) ∈ Wn ) as annuli.
Fig. 4. The high-pass filter (H) applies the wavelet transform ψu,s (t) to our data for a specific scale s; the low-pass filter (G) applies the scaling function φu,s (t) to extract all of the information not accounted for. Each result is decimated without loss of information.
3.4 Discrete Wavelet Transform We have seen that the wavelet transform divides L2 (R) into the sets · · · ⊂ Vn+1 ⊂ Vn ⊂ Vn−1 · · ·, where Wn = Vn ∩ Vn+1 or Vn = Wn ∪ Vn+1 , Wn ∩ Vn+1 = ∅. This is a recursive filtering process, where each resolution of scaling functions (φu,s (t) ∈ Vn ) is split into the next resolution of wavelet functions (ψu,s (t) ∈ Wn ) and the next resolution of scaling functions (φu,2s (t) ∈ Vn+1 ). When given in its discrete form, the dyadic wavelet transform can be shown as a sequence of high-pass1 and low-pass2 filters, where the filter coefficients of the high-pass filter describe the wavelet transform and the low-pass filter coefficients describe the scaling function which is taking place (shown in Figure 4). We observe that the output of the high-pass filter (the wavelet components) are part of the resulting transform coefficients, and the output of the low-pass filter is fed back into another high- and low-pass filter to be split again (the coefficients of the scaling function decomposition), as shown in Figure 5. Until the mid 1980s, there was no such filter that could provide perfect reconstruction of the decomposed signal; therefore there was no way this idea could be applied in practice. The Conjugate Mirror Filter (CMF) [Vetterli 1986; Vetterli and Herley 1992] provided a means to build these wavelet filter banks and provided the spark to the wavelet community. By performing this recursive filtering process, 1A 2A
high-pass filter allows high-frequency data through (the upper Fourier transform components). low-pass filter allows low-frequency data through (the lower Fourier transform components). ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
278
•
L. A. F. Park et al.
Fig. 5. Recursive filtering process. H is a high-pass filter (using the wavelet coefficients) and G is a low-pass filter (using the scaling function coefficients). The low-pass data gets split and decimated repeatedly to obtain its wavelet transform. An example of this is shown in process (19).
we are able to complete the transform in linear time, which is faster than the Fourier transform. We will now provide a simple example of how we can use the transform to provide us with the different levels of resolution of a signal. The Haar wavelet is equivalent to 1 cycle of a square wave. To perform the wavelet transform, we take every possible scaled and shifted version of the wavelet and find how much of this wavelet is within our signal (by finding the dot product). For example, if our signal is fd˜ ,t = [2 0 0 1 1 1 0 0] and we use the Haar wavelet transform 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 − 1/8 − 1/8 − 1/8 − 1/8 1/4 1/4 −1/4 −1/4 0 0 0 0 0 0 0 1/4 1/4 − 1/4 − 1/4 0 W = , 1/2 − 1/2 0 0 0 0 0 0 0 0 1/2 − 1/2 0 0 0 0 0 0 0 0 1/2 − 1/2 0 0 0 0 0 0 0 0 1/2 − 1/2 the wavelet components will be √ √ √ √ √ √ W f˜d ,t = [ 5/ 8 1/ 8 1/ 4 2/ 4 2/ 2 −1/ 2 0 0 ] ,
(18)
where x is x transposed for any vector x. This transformed signal shows √ us the positions of the terms at many resolutions. The first component (5/ √8) shows that there are five occurrences of the term. The second component (1/ 8) shows that there is one more occurrence of the term in the first half of the signal than in the second half. The third component shows that there is one more occurrence of the term in the first quarter compared to the second quarter. The fourth component compares the third and fourth quarters. The next four components compare the eighths of the signal. Therefore, we can observe the signal at different levels of resolution by noting certain components of the transformed signal (as shown in Table I). √ √ √ √ √ √ [5 8][1 8][ 1 4 2 4 ][ 2 2 −1 2 0 0 ]. If each component of the original signal represented a portion of a document, we could use the wavelet transform to analyze the query term positions at ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
279
Table I. The Spectral Coefficients Produced from the Haar Wavelet Transform in (18) (We can see that the first component contains no signal position information (low resolution), and the last four components focus on two side by side components (high resolution).) Transformed Value √ 5√8 1√8 1√4 2√4 2 2 √ −1 2 0 0
Description Sum of signal First half–second half of the signal First quarter–second quarter of the signal Third quarter–fourth quarter of the signal First eight–second eighth of the signal Third eighth–fourth eighth of the signal Fifth eight–sixth eighth of the signal Seventh eighth–eighth eighth of the signal
multiple document resolutions. If we chose only the first wavelet component, we would treat the document without spatial information (as in the vector space methods). The matrix multiplication causes this transformation to be of order O(N 2 ) for signals of N elements. To speed up this process, we must use the wavelets scaling function as well as the wavelet function. Applying the scaling function to our signal allows us to extract all of the information orthogonal to the wavelet of the current resolution and also adjusts the signal to the next level of resolution. Therefore, we can obtain the wavelet transform by following a simple recursive process: —set input signal (x); ˜ — Initialize output elements ( y˜ = ∅); — Initialize counter (n = #(x)); ˜ —while n = 1: (1) apply wavelet function to signal, decimate by factor of 2, and store in wavelet signal ( y˜ n/2,n−1 = D2 (H(x))); ˜ (2) apply scaling function to signal, decimate by factor of 2, and use as new input signal (x˜ = D2 (G(x))); ˜ (3) half counter (n = n/2); — assign the remaining input element to zeroth element of wavelet signal y˜ 0,0 = x. ˜ In the above, #() provides the number of elements in the signal, D2 () is the decimating function, y˜ n/2,n−1 is elements n/2 to n − 1 of signal y˜ . and the Haar We will show an example of this using the same f˜d ,t as before√ √ wavelet which has wavelet coefficients (high-pass filter) of [1/ 2 − 1/ 2] √ √ and scaling function (low-pass filter) [1/ 2 1/ 2]. The first iteration of the scaling function application is a convolution between the scaling function and the signal; this can also be thought of as the dot product of the many shifted versions of the scaling function at the first resolution with the signal. Therefore, we have √ √ √ f d ,t · [ 1/ 2 1/ 2 0 0 0 0 0 0 ] = 2/ 2, √ √ √ f d ,t · [ 0 0 1/ 2 1/ 2 0 0 0 0 ] = 1/ 2, ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
280
•
L. A. F. Park et al.
√ √ √ f d ,t · [ 0 0 0 0 1/ 2 1/ 2 0 0 ] = 2/ 2, √ √ f d ,t · [ 0 0 0 0 0 0 1/ 2 1/ 2 ] = 0. The convolution of the Haar scaling function and the signal fd˜ ,t produces √ √ √ [ 2/ 2 1/ 2 2/ 2 0 ]. By performing the same operation with the wavelet function, we receive √ √ √ f d ,t · [ 1/ 2 −1/ 2 0 0 0 0 0 0 ] = 2/ 2, √ √ √ f d ,t · [ 0 0 1/ 2 −1/ 2 0 0 0 0 ] = −1/ 2, √ √ f d ,t · [ 0 0 0 0 1/ 2 −1/ 2 0 0 ] = 0, √ √ f d ,t · [ 0 0 0 0 0 0 1/ 2 −1/ 2 ] = 0, The convolution of the Haar wavelet function and the signal fd˜ ,t produces √ √ [ 2/ 2 −1/ 2 0 0 ]. These results are concatenated to produce the first iteration of the wavelet √ transform √ √ (shown in process (19)). The scaling function result ( [ 2/ 2 1/ 2 2/ 2 0 ]) is passed onto the second iteration, and the wavelet result is kept as part of the answer. The iteration involves convolving the scaling function result √ second √ √ ( [ 2/ 2 1/ 2 2/ 2 0 ]) with the scaling function: √ √ √ √ √ √ [ 2/ 2 1/ 2 2/ 2 0 ] · [ 1/ 2 1/ 2 0 0 ] = 3/ 2, √ √ √ √ √ √ [ 2/ 2 1/ 2 2/ 2 0 ] · [ 0 0 1/ 2 1/ 2 ] = 2/ 2. It also involves convolving the scaling function result with the wavelet function: √ √ √ √ √ √ [ 2/ 2 1/ 2 2/ 2 0 ] · [ 1/ 2 −1/ 2 0 0 ] = 1/ 2, √ √ √ √ √ √ [ 2/ 2 1/ 2 2/ 2 0 ] · [ 0 0 1/ 2 −1/ 2 ] = 2/ 2. We keep the result from the wavelet convolution and pass the scaling function convolution to the next iteration. We can see that the portion obtained from the wavelet function convolution is kept as a piece of the transform result, but the portion obtained from the scaling function is fed back in to the system to be used again. The complete wavelet transform process is √ √ ! √ $ √ % 5/ 8 3/ 4 2/ 2 ⇒ 5/ 8 2 √ √ ⇒ & √ ' √ 0 1/ 8 2/ 4 1/ 2 1/ 8 √ ⇒ " √ # √ √ 1/ 4 0 1/ 4 1/ 4 2/ 2 √ √ √ 1 2/ 4 2/ 4 0 2/ 4 ⇒ √ √ √ √ . (19) 1 2/ 2 2/ 2 2/ 2 2/ 2 √ √ √ √ 1 −1/ 2 −1/ 2 −1/ 2 −1/ 2 0 0 0 0 0 0 0 0 0 0 ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
281
We have shown the result from the scaling function convolution in braces {} and the result from the wavelet function convolution in parentheses (). The values produced by the shifted wavelet function (wavelet convolution) are kept during each iteration. The values produced by the scaling function are fed back into the recursive splitting process. We can see that this process is a divide-and-conquer application of the wavelet transform. We can collapse the recursive process into a single pass over the data to produce the transformed data and hence reduce the wavelet transform to order O(N ). By using the wavelet and scaling functions together, we do not need to scale the wavelet function (only shifts are performed); the data is inversely scaled by the scaling function to allow for this. For a more rigorous proof see Mallat [2001]. 4. CHOOSING A WAVELET Before applying a wavelet transform to a data set, we must first choose a wavelet from the many varieties that exist. Some well-known wavelets come under the title of Daubechies [Daubechies 1988], Shannon, Battle-Lemari`e, Meyer, Symmlets, Spline, and Biorthogonal [Mallat 2001]. Before we can choose one, it is necessary to understand the properties that define each wavelet. 4.1 Wavelet Properties The two main factors which will influence our choice of wavelet are the number of vanishing moments and the size of support. Both of these must be considered to extract the most information in the least space from the data set which we are using. 4.1.1 Vanishing Moments. When storing or transmitting a signal, it is best if we can do so by using the least amount of storage or the least amount of bandwidth. In the document retrieval domain, we want to store our data in the smallest space possible (which is also simple to access and retrieve). When compressing a signal, whether it be for storage or transmission, we want to fit as much information as possible into the smallest possible space. Mapping data into a transformed space is a simple method of compression because the transform algorithm can be used at the compression and decompression sides without any statistical knowledge of the data. Optimal compression algorithms encode the frequent data in the smallest number of symbols. Therefore, we want a set of orthogonal wavelet basis functions which would be best for most of the signals we want to compress. Signal compression is measured in terms of vanishing moments of the wavelet function. The kth moment of a function f (t) is defined as ∞ νk = t k f (t) dt. (20) −∞
Therefore, for the kth moment to vanish, Equation (20) must equal zero. A wavelet is said to have n vanishing moments if Equation (20) is zero for 0 ≤ k < n. We will show that wavelets with a higher number of vanishing moments are able to represent smooth functions in a more compact manner. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
282
•
L. A. F. Park et al.
Fig. 6. Functions exhibit compact support if they have no infinite interval of non-zero values.
The number of vanishing moments of a wavelet is related to the differentiability of the wavelet. If we consider the signal f which is m times differentiable in the region [v − h, v + h], we can find the Taylor polynomial approximation to this at v: pv (t) =
m−1 k=0
f (k) (v) (t − v)k , k!
(21)
which has the error v (t) = f (t) − pv (t) where | v (t)| ≤
|t − v|m sup | f (m) (u)| m! u∈[v−h,v+h]
∀t ∈ [v − h, v + h].
The wavelet transform of f is ∞ 1 t−u W (u, s) = f (t) √ ψ dt s s −∞ ∞ 1 = √ f (t) ψ(t ) dt s −∞ ∞ ∞ 1 1 = √ pv (t) ψ(t ) dt + √
v (t) ψ(t ) dt, s −∞ s −∞
(22)
(23) (24) (25)
where ψ(t) has n vanishing moments. If the polynomial pv has degree of at most n − 1, we notice that the first term vanishes and we are left with the wavelet transform of the error term. W (u, s) = W v (u, s).
(26)
Therefore, the more vanishing moments a wavelet has, the smaller the error term will be. This implies that if we use a wavelet with more vanishing moments we will produce transformed data components in which only a few will be significant, requiring little storage space. 4.1.2 Size of Support. Another factor that affects the focusing of the wavelet transform is the support size of the wavelet. The support of a function f is the domain in which the function is nonzero [Weisstein 1999]. A function f has compact support if its support is bounded. For example the square function (shown in Figure 6) 1 0 ≤ t ≤ 1, f sq (t) = (27) 0 otherwise, ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
283
has support t ∈ [0, 1] which is compact; therefore f sq has compact support. If we observe the function f exp (t) = et , we observe that it has support t ∈ R, which is not compact. Therefore, f exp does not have compact support. When choosing a wavelet to use for a specific data set, it is essential that we examine the wavelet’s size of support. If we examine the wavelet ψ j,n (t) = 2− j /2 ψ(2− j t − n),
(28)
we notice that, if there exists a singularity of large magnitude at point t0 in function f , then f , ψ j,n may have a large magnitude. If there are K wavelets whose support includes the point t0 for each level of scale 2 j , then the wavelet function has support of size K . Therefore, the greater the size of support of the wavelet, the more wavelet components will include the singularity t0 , and therefore the more likely many high-magnitude wavelet coefficients will exist. If we reduce the size of the support of the wavelet, we will have fewer highmagnitude components, and therefore fewer significant components to consider (for storage or transmission) when performing later calculations. Choosing a small size of support is essential in our application of the wavelet transform. If we examine our term signals, we will see that they consist of a few singularities (most term signals contain one singularity). Therefore, the larger the support of the wavelet, the more nonzero components will exist in the transformed data. Our index size will be more compact if we choose a wavelet with a small size of support. 4.2 Selected Wavelets To analyze the positions of the terms and their relationship with other terms in the document, we must be able to analyze their relationship document-wise (as the low-resolution set of wavelets should do by including every term signal component in their calculations) and we must also be able to analyze the terms position-wise (as the high-resolution set of wavelets should do, by including only single components). Every wavelet has a lowest level of resolution which includes every element in the set to be transformed, but not every wavelet can focus tightly on the elements wanted. We mentioned before that, if we want to find singularities in a signal and obtain transformed data with the least nonzero coefficients, then we must choose a wavelet which has a small support. The wavelets with the smallest support size are the Haar wavelet and the Daubechies-4 wavelet [Daubechies 1988]. This can be seen by the number of filter coefficients needed to describe each of them (two for Haar and four for Daubechies-4). 4.2.1 The Haar Wavelet. The Haar wavelet [Haar 1910] (shown with its scaling function in Figure 7) has compact support of size 1 but is not continuously differentiable. The compact support of the wavelet implies that the transformed signal will require less storage space than one which does not have such a compact support. For example, if we take the Fourier transform of an impulse, we get a significant value (not close to zero) for each of the frequency components (therefore, we have to store B values for B components). If we do the same with the Haar wavelet, we obtain B/2 significant values. This example ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
284
•
L. A. F. Park et al.
Fig. 7. The Haar scaling function (φ(t)) and wavelet (ψ(t)).
is a very common case for word signals; therefore, the size of the index should reduce by more than 50%. Due to the shape of the Haar wavelet, if we apply the Haar wavelet to a signal, we can see that we are calculating the difference between the left and right components of the signal. If the resulting inner product of a positive real signal is positive, we can deduce that the signal is biased to the left. If the inner product is negative, then the signal is biased to the right. This also goes for the positions of the signal when examining the focused components It is useful to recall that, when using the Fourier [Park et al. 2004] and cosine transforms [Park et al. 2002a], we obtained the word signal and mapped it to the corresponding domain using the transform. By obtaining the spectral information, we were able to compare signals and find their relative positions to each other using the phase of the spectrum, and we were able to identify the frequencies of the terms by examining the magnitude of the signal. The Fourier transform was initially chosen because of its ability to map shifts in a signal to a phase change. The Cosine transform was chosen for its partial ability to map shifts to phase changes and because it produces real values (and unlike the Fourier transform which produces complex values), which requires less storage (and hence a smaller faster spectral index). If we examine the Haar transform, we will notice that it deals with shifts in its own unique way. For example, if we compare the transform of a signal with a value of 1 at the first element and zeros elsewhere, and the transform of a signal with a value of 1 at the second element and zeros elsewhere (see Table II), we observe that the only difference between these two signals is the fifth wavelet component (which changes its sign). We notice from Table II that the same goes for impulses at components three and four, five and six, and seven and eight. If we examine the shift across a factor of 2 boundary (e.g., compare positions 2 and 3, or 4 and 5), we see more ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
285
Table II. Haar Wavelet Decomposition of Signal with Impulse at Term Component Position Term Signal ( 1 0 0 0 0 ( 0 1 0 0 0 ( 0 0 1 0 0 ( 0 0 0 1 0 ( 0 0 0 0 1 ( 0 0 0 0 0 ( 0 0 0 0 0 ( 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
) ) ) ) ) ) ) )
[ 0.35 [ 0.35 [ 0.35 [ 0.35 [ 0.35 [ 0.35 [ 0.35 [ 0.35
Haar Wavelet Transform of Term Signal 0.35 0.50 0 0.71 0 0 0.35 0.50 0 −0.71 0 0 0.35 −0.50 0 0 0.71 0 0.35 −0.50 0 0 −0.71 0 −0.35 0 0.50 0 0 0.71 −0.35 0 0.50 0 0 −0.71 −0.35 0 −0.50 0 0 0 −0.35 0 −0.50 0 0 0
0 ] 0 ] 0 ] 0 ] 0 ] 0 ] 0.71] −0.71]
than one coefficient change. This is due to the decomposition of the discrete dyadic wavelet transform and the support of the Haar wavelet. The last four wavelet coefficients show small changes in the signal position, the third and fourth wavelet coefficients display larger changes in the signal position, and the second component shows even greater changes in the signal position (the first coefficient is related to the sum of the signal and is not effected by the signal position). From this, we can see that, if we take only the first 1, 2, or 4 coefficients, we will be observing the function at a different resolution. Wavelets give us a new perspective by extracting the positions at different resolutions. 4.2.2 Daubechies Wavelets. A short time after the introduction of conjugate mirror filters, the conditions of the filters were found to be identical to those of orthogonal wavelets. Ingrid Daubechies [Daubechies 1988] took advantage of this relationship and found that for a wavelet (filter) to have p vanishing moments, it must have a support of at least 2 p. This theorem set a lower limit on the number of filter coefficients needed to perform a wavelet decomposition of a certain smoothness. From this knowledge, she was also able to derive a set of wavelets which have this minimum support for a given number of vanishing moments; these are called Daubechies wavelets. It is interesting to note that the wavelet derived with minimum support when given only one vanishing moment is in fact the Haar wavelet. The wavelet we will be examining is the Daubechies-4 wavelet (so called because it has four filter coefficients in each high-pass and low-pass filter, and two vanishing moments) shown in Figure 8. 5. COMPUTATIONAL COMPLEXITY AND STORAGE To thoroughly examine a retrieval method, we must not only examine the precision of results, but also the computational and storage requirements. 5.1 Computational Complexity If we take note of how that discrete wavelet transform can be performed using a recursive filter decomposition (shown in Section 3.4), we can easily see that the time taken to perform the discrete dyadic wavelet transform is linear with respect to the number of discrete elements chosen to transform [Mallat 2001]. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
286
•
L. A. F. Park et al.
Fig. 8. The Daubechies-4 scaling function (φ(t)) and wavelet (ψ(t)).
Therefore, if we choose B components for our term signal vectors, the wavelet transform will be of order O(B), and the FFT is of order O(B log B) [Proakis and Manolakis 1996]. This is performed for each query term. In the following lists, we will represent the number of documents in the document set as N , the number of query terms as τ , and the number of components as B. Each vector space method of document retrieval at query time involves the following: (1) Apply the specific weighting scheme to each query term in each document (O(N τ )). (2) Sum each weighted query term in each document to obtain the document scores for each document (O(N τ )). Therefore, the overall computational complexity of a vector space method is O(N τ ). The spectral methods require a bit more computation: (1) Apply the specific weighting scheme to each component of each query term in each document (O(N τ B)). (2) Perform the selected transform on each weighted word signal in each document (O(N τ B log B) for the FFT or O(N τ B) for the DWT). (3) Calculate the phase precision for each spectral component across each query term in each document (O(N τ B)). (4) Calculate the magnitude for each spectral component across each query term in each document (O(N τ B)). (5) Calculate the score components by multiplying the magnitude and phase precision of each component in each document (O(N B)). (6) Find the document score by summing the score components from each document (O(N B)). ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
287
Therefore, the overall time complexity of the Fourier transform method is O(N τ B log B) and the time complexity of the wavelet method is O(N τ B) if the transform is performed during the query time. We have taken a simplistic view of both the vector space and spectral methods by including calculations on all documents. In practice, we would precompute spectral values and use accumulation schemes to find the approximate top n documents, and hence only the values associated to the top n documents would need to be processed at the query time. A typical query time for our experiments performed later in the article (Section 7.2) is 0.02 s for the vector space method and 0.1 s for the spectral retrieval method using eight components. Reducing the number of components used in the calculations reduces the query time in a linear fashion. At this time, we should remind the reader that the vector space method of information retrieval is a special case of the wavelet information retrieval method where B = 1. Therefore, by using the wavelet method, we are able to have more freedom as to how we perform our searches by choosing alternative B values to trade off between speed and accuracy. 5.2 Storage Size To examine the impact our wavelet method has on storage, we will look at the common case of a document containing one occurrence of a term. In this case, previous methods using the Fourier and cosine transforms would transform a term signal with one nonzero element to a term signal with no nonzero elements. This leaves us with B times as many elements to store when compared to the vector space methods. Using compression techniques such as quantization and cropping [Park et al. 2002b] led us to the conclusion that we were able to retain the high-precision results of the Fourier and cosine retrieval methods with an index of only four times the size of that of the vector space methods. By using the wavelet transform, we obtain term signals with B/2 nonzero elements when transforming a term signal with one element. Therefore, by using the same compression techniques found in the Fourier and cosine methods, we would expect to reduce the index size even further while still retaining the high precision of the wavelet retrieval method. Our experiments showed that, when we stored spatial values (the term signals consisting of positive integers and many zeros) in the index, we were able to achieve an index size which is 20% larger than the vector space method (74 MB compared to 60 MB). When storing the spectral values (the term spectra consisting of floating-point values and not as many zeros) in the index, using 6-bit floating point quantization, the index was 160% larger than the vector space method index (160 MB). In this index, all eight spectral components were stored for each signal. To reduce the index size, we could choose to store only the first two or four spectral components for each term spectrum. By removing these components, the index size would be reduced linearly, but the precision of the document rankings would be slightly poorer. In our experiments we have shown this effect on precision. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
288
•
L. A. F. Park et al.
6. ANALYSIS Before we present the experimental results of our new method, we first demonstrate that our scoring function satisfies two important basic properties: — The score must monotonically increase with the increase of occurrences of a query term. — The score must monotonically increase as the displacement between two query terms decreases. We will examine these properties in the following sections with both the Haar and Daubechies wavelets. 6.1 Occurrence Analysis A desired property of a document score calculation is an increase in score when there is an increase in the number of query terms found in the document. The spectral document ranking method takes into account the number of query terms in a document during the magnitude calculations. If we ignore the phase precision for the moment, we see that the document score is based on the l p norm of the sum of the query term spectra associated with the document. The wavelet transform is a linear transformation; therefore, it satisfies the following properties: (1) W (x˜ 1 + x˜ 2 ) = W (x˜ 1 ) + W (x˜ 2 ), (2) W (α x) ˜ = αW (x), ˜ where x˜ is a vector and α is a scalar. This implies that summing the query term signals and performing the wavelet transform on the combined signal will achieve the same result as summing the query term spectra. Therefore, we can examine the effect of one term signal and assume that it is the sum of all of the query term signals. We want to show that the document score for combined query term signal w˜ d ,t + x˜ is greater than the document score for term signal w˜ d ,t , where x˜ is a vector containing real values greater than or equal to zero. THEOREM 1. If we increase our term signal w˜ d ,t to w˜ d ,t + x, ˜ where each element xb ≥ 0, then ζ˜d ,t 2 ≤ ζ˜d ,t + ξ˜ 2 ,
(29)
where ζ˜d ,t and ξ˜ are the wavelet transforms of w˜ d ,t and x, ˜ respectively. PROOF.
If we take a single component wd ,t,b ∈ w˜ d ,t,b, we can show that wd ,t,b ≤ wd ,t,b + xb, wd2 ,t,b
(30) 2
≤ (wd ,t,b + xb) ,
w˜ d ,t 2 ≤ w˜ d ,t + x ˜ 2.
(31) (32)
By using Plancherel’s theorem [Mallat 2001], we observe that the signal energy is conserved after the wavelet transform has taken place: W (w˜ d ,t )2 ≤ W (w˜ d ,t + x) ˜ 2. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
(33)
Novel Document Retrieval Method
•
289
The linear property of the wavelet transform allows us to split the transform on the right-hand side to obtain ζ˜d ,t 2 ≤ ζ˜d ,t + ξ˜ 2 ,
(34)
where ζ˜d ,t = W (w˜ d ,t ) and ξ˜ = W (x). ˜ If we use the l 2 norm in our document score calculations, Theorem 1 shows us that the document score will increase if we increase the number of query terms in the document. To generalize the magnitude score calculation to p being a natural number, we will treat the shifted and scaled wavelets as an orthonormal basis ψ˜ b. Therefore, each of the term spectra coefficients can be shown as the inner product of the term signal and one of the wavelet basis (ζd ,t,b = w˜ d ,t , ψ˜ b), ζ˜d ,t p =
B−1
|ζd ,t,b| p =
b=0
B−1
p
|ζd ,t,b| =
b=0
B−1
p
|w˜ d ,t , ψ˜ b |,
(35)
b=0
assuming that the wavelets are in the real domain. If we increase w˜ d ,t by α δ˜k (where each element δk,b = 1 if k = b, or zero otherwise) we achieve B−1
p |w˜ d ,t + α δ˜k , ψ˜ b | =
b=0
B−1
|(w˜ d ,t , ψ˜ b + αψb,k ) p |.
(36)
b=0
Therefore, we want to show that B−1
p |w˜ d ,t , ψ˜ b | ≤
b=0
B−1
|(w˜ d ,t , ψ˜ b + αψb,k ) p |.
(37)
b=0
For the p = 1 case, the right-hand side becomes B−1
|w˜ d ,t , ψ˜ b + αψb,k |.
(38)
b=0
This shows that, in the p = 1 case, there is no guarantee that, Equation (37) will hold, since it depends on the wavelet set and the original signal. For the p = 2 case, the right-hand side of Equation (37) becomes B−1
2 2 w˜ d ,t , ψ˜ b + 2w˜ d ,t , ψ˜ bαψb,k + α 2 ψb,k .
(39)
b=0
If we split up the summation: B−1
2 w˜ d ,t , ψ˜ b + 2α
b=0
B−1
w˜ d ,t , ψ˜ bψb,k + α 2
b=0
B−1
2 ψb,k ,
(40)
b=0
which simplifies to B−1
2
w˜ d ,t , ψ˜ b + 2αwd ,t,k + α 2
(41)
b=0 ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
290
•
L. A. F. Park et al.
due to the orthonormal properties of the wavelet basis vectors. We know that wd ,t,k and α are positive; therefore, any increase in wd ,t,k results in an increase in the magnitude calculation of the document score for p = 2 (which is also shown by Theorem 1). This shows that we would expect the l 2 norm results to be more precise than the l 1 norm results. This fact is supported by our experimental results. 6.2 Proximity Analysis The spectral document retrieval methods achieve high precision because they combine the ideas from the vector space methods, which only analyze the term counts in the documents, and the proximity methods, which take into account the displacement of the query terms from each other. In this section we will examine how each method reacts to the position of two different terms. The ideal is that each method give a score which is monotonically decreasing when the displacement of the terms becomes larger, but we will see that that is not always the case. To conduct the experiment, we have chosen two query terms (qm , qn ) which exist in a document once each at a displacement of b components apart. Each query term has the same term weight and, since they appear in the same document, the same document normalization. Due to the nature of the Haar wavelet decomposition, each wavelet coefficient of qm will have the same sign as the wavelet coefficients of qn if the word appears in the same position. If we normalize our weighted term signals for qm and qn so that only ones and zeros occur, we are able to show the effect of each position on the Haar wavelet transform. Analyzing the proximity of terms is not as simple as in the Fourier transform case. We cannot use displacement as a parameter because the wavelets have compact support. To analyze the change of the score when the proximity of the terms changes, we experimented with the Haar and Daubechies-4 methods. Each experiment compared a single term appearing at component position zero and another term whose position was adjusted. The score was then plotted against the displacement of the two terms. We hoped that the score would decreased as the displacement between the two terms grew. The results are shown in Figures 9(a), 9(b), 10(a), and 10(b). The results show that both of the Haar methods have a nice monotonic decrease as the displacement increases, but both of the Daubechies-4 methods do not. We expected problems with the Daubechies-4 wavelet method because the wavelet shape (shown in Figure 8) is not a simple rise and fall. The wavelet does decrease, but then increases above zero and then back to zero. This is a common behavior among wavelets because of the property that they must integrate to zero. For this reason, we will assume that most other wavelets will behave in the same fashion as the Daubechies-4 wavelet. 7. EXPERIMENTS AND RESULTS We have performed many extensive experiments using the spectral retrieval to empirically show that the wavelet information retrieval method is superior ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
291
Fig. 9. Scores derived from the Haar wavelet transform using (a) sum, (b) sum of squares, of a document containing two query, terms of equal weight. The first is found in component zero, the second is found in component Term bin displacement.
Fig. 10. Scores derived from the Daubechies-4 wavelet transform using (a) sum, (b) sum of squares, of a document containing two query, terms of equal weight. The first is found in component zero, the second is found in component Term bin displacement.
to the vector space and proximity methods. But before we discuss our results in depth, we will go though a simple example of how this wavelet document retrieval method works. 7.1 Sample Document Set We will use the sample data found in Table III and their wavelet transforms found in Table IV. The tables contains three selected terms from three different documents. Each term in each document has a corresponding term signal containing eight elements. Each element represents the occurrence of that term in that portion of the document. For example, the first element of the term signal laugh in document 1 is 2. This implies that the term laugh occurs ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
292
•
L. A. F. Park et al.
Table III. A Sample Set of Terms with Their Term Signals Within Three Documents Terms laugh diary smile
Document 1 ( ) 2 0 0 1 1 1 0 0 ( ) 0 0 0 1 0 1 0 0 ( ) 0 0 0 0 0 0 1 0
Document 2 ( ) 0 0 0 1 0 1 0 0 ( ) 0 1 0 0 1 0 0 0 ( ) 0 0 0 0 1 0 0 0
Document 3 ( ) 0 1 0 0 1 0 0 1 ( ) 0 1 0 1 0 0 0 0 ( ) 0 0 0 0 0 1 0 0
Table IV. The Haar Wavelet Transforms of the Term Signals Found in Table III Terms laugh diary smile
Document 1
*
√5 √1 √1 √2 8 4 4 * 8 −1 √1 √2 0 √ 4 4 * 8 −1 −1 √1 √ 0 √ 8 8 4
Terms
+
*
laugh diary
Document 2
*
−1 √2 √ 0 0 2 2 + −1 −1 0 √ √ 0 2 2 + 0 0 0 √1 2
√2 * 8 √2 * 8 √1 8
−1 √ 4 0 √1 4 −1 √ 0 8
0
Document 3 −1 √ 8
√3 *8
*
smile
√2 8 √1 8
√1 4
√2 8 −1 √ 8
0
0 0 0
√1 4
√1 2 −1 √ 2
0
√1 2
−1 √ 2
0 0
−1 √ 2+
0 0 −1 √ 2
0
−1 −1 √ √1 0 √ 4 2 2 √1 √1 0 √1 4 2 2 √1 0 0 √1 4 2
0
+
+ 0 + 0
+
+
twice in the first eighth of document 1. The fourth element contains the value 1, which implies that the term laugh occurs once in the fourth eighth of document 1. Note: we will ignore the initial preweighting stage and process the data as if the weights were unitary. Preweighting is an important part of the retrieval process; it is only left out in this example to focus on the wavelet retrieval process. If we query our database with the terms smile and diary, the system will extract the data for those terms only (therefore, we will ignore the data for the term laugh from now on). The system will calculate the score for each document, so we will examine the score calculation for the first document in detail. Our wavelet components are Terms diary
Document 1 ( 2 ) −1 −1 √1 −1 √ √ 0 √ 0 √ 0 8 4 4 2 2 ( 1 −1 ) −1 √ √ 0 √ 0 0 0 √12 8 8 4
smile
Take the magnitudes and sum them: Terms diary smile Total
Document 1 Magnitude ( 2 0 1 1 0 1 ( (
√ 8 √1 8 √3 8
√
8 √1 8 √1 8
√
4 √0 4 √1 4
√
4 √1 4 √2 4
√
2 √0 2 √0 2
√
2 √0 2 √1 2
√1 2 √0 2 √1 2
√0 2 √1 2 √1 2
) ) )
Take the phase and find the zero phase precision: Terms diary smile Zero phase precision
Document 1 Phase [ 1 0 −1 1 0 −1 −1 [ 1 −1 0 −1 0 0 0 ( 1 1 1 12 0 0 12 2 2
ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
0] 1] 1 2
)
Novel Document Retrieval Method
•
293
Combine the magnitude and phase: Document 1 magnitude Document 1 Zero phase precision Document 1 score vector
( ( (
√3 8
1 √3 8
√1 8 1 2 1 √ 2 8
√1 √2 √0 4 4 2 1 0 0 2 1 √0 √0 √ 2 4 4 2
) √1 √1 √1 2 2 2 ) 1 1 1 2 2 2 1 1 1 ) √ √ √ 2 2 2 2 2 2
We can choose either the sum or the squared sum of the score vector as our document score. If we choose the squared sum, we obtain 51 Document 1 score = = 1.5938. 32 If we follow the same process for the other two documents we obtain Document 1 2 3
Score 1.5938 4.3437 1.5625
If we examine the positions of the query terms (in the term signals), we can see that the document scores reflect the proximity and occurrence of the query terms in the documents. Document 2 scored the highest and has the query terms closest together within the document; we notice in document 2 that both query terms appear in the same component. Document 1 ranked second and we observe that the query terms appear in neighboring components. Finally, document 3 ranked third, with its query terms more distant than the other two documents. 7.2 Application to TREC Document Set To judge how accurate our wavelet document retrieval methods are, we will compare them to the current methods which have shown to be of high precision. The experiments we have performed involved a variety of methods shown in Table V. These methods are labeled in such fashion that the user can identify the specific algorithm used. For example daub-9-4-6.b8 indicates that the Daubechies-4 wavelet was used, 9 as the first digit implies Lnu.ltu weighting, 4 as the second digit means zero phase precision, 6 as the third digit shows that l 2 norm was used to sum the score vectors, and b8 implies that all eight spectral components were used in the score calculations. We compared these against the vector space methods AB-AFD-BAA, BD-ACI-BCA, BI-ACI-BCA, and Lnu.ltu weighting from SMART. We also compared them against a successful term proximity measure [Clarke and Cormack 2000] called shortest-substring retrieval (shown as SSS). Included in the results are the precision values of our fds-7-4-1 method [Park et al. 2004]. This is a similar method to our wavelet method which uses a Fourier transform in place of the wavelet transform. The fds-7-4-1 method uses AB-AFD-BAA preweighting. For each of the trials, we set the term signal length B = 8 (which is also the spectrum signal length). This value has been shown in previous experiments using the Fourier transform [Park et al. 2004] to obtain a high precision without using excessive storage. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
294
•
L. A. F. Park et al. Table V. Experimental Methods (Method names are of the form wavelet-x-y-z.bn where the values of x, y, z, n associate to the description in this table; e.g., haar-5-4-6.b4 implies use of the Haar wavelet, with BD-ACI-BCA preweighting, using sum magnitudes with zero phase precision, combine components with the l 2 norm, and using only the first four of eight components) Label x
y z n
Value
Description
5 7 8 9 1 4 1 6 {1, 2, 4, 8}
BD-ACI-BCA preweighting AB-AFD-BAA preweighting BI-ACI-BCA preweighting Lnu.ltu preweighting Sum vectors with no phase precision Sum magnitude with zero phase precision Combine using l 1 norm Combine using l 2 norm Number of score components added
Fig. 11. Examples of queries taken from TREC queries 51–200 titles.
To perform any experiment in information retrieval, we need a substantial database of documents and a well-defined set of queries and documents which are relevant to these queries. The TREC collection3 is just this. We chose to experiment on the AP2WSJ2 (Associated Press disk 2 and Wall Street Journal disk 2) set containing 154,443 documents. We also selected the titles of queries 51 to 200 (from TREC 1, 2, and 3) as our query set. Examples of typical queries can be seen in Figure 11. We are interested in an algorithm that would be effective for text retrieval on the Web. We have observed that most Web search engine users will only observe the first 20 documents retrieved. If they do not obtain the results they want within this selection, the query is usually reformulated and the search is tried again. Therefore, we will only observe the results of precision after the first 5, 10, 15, and 20 documents (this represents the impatience of the typical Web user). The results sorted by precision after 5, 10, 15, and 20 documents are given in Tables VI(a), VI(b), VII(a), and VII(b), respectively. Each table shows the results for the top 20 and the results of any of the comparative methods below the bar if they did not appear in the top 20. We notice from the tables that the methods that appear at the top of each list use all eight wavelet components to calculate the score and that most use the 3 http://trec.nist.gov/.
ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
295
Table VI. Top 18 Methods Using Data Set AP2WSJ2 with Queries 51 to 200 Sorted by Precision After (a) 5 and (b) 10 Documents (The methods under the lower bar are the vector space and proximity methods which did not make the top 20.) Method
Precision 5
Method
Precision 10
haar-5-4-6.b8 haar-7-4-6.b8 daub4-5-4-6.b8 daub4-7-4-6.b8 fds-7-4-1 daub4-9-4-6.b8 haar-9-4-6.b8 AB-AFD-BAA daub4-7-4-1.b1 haar-7-4-6.b1 haar-7-4-1.b1 daub4-7-4-6.b1 haar-5-4-6.b1 haar-5-4-6.b2 haar-7-4-6.b2 daub4-7-4-6.b4 daub4-5-4-6.b1 daub4-9-4-1.b8
0.5000 0.4960 0.4960 0.4947 0.4947 0.4907 0.4893 0.4880 0.4827 0.4827 0.4827 0.4827 0.4813 0.4813 0.4813 0.4813 0.4813 0.4813
daub4-7-4-6.b8 fds-7-4-1 daub4-5-4-6.b8 haar-5-4-6.b8 daub4-9-4-6.b8 haar-7-4-6.b8 haar-5-4-6.b4 haar-9-4-6.b8 haar-7-4-6.b2 haar-9-4-1.b8 haar-7-4-6.b4 daub4-5-4-6.b2 haar-7-4-1.b2 daub4-7-4-1.b8 daub4-7-4-6.b4 daub4-9-4-6.b4 daub4-7-4-6.b2 daub4-5-4-6.b4
0.4687 0.4673 0.4653 0.4633 0.4620 0.4593 0.4593 0.4593 0.4573 0.4560 0.4560 0.4553 0.4547 0.4547 0.4547 0.4540 0.4540 0.4540
SMART BD-ACI-BCA BI-ACI-BCA SSS
0.4693 0.4440 0.4347 0.3718
AB-AFD-BAA SMART BD-ACI-BCA BI-ACI-BCA SSS (b)
0.4493 0.4493 0.4247 0.4100 0.3362
(a)
l 2 method to combine the score. Each of the methods uses either the AB-AFDBAA, BD-ACI-BCA, or Lnu.ltu weighting with zero phase precision. We can also see that the shortest-substring (SSS) method performs poorly relative to the other methods. The shortest-substring method makes use of logic operators to find the shortest substring of text containing the query. Since there are no operators within the queries used in our experiments, we assumed that all of the query terms were needed (equivalent to inserting an AND operator between each of the terms). Therefore, if any of the terms did not appear in a document, there would be no shortest substring and hence no document score. By using this similarity function, many relevant documents would have received a zero score, resulting in an overall poor precision for the shortest-substring method. It is interesting to see that, even though the Daubechies-4 proximity analysis was not monotonic, the results were just as good as the Haar method for the l 2 combination case. A plot giving the precision after 5, 10, 15, and 20 documents (Figure 12) shows that the wavelet methods are preferred for this level of retrieval (which is the level required for Web searching). This plot shows the Daubechies-4 method above the Haar method for most of the recall levels shown. The Daubechies4 wavelet produces higher-precision results, but we must also take into account the size of the index produced. Since the Daubechies-4 wavelet has larger ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
296
•
L. A. F. Park et al.
Table VII. Top 18 Methods Using Data Set AP2WSJ2 with Queries 51 to 200 Sorted by Precision After (a) 15 and (b) 20 Documents (The methods under the lower bar are the vector space and proximity methods which did not make the top 20.) Method
Precision 15
Method
Precision 20
fds-7-4-1 haar-7-4-6.b8 daub4-7-4-6.b8 daub4-9-4-6.b8 haar-9-4-6.b8 haar-5-4-6.b8 AB-AFD-BAA haar-9-4-1.b8 daub4-5-4-6.b8 haar-7-4-6.b4 haar-7-4-1.b8 daub4-7-4-6.b4 haar-7-4-6.b2 SMART daub4-9-4-1.b8 daub4-7-4-6.b2 haar-9-4-6.b4 daub4-9-4-6.b4
0.4493 0.4449 0.4449 0.4431 0.4413 0.4404 0.4404 0.4396 0.4391 0.4382 0.4382 0.4373 0.4373 0.4356 0.4351 0.4347 0.4333 0.4329
daub4-7-4-6.b8 daub4-9-4-6.b8 haar-7-4-6.b8 fds-7-4-1 haar-9-4-6.b8 AB-AFD-BAA haar-5-4-6.b8 haar-9-4-1.b8 haar-7-4-1.b8 SMART daub4-5-4-6.b8 daub4-7-4-6.b4 haar-7-4-6.b4 haar-7-4-6.b2 haar-5-4-1.b8 daub4-5-4-6.b4 haar-9-4-6.b4 daub4-9-4-1.b8
0.4257 0.4223 0.4223 0.4220 0.4217 0.4217 0.4213 0.4190 0.4183 0.4180 0.4177 0.4157 0.4143 0.4137 0.4123 0.4117 0.4107 0.4103
BD-ACI-BCA BI-ACI-BCA SSS
0.4142 0.3862 0.3078
BD-ACI-BCA BI-ACI-BCA SSS
0.3953 0.3657 0.2856
(a)
(b)
Fig. 12. Precision-recall plot for recall of 5, 10, 15, and 20 documents for the Haar and Daubechies-4 wavelet methods, and the AB-AFD-BAA vector space method. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
Novel Document Retrieval Method
•
297
support than the Haar wavelet, we must expect that more nonzero values will be produced after the transform is applied. Our experiments have shown that the Daubechies-4 wavelet index is on average 1.4 times the size of the index produced from the Haar wavelet on the AP2WSJ2 document set. Therefore, the Daubechies-4 wavelet is preferable to the Haar wavelet for spectral document retrieval in terms of precision, but that comes at the cost of additional storage. 8. CONCLUSIONS The wavelet transform is a tool which has been used in many areas of science and engineering to extract information about the self-similarity of signals. Using this information, we are able to encode the signals in a more compact manner and easily extract desired content to perform other tasks (e.g., compare signals for similarity). We have proposed a new spectral-based information retrieval method using the wavelet transform assuming our hypothesis that a document is more likely to be relevant if the patterns of all query term appearances are similar. We have shown through occurrence and proximity analysis that our new retrieval method behaves in the desired manner. By adjusting the size of the document resolution used, we can reduce this scheme to the vector space method. The techniques developed using the Haar and Daubechies-4 wavelets, when using the l 2 component combination on all components and zero phase precision, were able to consistently produce higher-precision results when compared to the vector space and proximity document retrieval methods. The spectralbased retrieval method produced on average a 4% increase in precision when compared to the corresponding vector space method using the TREC AP2 data set. This was performed in the same fast query time as found in the vector space method, using a larger index to store the extra information used. ACKNOWLEDGMENTS
Thanks to Andrew Liu for discussions on wavelets and the ARC Special Research Centre for Ultra-Broadband Information Networks for its support and funding of this research. REFERENCES BUCKLEY, C., SINGHAL, A., MITRA, M., AND SALTON, G. 1995. New retrieval approaches using SMART: TREC 4. See Harman [1995], pp. 25–48. BUCKLEY, C. AND WALZ, J. 1999. SMART in TREC 8. See Voorhees and Harman [1999], pp. 577– 582. CLARKE, C. L. A. AND CORMACK, G. V. 2000. Shortest-substring retrieval and ranking. ACM Trans. Inform. Syst. 18, 1 (Jan.), 44–78. DAUBECHIES, I. 1988. Othonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996. HAAR, A. 1910. Zur theorie dur othogonalen funktionensysteme. Mathemat. Annalen 69, 331– 371. HARMAN, D., Ed. 1995. The Fourth Text REtrieval Conference (TREC-4). NIST Spec. Pub. 500-236. National Institute of Standards and Technology, Gaithersburg, MD. HAWKING, D. AND THISTLEWAITE, P. 1996. Relevance weighting using distance between term occurrences. Tech. rep. TR-CS-96-08. The Australian National University, Canberra, Australia. ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.
298
•
L. A. F. Park et al.
JAIN, A. K. 1979. A sinusoidal family of unitary transforms. IEEE Trans. Patt. Analys. Mach. Intell. PAMI-1, 4 (Oct.), 356–365. MALLAT, S. 2001. A Wavelet Tour of Signal Processing, 2nd ed. Acacdemic Press, San Diego, CA. MILLER, N. E., WONG, P. C., BREWSTER, M., AND FOOTE, H. 1998. TOPIC ISLANDS—a waveletbased text visualization system. In VIS ’98: Proceedings of the Conference on Visualization ’98. IEEE Computer Society Press, Los Alamitos, CA. 189–196. PARK, L. A. F., PALANISWAMI, M., AND KOTAGIRI, R. 2001. Internet document filtering using Fourier domain scoring. In Principles of Data Mining and Knowledge Discovery, L. de Raedt and A. Siebes, Eds. Lecture Notes in Artificial Intelligence, vol. 2168. Springer-Verlag, Berlin, Germany, 362– 373. PARK, L. A. F., PALANISWAMI, M., AND RAMAMOHANARAO, K. 2002a. A novel Web text mining method using the discrete cosine transform. In 6th European Conference on Principles of Data Mining and Knowledge Discovery, T. Elomaa, H. Mannila, and H. Toivonen, Eds. Lecture Notes in Artificial Intelligence, vol. 2431. Springer-Verlag, Berlin, Germany, 385–396. PARK, L. A. F., PALANISWAMI, M., AND RAMAMOHANARAO, K. 2005. A novel document ranking method using the discrete cosine transform. IEEE Trans. Patt. Analys. Mach. Intell. 27, 1 (Jan.), 130–135. PARK, L. A. F., RAMAMOHANARAO, K., AND PALANISWAMI, M. 2002b. A new implementation technique for fast spectral based document retrieval systems. In IEEE International Conference on Data Mining, V. Kumar and S. Tsumoto, Eds. IEEE Computer Society, Los Alamitos, CA, 346–353. PARK, L. A. F., RAMAMOHANARAO, K., AND PALANISWAMI, M. 2004. Fourier domain scoring : A novel document ranking method. IEEE Trans. Knowl. Data Eng. 16, 5 (May), 529–539. PROAKIS, J. G. AND MANOLAKIS, D. G. 1996. Digital Signal Processing Principles, Algorithms and Applications, 3rd ed. Prentice-Hall, Inc, Englewood Cliffs, NJ. ROBERTSON, S. E. AND WALKER, S. 1999. Okapi/keenbow at TREC-8. See Voorhees and Harman [1999], pp. 151–162. SALTON, G. AND BUCKLEY, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Process. Manage. 24, 5, 513–523. SINGHAL, A., BUCKLEY, C., AND MITRA, M. 1996. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’96, August 18–22, 1996, Zurich, Switzerland (Special Issue of ¨ the SIGIR Forum), H.-P. Frei, D. Harman, P. Schauble, and R. Wilkinson, Eds. ACM, Press, New York, NY, 21–29. UHL, A. 1994. Digital image compression based on non-stationary and inhomogeneous multiresolution analyzes. In Proceedings of the IEEE International Conference on Image Processing (ICIP-94). vol. 3, 378–382. VETTERLI, M. 1986. Filter banks allowing perfect reconstruction. Signal Process. 10, 3, 219–244. VETTERLI, M. AND HERLEY, C. 1992. Wavelets and filter banks: Theory and design. IEEE Trans. Signal Process. 40, 9 (Sept.), 2207–2232. VOORHEES, E. M. AND HARMAN, D. K., Eds. 1999. The Eighth Text REtrieval Conference (TREC-8). National Institute of Standards and Technology Special Publication 500-246. Department of Commerce, National Institute of Standards and Technology, Gaithersburg, MD. WANG, J., LI, J., AND WIEDERHOLD, G. 2001. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Trans. Patt. Analysis. Mach. Intell. 23, 9 (Sept.), 947–963. WEISSTEIN, E. W. 1999. Support. In Eric Weisstein’s World of Mathematics. CRC Press LLC, Boca Raton, FL. Also go to Web site http://mathworld.wolfram.com. ZOBEL, J. AND MOFFAT, A. 1998. Exploring the similarity space. ACM SIGIR For. 32, 1 (Spring), 18–34. Received June 2003; revised July 2004; accepted March 2005
ACM Transactions on Information Systems, Vol. 23, No. 3, July 2005.