set of hand picked synonym sets has yet to be devised, and expanding by the synonym sets that are ..... The runs in which the original terms a was less than or equal to another a tested this assumption. Clearly, the expansion ... is not optimal.
@erY
ExPansion
using
Lexical-semantic
Ellen
M.
ellen@scr.
Siemens
Voorhees siemens.
Corporate
755
College
Princeton,
Relations
com
Research, Road NJ
Inc.
East
08540
Abstract Applications such as office automation, news filtering, help facilities in complex systems, and the like require the ability to retrieve documents from full-text databases where vocabulary problems can be particularly severe. Experiments performed on small collections with single-domain thesauri suggest that expanding query vectors with words that are lexically related to the original query words can ameliorate some of the problems of mismatched vocabularies. This paper examines the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in Word Net. Experimental results show this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the information being sought even when the concepts to be expanded are selected by hand. Less well developed queries can be significantly improved by expansion of hand-chosen concepts. However, an automatic procedure that can approximate the set of hand picked synonym sets has yet to be devised, and expanding by the synonym sets that are automatically generated can degrade retrieval performance.
1
Introduction
Users of retrieval systems that use word matching as a basis for retrieval are faced with the challenge of phrasing their queries in the vocabularies of the documents they wish to retrieve. This difficulty is especially severe in large, full-text databases since such databases cent ain many different expressions of the same concept [1]. Yet the ability to retrieve documents from such databases is crucial in a wide range of applications: retrieving documentation in support of a legal case, facilitating the organization and retrieval of correspondence and forms in an office, filtering news feeds for articles of interest, finding relevant passages within the complete manual set of a complex system for the particular problem at hand, etc. One method of easing the user’s burden when selecting query words is for the retrieval system to automatically expand the query by adding terms that are related to the words supplied by the user. The new terms can either be statistically related to the original query words (that is, the terms tend to cooccur with one another in documents) or chosen from lexical aids such as thesauri, controlled vocabulary schedules, and the like. Using statistical relations to expand query vectors is attractive since the the relations are easily generated from the documents at hand, obviating the need for lexical aids, which are expensive to build and maintain, Unfortunately, such methods have had little success in improving retrieval effectiveness when used apart from relevance data [2, 3]. Indeed, Peat and Willett show there are limitations to the effectiveness one can expect from such systems [4]. (Note, however, that methods that exploit statistical relations but do not expand the query, such as Latent Semantic Indexing [5], have been more successful.) Using lexical aids as a source of related terms has met with some success in small experiments. Salton and Lesk found that expansion by synonyms improved performance but expansion by broader or narrower terms selected from a hierarchical thesaurus was too inconsistent to be generally useful [6]. Wang, Vendendorpe, and Evens found that a variety of lexical-semantic relations improved retrieval performance [7]. However, each of these conclusions was drawn from experiments on very small collections using single-domain thesauri. This paper examines the utility of query expansion by lexical-semantic relations in a large collection that spans several domains. Queries are expanded using the relations encoded in WordNet [8], a large, [9]. general-purpose lexical system built at Princeton University, and are run against the TREC collection To eliminate were
chosen
the confounding by hand.
expected
from
scenario,
the
Thus,
a completely expansion
effects the
of expanding
results
automatic did
not
improve
a poor
reported
here
procedure
that
the
selection
represent uses this
effectiveness
of words, an upper expansion
of queries
that
the terms
bound
strategy. were
that
on the
Even
relatively
were
expanded
performance in this complete
to be best-case at the
62 start.
Less
were
complete
significantly
2
The
queries
—
improved
queries
by the
Retrieval
consisting
of a single
sentence
the
topic
of interest
—
Environment
This section provides the background necessary to understand were carried out. The following section describes the experiments summarizes the conclusions the data support.
2.1
describing
expansion.
the context themselves,
in which the experiments and the remaining section
WordNet
The
expansion
procedure
manually-constructed Science Laboratory a synset. Synsets speech.
For
used
lexical at Princeton are organized
nouns
(the
only
hypernymy/hyponymy relation
M-a
hierarchies. as defined
toy,
The
2.2
dominant
Figure
1 shows
this
work
relies
developed
heavily
by
on the
George
information
Miller
and
of WordNet
relation)
and
used in this study), three
different
and
organizes
relationship, a piece
ZS-a relation
for
of WordNet.
The
six senses
of the
the
is part-of
a playground.
TREC
Collection
recorded
his
University [8]. WordNet’s basic object by the lexical relations defined on them,
part
(is-a
is the by the
a child’s
in
system
the
noun
the lexical
synsets
relations
WordNet,
the
a
Cognitive
a set
all the
Also
include
(part-of)
into
contains swing.
at
is a set of strict synonyms, called which differ depending on part of
meronym/holonym
figure
in
colleagues
of approximately
ancestors
shown
antonymy, The
relations. and
is that
ten
descendants
one of the
senses,
The TREC collection is a test collection being produced as a result of the TREC and Tipster workshops [9]. The part of the collection used in this work consists of the approximately 742,000 documents on TREC disks one and two, queries 101-150, and the set of relevance judgments produced after the TREC-2 and Tipster-3 evaluations. The TREC documents consist of English prose obtained from a variety of sources including newspapers, abstracts of technical papers, and the Federal Register. There are some SGML-like tags in the documents to delineate the bibliographic parts of the document (document number, title/headline, author, etc.). Other tags that mark special punctuation in the body of a document were ignored in this work. The documents were indexed completely automatically using the standard SMART indexing routines [10] (i.e., tokenization, stop word removal, and stemming) to produce an inverted index of document vectors.
The text statement markers
(the
description that
is also
words
creator
enclosed
Figure
in
field.
(The
are related
version,
Summary
the
Summary
for
Statement
has
Narrative
field
the
Concepts
to the topic.
Summary
Statement
statement
The
document;
thinks
shorter
The
topic
brackets).
a relevant
topzc statement,
parlance,
Each
2.
angle
constitutes This
request.
Description
or, in TREC
in
of the statement
available.
search
query
as shown
of what
the
the
of a TREC
of need
topic
but
lists
version
always,
by
words
and
identical
describing
sentence to the
phrases
statement
sentence
2 is the
special detailed
of each topic
a single
in Figure
not
flagged
a particularly
usually
is usually
shown
is frequently,
field
natural-language
of fields
provides
A shorter
Statement,
the
is a complex a set
given
the
in the
Description
field.) For
this
containing synsets i.e.,
work,
I added
nouns
germane
that
emphasized
selecting
restrict
the
myself
that
the
the
topic
only
choice
the
for
is-a hierarchy (stimulants,
1The actual structure
be used topic. 2 provides
about
I added
demonstrated were
that
was governed
sets per in Figure
but
the
the efficacy
would
information
experiments
of drugs
synsets
of synsets synsets
to the topic statements: My goal in selecting
concepts
contains
is to investigate
‘pharmaceutical’, Early
important
to adding
6) synonym 122 shown
asks
topic.
that
selected
maximum Topic
field
to the
synset
the experiments Instead,
a new
used.
intoxicants,
bringing
the
synset
that
of the correct
contain
sedatives,
is not quite a hierarchy
some
One
by my
aspect
original
topic
the
an example cancer
however,
assuming
of the
full
topic
synsets
were
poorly
to pharmaceutical, are not
to
The
a child
of the
synset
when
synsets
with
{ drug} has children related
starting
for
to cancer-fighting.
sirkce a few synsets have more than one parent.
many
many
the
fact O,
a topic. never
{ drug}, very
of
(minimum
for
text
not
concepts. and
of 2.7
selected
market.
I did
one purpose
statment
of how
resolution,
word.
since
good
an average
drugs
is sense topic
I added
{pharmaceutical},
etc. ) that
word,
WordNet synsets topic was to pick
problem
original
query.
fighting worked
of the
relations
understanding
to expand
of hand-selected for a particular
of an ambiguous
of lexical-semantic
expansion
In addition
topic. sense
a list synsets
The
mentions
to
the
text.
children
different
in
types
I chose the more
63
l==
act
entity
human.activity
attribute
T
human_action
/
r
O J(X
I
“nrmirnate_obj ec
activity
attribute
change
physical_object
behavior
thing
a
I liveliness
diversion
motion
recreation
property
motion
article
movement
artefact
I
I
1
artifact
change_of_location
movement
8
I
I swing
sound~operty
instrumentality
swinging n,’ ante
rhythm
stroke
“1
music
device
danccroom_music ballroom_music golr_stroKe swing lilt
8~
swing
shot
basebaIl_swing
golf_shot
approach
swing
approach_shot
jive
drive
slice
mechanical_device
jazz
swing
I plaything toy
0
swing
/
playground puts
hook
0
trapeze
1. Relations
Figure
pharmaceutical
specific to topic
topics
contain
important
is a gap in WordNet;
disciplinary
measures
in version
the expanded
query,
swing in Word Net.
The
complete
list
of synsets
added
and {pharrnaceutica~.
concepts
for example,
are not
for the six senses of the noun
over-generalizing
{skin-cancer},
122 is {cancer},
Some synset
to avoid
defined
that
have
no corresponding
toxic waste, genetic 1.3 of WordNet.
synset.
engineering,
More
often,
Occasionally,
and sanctions
the important
the
meaning
concept
missing
economic
was a proper
SDI or Star Wars, for example, is an important concept for topics 101 and 102 but does not occur in WordNet. Nothing was added to the topic texts for concepts that lacked corresponding synsets in these experiments, although making some provision for them would improve retrieval performance. noun
2.3
or highly
technical
The
Expansion
term
that
one wouldn’t
expect
to be in Word
Net.
Procedure
Once the text of the topics is annotated with Selected fields of the original topic statements
synsets, the remainder of the processing is automatic. are indexed using the standard SMART routines. The
64
Domain:
Medical
Topic:
& Biological
RDT&E
of
New
Cancer
Fighting
the
research,
Description:
Document a new
will
report
on
anti-cancer
drug
developed
document
cancer
fighting
approval. type
report to
or
which the
on
market,
laboratory
cancer(s) of
will
drugs
The
of
properties
cancer,
2.
drug,
terms
evaluation
(RDT&E)
of
drug
nmst
phase
the
worldwide
for
designed
to
process
to
responsible
is
be
im
the
drug
counter,
of
government project,
and
the
bringing
new
marketing the
specific
chemicaljmedical
identified.
chemotherapy
derived
Given synonyms away
from
these
sections
set section
asynset, within
from
the
facilitate
synset the
each
that
maybe
followed.
and
may
contain
only
to the
words, a tag
stop
query. words
indicating
appended As
added
maybe pitch pitch
would
not
playground, Stems
model
by
ctypes) where
(called eleven
ctypes:
contained
within
original
query
ctypes.
Similarly,
The
similarity
weighted
the
ofgolj, vector.
Ifthesynset wouldbe
ctype
corresponds
query
query
portion that
thesynsets
lnc weights
suggested of times
are stemmed. to the
only one
for
the link
a given
of that of the
a synset
type
topic
of the
of
text
chain
are
into
their
component
The
word
stems
original
synset
plus
are then
stems
are
kept
separate
synonyms, half
1.
Ifthe
(child)
approach, one,
swing
[12];
using
that two
CYl X
the
device,
links
chip,
then
meaning
query different
extended
of different
A query
one each
of asymmetric
~
vectors, Qi to the other et al.
Figure
rnechanzcal,
relation. and
for
through
occurs
putt,
of subvectors
lexical
selected
=
in
ofhyponym
to length
of swing,
is comprised
(each
Q)
shown
drive,
are limited
D and an extended D and each of the query’s
by Buckley the term
hook,
vector
between
svirzg number
and
chap and plaything plaything,
query.
one for
isrelatedtoasynset
slice,
for any
is the one containing
the
to a different
asynset
. denotes the inner product of two the importance of ctype i relative t~ is the number
shot,
topic
then
to the
and
chains
ctype
the
set
chain
section
within
words
to the
of WordNet
a document
sum of the similarities
ofa
1 are broken
are related
stems
swing,
vector
terms,
is amemberof
between
where
when
is parameterized
parameter
inthesynset
contained
add synsets
procedure The
the
relations
each
aword
listed
gol~.stroke,
link,
added lexical
Each
noun
using
is invoked
—onecan words in
length
Figure
If hyponym
one
[11].
that
expansion
schemes.
remaining
consider
added for
different
original
the
which
stroke,
sin(D,
where reflects
procedure
vector or all
maximum
synonyms
containing
Fox
one for term
and
process,
query
trapeze
the
synset
All
through
be followed
through
introduced
expansion
The
etc. of these
as change.ojJocationin
synset
stems
the
may and
added
The
terms.
is the
the
type,
ateach
removed,
expansion
be added.
type
122.
toaddto aquery the zs-a hierarchy,
in WordNet
type.
such
query
then
of link
begins
relation
topic
be addedto
of query
terms”.
of a variety
single
as o~are
of the
to the
link
ofa
lexical
would any
query
included
A chain
original
traversed,
type
links
such
an example
synset
regardless
Collocations
the
to the
“original
effectiveness
relation
link added
are
statement
there is awidechoice ofwords the synset, or all descendants in
original
for
2. Topic
is reached.
comparing
specifies
toy,
and
world.
leukemia
synonym
and
the
conceptualization
company
the
drug
any
from
Figure
run
in
Concept(s):
1.
to
testing,
development,
anywhere
Narrative:
A relevant
the
Drugs
for
relation appears relations
vector the
vector
space
concept
types
potentially
other
has its own in both
of the
appears
query vector subvectors:
has
relation
types
ctype).
An
respective
in both
ctypes.
Q is computed
as the
D.Qi
i is the ith subvector of Q, and CYi, a real number, ctypes. Terms in documents vectors are weighted that
is, the
in the document
weight
and is then
of a term normalized
is set to 1.0+ by the square
ln(t~) root
65 of the sum of the squares of the weights in the vector (cosine normalization). Query terms are weighted using it~ the log term frequency factor above is multiplied by the term’s inverse document frequency, and the weights in the ct ype representing original query terms are normalized by the cosine factor. Weights in additional ctypes are normalized using the length computed for the original terms’ ctype. This normalization strategy allows the original query term weights to be unaffected by the expansion process and keeps the weights in each ctype comparable with one another.
3
Experiments
3.1
Topic
Full
Statement
The purpose of this investigation is to determine the efficacy of expanding a query by lexical-semantic relations. Given a set of concepts to be expanded, the effectiveness of an expanded run is dependent on the link types followed during the expansion and the relative weight given to each link type (the a’s in the similarity function above), so a variety of different schemes must be tested. Table 1 shows the 11point average precision value and percent difference over the unexpanded run for different combinations evaluated using the full topic statement (except the “Definitions” field) plus synsets. Four expansion strategies were tried: expansion by synonyms only, expansion by synonyms plus all descendants in the zsexpansion by synonyms plus parents and all descendants in the is-a hierarchy, and expansion a hierarchy, by synonyms plus any synset directly related to the given synset (i.e., a chain of length 1 for all link types). The a for the original terms subvector was usually greater than the a for the other subvectors to reflect the assumption that user-supplied terms are generally superior than automatically added ones. The runs in which the original terms a was less than or equal to another a tested this assumption. Clearly, the expansion is ineffective: none of the expansion strategies significantly improves the performance of the unexpanded query. Indeed, the difference in performance between an expanded and unexpanded run for individual queries is very small for most expanded runs. Individual query performance differs more for more aggressive expansion strategies (i.e., expanding using longer chains of links and weighting added terms more heavily) but across the set of queries the aggregate performance is worse for aggressively expanded queries. In an earlier set of experiments, the most effective expanded run was the one that expanded a query synset by any synset directly related to it and had a = .5 for all added subvectors [13]. While this combination is not optimal for these queries, it has the advantage of being a straight-forward choice of expansion parameters. Thus, this expansion strategy, which will be called the standard expansion strategy, is used for the experiments described in the next section. 3.2
Less
Detailed
Query
expansion
vocabularies.
the very
complete
original
derived the
using
queries
of queries: and
derived
derived
same
from
(17.56)
Figures run the
for 3 and
is the
case uses
and
same the
Summary
the the
topic
from each
full
topic
4 contain
unexpanded Statement
the
plus
Statement only version (35% level of effectiveness obtained queries
3.3
(3970
degradation
Automatic
another
the
same
vectors.
The
problems
TREC
derived
expansion
field;
of the
in the
queries
standard
exactly
query
the
query
terms
derived same
from
strategy.
query
shorter
derived
Concepts
Selection
by
due
to
versions
query
set was derived
of
set was
using
to expand
for
from version
the
full
of the
mean
new
queries,
no
only
as did
but
mean
the
1 l-point
average
number
different
Concepts
of
versions
(SmryCon),
of additional
terms
is the
time.
of the
additional
statement.
the
of the
plus
each
versions
with
topic
each
number
is expanded
run
contains
for
Statement
The
the two
expanded
table
queries.
The
terms
added.
Expansion
significantly
unexpanded
does
improves
The not
the
improvement in 1 l-point average precision). Note, however, that by the expanded Summary queries is less than the unexpanded
in the
caused
collection One
set of synsets
terms
Summary
only.
set of synsets
results as the
to original
from
(Summary)
retrieval
queries
different
(Full),
since
of the
topic,
Concepts
used
of additional
Statement
version
version
of the
ratio
Summary
the
versions
the
some
is unhelpful
statement.
lengths
mean
using
plus
to overcome
expansion
by a TREC
expanded
new
designed
that
provided
Statement Both
the full
2 compares terms
technique hypothesis
were
Summary
Statement. from
the
statement
statements
the
derived
Table original
To test
problem
topic
Summary
Statements
is a recall-enhancing
differing the
Topic
base
improve Summary
the overall full topic
precision).
of Synsets
Given that short queries have the potential to be significantly see if the potential can be realized by a completely automatic
improved by expansion, it is necessary to procedure. While it is possible to present
66 ave. Unexpanded Expansion
by synonyms
orig
terms
.1
.3614
1
.3
.3639
+1.5
1
.5
.3634
+1.3 +1.2
a list
is based
of the portance in more
query than
A series effectiveness.
.3617
.3 .3 .5 .5 .5 .8 .8 .8
.1 .3 .1 3 :; .1 .3 .5
.3639 .3635 .3635 .3637 .3622 .3614 .3612 .3603
a
plus
parents
synonyms
and
a
all
+0.9 +1.5 +1.4 +1.4 +1.4 +1.0 +0.8 +0.7 +0.5
descendants
descendants
a
parents
a
1
.1
.1
.1
.3617
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
.3 .3 .3 .5 .5 .5 .5 .5 .5 .8 .8 .8 .8 .8 .8 1
.1 .3 ..‘3 .1 .3 .3 .5 .5 .5 .1 .3 .3 .5 .5 .5 1
.1 .1 .3 .1 .1 .3 .1 .3 .5 .1 .1 .3 .1 .3 .5 1
.3640 .3639 .3647 .3639 .3638 .3646 .3624 .3628 .3627 .3622 .3617 .3614 .3605 .3605 .3609 .3511
+0.9 +1.5 +1.5 +1.7 +1.5 +1.5 +1.7 +1.0 +1.2 +1.1 +1.0 +0.9 +0.8 +0.5 +0.5 +0.6 -2.1
1
.3350
-6.6
2 a
plus
1 any
synonyms
directly
a
related other a
synset
1
.3
.1
.3629
+1.2
1
.3
.3630
+1.2
1
.5
.3 .1
.3624
+1.0
1
.5
.3
.3620
+0.9
1
.5
.5
.3608
+0.6
1
.3
.5
.3604
+0.5
1
1
1
.3491
-2.7
Table
1. Combinations
of candidate
synsets
and a poor a high-level
Using
the
of retrieval
that
same
runs
experiments
and
have
strategies
them
the synsets
reasoning number
is not two
of expansion
and relation
select
the
choice can be worse than not description of the algorithm
by the
N documents to at least The
a
1 1 1 1 1 1 1 1
is approximated
to be related
descendants .1
on the observation [14].
a
.1
terms
is a tedious process, Figure 5 provides rithm
.3629 all descendants
synonyms
by synonyms
orig
+0.8
1
terms
Expansion
with
a
plus
by synonyms
orig
users
.8
terms
Expansion
a
1
by synonyms
orig
~o change
only
synonyms
a
1 Expansion
prec.
.3586
queries
using
as is used
query
the above
tested
for
Sense terms
resolution
before
procedure
different
values
inverse
in which
choosing
[13]. to select
the
correct
synsets.
the correct
sense of zmportant
document
frequency
a query
term
occurs
is approximated
it is included
on the Summary of N: 70,000,
tested.
to expand,
expanding developed
need to represent
of documents
expanded.
original
ones
weights
in the
by requiring expanded
Statements approximately
The
tested 10%
algo-
concepts
weights — a term
synsets
[15],
im-
occurring a new
term
query. the procedure’s of the
collection,
67 0
‘1 n
—
2—
0.0
0.2
0.4
full unexpanded smr con unexpanded sm & con expanded
0.6
1 ;0
0.8
Recall
Figure
3. Effectiveness
of queries
derived
from
Summary
Statement
and Concept
fields.
0
0.0
0.4
0.2
0.6
1.0
0.8
Recall
Figure
and
35,000,
approximately
expanding
(all
shows
1 l-point
the
materially requirement correct
link
changes
Inspection
4. Effectiveness
of that
5%
types
average the the
collection;
treated
performance appear contained
Mean
different
in two
number
from
the
is not query
terms
ratio
2. Length
for
unexpanded
lists
in a short
1 and
obtained
resulted
Mean
Table
values of the
that
derived
identically):
precision
queries
a term
senses of words
of the
were
of queries
statistics
from
limits
on the
2; and
different
these
runs.
Summary automatic
a good
Summary
Statement.
lengths As
can
Statement
seldom
have
common
Full
SmryCon
to sense relatives.
Summary
29.22
11.02
.36
.77
1.71
versions
be seen,
procedure
52.54
for different
to follow
.3, .5, and none
when
.8. Table of the
3
runs
queries.
selection
approximation
of chains
a values:
of queries.
suggests
that
disambiguation.
Instead,
the words
the The that
68 for
(each query word w) { if (w not already expanded and document frequency of w < N ) { expand all synsets containing w producing
kin list of w
} } ~or (each relative in the set of kin lists) { if (relative occurs in more than 1 list) add relative to query vector
} Figure
5. Procedure
to automatically
select synonym
sets to expand.
appear in more than one list are likely to be fairly general terms with more than one sense themselves. For example, since collocations are split into their components during the expansion process, general nouns such as system tend to appear in multiple lists.
4 The
Conclusion experiments
little
benefit
it is not
doing
as relevance query
here
demonstrate
a user supplies
surprising
no means for
discussed
when
that
longer
a perfect
feedback expansion
queries
job
[16].
a detailed
expansion Since
benefit
of retrieval, The
that query.
success
are idiosyncratic
less than
and
they
of these
to the
by general query
shorter
queries.
can be improved other
methods
particular
lexical-semantic
expansion
query
However, by other
suggests in the
relations
is a recall-enhancing the longer expansion
that
context
the of the
provides technique,
queries
are by
techniques
such
most
useful
relations
particular
document
lexical-semantic
relations
collection. Nonetheless, have
the
users
potential
as a better
formulated
that
to select
is able
frequently
to improve
do not an initial
user-supplied appropriate
supply query,
query. concepts
a detailed though The
this
challenge
query. now
N=70,000;
queries max
chain
max chain a!=.s
Table selected
3. Effectiveness synsets.
of expansiou
prec.
procedure
‘?10change
.1627
-0.5
.1603
-1.9
.1543
-5.6
.1633
-0.1
.1557
-4.7
.1402
-14.2
cr =.3
.1636
+0.1
a=.5
.1635
+0.1
a=.8
.1639
+0.3
chain length=2 cl! =.3
.1645
+0.7
Q!= ..5
.1642
+0.5
~=.
.1617
-1.0
max
an automatic
length=2
0! =.5 ~=.8
N=35,000;
in finding
to be as effective
length=l
a=.8
N=35,000;
is unlikely
.1634
cY =.3 0! =,5 N=70,000;
lies
case,
query
to expand.
ave. Unexpanded
In this
expanded
chain
length=l
max
$
strategies
on Summary
Statement
qneries
when
expanding
automatically
69
References 1. David
C. Blair
Information
and M. E. Maron.
Processing
and
Full-text
Management,
information
retrieval:
26(3):437-447,
1990.
The retrieval 2. A. F. Smeaton and C. J. van Rijsbergen. Journal, 26:239-246, document retrieval system. Computer 3. C. T. Yu, C. Buckley, Information
and G. Salton.
Technology:
Research
A generalized
Further
effects of query
and clarification.
expansion
on a feedback
1983.
term dependency
and Development,
analysis
2:129-154,
model
in information
retrieval.
1983.
4. Helen J. Peat and Peter Willett. The limitations of term co-occurrence data for query expansion in of the Amerzcan Soczety for Information Science, 42(5):378-383, document retrieval systems. Journal 1991. 5. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, of the American Society for man. Indexing by latent semantic analysis. Journal 41(6):391-407, 1990.
and Richard Information
HarshSc2ence,
evaluation of indexing and text processing. In Gerard Salton, 6. G. Salton and M. E. Lesk. Computer Retrieval System: Experiments in Automatic Document Processing, pages 143– editor, The SMART Inc. Englewood Cliffs, New Jersey, 1971. 180. Prentice-Hall, Wang, James Vandendorpe, and Martha Evens. Relational 7. Yih-Chen thesauri in information Journal of the American Soczety for Information Sczence, 36(1):15-27, January 1985. trieval. 8. George Miller. cography,
3(4),
Special
Issue, WordNet:
An on-line
lexical
database.
Journal
of Lexi-
1990.
The first Text REtrieval Conference 9. Donna K. Harman. Processing and Management, November, 1992. Information 10. Chris Buckley. 686, Computer
International
re-
MD,
U.S.A,
4-6
Implementation of the SMART information retrieval system. Technical Science Department, Cornell University, Ithaca, New York, May 1985.
Report
85-
11. Edward
the Boolean and A. Fox. Extending Queries and Mu/tip/e Con cept Types. Microfilms, Ann Arbor, MI. P-norm
Vector
Space
PhD
thesis,
(TREC-1),
Rockville,
29(4):411-414,
Models
Cornell
1993.
of Information
University,
Retrieval
1983.
with
University
12. Chris Buckley, Gerard Salton, and James Allan. Automatic retrieval with locality information using of the First Text REtrieval Conference (TREC-l)J SMART. In D. K. Harman, editor, Proceedings pages 59–72. NIST Special Publication 500-207, March 1993. On expanding query vectors with lexically related 13. Ellen M. Voorhees. of the Second Text REtrieval Conference (TREC-2), editor, Proceedings
words. 1993.
In D. K. Harman, In press.
14. Ellen M. Voorhees and Yuan- Wang Hou. Vector expansion in a large collection. In D. K. Harman, of the First Text REtrteval Conference (TREC-1), pages 343–351. NIST Special editor, Proceedings Publication 500-207, March 1993. 15. Karen
Sparck
Journal
Jones.
A statistical
of Documentation,
interpretation March
28(1):11-21,
of term specificity 1972.
and its application
in retrieval.
16. Chris Buckley, James Allan, and Gerard Salton. Automatic routing and ad-hoc retrieval using of the Second Text REtrzeval Conference SMART: TREC 2. In D. K. Harman, editor, Proceedings {TREC.2),
1993.