the Moulton and Robinson theory, contrast it to rewrite rule based theories, and outline its general advantages as an ap- proach to natural language processing.
1984 TOC
Search DEPENDENCY PARSING FOR INFORMATION RETRIEVAL
D. P. Metzler T. Noreault L. Richey B. Heidorn Information Science Department University of Pittsburgh Pittsburgh, PA 15260 USA This paper describes the development of a parser based on the Moulton and Robinson (1981) dependency theory of syntax, and several strategies by which we are attempting to apply the outputs of this parser to the processes of Information Retrieval. We first discuss the limits of present Information Retrieval theory and the potential benefits of linguistic analysis for Information Retrieval. Next we briefly present the Moulton and Robinson theory, contrast it to rewrite rule based theories, and outline its general advantages as an approach to natural language processing. Next we describe the parser we have implemented based on the Moulton and Robinson theory, and some of the implementation issues we have addressed. Finally, we discuss several strategies by which this parser could be applied to Information Retrieval, and the problems involved in this application. i. INFORMATION RETRIEVAL:
THE LIMITS OF KEY WORD APPROACHES
Present Information Retrieval techniques are based on the matching of key words in a query with key words in a document or document representation.
Techniques such as probabilistic indexing or the use of
thesaurus relations are used to extend the range of useful relations between terms, but these techniques are all based on overall statistical relationships among the terms of the collection or the language as a whole.
These techniques do not offer any way of capturing the shifts in
meaning of a term as it is used in different contexts or the meanings of combinations of terms as they are used together. This inability to represent differences in how terms are used in a document or query places an upper bound on the performance of Information Retrieval systems.
In fact, an analysis of experimental results
shows that new techniques, while achieving statistically significant improvement in performance, ance
offer only slight gains in absolute perform-
(e.g., McGill et al., 1979).
We feel that the existing techniques
employed in Information Retrieval research have succeeded in effectively
£_
Metzler et al. : Dependency parsing
utilizing
314
the information available within the keyword approach
document representation. necessary
To gain substantial
to perform a deeper analysis
improvement,
to
it will be
of both the document and the query
in order to obtain a more precise match. Two obvious categories
of such potential
are syntactic and semantic analyses
"deeper analyses"
of the document and of the natural
language expression of the query, however there are difficulties of these approaches.
Semantic processing
such open ended applications
with each
is simply not yet feasible for
as Information Retrieval.
Syntactic
parsing has also not proven to be very useful for Information Retrieval. In part this has been due simply to the lack of a parser that is efficient enough to handle large amounts incomplete
or ungrammatical
of separate sentences.
of full text, or flexible enough to handle
strings,
Moreover,
or the relationships
it is quite difficult
among the words
to use the level
of detail and complexity provided by a standard parser without some means of extracting
generalizations
over the syntactic
and Robinson based parser promises
structures.
The Moulton
to address all of these issues.
2. THE MOULTON AND ROBINSON MODELS There are two essential aspects of the Moulton and Robinson theory.
There is a structural model which refers
underlying
conceptual relations
that are presumed
parsing and a processing model which refers
to the nature of the to result from syntactic
to the nature of the parsing
process itself. 2.1 Underlxing Representations:
The Structural Model
The Moulton and Robinson theory, which is a particularly version of syntactic dependency grammar, that underlying
conceptual
scope and dependency. specification other.
representations
(e.g., Hudson,
pure
1976) suggests
encode only two relations,
Scope, which is usually binary,
refers to the
of which words or concepts are immediately
related to each
In a phrase such as "fire engine dog," "fire" and "engine" are in
each other's scope, and together specify a meaning which only as a unit relates to "dog."
Scope,
structure describing
therefore,
specifies
the relationships
a nonordered hierarchical
among the words of the sentence.
Within a pair of words or constituents other's scope,
it is almost always
dominant role in determining that concept.
that are in each
the case that one element plays a
the pair meaning, while the other modifies
Thus, "fire engine" refers to an engine of a particular
Metzler et al." Dependency parsing sort.
Taken together,
engine dog." whole refers engine.
this compound modifies
315 "dog" in the phrase "fire
Since "dog" is dominant in this phrase,
the phrase as a
to a kind of a dog, rather than a kind of fire or a kind of
Each binary pair of concepts
marked with respect to dependency. are indicated with an asterisk.
that share each other's
scope are
In figure i., the dominant branches
Moulton and Robinson contend that with
an adequate reliance on semantics is enough to specify the meaning
and pragmatics of sentences
this very simple syntax
and larger units of
language. 2.2 The Processing Model The second aspect of the Moulton and Robinson theory concerns the nature of the syntactic processes which mediate between the underlying hierarchical
structures
and surface strings.
The essential nature
of the Moulton and Robinson model of parsing is that it minimizes role of external abstract rules applied to the linguistic necessary
(e.g., phrase grammar rules)
string,
to apply those rules.
extremely simple.
and the control structures The control structure
It substitutes
are more usually encoded as rules.
problems
that are that are
of the system is
for rule and control complexity
degree of data complexity which in effect encodes
high degree of modularity
a high
the relationships
that
This difference provides an extremely
and flexibility and generally avoids the
of rule ordering which have plagued other computational
formal approaches
the
and
to language parsing.
The data structures which carry the bulk of the computational mechanisms
in this model are individual modules.
Each module can be
thought of as a four sided entity in that it has four edges by which it can be linked to other modules.
Each edge contains codes which enable two
Figure i
i* [ fire
--
I*
l
I dog
engine
I
llHlllllflll II1
I
Metzler et al. : Dependency parsing
316
modules to be linked if and only if they have matching codes. modules have lexical codes representations)
(potentially
Lexical
links to their semantic
on their bottom edges.
The top edges of lexical modules,
like all edges of the structure modules,
contain abstract codes whose only
meaning is determined by the connections which they permit.
The sides of
lexical modules are blank. The majority accounted
of the complexity
for by abstract
abstract codes.
of syntactic structure
is
structure modules whose edges contain only
The work of the parser is reduced to finding a set of
structure modules
that can be connected
to the lexeme modules of an input
string such that no outside edges are left with unmatched obligatory connection codes.
(Edges may have optional codes or no codes at all.)
A few simple topological representations
rules allow for stretching unordered underlying
to map them onto the linear input string.
The underlying directly available abstracting determines
conceptual structure
of an input string is
from a completed structural model of the input by
the hierarchical
structure.
The branching
of the hierarchy
scope relations while dependency is determined by which
element of a scoped pair is linked to the higher constituents structure.
(See figure 2.)
Variation in the allowable
of the
structures
permitted by the existence of optional codes on the edges. It is important structures
to note that although
the underlying
of the Moulton and Robinson model are much sparser than
Figure 2 N2- 7, i0 N5 N5
N5 fire
N5 N2-7,10 N5 N5 N6 engine
N6 dog
is
Metzler et al. : Dependency parsing
317
conventional syntactic trees, the parser modules implicitly contain all the information required for conventional syntactic parsing,
and, in fact,
it is not difficult to derive conventional descriptions from the parser. One thus has available two levels of analysis: scope and dependency descriptions,
the Moulton and Robinson
and conventional constituent labels.
It is possible not only to move from one to the other as the needs of a language processing strategy dictate, but also to augment or alter the descriptions at one level according to the information supplied by the other level.
For instance, one might traverse scope and dependency trees
with checks based on the nature of particular constituents. 3. THE IMPLEMENTATION We have begun work on an implementation of this theory, based not only on parser
the work of Moulton and Robinson, but also on the SCRYP
(Gruenewald,
1981).
Like Gurenewald, we have adapted the Moulton
and Robinson processing model by writing all modules in terms of mandatory binary structures, optional parts.
rather than tertiary structures with
We are interested also in implementing explicit
heuristics of the sort that people use (Clark and Clark, 1977) not only because of our interest in cognitive simulation but also because we are concerned with developing a computationally feasible parser for practical applications. The parse is accomplished by searching for a way in which the lexical modules
that correspond to the string and some structural
modules can be connected in a hierarchical structure that preserves relationships of the original input.
the
(The underlying structure itself is
unordered, but its hierarchical structure is determined by the surface string order.) string.
First,
lexical modules are assigned to the words of the
Next, each of the pairs of lexical modules are connected in any
possible way that preserves
the word order of the original string.
Later
passes successively add these simple structures together in more and more complex ways, always constrained by the need to maintain the correct surface order of the original string.
Thus,
the model is basically a
bottom up parser, but combines certain advantages of both bottom up and top down parsers with some advantages that are not found in either of these two as they have been traditionally implemented.
For instance,
although the parser is basically bottom up, the fact that all possible pairwise ordered combinations of terms are explored in the first pass
Metzler e t al. : Dependency parsing
318
means that a high level relationship such as that between a subject and a main verb can be built immediately.
These high level relations would
then be available for confirmation by general semantic or pragmatic processes.
In a typical bottom up parser, such high level relationships
cannot be entertained until all the lower level structures have already been built.
In a typical top down parser, high level relationships are
hypothesized at an early stage, but these relationships are not tied to any data in the string until the lower level structures have been built. This type of parser is however subject to the combinatorial problem since there is the possibility of an exponentially increasing number of structures built on each pass unless there are strong constraints on the ways that structures are allowed to combine.
Two ways
to control this problem are (i) by locally restricting the ways that modules are permitted to combine, and (2) by using heuristic rules to: determine the order in which structures are built, restrict the number of structures built, and/or restrict the number of structures that are maintained for consideration.
It has been our experience so far that
as we increase the number of structural modules,
the increasing
specificity of the modules offsets to some degree the combinatorial increase.
Since we do not believe that this local restriction alone will
adequately constrain the combinatorial explosion we have begun work on the explicit use of heuristics to improve efficiency. We have so far partially implemented two of the "syntactic strategies" discussed by Clark and Clark
(1977) that relate to the
separation of noun phrase and prepositional phrase processing from that of the entire input string.
The explicit use of these strategies, with
a look ahead for the end of a constituent,
and a recursive call to the
parser to deal only with the local constituent, has resulted in large reductions in the number of structures generated and processing time, and, in fact, may roughly
reduce the problem to linear complexity.
In addition, we are starting to develop several other strategies which are basically computational in their nature, although they too can be related to psychological considerations.
One strategy,
for instance, would maintain a window of recently developed structures in which one would look for the terms of a new input string.
When a term
is found in this structure, modules would be tried first that could replicate the structures in which the terms was recently used.
Metzler e t al. : Dependency parsing
319
4. STRATEGIES FOR INFORMATION RETRIEVAL This parser promises to be useful for Information Retrieval for a variety of reasons.
First, its speed would allow for either
preprocessing of large amounts of full text or real time processing of full text surrounding retrieved key words for the purposes of estimating their relevance.
Second,
the formal simplicity of its output suggests a
variety of heuristic strategies
for estimating the relatedness between
the uses of a term in a text and in a query.
Third, the parser can handle
incomplete or ungrammatical strings. These factors, and the nature of the parser suggest a variety of information retrieval strategies.
Some of these make use only
of the dependency and scope relations of the Moulton and Robinson theory while others utilize some additional conventional syntactic information mediated by or made available through the dependency parser. Although it is expected that these strategies will prove useful in estimating the relevance of the use of terms in a query to the use of those terms in a text, thus improving precision,
it is
anticipated that some of these strategies will reject relevant documents. The empirical test of these costs and the comparison to the costs of other means of improving precision,
such as including additional query
terms, awaits the implementation of a relatively complete version of this paper. Some of the general approaches we have been exploring and illustrations of their implementation follow.
In general, we are planning
an Information Retrieval environment in which the user has available all the standard facilities,
(e.g., Boolean combinations of terms, stemming,
etc.), but in addition has the facility to utilize a limited set of basic natural language structures such as noun phrases
(including prepositional
phrases), simple sentences, and simple embedded clauses, relations between terms
to express the
(including adjectives and verbs).
Such complex
terms could themselves be treated as units by the conventional Information Retrieval processes. 4.1 Pattern Matching Our original hope was that the formal simplicity of the Moulton and Robinson model would
lend itself fairly directly to
relatively simple pattern matching. this approach.
|
Figure 3 illustrates
the power of
A query for "approximate string matching" would match
If
fill
I
IIIll II
Metzler et al." Dependency parsing
Figure
320
3
I approximate I string
approximate matching
matching
-"l
[ of
I approximate
strings
I*
I
matching
L I
I
matching
approximate
approximations
[ number
i of
I
matching
strings
Metzler e t al. : Dependency parsing
321
"approximate matching of strings" perfectly in terms of the scope and dependency relations among the three terms.
In addition, the string
"approximate matching" matches the query in terms of the scope and dependency relations among the two terms which are present.
Clearly,
this permits a more delicate specification of a query than does the simple use of Boolean operators.
Permitting the query to match text
with a subset of the query terms widens the recall of the query, as would adding an OR term, while insisting that the terms have the appropriate scope and dependency relations tends to restrict the matching to appropriate uses of the terms in the text.
"Matching approximations"
and "approximate number of matching strings" are two examples of text that do not match the query in terms of scope and dependency.
The latter
is particularly interesting in that the terms of the query and text appear in identical order. The implementation of this approach is not without difficulties, however.
These are due principly to the fact that the
relationship between two terms is not only dependent on their structural relationship, but also on the semantics of the other terms involved in the structure.
We are exploring the possibility that the structural
relations of terms in a query can be matched against the relations of those terms in a text utilizing a combination of general pattern matching procedures and special word and word class specific rules. The preposition "by" illustrates this point.
The parser
treats all prepositions as dominant over the noun phrase portion of prepositional phrases.
This can be useful, for instance, in
distinguishing between direct objects, which are the most dominant term under a transitive verb, and indirect objects, which are parsed as modifying a preposition.
However, in a passive construction,
the most
dominant term in the sentence structure becomes the prepositional phrase whose head is the word "by."
To identify the head noun of such a
construction as the head concept of the sentence, it is necessary either to attach special procedures matching procedures
to the word "by" or to modify the pattern
to look for reduced patterns consisting only of words
of a particular class, especially nouns.
We are currently investigating
both approaches. 4.2 Indices of Relatedness Rather than trying to match patterns directly, it is possible
Metzler e t al. : Dependency parsing
322
to derive summary estimates of the similarity of relatedness between pairs of query terms and pairs of terms in candidate documents.
A first
pass at such an index would assign a high value when a pair appears in each with the same dependency relation, a low value when the dependency is reversed
(e.g., "fire engines" vs. "engine fires"), and an intermediate
score when the dependency relation in one or both is not determined. One refinement of this approach would make use of the hierarchical distance between the terms in the two structures. 4.3 Weighting the Importance of Terms A variety of strategies involve using syntactic information to weight the importance of individual query terms found in the text.
One
particularly simple but promising strategy is to discount any noun term found to be dependent on another noun, unless the dominant noun is also a query term.
This strategy,
carried out within noun phrases, would
discount terms used only to modify other nouns, whether by noun noun modification or prepositional phrase.
A text which contained "fire
engine" but not "fire" would be unlikely to be directly relevant to a "fire" query.
Similarly, a text that contained "skyscrapers in Seattle,"
but not "Seattle" as a head of a noun phrase, would be unlikely to be related to a query which contained "Seattle" without "skyscrapers." This strategy can also be carried out on the sentential level.
At this level it has the effect of demanding that any query noun
be found as the dominant noun
(e.g., underlying subject) in the text,
unless the dominant noun of the sentence is also a query term.
The
assumption here, of course, is that a relevant document is likely to mention query terms as the topics of sentences. 4.4 Isolation of Key Portions of Text Variants of the previous approach isolate for consideration only the more important portions of text.
Such strategies might,
instance, ignore key words found in embedded clauses included embedded clauses) in a sentence.
for
(unless the query
or ignore any but the two most dominant nouns
(These nouns are typically the underlying subject and
direct object.) 4.5 Simple Sentences The previous strategies have focused essentially on the relations among nouns in a text.
Some variants of these approaches also
allow for the graceful use of adjectives since they can be considered in
II
!
Metzler e t al. : Dependency parsing
323
relation to the particular noun on which they are dependent.
The
dependency parser also allows one to utilize the relationship between two
(or more) nouns specified by a verb.
One might,
interested in retrieving possible examples Germany, without retrieving
the numerous
be
of attacks by Poland on
documents
The parser produces very similar underlying
for example,
concerning
dependency
the reverse.
structures
not
only for active and passive sentences, but also for active and passive nominalizations
of these concepts such as "Poland's attack on Germany"
or "the attack by Poland on Germany."
In each case, "Poland" is the
dominant content word of the construction, "Poland,"
and "Germany"
"attack" is dependent
is dependent on "attack."
with a Boolean combination
on
Rather than querying
of "Poland," "Germany" and "attack," which
would not specify the nature of the relationships
among these concepts,
one would query with the simple sentence "Poland attacks Germany." course verbs introduce new problems as this example may illustrate,
of synonymy and paraphrase;
Of
however,
these problems may be less severe than
those one may face in the absence of verb specification. 4.6 Cross Sentence Relations The Moulton and Robinson theory offers no structural analysis of connected discourse constituent
structure
pronoun reference.
in the sense of a hierarchical of large units of text.
It is, however,
description
Nor does it address
possible to overlay the descriptions
of individual sentences when they contain overlapping nouns. lapping structural descriptions
can provide dependency
These over-
information
concerning the relations between terms in separate sentences. intersentence relations,
of the
Although
relations may be harder to specify than intrasentence
they may nonetheless
prove useful,
for instance in conjunction
with strategy 4.2. REFERENCES Clark, H. & Clark, E. (1977). Psychology and Language. Harcourt Brace Jovanovich. Gruenewald, Hudson,
New York:
P. J. (1981). SCRYP, the syntax crystal parser: implementation. In Moulton and Robinson.
R. A. (1976). Arguments Chicago: University
for a Non-Transformational of Chicago Press.
a computer Grammar.
McGill, M., Koll, M. & Noreault, T. (1979). An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems. Final Report to the National Science Foundation, NSF-IST-78-10454.
Metzler e t
al. •
Dependency parsing
Moulton, J. & Robinson, G. (1981). The Organization of Language. York: Cambridge University Press.
324 New