based grammar, using a graph-unification-based representation language, using ...... advanced here in favor of a best-first enumeration order would lose force.
From: AAAI-86 Proceedings. Copyright ©1986, AAAI (www.aaai.org). All rights reserved. A PARSER FOR PORTABLE NL INTERFACES USING GRAPH-UNIFICATION-BASED GRAMMARS
Kent Wittenburg
and Computer
Microelectronics
may
Abstract This
paper
presents
of a parser
the reasoning
for the Lingo
the selection
and
that
uses
by
a best-first
flexibility
language
interfaces
on natural
in the selection a syntactically
allowing
grammars exhaustive
control
for these for
best-first
in particular enumeration
development.
at
of the parsing algorithm based grammar, using
a number
a
other
on an agenda.
processing
of
advantages
representation
of
managed
language
tuning
It
applications
parsing
of
this
languages
advantages
that
advantages
for
particular
choice
for
are outlined,
acrue
to
graph-
as well
this
approach
based greater enhance
this
short
greater
design
relatively
to new applications,
sophistication
required
but
also,
for interfaces
of the future.
choice,
the
next
step
for grammar representation and itself. The representation language
unification-based formalism, (1984) and Shieber (1984). languages have had a
tremendous
computational
and
linguistics,
for
constrained domains, offer a modularity in design that in the long run. Not only does the
applications
first
run
highly
transportability the
to knowledge-based Given
the in
grammars payoffs
possible
formalism grammar
in
systems
modularity it makes
domains while at the same time allowing of the search space during grammar
Efficiency
unification-based
structure
natural
offer
syntactically will achieve
design
graph-unification-based representation language, using Combinatory Categorial Grammars, and adopting a one-to-many mapping from syntactic bracketings to semantic representations in certain cases. The algorithm chosen is a variant of chart parsing offers
Corporation
unsophisticated
behind
project
MCC. The major factors were the choices of having
Technology
was
choose
to
a
an approach to the chosen was a graph-
in particular, one based on Karttunen Graph-unification-based representation impact
in fact
have
in been
the
field
of
incorporated
in
one form or another into a number of influential linguistic theories (see Shieber 1986 for discussion). Among the many advantages of
as
a
by
requiring
graph-unification-based
formalism
no special
training
for
are
(i)
grammar
it
is
writers
easy
with
to
use,
linguistics
virtue of its use of an agenda as a control structure. We also mention two useful refinements to the basic best-first chart parsing algorithm that have been implemented in the Lingo
backgrounds; (ii) it is a language separable from any particular machine-dependent implementation and thus amenable to optimizations at many levels; (iii) it avoids the typical explosion of
project.
ad
hoc
procedural
declarative variety
1. Introduction In designing the
first
used
a portable
crucial
in the
Existing there
NL
those
to
(NL)
can
(Martin,
a semantically
interface,
require
it be syntactically
of the
one
based,
along general
Appelt,
and
Pereira
based
grammar
lines:
grammar 1983),
particular
of
versus to
grammatical
other
things
to generate
Our
based.
these
the
choice
Categorial Though this
for
an
untested
approach
to
(iii)
order
partially
grammar domain. offer the
must The
advantage
for
new
each
and
be rewritten syntactically domain
and
Robustness
in
of parsing
systems which, along to separately; though initial design entwined with One
of the
language grammar.
the
able
to avoid
achieving
first
rehacking
a greater
syntactic
in the
the
level
variations
is a component
grammar
of generality of
the
decisions was
in the
to go with
It was felt that,
although
come
MCC
for
Lingo
a general,
grammar,
and the
being
flexible, (v)
it is order
same
a purely
accommodating free,
grammar
a which
rules
can
be
input.
and
extendable dealing
base,
was
was
1982,
applications particularly
Combinatory
Steedman
1985).
to date,
we felt
promising
to empty
it
lexical
relative
in
the
it handles English extraction phenomena clauses, etc.) efficiently and elegantly
offers
rewrite
rules
a method
free
word (iv)
it
ambiguity
it is particularly suitable designs; and (vi) it is able rule
grammar
or complicated
(ii) it accounts for more of English approaches without special rules
formalism; with
the
Steedman
language
grammar
resorting
passing schemes; than alternative
to and
of
accounting
order
with
suggests
and
heuristic
for left-to-right, to accomplish
feature
coordination or ad hoc for
free
a simple
and
word easily
natural
techniques
for
rule
preferencing;
(v)
incremental all this with
processing a very small
to the alternatives.
syntactically-based
with semantics generally, must be attended robustness is obviously not precluded by this
choice, it does not the grammar itself.
interfaces
the
from scratch, in general, for each new based grammars, on the other hand,
of being
sophistication
and
(Ades
respects: (i) relative (wh-questions,
operations;
coverage
the very
that
approach
following
suffer
syntactic
is
as to analyze.
in natural
generally
of
means
Grammar
withough
patchiness
in it
theories;
as well
domain, e.g., Plume (Hayes, Andersen, amd Safier 1985). The semantically based grammars offer customization of the entire robustness of parsing along domain system for the domain; However, they sensitive lines is an advantage usually cited. from
(iv)
of
grammar
or semantically
be classified
use a syntactically
TEAM
use
language
is whether
systems
that
e.g., that
that
interface
are those
English,
natural
decisions
system
of
among used
operations
language;
free
since
project
it
is
not
Given the
the
subject
three of this
design
decisions
paper,
parser for NL interface representation language
just
namely, applications have just
we
mentioned,
the
selection
using the mentioned.
we now come and
design
grammars
and
to
of a the
on natural
syntactically-based
semantically-based
grammars
NATURAL
LANGUAGE
/
1053
2. Charts First
let
criteria,
us
ask
independently following
of
what the
desiderata
we
choices
should
should
expect
related
to
speak
from
the
the
parser The
grammar.
for themselves:
structure
makes
it
possible to develop sophisticated grammar development that the state of a parse can be examined at any point.
the
tools
so
agendas
is
Another
-
presence
point
its relation A formally sound ensure termination,
l
basis for the completeness,
algorithm in order and tractability.
to
for
debugging The
l
facilities
ability
domains systems The
l
development
grammar
inspection
and
to tune
particular
right-left,
or
language
input.
grammars
potential
to
into
integrate
for
and
for
purposes
to any
heuristic
parsing
middle-out in
in
this
the
area
effect
paths
with
if necessary.
algorithm
able
the
respect
VS.
Partial
and
an
of and
can
be
attention
can
be
time,
a property
at any
agenda
processing
of
it is hard
any
heuristics
to suspend
to control
natural completely
agenda,
possibility
Thus
with
is
as appropriate
of the chart
of being
left-right,
processing
respect
on
as long
with
proceeds
In fact, can be designed. any mix of these orders;
the
later
chart
the control
ordering
arbitrary
promising
general
of
of
can be achieved
produces
processing
contextual
the
version
the
control
about
of whether
Enumeration
operations to allow
which less
factors
by
directed
applied
making
some
designs
agenda designed
in particular
efficient
semantics
preferencing
these
steps.
of parsing
such that prototypes can be developed.
factors
maximize
that
worth
to questions
controlled Modes
l
of an inspectable
on
resuming
such
to imagine
a more
decisions.
tuning. A
l
design
credit
that
individual
principle
allows
schemes
performance
to
maximum
place chart
for
adaptation
to
flexibility
and
formal
future
adaptation
to
soundness,
the
extolled
by
Kay
most
and
others
and
pieces
of evidence
One
of the
their have
widespread adoption. Charts, figured prominently in important
persuasive
from the computer natural language
science
systems,
e.g., Slocum
not
reason
obvious
some variant of of charts have be repeated
here.
in favor
e.g.,
Earley
is that
(1970),
in
e.g., Kaplan (1973), Bear and (1981), Ford, Bresnan, and Kaplan Patil (1981), Shieber (1985), and in (1981).
through
whatever
reason,
in
published
most
designs, have
despite
written
that
of the
possibility
order
in
driven
chart
applied
However,
with
fact
for
existing
a Wizard-of-Oz
operations agenda
to
the
(1973)
to control
ways.
other
control
Kaplan
the
Research
hand,
seems
to
into have
concerns (e.g., Ford, by the need to develop
the
needs
of NL interface
Retaining
shortly the
without
maximum having
grammars of ‘the
agenda
are (i) as a means
development,
an agenda
The advantages of having a parser go well beyond to scrap
is a major relative
structure. of integrating
flexibility
ease Other
an agenda matters of for
or radically
advantage. of adding possible
semantics
and
later alter
We will meta-level uses of an contextual
And,
to
complete
our
list
/ ENGINEERING
It seems safe to say that no added to the parser, and Martin,
are
interface
Our
choices
relating
approach
to
add
matter Church,
the
representation
language
what and
and
further weight to this argument It is a commonly acknowledged
enumeration.
unification of mostly because
graphs tends to be expensive unification is destructive of the
the
against fact that
computationally, graphs on which
it
1984). The expense is particularly is called (see, e.g., Karttunen evident in chart parsing because, by definition, one needs to retain edge
graphs
create
in
new
their
edge
original
graphs
state
that
before
reflect
unification
the
result
originals. Optimizations of the unification been suggested as a means for coping with 1985;
Karttunen
and
obviously
welcome,
calls
unification
to
As
While
place
to cut
in
detailed
interpretations is that
rule
firings
that which
place.*
(1985), extraction is the
identical
lead
to
optimizations
are the
complete
a
consequence certain
potential
for
Another
through
identical
Grammars
also
is inappropriate.
and
content.
paths the
as the
is to minimize Avoiding
Categorial
multiple
unifying
line.
in parsing
have are
such costs
enumeration
grammars
as well
of
algorithm itself have this problem (Pereira
Combinatory
Wittenburg
these
there
first
in this
handling
in
this
of
in for
the
step
exhaustive
mechanism coordination
paths,
1985).
properties that
is
Kay
another
is a first
Certain suggest
the
solution.
of
the
kinds
of
ambiguous way
to say
search space of The obvious
method to use in such a domain avoids enumeration of all since there is no obvious reason to do so. This general plan
* It is relevant
has been
1054
Patil
application.
search
of
and
language
improvements
others.
Church,
exhaustive enumeration in the face of such data lead to satisfactory performance for a natural
by
among
Martin,
Patil added many, will probably not
unification
(1980),
experiment,
coverage
for certain kinds a corpus collected
product.
optimizations
factors into the scoring; (ii) as a framework for credit assignment schemes to automate adaptation to individual performance situations; and (iii) as the control mechanism for parallel processing of parsing steps, a natural use of agendas pointed out Kay
of extensive
each
enumeration
to a parser
example
an agenda
flexible
on
also
systems.
code or existing
see an
and
by psycholinguistic 1982), not specifically
efficiency.
adjustments
breadth-first
(1980)
of using
maximally
is in fact the key ingredient. as a control mechanism for increased
exhaustive Kay
parsers,
been mainly driven Bresnan, and Kaplan efficient
seems
work
the
enumeration agenda
there
chart parsing in the attention in the more in chart parsing For to be an association of chart parsing
grammars
found 958 parses for the sentence In as much as allocating is u tough job I would like to have the total costs related to
grammar
despite the popularity of there has been relatively little quarters to the role of agendas
based
amounts of syntactic ambiguity For example, working with
(1981) costs
exhaustive However, literature, theoretical
syntactically
admit staggering of constructions.
of charts is or something very similar, theoretical work on parsing
perspective,
research, Karttunen (1979), Thompson (1982), Martin, Church, and applied
will
most
Enumeration
While many chart-parsing algorithms use control schemes that exhaustively enumerate the search space, there are a number of reasons why exhaustive enumeration is unsuitable for NL interface applications given the choices mentioned. The most obvious
situations.
to begin in constructing a parser is with parsing (Kay 1980). The many advantages
been
2.1. Exhaustive
incorporating
automate
A design that does not preclude parallel processing schemes.
l
For
in
assignment
to report,
using
in parsing
(at least
based
structure
on unpublished
sharing
performance.
temporarily)
shelved
work
by David
Wroblewski,
not yield the expected may In fact, unification with structure in the Lingo
project.
that
dramatic sharing
has
been
with
embraced
in the
theoretical
formal
devices
utilized
the
Grammar.
It
strategies which
has
been
context;
has no reason There
is
one that
so
as
interpretation. by Church
in
Among
parse,
mapping in the
avoiding
one
and
Marcus,
assume grammar
that
that are
and
For more
if we
generating all reason we can than
the
the
of
reached
from syntactic bracketings case of extraposition from
once we the same
mentioned
by
one
only
member
one
l
there
Despite
all these
to noun
have
one
phrase
interpretation
or
that
against
there
are
a grammar
of
happens
have
such
for
l
under
toggled
such
between
arbitrary
points
controlled
chart
it need
not
involve
The
to be explored. during
be
It is very
circumstances.
exhaustive in between, parsing
Thus
and
The
is the
offers
this
sort
we can
that
important
to so
that
ask
On
The
can
be
as well
as
for.
Agenda
first
algorithm
(Earley
ordering
schemes
associated
for exhaustive string during itself
is ill-suited
have
been
to
get
some
partial
(Slocum
for
anything
along
with
with
chart
parsing,
efforts Given
distinguish,
then,
suggestions
in the
with
our
what with
other
breadth-
in such
breadth-first
motivation
to
be
for heuristically some
form
for
there a way
proposed ordering
of breadth-first
as
ordering keeping
rule search should
from
other
in Nilsson
of
all,
sense
has,
of
Nilsson
under
of merging
i.e.,
certain
nodes
alternate
the
paths
are
in the
through
of little
search
the
importance;
algorithm
is not
of the
algorithm,
optimality
the
of the total number of graph that are expanded,
a
nodes will
in the have a
on efficiency.
actions
(1980).
appearing
on a chart
can be looked
at
to take.**
We
chart parsing, then, graph-searching procedure
now
turn
3. The Best-First
to an overview
allow a presented
of the
algorithm
Algorithm
but
that
could
purpose
here is not to present the best-first chart parsing Readers may consult the original sources or in depth. (1986b) for detail in this respect. What I wish to do is certain features of the algorithm that help achieve the
outlined
The
parser
reflection Nilsson, two
above.
of and
enumeration
this
importance
project
Raphael
1968)
has of
been the
dubbed
A*
in its design.
previously
published
Astro,
search The
chart
which
algorithm algorithm
parsing
is a (Hart,
we use is
algorithms
respects.
or unary
We assume a grammar whose rules have only daughters. Such a simplification is a consequence
in
binary of the
** but
to
for the than
Intuitively,
continue
first
the
itself.
been
just
parse,
of
These characteristics of simplification of the general
a collection What is of could return
best
effect
solution
hand,
bearing
less general
(e.g., Robinson 1982, Heidorn 1982, Slocum 1984). maximum benefit for our purposes is a design that the
graph
can be irrevocable,
database
of
to the
other
possible
goals
respect to some mechanism by
even restricted breadth-first We of the alternatives. about
as
set of edges
algorithm Wittenburg highlight
is designed
enumeration,
grammars basically
additional
is
literature
achieved
exhaustive
at partitioning
enumeration
1984).
grammar with such a control
but
firings to a minimum, however, is less desirable than some
of parses
1970),
enumeration of the parsing. Although
or
of flexibility.
Order
Earley
in
scheme
implicit
lengths
graph
the
The
My 2.2. Enumeration
as a standard
graph).
search algorithm; the as the closed nodes in a best-first other data structure we need is one to keep track of open nodes, which can be viewed as an agenda of
exhaustive
enumeration,
best
is of
backtracking.
itself the
relative
strong
development
a parser
minimal
search
is
a
enumeration Assuming
grammar
chart
control
i.e., a measure complete search
fails in an application (whether will probably arise in which all
an event
Best-first
is
choice
those
to
using
mode.
in general
attractive
search.
commutative
the
in which,
exhaustive
development
when parsing circumstances
&ill
using
is
thus
thus admissability strong factor.
An example
attachment
arguments
a more
scheme.
to an and/or
system
search
within
that undesirable rule interactions can be eliminated. Also it is important to take note of the performance characteristics of the parser
l
analysis
more
path.
that
graph
(1980);
l
arguments
space
be
is a depth-first,
of charts
search graph appropriately; thus, there may be no need to check whether newly added nodes have already been generated in the search graph.
one. The is different
syntactic
by a different
deems
to continue
assuming
some
backtracking,
can be represented
conditions,
would yield the full set of attachments of view while “low” attachment would
mode,
anticipated
are
to
l
set.
there will be times “hard” or “soft”), the search
may
of prepositional
semantic
in an application enumeration in
We
processing
order
a strength
control
of heuristic search
The
for
interpretation
no reason
have reached the first solution in these cases
are reachable
of the first
a semantic
we have
assigned
“high” attachment the semantic point
yield
reach
above.
path,
that
be a treatment
say, from
to then
interpretations
interpretations would
able
one path,
paths reach
reason
set
are
than
semantic
breadth-first
since
require
(as opposed
Fleck
are advanced
to
but
“best-first”
The
l
Rich et al. (1986) g ive an account of how such ideas can be extended to a variety of syntactic constructions and to the semantic representation itself. Given this picture, we again have a case of a search domain in which it is inappropriate to generate all through
do not
perspective
phrases.
paths.
subsequent
alternatives design,
they
semantic in work
Hindle,
arguments
if
parses
course a well-known paradigm in the A.I. literature. A number of interesting observations can be made about chart parsing from the
exhaustive
Let us by the
with respect to previously surfaced
various
the
backtracking
a given
a so-called
admitted
(1983),
(1986a)
fire
other
heuristic
one can.
be ambiguous general idea has
Pereira
enumerate it necessary.
to control
to a successful
analyses
one-to-many interpretations
semantic
forward
use
in order to
syntactic to This
dealing Categorial
should
appropriate
motivation for be mentioned.
(1980),
a
one
grammars
further should
In Wittenburg
(1983).
most
as one proceeds
individual
designed
are
literature
Combinatory
that
these
to fire all the rules
enumeration certain
with
rules
as long
using
suggested
in processing grammar
linguistics in
exercised; which
viewed
closed open
have
not
as closed
yet
nodes
discussed
here,
successors
of that
nodes
nodes
an edge edge
in a heuristic
are options been
must
have
The
exercised. be understood
is placed
search
which
on the
are options
generated
observation
in light chart
algorithm been
that
of the fact
concurrent
with
during chart
that the
that
edges
in the
have
the search can
be
algorithm
expansion
of all
on the agenda.
NATURAL
LANGUAGE
!
1055
grammars
being
inherent
used
restriction
parsing algorithms right hand sides technique
the
to
MCC
chart
generalize arbitrary
Lingo
to
Earley
(1970).
the
algorithm
already binary general
Most that using
grammars length by
by
Categorial
project,
parsing.
for complicating
Combinatory
Grammars;
an
Lingo
project;
published chart have rules with the dotted rule
and
there
are no penalties,
However, in this
the effect
not
there
way
of dotted
The the
The
second
to
formal
operations
point
previously
about
this
published such
as
chart
ones
parsing
is that
is in
transformations
algorithm
it or
does
with
not
register
maintence
aspects
upon
follows
the basic
permit
setting
based
on
heuristic
function
to
Developing trial, and
the error,
and
the
start
as a basis
of Nilsson
for the
order
*agenda*
2. Initialize
*chart*.
3. LOOP:
if *agenda*
as
main
loop;
of the
the
best
action
5. If *best-edge*
from
the
and apply edge added
remove
it
the
terminating
conditions,
is non-nil,
the set of M on the
generate
Install
the members
a heuristic
ordering
A4 of
edge
successors
generate and
also
step,
in chart
which
see
if
a major
achieves formalisms appears
initialization.
between checking to
We make a rule
call
can be
in step is
6 of
chart,
leading
while actually applying highest ranked action
to The
to augmentations
a rule on the
on the
agenda
call is done only when invoking agenda. Given this distinction,
only, the we
can make make use of an optimized test function for checking edge leaving actual unification with its graphs for rule application, Since the latter destructive effects to applying a rule call. operation is invoked much less often than the generate successors step, there are unification-based themselves
for
major savings grammars, the
test
in at
function.
unification costs. For graph least three options present First
is
Shieber’s
notion
of
restricted unification (Shieber 1985). Second, Karttunen (1986) has suggested a scheme where the destructive effects of unification can be undone. Last, a more “porous”, but more efficient check-graphs function could be used that is nondestructive of its The last of these options is the one used in the graph arguments.
1056
node
heuristic expansion.
some
general
factors
that
are a
function. any rule
to the
call is the span
chart
if the
rule
fires
from the In general,
be favored.***
content
in the a role. lexical
for designing that
word
others
heuristics
tend
corpora. and
take will
in a rule
advantage
be
of
statistically
domains
of discourse.
to become
experience Automated
gathering
involved
that
senses
in certain
will
through
and
edges
Differentiating among the sense assignments is one
heuristics
some
than
such
make
/ ENGINEERING
mention
parsing
here
algorithm.
much
more
refined
with particular grammars, techniques for statistical
heuristic
refinement
are
always
a
used
the generate
to optimize
the
grammar
in this
basic for
way
successors
also
allows
step.
The
for further
partitioning
heuristic
of
control
actions.
4.1. Avoiding Many chart grammars
of two useful refinements to the The first introduces a technique
for unary rules more restrictive, thus **a** Also we useless unary rule applications. to partition the rules of the grammar that is
environments
avoiding ultimately review a technique
of the parser’s
a critical likely
and actually applying a rule to a set of edges. operation is a part of generating the successors of a new
on the
given
a
4. Refinements
chart
Generating Successors that The critical feature of this algorithm efficiency advantage for graph-unification-based
succeed checking
a
algorithms
of
scheme.
3.2.
loop,
of
all
choice
its
*agenda*,
making
the main
the
in scoring
should
sophisticated
We
distinction
as with
is
possibility.
*best-edge*
in the
factor
likely
lexicons, information
exit
7. Go LOOP.
found
once they
that are more likely to lead to success should higher intrinsic scores, and play a role in **** a rule call.
fact
Naturally, and
satisfies
successors. following
we mention
to be added
linguistic
more
the action, Set *bestto the *chart*, if any;
algorithm,
search,
a heuristic
edge
method
successfully. 6. If
Here
spans
The
failure.
*agenda*,
successfully
4.
successors
call should also play scores of ambiguous
from the *agenda*, edge* to the new else, NIL.
to apply
because
in performance,
The rules be given
l
it
(1980:64).
exit with
our algorithm
downgrade
successfully. This span can be computed spans of the daughter edges in the rule call.
the 4. Select
the
An important
l
l
is empty,
given
right set of heuristics is the product of intuition, and depends upon the particulars of the grammar
domain.
scoring 1. Initialize
6 fail
in step
graph
for designing
of the system.
suffices
organization
in step
iteration
choice a slight
3.3. The heuristic function Critical to the success of this
longer 3.1. The Main Loop The following procedure
except
generated
are chosen
for backtracking. In was done by Kaplan (1973), nor is it designed our experience, these simplifications allow more flexibility in some of the agenda
calls
the best
using
rules
remainder of the grammar consists of unary rules, effect of altering the complex categories in some
respect
if rule
it seems
is no
when
achieved in the categories of such grammars. The rules that are needed are a small, fixed number of combination rules, operating over these complex
categories. which have way.
of
introduced
motivation fact only very
in
due
useless parsing
make
unary rule proposals algorithms developed
for
context-free
use of relations
in the grammar which can prune collecting this sort of (1980) mentions
the total search space. reachability information Calls to unary actions.
Kay into tables which help control parsing rule productions are an obvious candidate
to
from the search space for Combinatory It is a general fact about unary rules that
attempt
to
Categorial
prune
Grammars.
*et Note unary
that
this
scheme
by itself
will tend
to favor
binary
rule
applications
over
recover
from
rule applications.
**** Among ungrammatical
the input.
rules
may
These,
be
some
in general,
whose
will receive
function a lower
is
to
priority.
***** agenda
This
augmentation
items
involving
also unary
helps
rule calls.
to establish
a heuristic
See Wittenburg
scoring
(1986b)
technique
for details.
for
they
tend
side
of
to be overly the
usefulness call
succeed
often
side. effect
a
out
rule
an
now
typically, by
to combine
edges
like
this
of
with
in chart
operations
more
under
consideration
be taken
the
itself
right-handthe
grammar
ultimate
Even when a unary edge to the chart,
additional
to be unable
subsequent
must
i.e.,
measure
application.
superfluous
of making edge
poor
in adding
turns
Adding
new
promiscuous,
is
of a unary
can
edge
rule
edges
Thus
rule that
on either
parsing
has
expensive
the
since
unary
rules. we
follows:
A relation call
node
exists
some
branching exhaustive sisters shown.
the
that
has
been
e&ended
sister
A is an extended
sister
node
(Y that
dominator
dominator
of B. In the
of A are
shown
of node
of A
as all
and
B if and
only
if there
p where
cr is a non-
@ is a non-branching
derivation
B’s that
tree
can
stand
below,
extended
in the
relation
/\/\ P I =
Q I =
P I =
I B
I A
I B
different
weights
the checking
to partitions
of subsets
as a whole.
of rules
that
are less
likely or perhaps whose checking is more expensive than other subsets. Also, it is possible to define a single check for an entire grammar
partition
consideration
that
in one fell swoop
of any of those
rules
can eliminate
the further
for the edge sets in question.
this
to be useful in this It is defined as
of node
is a sister
exhaustive
found relation.
assigning
in all generate
What is successors procedures involving any adjacent edges. called for then is the computation of some grammar relation that can be used to further restrict the conditions for applications of regard
by
we can postpone
5. Evaluation Since
the
critical
makeup
role
difficult
and further
in
of the
a parser
to evaluate
when chart
heuristic based
and different some of the
grammars arguments
enumeration
order application,
language,
Combinatory
heuristic
itself
plays
graph
search,
of the design
non-heuristically use different
than the advanced
would
interface
function
on
the efficiency
compared to other parsing systems that
study a
it
is
in the general
case
based designs. For representation languages
ones used here in
lose
such
force.
in the Lingo project, favor of a best-first
However,
a graph-unification-based Categorial Grammars,
given
an
NL
representation a many-to-one
and
mapping from syntactic bracketings to semantic representations, it is hard to imagine that any radically different alternative could beat the parsing program we have outlined here on the grounds of efficiency, indicates first
clarity, and flexibility. Experience in the Lingo an overwhelming performance improvement with
parsing
design
breadth-first
when
design
compared
that
to a non-selective, in
figured
our
original
and
refinement
project a best-
bottom-up, bootstrapping
operation. The
extended
precompute
sister
a grammar
the left-hand-sides must
of
(1986b),
that
unary
successors match
hand-side
is used
table
of each
unary-rule-call edge
relation
one
unary
rule.
a consequence
holds
rule.
of this
following
the
extended
is that
the
sisters
some
unary
4.2. Mets An
rule
calls
Agenda
additional
which
involve
for
sisters out in
for
adjacent
of the leftWittenburg
generate
successors
step now has to consider as successors of a given edge those unary rule calls which apply to the edge directly, those
We
condition
is now that
of the extended As pointed move
way.
An additional
of a new edge
at least
that
in the
not but
the edge as an extended
just also
sister.
Items
augmentation
that
we have
implemented
involves
adding
Current
research methods
as
changes
involving
the
tuning
of grammars
of the research
inspectable includes
buffering focused
approximations
the
rules
in
in
the
grammar,
It is also possible
that generates these basic agenda these rule calls. This augmentation iteration
of checking
the
edges
into n steps corresponding rules into n cells.
weeding
to define
out
a meta
all
rules
agenda
item
items, i.e., that itself is a way of breaking
with
respect
to a partitioning
to the whole
could
lead
[i, P, E’], cell of the new
edge
added.
where
i is a heuristic
grammar and
its
Exercising
rules, adjacent
rule
calls
added
edges
an agenda
the edges in E’ with respect only. Any rules which then to
the
score
item
at
effect that in the items of this new will have the form P is a partition
assignment,
E’ is a triple
and
the
of this
time form
that
represents
the
new
involves
edge
the was
checking
to the grammar rules of partition pass these further tests will result agenda
of
a
type
that
we
to
of
We are using
particular
corpora
of heuristic further
design
a set of tools which
take
of the chart. designing credit
for the
advantage Longer-term assignment
to specific performance situations. of incorporating some form of
agenda structure on local areas
so that attention of the chart.
can be Initial
that such a factor would improve the which tends to be best when considered in than globally. Also, left-to-right buffering
cascading an
of real-time
work
reported
design important
feeding step
into semantic and towards any future
parsing.
also
Rich
made
offered
Wroblewski including
graph
unification
also
wish
and
advice
the
Lingo
extensive
valuable
David work, interface
on here I gratefully
contributions
has also the coding and
to the
the
parser
to thank
involved many others at MCC acknowledge the contribution of In particular, team to this work.
comments
Lauri
on parsing
on
to the research earlier
drafts
of
itself this
and
paper.
made significant contributions to this of structure-sharing algorithms for
design
and
and
coding
grammar
Karttunen
of the window
development for
system
environment.
his long-standing
support
matters.
P in
References
assumed
previously.
1. Ades, Words.
The advantages of incorporating are that it is possible to heuristically
a
besides the author. other members of Elaine
Adding this sort of agenda item has the generate successors step, we produce agenda type at the first pass. These new agenda items
consideration
Acknowledgements The
grammar
for
components,
generates apart the
of the set of grammar
a
control structure the possibility of
indicates experience reliability of heuristics, a local fashion rather
against
all
as
agenda.
within the incrementally
pragmatic
fail the test.
evaluation
well
schemes to automatically adapt We also count the possibility
a new type of agenda item. In the basic best-first algorithm, agenda items are made up of rule calls over a set of edges. These agenda items are generated by checking these edges which
involves
scoring
A.
and
M.
Linguistics
Steedman.
1982.
and Philosophy
On
the
Order
of
4: 517-558.
this new type of agenda item order the iteration over the
NATURAL
LANGUAGE
/
1057
I
2. Bear,
J.,
Phrase
and
L.
Karttunen.
Structure
Parser.
1979. Texas
PSG:
A
Linguistic
17. Nilsson,
Simple
Forum
15:
Palo
N.
Alto,
1980. Ca.:
Principles
of Artificial
Intelligence.
Tioga.
l-46. 18. Pereira, 3. Church,
K.
1980.
On Memory MIT Processing.
Language [Available Club.]
through
4. Earley,
J.
the
in Natural Dissertation.
University
Linguistics
Indiana
An
1970.
Algorithm.
Limitations Doctoral
Efficient
Communications
Context-Free of the ACM
F.
note
Parsing
and
for the Heuristic
Paths.
IEEE
7. Hayes, P., Caseframe
B. Raphael.
Determination
Transactions
1968.
E.,
J.
of the
Computational
Barnett,
K.
of Minimum
on SSC
23rd
Meeting
Linguistics,
J.
8. Heidorn,
G.
Computed Proceedings
Wittenburg,
and
22. Shieber.
of the Association
Experience
for
S. 1984.
for Linguistic pp. 362-366.
an
Easily
Metric for Ranking Alternative Parses. of the 20th Meeting of the Association
Computational
Linguistics,
23. Shieber,
R.
193-241.
A
1973.
Rustin
(ed.),
pp. 82-84.
New York:
10. Karttunen,
General
Natural
Processor.
24. Shieber,
pp.
Algorithmics.
L. 1986.
HUG:
for unification-based International
A development
grammars.
and CSLI,
Stanford
ms.,
Association
for
Computational
SRI
M.
Sharing Meeting
Linguistics,
Algorithm
1980.
Schemata
in Syntactic Processing. Xerox Center, tech report no. CSL-80-12.
J.
M., D, Hindle, about Talking
21st
Annual
Computational 15. Martin, P., Transportability Interface
and M. Fleck. about Trees.
Meeting
Linguistics, D.
of
pp.
and
Data
Palo
Alto
In Proceedings
16. Martin, W., K. Church, of a Preliminary Analysis Algorithm: Theoretical and tech
report
Association
for
pp. 129-136.
Appelt, and F. Pereira. 1983. and Generality in a Natural-Language
System.
of IJCAI,
/ ENGINEERING
pp.574-581.
and R. Patil. Breadth-First Experimental
no. MIT/LCS/TR-261.
An Introduction
In
of the Association
for
to Unification-Based Chicago:
University
of
forthcoming. 1984.
METAL: The Paper presented
LRC Machine at the ISSCO
Translation, April 2-6, 1984, [Working paper no. LRC-84-2,
Research
26. Steedman, M. the Grammar
Center,
University
1985. Dependency of Dutch and
of
Texas
at
and Coordination in Language English.
Chart H. 1981. in PSG. In Proceedings
27. Thompson, Schemata Meeting
1981. Parsing Results.
Parsing
and
of the
Association
for
19th
Rule Annual
Computational
pp. 167-172.
28. Wittenburg, Categorial
tech
the
of
presented
1983. D-Theory: In Proceedings of
the
Parsing
Formalisms.
.]
Society 14. Marcus, Talking
to Extend
pp. 145-152.
Grammar.
Press,
Linguistics,
Structures Research
Meeting
Language of Coling84, Linguistics.
61:523-568.
133-136. 13. Kay,
to
Linguistics
University.
L., and M. Kay. 1985. Structure In Proceedings of the 23rd Trees.
12. Karttunen, with Binary
23rd
Translation System. Tutorial on Machine Lugano, Switzerland.
environment
Unpublished
of a Computer
Restriction
Linguistics,
S. 1986.
25. Slocum,
Austin
11. Karttunen,
for
In
Processing,
L. 1984. Features and Values. Proceedings 28-33. Association for Computational pp.
of Coling, Linguistics.
of the
Syntactic
25:27-37.
Design
Using
of the
Approaches
Language
Grammar
for Complex-Feature-Based
Computational
In for
The
S. 1985.
Chicago 9. Kaplan,
A
of the ACM
Information. Proceedings Association for Computational
Algorithms
with
DIAGRAM:
Cost
4:100-107.
pp. 153-160.
1982.
1982.
Communications
Proceedings
1058
Analysis.
Corporation.
Dialogues.
A Formal
P. Andersen, and S. Safier. 1985. Semantic Parsing and Syntactic Generality. In
Proceedings
MIT
Language
SRI International.
G. Whittemore. 1986. Ambiguity and Procrastination in NL Interfaces. Technical report no. HI-073-86, Microelectronics and Computer Technology
21. Robinson, P., N. Nilsson,
the
Center,
MIT.
6. Hart,
R.
for Natural
13:94-102. 20. Rich,
Basis
Logic
275, A.I.
19. Pereira, F. 1985. A Structure-Sharing Representation for Unification-Based Grammar Formalisms. In Proceedings of the 23rd Meeting of the Association for Computational Linguistics, pp. 137-144.
5. Ford, M., J. Bresnan, and R. Kaplan. 1982. A Competence-Based Theory of Syntactic Closure. In The Mental Representation of J. Bresnan (ed.), Cambridge, Grammatical Relations, pp. 727-796. Mass.:
1983.
Technical
K. 1985. Grammars at
the
Some Properties of Combinatory of Relevance to Parsing. Paper
Annual
of America,
report
K. 1986a. To appear Anaphora. Volume 20: Discontinuous
30. Wittenburg,
[MCC
tech
K.
1986b.
Technical Search. Microelectronics Corporation.
of
the
27-30,
Linguistics
Seattle.
[MCC
no. HI-012-86.1
29. Wittenburg,
Academic.
Meeting
December
Extraposition
report
and
from
in Syntax and Constituencies.
NP
as
Semantics, New York:
no. HI-118-85.1
Parsing
as
Heuristic Graph HI-075-86,
no. report Computer
Technology