A Parser for Portable NL Interfaces Using Graph-Unification-Based

0 downloads 0 Views 1MB Size Report
based grammar, using a graph-unification-based representation language, using ...... advanced here in favor of a best-first enumeration order would lose force.
From: AAAI-86 Proceedings. Copyright ©1986, AAAI (www.aaai.org). All rights reserved. A PARSER FOR PORTABLE NL INTERFACES USING GRAPH-UNIFICATION-BASED GRAMMARS

Kent Wittenburg

and Computer

Microelectronics

may

Abstract This

paper

presents

of a parser

the reasoning

for the Lingo

the selection

and

that

uses

by

a best-first

flexibility

language

interfaces

on natural

in the selection a syntactically

allowing

grammars exhaustive

control

for these for

best-first

in particular enumeration

development.

at

of the parsing algorithm based grammar, using

a number

a

other

on an agenda.

processing

of

advantages

representation

of

managed

language

tuning

It

applications

parsing

of

this

languages

advantages

that

advantages

for

particular

choice

for

are outlined,

acrue

to

graph-

as well

this

approach

based greater enhance

this

short

greater

design

relatively

to new applications,

sophistication

required

but

also,

for interfaces

of the future.

choice,

the

next

step

for grammar representation and itself. The representation language

unification-based formalism, (1984) and Shieber (1984). languages have had a

tremendous

computational

and

linguistics,

for

constrained domains, offer a modularity in design that in the long run. Not only does the

applications

first

run

highly

transportability the

to knowledge-based Given

the in

grammars payoffs

possible

formalism grammar

in

systems

modularity it makes

domains while at the same time allowing of the search space during grammar

Efficiency

unification-based

structure

natural

offer

syntactically will achieve

design

graph-unification-based representation language, using Combinatory Categorial Grammars, and adopting a one-to-many mapping from syntactic bracketings to semantic representations in certain cases. The algorithm chosen is a variant of chart parsing offers

Corporation

unsophisticated

behind

project

MCC. The major factors were the choices of having

Technology

was

choose

to

a

an approach to the chosen was a graph-

in particular, one based on Karttunen Graph-unification-based representation impact

in fact

have

in been

the

field

of

incorporated

in

one form or another into a number of influential linguistic theories (see Shieber 1986 for discussion). Among the many advantages of

as

a

by

requiring

graph-unification-based

formalism

no special

training

for

are

(i)

grammar

it

is

writers

easy

with

to

use,

linguistics

virtue of its use of an agenda as a control structure. We also mention two useful refinements to the basic best-first chart parsing algorithm that have been implemented in the Lingo

backgrounds; (ii) it is a language separable from any particular machine-dependent implementation and thus amenable to optimizations at many levels; (iii) it avoids the typical explosion of

project.

ad

hoc

procedural

declarative variety

1. Introduction In designing the

first

used

a portable

crucial

in the

Existing there

NL

those

to

(NL)

can

(Martin,

a semantically

interface,

require

it be syntactically

of the

one

based,

along general

Appelt,

and

Pereira

based

grammar

lines:

grammar 1983),

particular

of

versus to

grammatical

other

things

to generate

Our

based.

these

the

choice

Categorial Though this

for

an

untested

approach

to

(iii)

order

partially

grammar domain. offer the

must The

advantage

for

new

each

and

be rewritten syntactically domain

and

Robustness

in

of parsing

systems which, along to separately; though initial design entwined with One

of the

language grammar.

the

able

to avoid

achieving

first

rehacking

a greater

syntactic

in the

the

level

variations

is a component

grammar

of generality of

the

decisions was

in the

to go with

It was felt that,

although

come

MCC

for

Lingo

a general,

grammar,

and the

being

flexible, (v)

it is order

same

a purely

accommodating free,

grammar

a which

rules

can

be

input.

and

extendable dealing

base,

was

was

1982,

applications particularly

Combinatory

Steedman

1985).

to date,

we felt

promising

to empty

it

lexical

relative

in

the

it handles English extraction phenomena clauses, etc.) efficiently and elegantly

offers

rewrite

rules

a method

free

word (iv)

it

ambiguity

it is particularly suitable designs; and (vi) it is able rule

grammar

or complicated

(ii) it accounts for more of English approaches without special rules

formalism; with

the

Steedman

language

grammar

resorting

passing schemes; than alternative

to and

of

accounting

order

with

suggests

and

heuristic

for left-to-right, to accomplish

feature

coordination or ad hoc for

free

a simple

and

word easily

natural

techniques

for

rule

preferencing;

(v)

incremental all this with

processing a very small

to the alternatives.

syntactically-based

with semantics generally, must be attended robustness is obviously not precluded by this

choice, it does not the grammar itself.

interfaces

the

from scratch, in general, for each new based grammars, on the other hand,

of being

sophistication

and

(Ades

respects: (i) relative (wh-questions,

operations;

coverage

the very

that

approach

following

suffer

syntactic

is

as to analyze.

in natural

generally

of

means

Grammar

withough

patchiness

in it

theories;

as well

domain, e.g., Plume (Hayes, Andersen, amd Safier 1985). The semantically based grammars offer customization of the entire robustness of parsing along domain system for the domain; However, they sensitive lines is an advantage usually cited. from

(iv)

of

grammar

or semantically

be classified

use a syntactically

TEAM

use

language

is whether

systems

that

e.g., that

that

interface

are those

English,

natural

decisions

system

of

among used

operations

language;

free

since

project

it

is

not

Given the

the

subject

three of this

design

decisions

paper,

parser for NL interface representation language

just

namely, applications have just

we

mentioned,

the

selection

using the mentioned.

we now come and

design

grammars

and

to

of a the

on natural

syntactically-based

semantically-based

grammars

NATURAL

LANGUAGE

/

1053

2. Charts First

let

criteria,

us

ask

independently following

of

what the

desiderata

we

choices

should

should

expect

related

to

speak

from

the

the

parser The

grammar.

for themselves:

structure

makes

it

possible to develop sophisticated grammar development that the state of a parse can be examined at any point.

the

tools

so

agendas

is

Another

-

presence

point

its relation A formally sound ensure termination,

l

basis for the completeness,

algorithm in order and tractability.

to

for

debugging The

l

facilities

ability

domains systems The

l

development

grammar

inspection

and

to tune

particular

right-left,

or

language

input.

grammars

potential

to

into

integrate

for

and

for

purposes

to any

heuristic

parsing

middle-out in

in

this

the

area

effect

paths

with

if necessary.

algorithm

able

the

respect

VS.

Partial

and

an

of and

can

be

attention

can

be

time,

a property

at any

agenda

processing

of

it is hard

any

heuristics

to suspend

to control

natural completely

agenda,

possibility

Thus

with

is

as appropriate

of the chart

of being

left-right,

processing

respect

on

as long

with

proceeds

In fact, can be designed. any mix of these orders;

the

later

chart

the control

ordering

arbitrary

promising

general

of

of

can be achieved

produces

processing

contextual

the

version

the

control

about

of whether

Enumeration

operations to allow

which less

factors

by

directed

applied

making

some

designs

agenda designed

in particular

efficient

semantics

preferencing

these

steps.

of parsing

such that prototypes can be developed.

factors

maximize

that

worth

to questions

controlled Modes

l

of an inspectable

on

resuming

such

to imagine

a more

decisions.

tuning. A

l

design

credit

that

individual

principle

allows

schemes

performance

to

maximum

place chart

for

adaptation

to

flexibility

and

formal

future

adaptation

to

soundness,

the

extolled

by

Kay

most

and

others

and

pieces

of evidence

One

of the

their have

widespread adoption. Charts, figured prominently in important

persuasive

from the computer natural language

science

systems,

e.g., Slocum

not

reason

obvious

some variant of of charts have be repeated

here.

in favor

e.g.,

Earley

is that

(1970),

in

e.g., Kaplan (1973), Bear and (1981), Ford, Bresnan, and Kaplan Patil (1981), Shieber (1985), and in (1981).

through

whatever

reason,

in

published

most

designs, have

despite

written

that

of the

possibility

order

in

driven

chart

applied

However,

with

fact

for

existing

a Wizard-of-Oz

operations agenda

to

the

(1973)

to control

ways.

other

control

Kaplan

the

Research

hand,

seems

to

into have

concerns (e.g., Ford, by the need to develop

the

needs

of NL interface

Retaining

shortly the

without

maximum having

grammars of ‘the

agenda

are (i) as a means

development,

an agenda

The advantages of having a parser go well beyond to scrap

is a major relative

structure. of integrating

flexibility

ease Other

an agenda matters of for

or radically

advantage. of adding possible

semantics

and

later alter

We will meta-level uses of an contextual

And,

to

complete

our

list

/ ENGINEERING

It seems safe to say that no added to the parser, and Martin,

are

interface

Our

choices

relating

approach

to

add

matter Church,

the

representation

language

what and

and

further weight to this argument It is a commonly acknowledged

enumeration.

unification of mostly because

graphs tends to be expensive unification is destructive of the

the

against fact that

computationally, graphs on which

it

1984). The expense is particularly is called (see, e.g., Karttunen evident in chart parsing because, by definition, one needs to retain edge

graphs

create

in

new

their

edge

original

graphs

state

that

before

reflect

unification

the

result

originals. Optimizations of the unification been suggested as a means for coping with 1985;

Karttunen

and

obviously

welcome,

calls

unification

to

As

While

place

to cut

in

detailed

interpretations is that

rule

firings

that which

place.*

(1985), extraction is the

identical

lead

to

optimizations

are the

complete

a

consequence certain

potential

for

Another

through

identical

Grammars

also

is inappropriate.

and

content.

paths the

as the

is to minimize Avoiding

Categorial

multiple

unifying

line.

in parsing

have are

such costs

enumeration

grammars

as well

of

algorithm itself have this problem (Pereira

Combinatory

Wittenburg

these

there

first

in this

handling

in

this

of

in for

the

step

exhaustive

mechanism coordination

paths,

1985).

properties that

is

Kay

another

is a first

Certain suggest

the

solution.

of

the

kinds

of

ambiguous way

to say

search space of The obvious

method to use in such a domain avoids enumeration of all since there is no obvious reason to do so. This general plan

* It is relevant

has been

1054

Patil

application.

search

of

and

language

improvements

others.

Church,

exhaustive enumeration in the face of such data lead to satisfactory performance for a natural

by

among

Martin,

Patil added many, will probably not

unification

(1980),

experiment,

coverage

for certain kinds a corpus collected

product.

optimizations

factors into the scoring; (ii) as a framework for credit assignment schemes to automate adaptation to individual performance situations; and (iii) as the control mechanism for parallel processing of parsing steps, a natural use of agendas pointed out Kay

of extensive

each

enumeration

to a parser

example

an agenda

flexible

on

also

systems.

code or existing

see an

and

by psycholinguistic 1982), not specifically

efficiency.

adjustments

breadth-first

(1980)

of using

maximally

is in fact the key ingredient. as a control mechanism for increased

exhaustive Kay

parsers,

been mainly driven Bresnan, and Kaplan efficient

seems

work

the

enumeration agenda

there

chart parsing in the attention in the more in chart parsing For to be an association of chart parsing

grammars

found 958 parses for the sentence In as much as allocating is u tough job I would like to have the total costs related to

grammar

despite the popularity of there has been relatively little quarters to the role of agendas

based

amounts of syntactic ambiguity For example, working with

(1981) costs

exhaustive However, literature, theoretical

syntactically

admit staggering of constructions.

of charts is or something very similar, theoretical work on parsing

perspective,

research, Karttunen (1979), Thompson (1982), Martin, Church, and applied

will

most

Enumeration

While many chart-parsing algorithms use control schemes that exhaustively enumerate the search space, there are a number of reasons why exhaustive enumeration is unsuitable for NL interface applications given the choices mentioned. The most obvious

situations.

to begin in constructing a parser is with parsing (Kay 1980). The many advantages

been

2.1. Exhaustive

incorporating

automate

A design that does not preclude parallel processing schemes.

l

For

in

assignment

to report,

using

in parsing

(at least

based

structure

on unpublished

sharing

performance.

temporarily)

shelved

work

by David

Wroblewski,

not yield the expected may In fact, unification with structure in the Lingo

project.

that

dramatic sharing

has

been

with

embraced

in the

theoretical

formal

devices

utilized

the

Grammar.

It

strategies which

has

been

context;

has no reason There

is

one that

so

as

interpretation. by Church

in

Among

parse,

mapping in the

avoiding

one

and

Marcus,

assume grammar

that

that are

and

For more

if we

generating all reason we can than

the

the

of

reached

from syntactic bracketings case of extraposition from

once we the same

mentioned

by

one

only

member

one

l

there

Despite

all these

to noun

have

one

phrase

interpretation

or

that

against

there

are

a grammar

of

happens

have

such

for

l

under

toggled

such

between

arbitrary

points

controlled

chart

it need

not

involve

The

to be explored. during

be

It is very

circumstances.

exhaustive in between, parsing

Thus

and

The

is the

offers

this

sort

we can

that

important

to so

that

ask

On

The

can

be

as well

as

for.

Agenda

first

algorithm

(Earley

ordering

schemes

associated

for exhaustive string during itself

is ill-suited

have

been

to

get

some

partial

(Slocum

for

anything

along

with

with

chart

parsing,

efforts Given

distinguish,

then,

suggestions

in the

with

our

what with

other

breadth-

in such

breadth-first

motivation

to

be

for heuristically some

form

for

there a way

proposed ordering

of breadth-first

as

ordering keeping

rule search should

from

other

in Nilsson

of

all,

sense

has,

of

Nilsson

under

of merging

i.e.,

certain

nodes

alternate

the

paths

are

in the

through

of little

search

the

importance;

algorithm

is not

of the

algorithm,

optimality

the

of the total number of graph that are expanded,

a

nodes will

in the have a

on efficiency.

actions

(1980).

appearing

on a chart

can be looked

at

to take.**

We

chart parsing, then, graph-searching procedure

now

turn

3. The Best-First

to an overview

allow a presented

of the

algorithm

Algorithm

but

that

could

purpose

here is not to present the best-first chart parsing Readers may consult the original sources or in depth. (1986b) for detail in this respect. What I wish to do is certain features of the algorithm that help achieve the

outlined

The

parser

reflection Nilsson, two

above.

of and

enumeration

this

importance

project

Raphael

1968)

has of

been the

dubbed

A*

in its design.

previously

published

Astro,

search The

chart

which

algorithm algorithm

parsing

is a (Hart,

we use is

algorithms

respects.

or unary

We assume a grammar whose rules have only daughters. Such a simplification is a consequence

in

binary of the

** but

to

for the than

Intuitively,

continue

first

the

itself.

been

just

parse,

of

These characteristics of simplification of the general

a collection What is of could return

best

effect

solution

hand,

bearing

less general

(e.g., Robinson 1982, Heidorn 1982, Slocum 1984). maximum benefit for our purposes is a design that the

graph

can be irrevocable,

database

of

to the

other

possible

goals

respect to some mechanism by

even restricted breadth-first We of the alternatives. about

as

set of edges

algorithm Wittenburg highlight

is designed

enumeration,

grammars basically

additional

is

literature

achieved

exhaustive

at partitioning

enumeration

1984).

grammar with such a control

but

firings to a minimum, however, is less desirable than some

of parses

1970),

enumeration of the parsing. Although

or

of flexibility.

Order

Earley

in

scheme

implicit

lengths

graph

the

The

My 2.2. Enumeration

as a standard

graph).

search algorithm; the as the closed nodes in a best-first other data structure we need is one to keep track of open nodes, which can be viewed as an agenda of

exhaustive

enumeration,

best

is of

backtracking.

itself the

relative

strong

development

a parser

minimal

search

is

a

enumeration Assuming

grammar

chart

control

i.e., a measure complete search

fails in an application (whether will probably arise in which all

an event

Best-first

is

choice

those

to

using

mode.

in general

attractive

search.

commutative

the

in which,

exhaustive

development

when parsing circumstances

&ill

using

is

thus

thus admissability strong factor.

An example

attachment

arguments

a more

scheme.

to an and/or

system

search

within

that undesirable rule interactions can be eliminated. Also it is important to take note of the performance characteristics of the parser

l

analysis

more

path.

that

graph

(1980);

l

arguments

space

be

is a depth-first,

of charts

search graph appropriately; thus, there may be no need to check whether newly added nodes have already been generated in the search graph.

one. The is different

syntactic

by a different

deems

to continue

assuming

some

backtracking,

can be represented

conditions,

would yield the full set of attachments of view while “low” attachment would

mode,

anticipated

are

to

l

set.

there will be times “hard” or “soft”), the search

may

of prepositional

semantic

in an application enumeration in

We

processing

order

a strength

control

of heuristic search

The

for

interpretation

no reason

have reached the first solution in these cases

are reachable

of the first

a semantic

we have

assigned

“high” attachment the semantic point

yield

reach

above.

path,

that

be a treatment

say, from

to then

interpretations

interpretations would

able

one path,

paths reach

reason

set

are

than

semantic

breadth-first

since

require

(as opposed

Fleck

are advanced

to

but

“best-first”

The

l

Rich et al. (1986) g ive an account of how such ideas can be extended to a variety of syntactic constructions and to the semantic representation itself. Given this picture, we again have a case of a search domain in which it is inappropriate to generate all through

do not

perspective

phrases.

paths.

subsequent

alternatives design,

they

semantic in work

Hindle,

arguments

if

parses

course a well-known paradigm in the A.I. literature. A number of interesting observations can be made about chart parsing from the

exhaustive

Let us by the

with respect to previously surfaced

various

the

backtracking

a given

a so-called

admitted

(1983),

(1986a)

fire

other

heuristic

one can.

be ambiguous general idea has

Pereira

enumerate it necessary.

to control

to a successful

analyses

one-to-many interpretations

semantic

forward

use

in order to

syntactic to This

dealing Categorial

should

appropriate

motivation for be mentioned.

(1980),

a

one

grammars

further should

In Wittenburg

(1983).

most

as one proceeds

individual

designed

are

literature

Combinatory

that

these

to fire all the rules

enumeration certain

with

rules

as long

using

suggested

in processing grammar

linguistics in

exercised; which

viewed

closed open

have

not

as closed

yet

nodes

discussed

here,

successors

of that

nodes

nodes

an edge edge

in a heuristic

are options been

must

have

The

exercised. be understood

is placed

search

which

on the

are options

generated

observation

in light chart

algorithm been

that

of the fact

concurrent

with

during chart

that the

that

edges

in the

have

the search can

be

algorithm

expansion

of all

on the agenda.

NATURAL

LANGUAGE

!

1055

grammars

being

inherent

used

restriction

parsing algorithms right hand sides technique

the

to

MCC

chart

generalize arbitrary

Lingo

to

Earley

(1970).

the

algorithm

already binary general

Most that using

grammars length by

by

Categorial

project,

parsing.

for complicating

Combinatory

Grammars;

an

Lingo

project;

published chart have rules with the dotted rule

and

there

are no penalties,

However, in this

the effect

not

there

way

of dotted

The the

The

second

to

formal

operations

point

previously

about

this

published such

as

chart

ones

parsing

is that

is in

transformations

algorithm

it or

does

with

not

register

maintence

aspects

upon

follows

the basic

permit

setting

based

on

heuristic

function

to

Developing trial, and

the error,

and

the

start

as a basis

of Nilsson

for the

order

*agenda*

2. Initialize

*chart*.

3. LOOP:

if *agenda*

as

main

loop;

of the

the

best

action

5. If *best-edge*

from

the

and apply edge added

remove

it

the

terminating

conditions,

is non-nil,

the set of M on the

generate

Install

the members

a heuristic

ordering

A4 of

edge

successors

generate and

also

step,

in chart

which

see

if

a major

achieves formalisms appears

initialization.

between checking to

We make a rule

call

can be

in step is

6 of

chart,

leading

while actually applying highest ranked action

to The

to augmentations

a rule on the

on the

agenda

call is done only when invoking agenda. Given this distinction,

only, the we

can make make use of an optimized test function for checking edge leaving actual unification with its graphs for rule application, Since the latter destructive effects to applying a rule call. operation is invoked much less often than the generate successors step, there are unification-based themselves

for

major savings grammars, the

test

in at

function.

unification costs. For graph least three options present First

is

Shieber’s

notion

of

restricted unification (Shieber 1985). Second, Karttunen (1986) has suggested a scheme where the destructive effects of unification can be undone. Last, a more “porous”, but more efficient check-graphs function could be used that is nondestructive of its The last of these options is the one used in the graph arguments.

1056

node

heuristic expansion.

some

general

factors

that

are a

function. any rule

to the

call is the span

chart

if the

rule

fires

from the In general,

be favored.***

content

in the a role. lexical

for designing that

word

others

heuristics

tend

corpora. and

take will

in a rule

advantage

be

of

statistically

domains

of discourse.

to become

experience Automated

gathering

involved

that

senses

in certain

will

through

and

edges

Differentiating among the sense assignments is one

heuristics

some

than

such

make

/ ENGINEERING

mention

parsing

here

algorithm.

much

more

refined

with particular grammars, techniques for statistical

heuristic

refinement

are

always

a

used

the generate

to optimize

the

grammar

in this

basic for

way

successors

also

allows

step.

The

for further

partitioning

heuristic

of

control

actions.

4.1. Avoiding Many chart grammars

of two useful refinements to the The first introduces a technique

for unary rules more restrictive, thus **a** Also we useless unary rule applications. to partition the rules of the grammar that is

environments

avoiding ultimately review a technique

of the parser’s

a critical likely

and actually applying a rule to a set of edges. operation is a part of generating the successors of a new

on the

given

a

4. Refinements

chart

Generating Successors that The critical feature of this algorithm efficiency advantage for graph-unification-based

succeed checking

a

algorithms

of

scheme.

3.2.

loop,

of

all

choice

its

*agenda*,

making

the main

the

in scoring

should

sophisticated

We

distinction

as with

is

possibility.

*best-edge*

in the

factor

likely

lexicons, information

exit

7. Go LOOP.

found

once they

that are more likely to lead to success should higher intrinsic scores, and play a role in **** a rule call.

fact

Naturally, and

satisfies

successors. following

we mention

to be added

linguistic

more

the action, Set *bestto the *chart*, if any;

algorithm,

search,

a heuristic

edge

method

successfully. 6. If

Here

spans

The

failure.

*agenda*,

successfully

4.

successors

call should also play scores of ambiguous

from the *agenda*, edge* to the new else, NIL.

to apply

because

in performance,

The rules be given

l

it

(1980:64).

exit with

our algorithm

downgrade

successfully. This span can be computed spans of the daughter edges in the rule call.

the 4. Select

the

An important

l

l

is empty,

given

right set of heuristics is the product of intuition, and depends upon the particulars of the grammar

domain.

scoring 1. Initialize

6 fail

in step

graph

for designing

of the system.

suffices

organization

in step

iteration

choice a slight

3.3. The heuristic function Critical to the success of this

longer 3.1. The Main Loop The following procedure

except

generated

are chosen

for backtracking. In was done by Kaplan (1973), nor is it designed our experience, these simplifications allow more flexibility in some of the agenda

calls

the best

using

rules

remainder of the grammar consists of unary rules, effect of altering the complex categories in some

respect

if rule

it seems

is no

when

achieved in the categories of such grammars. The rules that are needed are a small, fixed number of combination rules, operating over these complex

categories. which have way.

of

introduced

motivation fact only very

in

due

useless parsing

make

unary rule proposals algorithms developed

for

context-free

use of relations

in the grammar which can prune collecting this sort of (1980) mentions

the total search space. reachability information Calls to unary actions.

Kay into tables which help control parsing rule productions are an obvious candidate

to

from the search space for Combinatory It is a general fact about unary rules that

attempt

to

Categorial

prune

Grammars.

*et Note unary

that

this

scheme

by itself

will tend

to favor

binary

rule

applications

over

recover

from

rule applications.

**** Among ungrammatical

the input.

rules

may

These,

be

some

in general,

whose

will receive

function a lower

is

to

priority.

***** agenda

This

augmentation

items

involving

also unary

helps

rule calls.

to establish

a heuristic

See Wittenburg

scoring

(1986b)

technique

for details.

for

they

tend

side

of

to be overly the

usefulness call

succeed

often

side. effect

a

out

rule

an

now

typically, by

to combine

edges

like

this

of

with

in chart

operations

more

under

consideration

be taken

the

itself

right-handthe

grammar

ultimate

Even when a unary edge to the chart,

additional

to be unable

subsequent

must

i.e.,

measure

application.

superfluous

of making edge

poor

in adding

turns

Adding

new

promiscuous,

is

of a unary

can

edge

rule

edges

Thus

rule that

on either

parsing

has

expensive

the

since

unary

rules. we

follows:

A relation call

node

exists

some

branching exhaustive sisters shown.

the

that

has

been

e&ended

sister

A is an extended

sister

node

(Y that

dominator

dominator

of B. In the

of A are

shown

of node

of A

as all

and

B if and

only

if there

p where

cr is a non-

@ is a non-branching

derivation

B’s that

tree

can

stand

below,

extended

in the

relation

/\/\ P I =

Q I =

P I =

I B

I A

I B

different

weights

the checking

to partitions

of subsets

as a whole.

of rules

that

are less

likely or perhaps whose checking is more expensive than other subsets. Also, it is possible to define a single check for an entire grammar

partition

consideration

that

in one fell swoop

of any of those

rules

can eliminate

the further

for the edge sets in question.

this

to be useful in this It is defined as

of node

is a sister

exhaustive

found relation.

assigning

in all generate

What is successors procedures involving any adjacent edges. called for then is the computation of some grammar relation that can be used to further restrict the conditions for applications of regard

by

we can postpone

5. Evaluation Since

the

critical

makeup

role

difficult

and further

in

of the

a parser

to evaluate

when chart

heuristic based

and different some of the

grammars arguments

enumeration

order application,

language,

Combinatory

heuristic

itself

plays

graph

search,

of the design

non-heuristically use different

than the advanced

would

interface

function

on

the efficiency

compared to other parsing systems that

study a

it

is

in the general

case

based designs. For representation languages

ones used here in

lose

such

force.

in the Lingo project, favor of a best-first

However,

a graph-unification-based Categorial Grammars,

given

an

NL

representation a many-to-one

and

mapping from syntactic bracketings to semantic representations, it is hard to imagine that any radically different alternative could beat the parsing program we have outlined here on the grounds of efficiency, indicates first

clarity, and flexibility. Experience in the Lingo an overwhelming performance improvement with

parsing

design

breadth-first

when

design

compared

that

to a non-selective, in

figured

our

original

and

refinement

project a best-

bottom-up, bootstrapping

operation. The

extended

precompute

sister

a grammar

the left-hand-sides must

of

(1986b),

that

unary

successors match

hand-side

is used

table

of each

unary-rule-call edge

relation

one

unary

rule.

a consequence

holds

rule.

of this

following

the

extended

is that

the

sisters

some

unary

4.2. Mets An

rule

calls

Agenda

additional

which

involve

for

sisters out in

for

adjacent

of the leftWittenburg

generate

successors

step now has to consider as successors of a given edge those unary rule calls which apply to the edge directly, those

We

condition

is now that

of the extended As pointed move

way.

An additional

of a new edge

at least

that

in the

not but

the edge as an extended

just also

sister.

Items

augmentation

that

we have

implemented

involves

adding

Current

research methods

as

changes

involving

the

tuning

of grammars

of the research

inspectable includes

buffering focused

approximations

the

rules

in

in

the

grammar,

It is also possible

that generates these basic agenda these rule calls. This augmentation iteration

of checking

the

edges

into n steps corresponding rules into n cells.

weeding

to define

out

a meta

all

rules

agenda

item

items, i.e., that itself is a way of breaking

with

respect

to a partitioning

to the whole

could

lead

[i, P, E’], cell of the new

edge

added.

where

i is a heuristic

grammar and

its

Exercising

rules, adjacent

rule

calls

added

edges

an agenda

the edges in E’ with respect only. Any rules which then to

the

score

item

at

effect that in the items of this new will have the form P is a partition

assignment,

E’ is a triple

and

the

of this

time form

that

represents

the

new

involves

edge

the was

checking

to the grammar rules of partition pass these further tests will result agenda

of

a

type

that

we

to

of

We are using

particular

corpora

of heuristic further

design

a set of tools which

take

of the chart. designing credit

for the

advantage Longer-term assignment

to specific performance situations. of incorporating some form of

agenda structure on local areas

so that attention of the chart.

can be Initial

that such a factor would improve the which tends to be best when considered in than globally. Also, left-to-right buffering

cascading an

of real-time

work

reported

design important

feeding step

into semantic and towards any future

parsing.

also

Rich

made

offered

Wroblewski including

graph

unification

also

wish

and

advice

the

Lingo

extensive

valuable

David work, interface

on here I gratefully

contributions

has also the coding and

to the

the

parser

to thank

involved many others at MCC acknowledge the contribution of In particular, team to this work.

comments

Lauri

on parsing

on

to the research earlier

drafts

of

itself this

and

paper.

made significant contributions to this of structure-sharing algorithms for

design

and

and

coding

grammar

Karttunen

of the window

development for

system

environment.

his long-standing

support

matters.

P in

References

assumed

previously.

1. Ades, Words.

The advantages of incorporating are that it is possible to heuristically

a

besides the author. other members of Elaine

Adding this sort of agenda item has the generate successors step, we produce agenda type at the first pass. These new agenda items

consideration

Acknowledgements The

grammar

for

components,

generates apart the

of the set of grammar

a

control structure the possibility of

indicates experience reliability of heuristics, a local fashion rather

against

all

as

agenda.

within the incrementally

pragmatic

fail the test.

evaluation

well

schemes to automatically adapt We also count the possibility

a new type of agenda item. In the basic best-first algorithm, agenda items are made up of rule calls over a set of edges. These agenda items are generated by checking these edges which

involves

scoring

A.

and

M.

Linguistics

Steedman.

1982.

and Philosophy

On

the

Order

of

4: 517-558.

this new type of agenda item order the iteration over the

NATURAL

LANGUAGE

/

1057

I

2. Bear,

J.,

Phrase

and

L.

Karttunen.

Structure

Parser.

1979. Texas

PSG:

A

Linguistic

17. Nilsson,

Simple

Forum

15:

Palo

N.

Alto,

1980. Ca.:

Principles

of Artificial

Intelligence.

Tioga.

l-46. 18. Pereira, 3. Church,

K.

1980.

On Memory MIT Processing.

Language [Available Club.]

through

4. Earley,

J.

the

in Natural Dissertation.

University

Linguistics

Indiana

An

1970.

Algorithm.

Limitations Doctoral

Efficient

Communications

Context-Free of the ACM

F.

note

Parsing

and

for the Heuristic

Paths.

IEEE

7. Hayes, P., Caseframe

B. Raphael.

Determination

Transactions

1968.

E.,

J.

of the

Computational

Barnett,

K.

of Minimum

on SSC

23rd

Meeting

Linguistics,

J.

8. Heidorn,

G.

Computed Proceedings

Wittenburg,

and

22. Shieber.

of the Association

Experience

for

S. 1984.

for Linguistic pp. 362-366.

an

Easily

Metric for Ranking Alternative Parses. of the 20th Meeting of the Association

Computational

Linguistics,

23. Shieber,

R.

193-241.

A

1973.

Rustin

(ed.),

pp. 82-84.

New York:

10. Karttunen,

General

Natural

Processor.

24. Shieber,

pp.

Algorithmics.

L. 1986.

HUG:

for unification-based International

A development

grammars.

and CSLI,

Stanford

ms.,

Association

for

Computational

SRI

M.

Sharing Meeting

Linguistics,

Algorithm

1980.

Schemata

in Syntactic Processing. Xerox Center, tech report no. CSL-80-12.

J.

M., D, Hindle, about Talking

21st

Annual

Computational 15. Martin, P., Transportability Interface

and M. Fleck. about Trees.

Meeting

Linguistics, D.

of

pp.

and

Data

Palo

Alto

In Proceedings

16. Martin, W., K. Church, of a Preliminary Analysis Algorithm: Theoretical and tech

report

Association

for

pp. 129-136.

Appelt, and F. Pereira. 1983. and Generality in a Natural-Language

System.

of IJCAI,

/ ENGINEERING

pp.574-581.

and R. Patil. Breadth-First Experimental

no. MIT/LCS/TR-261.

An Introduction

In

of the Association

for

to Unification-Based Chicago:

University

of

forthcoming. 1984.

METAL: The Paper presented

LRC Machine at the ISSCO

Translation, April 2-6, 1984, [Working paper no. LRC-84-2,

Research

26. Steedman, M. the Grammar

Center,

University

1985. Dependency of Dutch and

of

Texas

at

and Coordination in Language English.

Chart H. 1981. in PSG. In Proceedings

27. Thompson, Schemata Meeting

1981. Parsing Results.

Parsing

and

of the

Association

for

19th

Rule Annual

Computational

pp. 167-172.

28. Wittenburg, Categorial

tech

the

of

presented

1983. D-Theory: In Proceedings of

the

Parsing

Formalisms.

.]

Society 14. Marcus, Talking

to Extend

pp. 145-152.

Grammar.

Press,

Linguistics,

Structures Research

Meeting

Language of Coling84, Linguistics.

61:523-568.

133-136. 13. Kay,

to

Linguistics

University.

L., and M. Kay. 1985. Structure In Proceedings of the 23rd Trees.

12. Karttunen, with Binary

23rd

Translation System. Tutorial on Machine Lugano, Switzerland.

environment

Unpublished

of a Computer

Restriction

Linguistics,

S. 1986.

25. Slocum,

Austin

11. Karttunen,

for

In

Processing,

L. 1984. Features and Values. Proceedings 28-33. Association for Computational pp.

of Coling, Linguistics.

of the

Syntactic

25:27-37.

Design

Using

of the

Approaches

Language

Grammar

for Complex-Feature-Based

Computational

In for

The

S. 1985.

Chicago 9. Kaplan,

A

of the ACM

Information. Proceedings Association for Computational

Algorithms

with

DIAGRAM:

Cost

4:100-107.

pp. 153-160.

1982.

1982.

Communications

Proceedings

1058

Analysis.

Corporation.

Dialogues.

A Formal

P. Andersen, and S. Safier. 1985. Semantic Parsing and Syntactic Generality. In

Proceedings

MIT

Language

SRI International.

G. Whittemore. 1986. Ambiguity and Procrastination in NL Interfaces. Technical report no. HI-073-86, Microelectronics and Computer Technology

21. Robinson, P., N. Nilsson,

the

Center,

MIT.

6. Hart,

R.

for Natural

13:94-102. 20. Rich,

Basis

Logic

275, A.I.

19. Pereira, F. 1985. A Structure-Sharing Representation for Unification-Based Grammar Formalisms. In Proceedings of the 23rd Meeting of the Association for Computational Linguistics, pp. 137-144.

5. Ford, M., J. Bresnan, and R. Kaplan. 1982. A Competence-Based Theory of Syntactic Closure. In The Mental Representation of J. Bresnan (ed.), Cambridge, Grammatical Relations, pp. 727-796. Mass.:

1983.

Technical

K. 1985. Grammars at

the

Some Properties of Combinatory of Relevance to Parsing. Paper

Annual

of America,

report

K. 1986a. To appear Anaphora. Volume 20: Discontinuous

30. Wittenburg,

[MCC

tech

K.

1986b.

Technical Search. Microelectronics Corporation.

of

the

27-30,

Linguistics

Seattle.

[MCC

no. HI-012-86.1

29. Wittenburg,

Academic.

Meeting

December

Extraposition

report

and

from

in Syntax and Constituencies.

NP

as

Semantics, New York:

no. HI-118-85.1

Parsing

as

Heuristic Graph HI-075-86,

no. report Computer

Technology