understanding natural language through parallel processing of ... - IJCAI

U N D E R S T A N D I N G N A T U R A L L A N G U A G E T H R O U G H P A R A L L E L PROCESSING O F S Y N T A C T I C A N D S E M A N T I C K N O W L E D G E : A N A P P L I C A T I O N T O D A T A BASE Q U E R Y

R. C O M I N O (*), R. G E M E L L O (*), G. G U I D A ( * * ) , C. R U L L E N T ( • ) , L. SISTO (*), M. S O M A L V I C O ( * * ) (*) C S E L T - C e n t r o S t u d i e L a b o r a t o r i T e l e c o m u n i c a z i o n i S.p.A. - Via G. Reiss R o m o l i , 2 7 4 - 10148 T o r i n o (Italy) ( * * ) M i l a n Polytechnic A r t i f i c i a l I ntelligence Project, M i l a n o ( I t a l y ) ABSTRACT

d u r i n g the parsing of

This paper describes the main features of

the P A R N A X

system f o r natural language access (in Italian) to an A D A B A S

proposes

an utterance,

the syntactic analyzer

candidate syntactic structures for a c o m p o n e n t of

that utterance that f i t the given set

data base. T h e core of the system is c o n s t i t u t e d by the analyzer that

semantic structures according to a given set of semantic rules.

includes

parallel

processing

of

syntactic

and

semantic

nondeterministic

semantic

of syntactic rules. S i m i l a r l y ,

the

analyzer constructs candidate

knowledge. It is argued that this feature (together w i t h the new

A m o n g these o n l y those structures t h a t can be associated to a

macro

corresponding syntactic s t r u c t u r e are validated

and

micro-analysis t e c h n i q u e w h i c h

is o n l y s h o r t l y

and

will

be

m e n t i o n e d in this paper) allowed the system to reach a good

f u r t h e r considered in the f o l l o w i n g steps of the processing. A l l

linguistic coverage, still ensuring an acceptable degree of effici-

other

ency. A f t e r the basic architecture and operation of P A R N A X

l i m i t i n g the search space w h i c h is actually expanded d u r i n g the

have been described, a t t e n t i o n is focused on the parallel syn-

analysis. This enables the system to reduce the n o n d e t e r m i n i s m

tactic/semantic

analyzer

advantages

obtained

discussed.

Examples

which

through of

is

illustrated

parallelism

PARNAX

in

are

operation

detail. also

are

The

shortly

candidate

structures are discarded,

thus considerably

of the analysis process, a n d , therefore, to operate w i t h improved e f f i c i e n c y .

presented.

References to related w o r k s are m e n t i o n e d , and directions for

2.

BASIC SYSTEM A R C H I T E C T U R E

f u t u r e research are o u t l i n e d .

P A R N A X allows casual users to access an A D A B A S data base

in

Italian.

It

translates natural

language requests

into

N A T U R A L programs (the A D A B A S f o r m a l query language) i n

1. INTRODUCTION This paper discusses a research project devoted to the

t w o steps. First the natural language q u e r y is processed by the

design of a robust and effective parser for Italian language, that

ANALYZER,

can support a large linguistic coverage, still ensuring an accept-

sed

that generates a semantic representation

in an

internal f o r m a l i s m ,

expres-

METANATURAL.

able degree of e f f i c i e n c y . T h e project has been partially based

METANATURAL

on previous results by the authors (Guida and Somalvico, 1979,

a l l o w a sufficient degree of independence of the understanding

1 9 8 0 ; Guida and Tasso,

1982) and presents several original

is an

called

intermediate language w h i c h should

process of the details of the data base logical schema. T h e n , the

c o n t r i b u t i o n s . A m o n g these are: a two-level analysis strategy

FORMALIZER

that includes a macro-analysis and a micro-analysis phase; a

full

transforms

the M E T A N A T U R A L

query

into a

N A T U R A L p r o g r a m . This is eventually supplied to the

model for semantic processing that is made up of t w o sequential

DBMS w h i c h provides the user w i t h the desired answer. System

phases,

tailoring

namely, a

nondeterministic

part that validates and

completes the a c t i v i t y of a syntactic analyzer, f o l l o w e d by a

Figure

d e t e r m i n i s t i c part t h a t constructs the o u t p u t internal representation;

a parallel

algorithm

that manages the t w o cooperating

processes of syntactic analysis and the n o n d e t e r m i n i s t i c part of semantic analysis.

and updating

MANAGEMENT

is ensured

by a

KNOWLEDGE

BASE

SYSTEM 1

shows

the basic architecture and

o p e r a t i o n of the A N A L Y Z E R , that constitutes

mode of

the core of the

system. T h e analysis is p e r f o r m e d at t w o levels: the upper level, called

macro-analysis,

takes

into

account

the

outer

sentence

This research project is supported by the i m p l e m e n t a t i o n

structure and suggests possible s p l i t t i n g and n o r m a l i z a t i o n of

of an e x p e r i m e n t a l system, called PARNAX, w h i c h is presently

c o m p l e x or syntactically unusual sentences i n t o simpler frag-

r u n n i n g in an I N T E R L I S P version on Siemens 7 7 4 8 at C S E L T

ments; the lower level, called microanalysis, parses the sentence

( T o r i n o , I t a l y ) . P A R N A X is a natural language interface to an

fragments and returns the o b t a i n e d M E T A N A T U R A L subtrees

A D A B A S data base c o n t a i n i n g i n f o r m a t i o n on the staff of a

to the upper level that w i l l compose t h e m i n t o the final M E T A -

sample c o m p a n y .

N A T U R A L tree.

In this paper, a t t e n t i o n is focused on the parallel strategy

Macro-analysis o p e r a t i o n is m a i n l y rule-based: a set of

developed for syntactic and semantic processing. At each step

structural rules of the t y p e < p a t t e r n , f r a g m e n t a t i o n - n o r m a l i z a -

664

R. Comino et al.

Micro-analysis consists of three phases. T h e first phase, namely lexical analysis, classifies the elementary c o m p o n e n t s of the i n p u t

fragment (words or simple constructs and idioms)

according

to their lexical t y p e .

Its o p e r a t i o n is based on a

n o n d e t e r m i n i s t i c finite-state recognizer that utilizes the lexical part of the dictionary, w h i c h contains lexical (Courtin, cesses:

roots and models

1977). T h e second phase includes t w o parallel pro

the syntactic analysis, w h i c h is based on the dependency

grammar model (Hays, 1 9 6 4 , C o u r t i n , 1977), and the first part of the semantic analysis, called semantic validation edge bases u t i l i z e d in this phase are semantic

rules,

and

the semantic

The knowl

the dependency rules, the

part

of the

dictionary.

The

result of the syntactic analysis and semantic v a l i d a t i o n is a set of validated trees. A validated tree is a m o d i f i e d dependency tree,

w h e r e semantically meaningless c o m p o n e n t s have been

p r u n e d and a p p r o p r i a t e semantic variables have been added Each node of the tree is therefore associated b o t h to a semantic variable that

expresses

the

kind

of

i n f o r m a t i o n o1 the cor

responding sentence fragment, and to a pointer to a procedure (semantic

module),

which

will

be

later

used

to assign

the

correct value to the variable. T h e t h i r d phase, namely semantic interpretation.

is a

d e t e r m i n i s t i c process devoted

to construct

the o u t p u t M E T A N A T U R A L tree. It operates t h r o u g h a bott o m - u p activation of the semantic modules that are referred to in validated

trees, and assignes the a p p r o p r i a t e values to the

corresponding semantic variables. 3. IMPROVING THROUGH

NONDETERMINISTIC ANALYSIS

PARALLELISM

Parallelism between syntactic analysis and the first part of the semantic analysis (semantic validation) is used in PAR N A X to improve the efficiency o1 the parsing process. B o t h s y n t a c t i c analysis and semantic v a l i d a t i o n are n o n d e t e r m i n i s t i c processes, that can be viewed as problems to be solved by a reduction-to-subproblems

mechanism

As

far

as

syntactic

analysis is concerned the p r o b l e m is to construct all the correct dependency trees for a given sentence. This p r o b l e m , once a w o r d in the sentence has been chosen as a candidate root o1 the tree, can be split ted i n t o subproblems. Each o1 t h e m consists in f i n d i n g all

the dependency subtrees whose roots (determined

t h r o u g h dependency rules) are just one step b e l o w the root of the dependency tree corresponding to the basic p r o b l e m . M a n y groups of alternative subproblems may arise, since m a n y groups Fig. 1 - Basic mode of operation of the ANAL YZER

of roots can be generated according to the dependency rules. A similar s i t u a t i o n arises in t h e case of semantic v a l i d a t i o n , w h e r e

t i o n a c t i o n , c o m p o s i t i o n action > is used b o t h by the fragmen-

the p r o b l e m consists in f i n d i n g all the correct semantic

tation and normalization m o d u l e

and the subproblems in

for

analysing the outer surface

s t r u c t u r e of t h e i n p u t sentence, and by the composition m o d u l e

possible

for

tic

building

analysis

also

manages

the

up

the o u t p u t

includes

METANATURAL

a paraphrase and dialogue

user-system

interaction

paraphrases of the i n p u t requests.

tree.

Macro-

m o d u l e that

and generates echoing

sequences

rules.

Both

separately solved

of

roots are d e t e r m i n e d

semantic by

two

trees,

f i n d i n g the semantic subtrees whose and

by

the seman-

syntactic p r o b l e m s c o u l d

d i s t i n c t search

processes

in

be

their

respective search spaces ( A N D - O R graphs), w h i c h are generally very large.

R. Comino et al.

Our choice is to solve both problems in a joined way, using t w o parallel, step-by-step search processes in the t w o

665

lead to the solution (boldface in Figure), while SP3 or SM3 w i l l fail in f o l l o w i n g steps.

search spaces. To this purpose a correlation between the t w o search spaces is necessary. This is defined in terms of an associa-

5 . T H E POWER O F P A R A L L E L I S M

t i o n criterion among problems in the t w o spaces (which is generally

a

many-to-many relation).

According to this,

for

each problem P in one search space only those reductions of P are taken

into account w h i c h have at least one association

among the possible sets of successor problems of the problem P' associated, in the other space, w i t h P Thus, most of the search for non existing solutions (semantic trees w i t h o u t corresponding syntactic trees, and vicoversa) is avoided, w i t h the desired result of constructing, w i t h considerably minor effort, only the set

of the semantic trees w h i c h are also syntactically valid

(validated trees).

empirical

experimentations have been done to

syntactic and semantic processing. T w o samples A and B of about 50 and 30 sentences respectively have been considered. A has been chosen f r o m a set of sentences proposed by casual users; B has been constructed to contain a broad spectrum of different ways of expressing the same request. Fach sentence in the t w o samples has been parsed first using the syntactic analyzer alone (I), and then using the parallel syntactic/semantic processor (II) thus generating the complete validated trees as well. The following parameters have been estimated, cardinality

In the current implementation of P A R N A X the parallel processing strategy above outlined is realized in a sequential way

Some

estimate the benefits obtained through parallel execution of

in which syntactic and semantic steps alternate in the

analysis, due to the limitations of the available I N T E R L I S P system.

of the syntactic search space expanded in both cases I and I I , and analysis time for syntactic processing alone in case I and parallel syntactic/semantic processing

in case

I I . The mean

values of the ratios of the above parameters in cases I and II (case I/case I I ) , tor both samples A and B, have been computed. 1 he table of Figure 2 illustrates the results obtained.

4. AN EXAMPLE A

simplified

Note that the sum of the time required for both syntactic example

of

parallel

syntactic/semantic

and semantic analysis (performed in parallel) is less than the

processing is shown in Figure 3. The focus is centered on one

t i m e needed for the syntactic analysis alone. We also outline

step of the analysis concerning the segment " N A T I A T O R I N O

that the benefits of parallel processing increase w i t h the length

PRIMA D L L

of the sentences.

1 9 5 8 " (for the english translation refer to the

Appendix) Lexical dictionary

and

semantic

types are

and are self explanatory.

extracted

from

the

For what concerns the

structure of semantic rules, taking rule I as an example, its meaning is the f o l l o w i n g one. a segment conveying the intorma

Sample A

Sample B

C A R D I N A L I T Y R A 1 l()

181

2.61

ANALYSIS TIME RATIO

1.36

2.03

Fig. 2 - Experimental results of parallel syntactic/semantic processing.

t i o n (A) S E L L C I O N (i.e., how to select an object f r o m a set) must contain a w o r d of lexical type " v e r b " and semantic type "AT"

inside that segment the i n f o r m a t i o n ( V A L U E must be

6.

EXPERIMENTAL ACTIVITY An

experimental

version of

PARNAX

has been fully

f o u n d (it is underlined) while ("'DATE and ( " ' C O M P A R A T I V E

implemented at CSELT (Torino, Italy) in the last t w o years.

may possibly be f o u n d , in any order. F 13 is the pointer to the

The system is connected to an A D A B A S data base containing

semantic module used during the

i n f o r m a t i o n on the staff of a sample company. It supports free

semantic

interpretation

phase.

b o t t o m - u p activation of the

The structure of dependency

Italian language interaction (including single sentences as well as

rules is evident considering rule 1 as an example; an adverb may

sequences of interrelated utterances), and is able to deal w i t h

be the dependent of a verb; the weights (-20, +10) deter-

ungrammatical sentences, ellipsis and telegraphic: forms, a class

mine the possible positions of the dependent (Courtin, 1977).

of anaphoric references, and m u l t i p l e queries. P A R N A X can

At the point of the analysis shown in Figure 3, t w o

also engage the user in a limited dialogue, that includes echoing

associated problems are considered; the former (P) consists in

user's request w i t h a system generated paraphrase, presenting

looking for a dependency tree rooted in " N A T I " , the latter (M)

alternative

consists in looking for a semantic tree rooted in © S E L E C T I O N .

and asking the user tor validation of the proposed interpretation

They are reduced to five and four subproblems respectively.

in the cases where other interpretations could exist.

interpretations when reference ambiguities occur,

O n l y subproblems SP1 and SP3 can be associated to subproblems

P A R N A X is w r i t t e n in I N T E R L I S P and is running on

SM1 and SM3 respectively. A l l other subproblems are pruned

Siemens 7748. P A R N A X dimensions are about 300 kbytes for

since no problem in { SM2, SM4 } c a n be associated to some

algorithms and 350 kbytes tor knowledge bases (the present

problem in { SP2, SP4, SP5 }. The analysis continues w i t h

version includes about one thousand lexical roots, 9 structural

problems SP1, SP3, S M 1 , and SM3, but o n l y SPI and S M I w i l l

rules, 130 dependency rules, 70 semantic rules, and 27 semantic

566 R. Comino et al.

R. Comino et al. 667

modules). The average time needed to parse an average-length

-

The way in w h i c h syntactic and semantic processing interact, w h i c h is s t r i c t l y step-by-step sequential in R U S / P S I - K L O N E

sentence (9 words) is a p p r o x i m a t e l y 3 CPU seconds ( w i t h o u t

and far more oriented towards a real parallelism in P A R N A X ;

accessing the DBMS). Figure 4 reports a short dialogue w i t h P A R N A X . F igure b

-

The structure of semantic analysis, w h i c h is intended as a

shows a small sample of d i f f e r e n t ways of expressing the same

single process in R U S / P S I - K L O N E w h i l e it is splitted in t w o

request

bases, namely nondeterministic validation and deterministic

(all

ot w h i c h are correctly understood by P A R N A X )

i n t e r p r e t a t i o n in P A R N A X .

that should give an idea of the linguistic coverage of the system.

A m o n g most promising research directions that deserve

For the English translations of the examples refer to the Appen-

f u r t h e r a t t e n t i o n we m e n t i o n the refinement of b o t h syntactic

dix.

and semantic models used by P A R N A X to better f i t the issues >

CONTA I DIPENDENTI D E L L A DIVISIONE INFORMALICAL N A T I DOPO IL 1949 E C O M U N I C A M l LA L O R O C A T E G O R I A

of

parallel

and

the

study

of

improved

parallel

*

la richiesta contienc due distinte domande

1.

numero dei dipendenti con divisione Informatica e anno di nascita maggiore di 1949 14

search

categoria professionals dei dipendenti con Divisione Inforrnatica e

represents a core t o p i c in many Al problems and could be of

2.

anno di nascita maggiore di 1949 COGNOMF

NOME

CATEGORIA

Rossi

Mario

I

Verdi

Claudio

D

As a concluding remark we note that the use of parallel algorithms t h a t u t i l i z e several and diverse sources of

knowledge is not l i m i t e d to the analysis of natural language but

interest in several applications.

APPENDIX Figure 3

>

processing,

strategies.

OUAL E' IL LIVELLO DI OUFLLI CHE ERA LORO SONO LAUREATE *

List the employees born in Turin before 1958

Figure 4. >

count the employees in the computer science department born after 1949 and let me know their professional category

livello dei dipendenti con divisione inforrnatica, anno di nascita

*

the request contains t w o queries.

maggiore di 1949 e t i t o l o di studio laurea

1.

number of employees having department computer science and

COGNOME

NOME

LIVELLO

Rossi

Mario

8

birth year greater than 1949. 2.

professional

category

of

the

employees

having

department

computer science and birth year greater than 1949 >

what is the level of those among them w h o received a doctor degree *

Fig. 4 - A sample dialogue.

level of employees having department computer science, birth year greater than 1949, and doctor degree.

> > >

d i m m i I'anno di nascita dei dipendenti laureati vorrei sapere I'anno di nascita di t u t t i i laureati d i m m i quando sono nati i dipendenti chc sono laureati

> > > >

anno di nascita laureati vorrei sapere qual e I'anno in cui sono nati t u t t i quei dipendenti che hanno conseguito la laurea vorrei sapere I'anno di nascita dei dipendenti con t i t o l o di studio laurea d i m m i in che anno sono nati coloro che sono laureati

>

d i m m i I'anno in cui sono nati i dipendenti in possesso di laurea

>

d i m m i I'anno di nascita di chi ha conseguito la laurea

>

dei laureati d i m m i I'anno di nascita

>

dei dipendenti che sono laureati voglio sapere I'anno in cui sono nati

>

cerca i dipendenti il cui t i t o l o di studio e laurea e comunicami I'anno in cui sono nati

tell me the birth year of the employees w i t h a doctor degree.

REFERENCES [ 1 ] Bobrow R.J.,

Webber B.L

>

qual e la data di nascita dei dipendenti che hanno la laurea? qual e I'anno di nascita di quelli che si sono laureati

> qual e I'anno di nascita dei dipendenti che sono in possesso di laurea?

[2]

Courtin guages

J.

Conf.

on Arti-

Stan ford.

1977. Algorithmes pour

naturelles.

Knowledge representation for 1st Annual Nat.

These,

University

le

traitement interactif des lan-

Scientifique et Medicale de

Grenoble. Grenoble, France. [ 3 ] Guida G., Sornalvico M. 1979. A t w o level modular system for natural language

laureati anno nascita

1980.

syntactic/semantic processing. Proc. ficial Intelligence, A A A I,

> >

Figure 5 >

understanding.

Proc.

Int.

Joint

Conf.

on

Artificial Intel-

ligence T o k y o , Japan, 34(3-347. Fig. 5 - A sample of different ways of expressing the same request. [ 4 ] Guida G., Sornalvico M. 1980. Interacting in natural language w i t h artificial 7. CONCLUSION

systems,

The parallel syntactic/semantic analyzer developed w i t h i n

[ b ] Guida G., Tasso C. 1982. N L I . A robust interface for natural language

the P A R N A X project shares some features w i t h the RUS/PSI-

person-machine

KLONE

7 7,417-433.

System

the D O N A U project. Information Systems 5, 333-

344.

( B o b r o w and

Webber,

1980). A l t h o u g h an

in-depth comparison of the t w o approaches major differences seem w o r t h being o u t l i n e d :

is d i f f i c u l t , t w o

c o m m u n i c a t i o n . Int.

Journal

of Man-Machine Studies

[ 6 ] Hays D.G. 1964. Dependency Theory: a formalism and some observations. Memorandum RM4087 P.R., The Rand Corporation.

understanding natural language through parallel processing of ... - IJCAI

understanding natural language through parallel processing of ... - IJCAI

Suggest Documents

Natural Language-8; Waltz - IJCAI

PARALLEL PROCESSING OF RESOLUTION Takahira ... - IJCAI

Understanding Natural Language through the UNL Grammar ...

the locality phenomenon and parallel processing of natural language

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Understanding - ICML

Optimization of Natural Language Processing

Language identification - Natural Language Processing group - FFZG

Language identification - Natural Language Processing group - FFZG

Natural Language Processing for Amazigh Language ... - AfLaT.org

Natural Language Processing with Python

Natural Language Processing based Automated

CSE 517: Natural Language Processing

natural language processing in biomedicine

Heterogeneous Natural Language Processing Tools via Language ...

Blunsom - Natural Language Processing Language Modelling and ...

Natural Language Processing for Amazigh Language ... - AfLaT.org

Natural language processing: an introduction

adaptive natural language processing

natural language processing - Google Sites