S.p.A. - Via G. Reiss Romoli, 274 - 10148 Torino (Italy) ntelligence Project ... tom-up activation of the semantic modules that are referred to in validated trees, and ...
U N D E R S T A N D I N G N A T U R A L L A N G U A G E T H R O U G H P A R A L L E L PROCESSING O F S Y N T A C T I C A N D S E M A N T I C K N O W L E D G E : A N A P P L I C A T I O N T O D A T A BASE Q U E R Y
R. C O M I N O (*), R. G E M E L L O (*), G. G U I D A ( * * ) , C. R U L L E N T ( • ) , L. SISTO (*), M. S O M A L V I C O ( * * ) (*) C S E L T - C e n t r o S t u d i e L a b o r a t o r i T e l e c o m u n i c a z i o n i S.p.A. - Via G. Reiss R o m o l i , 2 7 4 - 10148 T o r i n o (Italy) ( * * ) M i l a n Polytechnic A r t i f i c i a l I ntelligence Project, M i l a n o ( I t a l y ) ABSTRACT
d u r i n g the parsing of
This paper describes the main features of
the P A R N A X
system f o r natural language access (in Italian) to an A D A B A S
proposes
an utterance,
the syntactic analyzer
candidate syntactic structures for a c o m p o n e n t of
that utterance that f i t the given set
data base. T h e core of the system is c o n s t i t u t e d by the analyzer that
semantic structures according to a given set of semantic rules.
includes
parallel
processing
of
syntactic
and
semantic
nondeterministic
semantic
of syntactic rules. S i m i l a r l y ,
the
analyzer constructs candidate
knowledge. It is argued that this feature (together w i t h the new
A m o n g these o n l y those structures t h a t can be associated to a
macro
corresponding syntactic s t r u c t u r e are validated
and
micro-analysis t e c h n i q u e w h i c h
is o n l y s h o r t l y
and
will
be
m e n t i o n e d in this paper) allowed the system to reach a good
f u r t h e r considered in the f o l l o w i n g steps of the processing. A l l
linguistic coverage, still ensuring an acceptable degree of effici-
other
ency. A f t e r the basic architecture and operation of P A R N A X
l i m i t i n g the search space w h i c h is actually expanded d u r i n g the
have been described, a t t e n t i o n is focused on the parallel syn-
analysis. This enables the system to reduce the n o n d e t e r m i n i s m
tactic/semantic
analyzer
advantages
obtained
discussed.
Examples
which
through of
is
illustrated
parallelism
PARNAX
in
are
operation
detail. also
are
The
shortly
candidate
structures are discarded,
thus considerably
of the analysis process, a n d , therefore, to operate w i t h improved e f f i c i e n c y .
presented.
References to related w o r k s are m e n t i o n e d , and directions for
2.
BASIC SYSTEM A R C H I T E C T U R E
f u t u r e research are o u t l i n e d .
P A R N A X allows casual users to access an A D A B A S data base
in
Italian.
It
translates natural
language requests
into
N A T U R A L programs (the A D A B A S f o r m a l query language) i n
1. INTRODUCTION This paper discusses a research project devoted to the
t w o steps. First the natural language q u e r y is processed by the
design of a robust and effective parser for Italian language, that
ANALYZER,
can support a large linguistic coverage, still ensuring an accept-
sed
that generates a semantic representation
in an
internal f o r m a l i s m ,
expres-
METANATURAL.
able degree of e f f i c i e n c y . T h e project has been partially based
METANATURAL
on previous results by the authors (Guida and Somalvico, 1979,
a l l o w a sufficient degree of independence of the understanding
1 9 8 0 ; Guida and Tasso,
1982) and presents several original
is an
called
intermediate language w h i c h should
process of the details of the data base logical schema. T h e n , the
c o n t r i b u t i o n s . A m o n g these are: a two-level analysis strategy
FORMALIZER
that includes a macro-analysis and a micro-analysis phase; a
full
transforms
the M E T A N A T U R A L
query
into a
N A T U R A L p r o g r a m . This is eventually supplied to the
model for semantic processing that is made up of t w o sequential
DBMS w h i c h provides the user w i t h the desired answer. System
phases,
tailoring
namely, a
nondeterministic
part that validates and
completes the a c t i v i t y of a syntactic analyzer, f o l l o w e d by a
Figure
d e t e r m i n i s t i c part t h a t constructs the o u t p u t internal representation;
a parallel
algorithm
that manages the t w o cooperating
processes of syntactic analysis and the n o n d e t e r m i n i s t i c part of semantic analysis.
and updating
MANAGEMENT
is ensured
by a
KNOWLEDGE
BASE
SYSTEM 1
shows
the basic architecture and
o p e r a t i o n of the A N A L Y Z E R , that constitutes
mode of
the core of the
system. T h e analysis is p e r f o r m e d at t w o levels: the upper level, called
macro-analysis,
takes
into
account
the
outer
sentence
This research project is supported by the i m p l e m e n t a t i o n
structure and suggests possible s p l i t t i n g and n o r m a l i z a t i o n of
of an e x p e r i m e n t a l system, called PARNAX, w h i c h is presently
c o m p l e x or syntactically unusual sentences i n t o simpler frag-
r u n n i n g in an I N T E R L I S P version on Siemens 7 7 4 8 at C S E L T
ments; the lower level, called microanalysis, parses the sentence
( T o r i n o , I t a l y ) . P A R N A X is a natural language interface to an
fragments and returns the o b t a i n e d M E T A N A T U R A L subtrees
A D A B A S data base c o n t a i n i n g i n f o r m a t i o n on the staff of a
to the upper level that w i l l compose t h e m i n t o the final M E T A -
sample c o m p a n y .
N A T U R A L tree.
In this paper, a t t e n t i o n is focused on the parallel strategy
Macro-analysis o p e r a t i o n is m a i n l y rule-based: a set of
developed for syntactic and semantic processing. At each step
structural rules of the t y p e < p a t t e r n , f r a g m e n t a t i o n - n o r m a l i z a -
664
R. Comino et al.
Micro-analysis consists of three phases. T h e first phase, namely lexical analysis, classifies the elementary c o m p o n e n t s of the i n p u t
fragment (words or simple constructs and idioms)
according
to their lexical t y p e .
Its o p e r a t i o n is based on a
n o n d e t e r m i n i s t i c finite-state recognizer that utilizes the lexical part of the dictionary, w h i c h contains lexical (Courtin, cesses:
roots and models
1977). T h e second phase includes t w o parallel pro
the syntactic analysis, w h i c h is based on the dependency
grammar model (Hays, 1 9 6 4 , C o u r t i n , 1977), and the first part of the semantic analysis, called semantic validation edge bases u t i l i z e d in this phase are semantic
rules,
and
the semantic
The knowl
the dependency rules, the
part
of the
dictionary.
The
result of the syntactic analysis and semantic v a l i d a t i o n is a set of validated trees. A validated tree is a m o d i f i e d dependency tree,
w h e r e semantically meaningless c o m p o n e n t s have been
p r u n e d and a p p r o p r i a t e semantic variables have been added Each node of the tree is therefore associated b o t h to a semantic variable that
expresses
the
kind
of
i n f o r m a t i o n o1 the cor
responding sentence fragment, and to a pointer to a procedure (semantic
module),
which
will
be
later
used
to assign
the
correct value to the variable. T h e t h i r d phase, namely semantic interpretation.
is a
d e t e r m i n i s t i c process devoted
to construct
the o u t p u t M E T A N A T U R A L tree. It operates t h r o u g h a bott o m - u p activation of the semantic modules that are referred to in validated
trees, and assignes the a p p r o p r i a t e values to the
corresponding semantic variables. 3. IMPROVING THROUGH
NONDETERMINISTIC ANALYSIS
PARALLELISM
Parallelism between syntactic analysis and the first part of the semantic analysis (semantic validation) is used in PAR N A X to improve the efficiency o1 the parsing process. B o t h s y n t a c t i c analysis and semantic v a l i d a t i o n are n o n d e t e r m i n i s t i c processes, that can be viewed as problems to be solved by a reduction-to-subproblems
mechanism
As
far
as
syntactic
analysis is concerned the p r o b l e m is to construct all the correct dependency trees for a given sentence. This p r o b l e m , once a w o r d in the sentence has been chosen as a candidate root o1 the tree, can be split ted i n t o subproblems. Each o1 t h e m consists in f i n d i n g all
the dependency subtrees whose roots (determined
t h r o u g h dependency rules) are just one step b e l o w the root of the dependency tree corresponding to the basic p r o b l e m . M a n y groups of alternative subproblems may arise, since m a n y groups Fig. 1 - Basic mode of operation of the ANAL YZER
of roots can be generated according to the dependency rules. A similar s i t u a t i o n arises in t h e case of semantic v a l i d a t i o n , w h e r e
t i o n a c t i o n , c o m p o s i t i o n action > is used b o t h by the fragmen-
the p r o b l e m consists in f i n d i n g all the correct semantic
tation and normalization m o d u l e
and the subproblems in
for
analysing the outer surface
s t r u c t u r e of t h e i n p u t sentence, and by the composition m o d u l e
possible
for
tic
building
analysis
also
manages
the
up
the o u t p u t
includes
METANATURAL
a paraphrase and dialogue
user-system
interaction
paraphrases of the i n p u t requests.
tree.
Macro-
m o d u l e that
and generates echoing
sequences
rules.
Both
separately solved
of
roots are d e t e r m i n e d
semantic by
two
trees,
f i n d i n g the semantic subtrees whose and
by
the seman-
syntactic p r o b l e m s c o u l d
d i s t i n c t search
processes
in
be
their
respective search spaces ( A N D - O R graphs), w h i c h are generally very large.
R. Comino et al.
Our choice is to solve both problems in a joined way, using t w o parallel, step-by-step search processes in the t w o
665
lead to the solution (boldface in Figure), while SP3 or SM3 w i l l fail in f o l l o w i n g steps.
search spaces. To this purpose a correlation between the t w o search spaces is necessary. This is defined in terms of an associa-
5 . T H E POWER O F P A R A L L E L I S M
t i o n criterion among problems in the t w o spaces (which is generally
a
many-to-many relation).
According to this,
for
each problem P in one search space only those reductions of P are taken
into account w h i c h have at least one association
among the possible sets of successor problems of the problem P' associated, in the other space, w i t h P Thus, most of the search for non existing solutions (semantic trees w i t h o u t corresponding syntactic trees, and vicoversa) is avoided, w i t h the desired result of constructing, w i t h considerably minor effort, only the set
of the semantic trees w h i c h are also syntactically valid
(validated trees).
empirical
experimentations have been done to
syntactic and semantic processing. T w o samples A and B of about 50 and 30 sentences respectively have been considered. A has been chosen f r o m a set of sentences proposed by casual users; B has been constructed to contain a broad spectrum of different ways of expressing the same request. Fach sentence in the t w o samples has been parsed first using the syntactic analyzer alone (I), and then using the parallel syntactic/semantic processor (II) thus generating the complete validated trees as well. The following parameters have been estimated, cardinality
In the current implementation of P A R N A X the parallel processing strategy above outlined is realized in a sequential way
Some
estimate the benefits obtained through parallel execution of
in which syntactic and semantic steps alternate in the
analysis, due to the limitations of the available I N T E R L I S P system.
of the syntactic search space expanded in both cases I and I I , and analysis time for syntactic processing alone in case I and parallel syntactic/semantic processing
in case
I I . The mean
values of the ratios of the above parameters in cases I and II (case I/case I I ) , tor both samples A and B, have been computed. 1 he table of Figure 2 illustrates the results obtained.
4. AN EXAMPLE A
simplified
Note that the sum of the time required for both syntactic example
of
parallel
syntactic/semantic
and semantic analysis (performed in parallel) is less than the
processing is shown in Figure 3. The focus is centered on one
t i m e needed for the syntactic analysis alone. We also outline
step of the analysis concerning the segment " N A T I A T O R I N O
that the benefits of parallel processing increase w i t h the length
PRIMA D L L
of the sentences.
1 9 5 8 " (for the english translation refer to the
Appendix) Lexical dictionary
and
semantic
types are
and are self explanatory.
extracted
from
the
For what concerns the
structure of semantic rules, taking rule I as an example, its meaning is the f o l l o w i n g one. a segment conveying the intorma
Sample A
Sample B
C A R D I N A L I T Y R A 1 l()
181
2.61
ANALYSIS TIME RATIO
1.36
2.03
Fig. 2 - Experimental results of parallel syntactic/semantic processing.
t i o n (A) S E L L C I O N (i.e., how to select an object f r o m a set) must contain a w o r d of lexical type " v e r b " and semantic type "AT"
inside that segment the i n f o r m a t i o n ( V A L U E must be
6.
EXPERIMENTAL ACTIVITY An
experimental
version of
PARNAX
has been fully
f o u n d (it is underlined) while ("'DATE and ( " ' C O M P A R A T I V E
implemented at CSELT (Torino, Italy) in the last t w o years.
may possibly be f o u n d , in any order. F 13 is the pointer to the
The system is connected to an A D A B A S data base containing
semantic module used during the
i n f o r m a t i o n on the staff of a sample company. It supports free
semantic
interpretation
phase.
b o t t o m - u p activation of the
The structure of dependency
Italian language interaction (including single sentences as well as
rules is evident considering rule 1 as an example; an adverb may
sequences of interrelated utterances), and is able to deal w i t h
be the dependent of a verb; the weights (-20, +10) deter-
ungrammatical sentences, ellipsis and telegraphic: forms, a class
mine the possible positions of the dependent (Courtin, 1977).
of anaphoric references, and m u l t i p l e queries. P A R N A X can
At the point of the analysis shown in Figure 3, t w o
also engage the user in a limited dialogue, that includes echoing
associated problems are considered; the former (P) consists in
user's request w i t h a system generated paraphrase, presenting
looking for a dependency tree rooted in " N A T I " , the latter (M)
alternative
consists in looking for a semantic tree rooted in © S E L E C T I O N .
and asking the user tor validation of the proposed interpretation
They are reduced to five and four subproblems respectively.
in the cases where other interpretations could exist.
interpretations when reference ambiguities occur,
O n l y subproblems SP1 and SP3 can be associated to subproblems
P A R N A X is w r i t t e n in I N T E R L I S P and is running on
SM1 and SM3 respectively. A l l other subproblems are pruned
Siemens 7748. P A R N A X dimensions are about 300 kbytes for
since no problem in { SM2, SM4 } c a n be associated to some
algorithms and 350 kbytes tor knowledge bases (the present
problem in { SP2, SP4, SP5 }. The analysis continues w i t h
version includes about one thousand lexical roots, 9 structural
problems SP1, SP3, S M 1 , and SM3, but o n l y SPI and S M I w i l l
rules, 130 dependency rules, 70 semantic rules, and 27 semantic
566 R. Comino et al.
R. Comino et al. 667
modules). The average time needed to parse an average-length
-
The way in w h i c h syntactic and semantic processing interact, w h i c h is s t r i c t l y step-by-step sequential in R U S / P S I - K L O N E
sentence (9 words) is a p p r o x i m a t e l y 3 CPU seconds ( w i t h o u t
and far more oriented towards a real parallelism in P A R N A X ;
accessing the DBMS). Figure 4 reports a short dialogue w i t h P A R N A X . F igure b
-
The structure of semantic analysis, w h i c h is intended as a
shows a small sample of d i f f e r e n t ways of expressing the same
single process in R U S / P S I - K L O N E w h i l e it is splitted in t w o
request
bases, namely nondeterministic validation and deterministic
(all
ot w h i c h are correctly understood by P A R N A X )
i n t e r p r e t a t i o n in P A R N A X .
that should give an idea of the linguistic coverage of the system.
A m o n g most promising research directions that deserve
For the English translations of the examples refer to the Appen-
f u r t h e r a t t e n t i o n we m e n t i o n the refinement of b o t h syntactic
dix.
and semantic models used by P A R N A X to better f i t the issues >
CONTA I DIPENDENTI D E L L A DIVISIONE INFORMALICAL N A T I DOPO IL 1949 E C O M U N I C A M l LA L O R O C A T E G O R I A
of
parallel
and
the
study
of
improved
parallel
*
la richiesta contienc due distinte domande
1.
numero dei dipendenti con divisione Informatica e anno di nascita maggiore di 1949 14
search
categoria professionals dei dipendenti con Divisione Inforrnatica e
represents a core t o p i c in many Al problems and could be of
2.
anno di nascita maggiore di 1949 COGNOMF
NOME
CATEGORIA
Rossi
Mario
I
Verdi
Claudio
D
As a concluding remark we note that the use of parallel algorithms t h a t u t i l i z e several and diverse sources of
knowledge is not l i m i t e d to the analysis of natural language but
interest in several applications.
APPENDIX Figure 3
>
processing,
strategies.
OUAL E' IL LIVELLO DI OUFLLI CHE ERA LORO SONO LAUREATE *
List the employees born in Turin before 1958
Figure 4. >
count the employees in the computer science department born after 1949 and let me know their professional category
livello dei dipendenti con divisione inforrnatica, anno di nascita
*
the request contains t w o queries.
maggiore di 1949 e t i t o l o di studio laurea
1.
number of employees having department computer science and
COGNOME
NOME
LIVELLO
Rossi
Mario
8
birth year greater than 1949. 2.
professional
category
of
the
employees
having
department
computer science and birth year greater than 1949 >
what is the level of those among them w h o received a doctor degree *
Fig. 4 - A sample dialogue.
level of employees having department computer science, birth year greater than 1949, and doctor degree.
> > >
d i m m i I'anno di nascita dei dipendenti laureati vorrei sapere I'anno di nascita di t u t t i i laureati d i m m i quando sono nati i dipendenti chc sono laureati
> > > >
anno di nascita laureati vorrei sapere qual e I'anno in cui sono nati t u t t i quei dipendenti che hanno conseguito la laurea vorrei sapere I'anno di nascita dei dipendenti con t i t o l o di studio laurea d i m m i in che anno sono nati coloro che sono laureati
>
d i m m i I'anno in cui sono nati i dipendenti in possesso di laurea
>
d i m m i I'anno di nascita di chi ha conseguito la laurea
>
dei laureati d i m m i I'anno di nascita
>
dei dipendenti che sono laureati voglio sapere I'anno in cui sono nati
>
cerca i dipendenti il cui t i t o l o di studio e laurea e comunicami I'anno in cui sono nati
tell me the birth year of the employees w i t h a doctor degree.
REFERENCES [ 1 ] Bobrow R.J.,
Webber B.L
>
qual e la data di nascita dei dipendenti che hanno la laurea? qual e I'anno di nascita di quelli che si sono laureati
> qual e I'anno di nascita dei dipendenti che sono in possesso di laurea?
[2]
Courtin guages
J.
Conf.
on Arti-
Stan ford.
1977. Algorithmes pour
naturelles.
Knowledge representation for 1st Annual Nat.
These,
University
le
traitement interactif des lan-
Scientifique et Medicale de
Grenoble. Grenoble, France. [ 3 ] Guida G., Sornalvico M. 1979. A t w o level modular system for natural language
laureati anno nascita
1980.
syntactic/semantic processing. Proc. ficial Intelligence, A A A I,
> >
Figure 5 >
understanding.
Proc.
Int.
Joint
Conf.
on
Artificial Intel-
ligence T o k y o , Japan, 34(3-347. Fig. 5 - A sample of different ways of expressing the same request. [ 4 ] Guida G., Sornalvico M. 1980. Interacting in natural language w i t h artificial 7. CONCLUSION
systems,
The parallel syntactic/semantic analyzer developed w i t h i n
[ b ] Guida G., Tasso C. 1982. N L I . A robust interface for natural language
the P A R N A X project shares some features w i t h the RUS/PSI-
person-machine
KLONE
7 7,417-433.
System
the D O N A U project. Information Systems 5, 333-
344.
( B o b r o w and
Webber,
1980). A l t h o u g h an
in-depth comparison of the t w o approaches major differences seem w o r t h being o u t l i n e d :
is d i f f i c u l t , t w o
c o m m u n i c a t i o n . Int.
Journal
of Man-Machine Studies
[ 6 ] Hays D.G. 1964. Dependency Theory: a formalism and some observations. Memorandum RM4087 P.R., The Rand Corporation.