in English and Bulgarian has provided a solid empirical .... into Bulgarian translation, while there is no such ... da-construction or a verbal noun (V i ~ generally.
HANDLING SYNTACTICAL AMBIGUITY IN MACHINE TRANSLATION Vladimir Institute
Pericliev
of Industrial Cybernetics and Robotics Acad. O.Bontchev Sir., bl.12 1113 Sofia, Bulgaria
guitles in the SL translated by means of syntactical, a n d n o t o n l y s y n t a c t i c a l , ambiguity in t h e TL.
ABSTRACT
The difficulties to be met with the resolution of syntactical ambiguity i n MT c a n b e at least partially overcome by means of preserving the syntactical ambiguity of the source language into the target language. An e x t e n s i v e study of the correspondences between the syntactically ambiguous structures in English and Bulgarian has provided a solid empirical basis in favor of such an approach. Similar results could be expected for other sufficiently related languages as well. The paper concentrates on the linguistic grounds for adopting the approach proposed.
1.
In this paper, we w i l l c o n c e n t r a t e on the linguistics ~rounds for adopting such a manner handling of syntactical ambiguity in an English to Bulgarian translation system.
2.
of in-
PHILOSOPHY
This approach may be viewed as an attempt to simulate the behavior of s man-translator who is linguistically very competent, but is quite unfamiliar with the domain he is translating his texts from. Such a man-translator will be able to say what words in the original and in the translated sentence go together under all of the syntactically admissible analyses; however, he will be, in general, unable to make a decision as to which of these parses "make sense". Our approach will be an obvious way out of this situation. And it is in fact not Infrequently employed in the everyday practice of more "smart" translators.
INTRODUCTION
Syntactical amblgulty, as part of the ambiguity problem in general, is widely recognized as a major difficulty i n MT. To s o l v e t h i s p r o b l e m , the efforts of computational linguists have been mainly d i r e c t e d to the process of analysis: a unique analysis is searched (semantical and/or world knowledge information being basically employed to this end), and only having obtained such an analysis, it is proceeded to the process of synthesis. On t h i s a p p r o a c h , in addition to the well known difficulties of general-linguistic and computational character, there are two principle embarrasments to he encountered. It makes us entirely incapable to process, first, sentences with "unresolvable syntactical ambiguity" (with respect to the disambiguation information stored), and, secondly, sentences which must he translated ambiguously (e.g. puns and the like).
We b e l i e v e that the capacity of such translators to produce quite intelligible translations is a fact that can have a very direct bearing on at least some trends in MT. Resolvlng syntactical ambiguity, or, to put it more accurately, evading syntactical ambiguity in MT following a similar human-like strategy is only one instance of this. There are two further points that should be made in connection with the approach discussed. We a s s u m e a s m o r e o r l e s s s e l f - e v i d e n t that:
In this paper, the burden of solution of the syntactical ambiguity problem is shifted from the domain of analysis to the domain of synthesis of sentences. Thus, instead of trying to resolve such ambiguities in the source language (SL), syntactically ambiguous sentences are synthesized in the target language (TL) which preserve their ambiguity, so that the user himself rather than the parser disambiguates the ambiguities in question.
( i ) MT s h o u l d n o t b e i n t e n d e d to explicate texts i n t h e SL b y m e a n s o f t e x t s i n t h e TL a s previous approaches imply, but should only translate them, no matter how ambiguous they might happen to be; (ii) Since ambiguities almost always pass unnoticed in speech, the user will unconsciously dtsambtguate them (as in fact he would have done, had he read the text in the SL); this, in effect, will not diminish the quality of the translation in comparison with the original, at least insofar as ambiguity is concerned.
This way of handling syntactical ambiguity may be viewed as an illustration of a more general approach, outlined earlier (Penchev and Perlcliev 1982, Pericliev 1983, Penchev and Perlcllev 1984), c o n c e r n e d a l s o with other types of ambt-
521
3.
THE DESCRIPTION OF SYNTACTICAL AMBIGUITY IN ENGLISH AND BULGARIAN
The empirical basis of the approach is provided by an extensive study of syntactical ambiguity in English and Bulgarlan (Pericliev 19835, accomplished within the framework of a version of dependency grammar using dependency arcs and bracketlngs. In this study, from a given llst of configurations for each language, all logically-admlssible ambiguous strings of three types in Engllsh and Bulgarian were calculated. The first type of syntactlcally ambiguous strings is of the form:
(15
A ~L~B,
e.g. adv.mod(how long?)
f
The s t a t i s t i c i a n
studied(V)
the ~hole
year(PP),
obj.dir(wh~t?) w h e r e A, B, . . . a r e c o m p l e x e s o f w o r d - c l a s s e s , " - - - ~ " i s a d e p e n d e n c y a r c , a n d 1, 2, . . . a r e s y n tactical relations. The s e c o n d t y p e i s
(2)
Case A provides a possibility for literal English into Bulgarian translation, while there is no such possibillty for sentences containing strings classed under Case B.
English strings which can be literally translated into Bulgarian comprise,roughly speaking, the majority and the most common of strings to appear In real English texts. Informally, these strings can be included into several large groups of syntactically ambiguous constructions, such as constructions with "floating" word-classes (Adverbs, Prepositional Phrases, etc. acting as slaves either to one, or to another master-word), constructions w i t h p r e p o s i t i o n a l and post-positional ad-
juncts to conjoined groups, constructions with several conjoined members, constructions with symmetrical predicates, some e l l i p t i c a l constructions, etc. Due t o s p a c e l i m i t a t i o n s , a few E n g l i s h p h r a ses with their literal translations will suffice a s an i l l u s t r a t i o n o f C a s e A. ( F u r t h e r o n , s y n t a c t i c a l relations as labels of arcs will be omitted where superfluous in marking the ambiguity):
of the form:
I
A -~->B
b. AZ imam(V) l n s t r u k t s i i ( N )
da ucha(d__aa-constr)
ohj d i r
---->krav'ivi(dj,
!1 It
'v
) ( z e , , , (N) " m o m i c h e t a ( N )
4.2.
Case
B:
)
attrib c.
Non-Literal
instruktsii(N)
Translation
za u c h e n e ( P r V b l N )
obj. dl r
E n g l i s h s t r i n g s w h i c h c a n n o t be l i t e r a l l y translated into Bulgarian are such strings which c o n t a i n : ( i ) w o r d - c l a s s e s (V i f Gerund) n o t p r e n ' sent in Bulgarian, and/or (ii) syntactical relations (e.g. "composite": language~-~ -- theory, etc.) not present in Bulgarian, and/or (iii) other differences (in global syntactical organization, agreement, etc. ).
J
Yet in o t h e r s t r i n g s , e . g . The c h i c k e n ( N ) is ready(Adj) to eat(V. .) (the chicken eats or is eaten.), in o r d e r t o preserve the ambiguity the infinitive s h o u l d be r e n d e r e d by a p r e p o s i t i o n a l v e r b a l n o u n : P i l e t o ( N ) e g o t o v o ( A d J ) z_~a j a d e n e (PrVblN), rather than with the finite da-construct i o n , s i n c e i n t h e l a t t e r c a s e we w o u l d o b t a i n two u n a m b i g u o u s t r a n s l a t i o n s : Pileto e gotovo da ~ade ( t h e c h i c k e n e a t s ) o r P i l e t o e g o t ovo da s e ~ade ( t h e c h i c k e n i s e a t e n ) , and s o on.
It will be shown how certain English strings falling under this heading are related to Bulgarian strings preserving their ambiguity. A way to overcome difficulties with (il) and (iii) is exemplified on a very common (complex) string, vlz. Adj/N/Prt+N/N's+N (e.g. stylish ~entlemen's suits).
F o r some E n g l i s h s t r i n g s no s y n t a c t i c a l l y amb i g u o u s B u l g a r i a n s t r i n g s c o u l d be p u t i n t o c o r r e s pondence, so that a translation w i t h our method p r o v e d t o be an i m p o s s i b i l i t y . E.g.
As an illustration, here we confine to problems to be met with (i), and, more concretely, to such English strings containing Vin f. These strings are mapped onto Bulgarian strings containing da-construction or a verbal noun (V i ~ generally b-eeing translated either way). E.g. nXthe Vln f in
predicative V~--7 I[ ob~ .dir
~
(I0) He found(V) the mechanic(N)
~
a helper(N)
Jl~bJ.indir ~ obJ.dir
t
obj. dlr
( e i t h e r t h e m e c h a n i c o r someone e l s e i s t h e h e l p e r ) i s s u c h a s e n t e n c e due t o t h e i m p o s s i b i l i t y in Bulgarian~r two n o n - p r e p o s i t i o n a l objects, a direct and an i n d i r e c t o n e , t o a p p e a r i n a s e n t e n c e .
(8) a. He p r o m i s e d ( V ) t o p l e a s e ( V i n f ) m o t h e r
t._JI
adv. mod
.
eL.
4.3.
(promised what or why?) is rendered by a da-construction in agreement with the subject, preserving the ambiguity:
b.
.
I[ objelht a (V)
~1
Ambiguity
Many v e r y f r e q u e n t l y e n c o u n t e r e d c a s e s o f m u l tiple syntactical a m b i g u i t y c a n a l s o be h a n d l e d successfully within this approach. E.g. a phrase l i k e C y b e r n e t i c a l d e v i c e s and s y s t e m s f o r a u t o m a t i c c o n t r o l and d i a ~ n o s i s i n b i o m e d i c i n e w i t h more t h a n 30 p o s s i b l e p a r s i n g s i s amenable to l i t e r a l translation into Bulgarian.
obj. dir
~,'" T~J
Mul~,,iple S y n t a c t i c a l
'
da z ~a r aI d v a ( d_ _a - c o n s t r ) m~Jka
adv. mod In t h e s t r i n g
4.4.
Semantically Ambi~uity
Irrelevant
Syntactical
attrib Disambiguating syntactical a m b i g u i t y i s an i m p o r t a n t t a s k i n MT o n l y b e c a u s e d i f f e r e n t m e a n i n g s are usually associated with the different syntactical descriptions. This, however, is not always t h e c a s e . T h e r e a r e some c o n s t r u c t i o n s i n E n g l i s h the syntactical ambiguity of which cannot lead t o multiple understanding. E.g. in sentences of the form A i s n o t B (He i s n o t h a p p y ) , i n w h i c h t h e a d verbial particle not is either a verbal negation (He i s n ' t h a p p y ) o r a n o n - v e r b a l n e g a t i o n ( H e ' s n o t happy), the different syntactical t r e e s w i l l be i n t e r p r e t e d s e m a n t i c a l l y as s y n o n y m o u s : 'A i s n o t B' ~-==~A is not-B'.
(9) a. ~ have(V)jl, i n s t r u c t i o n s ( N ) ~ , toj s t ~f d y)( Vji n obJ.dlr
(what i n s t r u c t i o n s o r I h a v e t o s t u d y w h a t ? ) V. _ can be r e n d e r e d a l t e r n a t i v e l y by a d_~a-construc ~nzt i o n o r by a p r e p o s i t i o n a l verbal noun:
523
We should not worry about finding Bulgarlan syntactically ambiguous correspondences for such English constructions. We c a n c h o o s e a r b i t r a r i l y one analysis, since either of the syntactical descriptions will provide correct information for our translational purposes. Indeed, the construction above has no ambiguous Bulgarian correspondence: in Bulgarian the negating particle combines either with the verb (then it is written as a sep a r a t e word) or w i t h t h e a d j e c t i v e (in which case it is prefixed to it). Either construction, however, will yield a correct translation: To~ == n e e -radosten or To~ e n e r a d o s t e n . 4.5.
A Lexical
First, in the processing of the majority of syntactically ambiguous sentences within an English into Bulgarian translation system it dispenses with semantical and world knowledge information at the very low cost of studying the ambiguity correspondences in both languages. It could be expected that investigations along this line will prove to be frultful for other pairs of languages as well. S e c o n d l y , w h e n e v e r t h i s way o f h a n d l i n g s y n tactical ambiguity is applicable, the impossibility of previous approaches to translate sentences with unresolvable ambiguity, or such with verbal J o k e s a nd t h e l i k e , t u r n s o u t t o b e a n e a s i l y attainable task.
Problem
Certain difficulties may a r i s e , h a v i n g m a n a g e d t o map E n g l i s h s y n t a c t i c a l l y ambiguous strings onto ambiguous Bulgarian ones. These difficulties are due to the different behavior of certain English lexemes in comparison to their Bulgarian equivalents. This behavior is displayed in the phenomenon we c a l l " i n t r a l i n g u a l lexical-resolution of syntactical ambiguity" (the substitution of lexemes i n t h e SL w i t h t h e i r t r a n s l a t i o n a l equivalents f r o m t h e TL r e s u l t s in the resolution of the syntactical ambiguity).
Thirdly, the approach seems to have a very natural extension to another principal difficulty in MT, v i z . c o r e f e r e n c e (cf. the three-ways ambiguity of Jim hit John and then he (Jim, John or neither?) w e n t away a n d t h e same a m b i g u i t y o f tQ~ ( = h e ) i n its literal translation i n t o B u l g a r i a n : D$im u d a r i DJon i t o g a v a t o j ( ? ) si otide). And, finally, there is yet another reason for adopting the approach discussed here. Even if we choose to go another way and (somehow) dlsamblguate sentences in the SL, almost certainly their translational equivalents will be again syntactlcally ambiguous, and quite probably preserve the v e r y a m b i g u i t y we t r i e d t o r e s o l v e . In this sense, f o r t h e p u r p o s e s o f MT ( o r other man-oriented applications o f CL) we n e e d n o t w a s t e o u r e f f o r t s to disambiguate e.g. sentences like John hit the dog w i t h th___ee l o n ~ h a t o r J o h n h i t th____eedo~ w i t h t h e l o n g woo1, s i n c e , e v e n i f we h a v e d o n e t h a t , t h e correct Bulgarian translations of both these sentences are syntactically ambiguous in exactly the same w a y , t h e r e s o l u t i o n of ambiguity thus proving t o be an e n t i r e l y superfluous operation ( c f . D~on u d a r i k u c h e t o s d a l ~ a t a p a l k a a nd D j o n u d a r i k u cheto s dal~ata valna).
For instance, i n s p i t e o f t h e e x i s t e n c e o f ambiguous strings in both languages of the form Verbtr/itr~->Noun, xemes ( e . g . strel~amitr),
with
some p a r t i c u l a r
shoot~r/itr==-~>zastrel~amtr I n w h i c h t o One E n g l l s h
le-
or lexeme co-
r r e s p o n d two i n B u l g a r i a n ( o n e o n l y t r a n s i t i v e , a nd the other only intransitive), the ambiguity in the translation will be lost. This situation explains why i t s e e m s i m p o s s i b l e t o t r a n s l a t e ambiguously into Bulgarian examples containing verbs of the type given, or verbal nouns formed from such verbs, a s t h e c a s e i s i n The s h o o t i n ~ o f t h e h u n t e r s . This problem, however, could be generally tackled in the translation into Bulgarian, since it is a language usually providing a series of forms for a verb: transitive, intransitive, and transitive/intransitive, w h i c h a r e more o r l e s s s y n o n y m o u s ~ f o r more d e t a i l s , c f . P e n c h e v and P e r l c l i e v (1984)).
5.
6.
JordanskaJa, L. 1967. Syntactical ambiguity in Russian (with respect to automatic analysis and synthesis). Scientific and Technical Information, Moscow, No.5, 1967. (in Russian).
CONCLUDING REMARKS
To conclude, some syntactically ambiguous strings in English can have literal, others non-llteral, and still others do not have any correspondences in Bulgarian. In summary, from a total number of approximately 200 simple strings treated in Engllsh more than 3/4 can, and only 1/4 cannot, be literally translated; about half of the latter strings can be put into correspondence with syntactically ambiguous strings in Bulgarian preserving their ambiguity. This gives quite a strong support to the usefulness of our approach in an English into Bulgarian translation system. S e v e r a l a d v a n t a g e s o f t h i s way o f h a n d l i n g syntactical ambiguity can be mentioned.
REFERENCES
Penchev, J. and V. Perlcllev. 1982. On meaning in theoretical and computational semantics. In: COLING-82, A b s t r a c t s , Prague, 1982. Penchev, J. and V. Perlcliev. 1984. On meaning in theoretical and computational semantics. Bulgarian Language, Sofia, No.4, 1984. (in Bulgarian). Pericliev, V. 1983. Syntactical Ambiguity in Bulgarian and in English. Ph.D. Dissertation, ms., Sofia, 1983. (in Bulgarian).
of
524