Document not found! Please try again

Technical Correspondence Automatic Clustering of Languages

3 downloads 0 Views 584KB Size Report
Therefore the most general problem is to determine for each pair of languages how similar or how dissimilar they are. Is Spanish closer to Latin than English to ...
Technical Correspondence Automatic Clustering of Languages V l a d i r n i r B a t a g e l j *t University of Ljubljana

Toma~ Pisanski t University of Ljubljana

D a m i j a n a Ker~i~* Jo~ef Stefan Institute

Automatic clustering of languages seems to be one possible application that arose during our study of mathematical methods for computing dissimilarities between strings. The results of this experiment are discussed. 1. Introduction

The purpose of this paper is to show that current mathematics and computer science can offer expertise to various "soft" sciences, e.g., linguistics. Sixty-five languages are automatically grouped into clusters according to the analysis of sixteen common words. The authors regard the results presented in this paper merely as an example of a possible application of cluster analysis to linguistics. The results should not be regarded as conclusive but rather as suggestions to linguists that similar projects can be carried out on a m u c h greater scale, hopefully yielding similar results and better understanding of language families. This is by no means the first application of mathematical methods to this problem; see for instance Kruskal, Dyen, and Black (1971) and Sujold~i~ et al. (1987). 2. P r o b l e m and Data

It is more or less clear that some words are similar in certain languages and dissimilar in other languages. Obviously two languages are similar if most words are similar. Therefore the most general problem is to determine for each pair of languages h o w similar or h o w dissimilar they are. Is Spanish closer to Latin than English to Danish? In general, perhaps such quantitative questions do not always make sense. But suppose we decide to make an experiment. Suppose we decide to measure dissimilarity between two languages by defining it in a strict mathematical manner. From the linguistic viewpoint this m a y be quite absurd. Nevertheless we have defined certain ways to measure dissimilarity between two words and used this to measure dissimilarity between two languages. There are several ways one can define such a dissimilarity. In this paper we will show some examples. The choice of the dissimilarity will of course influence the outcome. It is interesting that changing the choice of the dissimilarity does not affect the outcome too drastically. It is for the linguists to tell whether this can be interpreted by saying that the results are stable, i.e., "almost independent" of the choice of dissimilarity functions and make sense for the languages.

• Supported in part by the Research Council of Slovenia. t Department of Mathematics,University of Ljubljana, Ljubljana,Slovenia. :~Department of Digital Communications,Jo~ef Stefan Institute, Ljubljana,Slovenia.

(~) 1992 Associationfor Computational Linguistics

Computational Linguistics .

Volume 18, Number 3

Let u be a w o r d in a language L1 and let v be its translation into another language L2. Let d(u, v) be a dissimilarity m e a s u r e or simply dissimilarity between the two words as it is described below. Hence d(u, v) is a nonnegative integer. In order to make things simpler we assume that both languages are written in the same alphabet. Let us give some examples for dissimilarity d(u, v). (a)

Assume that dl(U, v) is the m i n i m u m n u m b e r of the letters that have to be inserted or deleted in order to change u into v. For example:

u v

= =

belly bauch.

Obviously in order to transform u into v w e have to delete the letters "elly" and insert the letters "auch." Hence

dl (belly, bauch ) = 8. (b)

The second possibility is the smallest n u m b e r of substitutions, deletions, and insertions to change u into v. In our example:

u = belly, v = bauch, d2(belly, bauch) = 4.

(c)

We have to substitute the letters "elly" with letters "auch" and this is the shortest w a y to change u into v. Both dl(u, v) and d2(u, v) are called the Levenshtein distance (Kruskal 1983). We can measure dissimilarity between two words also with the length of their shortest c o m m o n supersequence (LSCS). A n y " w o r d " (string) z is a s u p e r s e q u e n c e of a word u if it can be obtained from u b y inserting letters into it. For example: if u = belly, v = bauch, then some possibilities for their shortest c o m m o n supersequence are "bellyauch," "bealulcyh," "belauchly,"... They all contain 9 characters. Therefore,

d3 (belly, bauch ) -- 9. There are other possibilities for defining dissimilarity d(u, v) that have been used in data analysis; see for instance Kashyap and O o m m e n (1983). .

340

In our study we have used only written languages and dialects. We used transliterations into standard Latin (English) alphabet. The data were p r o v i d e d from a variety of sources such as native speakers and dictionaries. However, transliterations were not checked. The translations were not given b y experts; hence it is quite likely that there are several inconsistencies present both in translations and in transliterations. Obviously the choice of a particular m e t h o d of transliteration and translation m a y influence the outcome. The letters that do not appear in the Latin alphabet were changed into similar letters of the Latin alphabet. For example: in the Slovenian alphabet there are three nonstandard letters ~, ~, ~. We have chosen to omit diacritical marks: c, s, and z. A possible alternative w o u l d be to use ch, sh, zh. Also we omit diacritical marks in other languages. For instance: ~i, fi, ~ are represented as a.

Vladimir Batagelj et al.

Language Language

L1 L2

Language

Lm

Automatic Clustering of Languages

1.

2. . . .

n.

Wll

w12

• • •

w21

w22

• • •

Win W2n

Wml

Wm2

•. •

Wmn

Figure 1 Data array.

3.

We have chosen 16 English words. Actually, we have started with data in Hartigan's Clustering Algorithms, page 243. Later we used The Concise Dictionary of 26 Languages in Simultaneous Translation to expand the data. Over 30 people all over the world have given corrections and data for lesser known languages and dialects. The resulting data are given in Appendix A. Only linguists should carefully select the words that would be used in the "real" project. We hope that they will contact us in order to carry out the "big" project. For some well-studied sets of words the reader should consult Kruskal, Dyen, and Black (1971) and Sujold~i4 et al. (1987).

4.

The computer program for computing dissimilarity measure uses the data about the languages in the large array shown in Figure 1. There are m languages and n words in each language. We have selected m = 65 languages and n = 16 words. Note that Appendix A gives essentially this array for our experiment. For instance L1 = Albanian, wl~ = keq.

5.

Once we select a dissimilarity measure d(u, v) between two words, the next step is to define the dissimilarity D(Li, Lj) between two languages. There are many possibilities. We decided to take the sum of dissimilarity measures of words. Mathematically, it is defined as: D(Li, Lj) -~ d(Wil, Wjl ) q-d(wi2,wj2) - } - . . . q-d(win,Wjn).

We would like to point out that this is studied by data analysis; the reader is referred to Hartigan (1971) for further discussion and background.

. The next step is to select an appropriate clustering method. There are many different methods available (Hartigan 1971). We wanted to have the results expressed in the form of a binary tree (see Aho, Hopcroft, and Ullman 1974 for the discussion of binary trees) or more precisely in the form of a dendrogram; see for instance Anderberg (1973) and Gordon (1981). We selected Ward's method, which tends to give realistic results. This method is discussed in Anderberg (1973) and Gordon (1981).

341

Computational Linguistics

Volume 18, Number 3

3. Results and Comments

The results are presented in Appendix B in the form of three dendrograms. Each of them corresponds to a specified dissimilarity measure. The three results are not identical; however, they are quite similar. If we cut the dendrogram horizontally at any height we obtain a partition of the set of the languages into a certain number of parts that we call clusters. The dendrogram tells us how many clusters are suitable for data that we analyze. The number of clusters we obtain from the cut at the largest "jump" of two neighboring levels of the union. Looking at our three dendrograms we can easily notice that our data form five clusters: •

Slavic



Germanic



Romance



Indic



all others.

We can also notice that first the Slavic branch is formed. Next the Germanic and the Romanic languages form their groups (clusters) nearly at the same point. At the end the Indic languages are branching off the others. The remaining languages do not form any other evident cluster. See Figure 2. The five clusters that are formed are very stable. Any pair of languages classified in one of our clusters in the first dendrogram are also in the same class in the other two dendrograms. Notice that in some clusters languages also form subclusters. For example look at the Germanic languages in any dendrogram where two parts are very pronounced: the Scandinavian languages and the German-related languages and dialects. It is interesting that the simplest dissimilarity measure dl (i.e., the number of insertions and deletions) gives the best separation of languages.

SL

GERMANICROM ~ 7 " INDIC Figure 2 Family tree of languages.

342

OTHERS

Vladimir Batagelj et al.

Automatic Clustering of Languages

We can mention that clusters we found with cluster analysis are very close to the language families established in linguistics (Kruskal, Dyen, and Black 1971). Obviously one could ask the following questions or problems that can only be answered by a large-scale project. 1.

In our case all treated words have equal weight. The similarity measure between two languages can also be defined in such a w a y that different weights (based on linguistic theory) are given to the words a n d / o r transformations.

2.

H o w much does the choice of words influence the final tree structure? In our analysis English belongs to the Germanic cluster, w h e n we k n o w that it also has a strong Romance component.

3.

Obviously a larger number of words would give a more accurate picture. The question is: how much and in what w a y do the results vary if we increase the number of words?

4.

H o w much would the results differ if we study spoken language instead of written language? We can consider for example some phonetic properties of written letters or strings of letters.

5.

A n y choice of transliteration introduces a "systematic error" in the results. One way of eliminating such an error would be to test for patterns and then not to penalize patterns that occur often. For example: if we find that "tch" ~ "zh" very often then we would not count it every time it occurs but only once.

Of course for such precise analysis one needs m u c h better knowledge of the linguistic field than we have as laypersons. References

Aho, A. V.; Hopcroft, J. E.; and Ullman, J. D. (1974). The Design and Analysis of Computer Algorithms. Addison Wesley. Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press. Gordon, A. D. (1981). Classification. Chapman and Hall. Hartigan, J. A. (1971). Clustering Algorithms. John Wiley. Kashyap, R. L., and Oommen, B. J. (1983). "A common basis for similarity measures involving two strings." Intern. J. Computer Math., 13: 17-40. Kruskal, Joseph B. (1983). "An overview of sequence comparison: Time warps, string edits, and macromolecules." SIAM Review,

25(2): 201-237. Kruskal, Joseph B.; Dyen, Isidore; and Black, Paul. (1971). "Some results from the vocabulary method of reconstructing languages trees." In Lexico-Statistics in Genetic Linguistics, Proceedings of the Yale Conference, Yale University. Sujold~iG A.; Simunovi4; Finka B.; Bennett L. A.; Angel J. L.; Roberts D. E; and Rudan P. (1987). "Linguistic microdifferentation on the Island of Kor~ula." Anthropol. Ling., 28: 405-432. The Concise Dictionary of 26 Languages in Simultaneous Translation, compiled by P. M. Bergman. A Signet Book from New America Library.

343

Computational Linguistics

Volume 18, Number 3

Appendix A. Sixteen Words in Sixty-Five Languages

ALBANIAN AR. TUNISIAN 1 BAH. MALAYSIA 2 BEN GALI BERBER BULGARIAN BYELORUSSIAN CATALAN CH. CANTONESE 3 CH. MANDARIN 4 CROATIAN CROAT. CAKAVSKIs CROAT. KAJKAVSKI6 CZECH DAN IS H DUTCH E N G LIS H ESPERANTO F IN N IS H F RE NC H GERMAN GER. BAVARIAN 7 GER. SWISS D. 1~ GER. SWISS D. 2 9 GREEK NEW GREEK OLD H E B R EW H IN D I H U N GA R IA N IN DO N ESIA N ITALIAN IT. N. LOMBARDY 1° IT. VENETII D. n

1 2 3 4 5 6 7 8 9 10 11

1.

2.

3.

4.

gjithcka ilkul semua sob akith vseki use tot chyun dou sve se sve vsechno all geheel all cio kaikki tout alle ail-zam aui alles olos holos kol sab minden semua tutto tu:t tut

keq xiab jahat kharap diri los kepski dolent waai b u hao los los los spatny slet slecht bad malbona huono mauvais schlecht schlecht schlaecht schlaecht kakos kakos ra kharab rossz buruk male catiiv brut

bark kirsh perut pet aaboudh korem brukha panxa tou d u zi trbuh trbuh trebuh bricho bug buik belly ventro vatsa ventre bauch wampn buch buch kilya koilia beten pet has perut ventre pansa panza

galm akhal hitam kalo averkan ceren chrni negre hak hei crn crn crn cerny sort zwart black nigra musta noir schwarz schwoaz schwarz schwarz mavros mavros shachor kala fekete hitam nero negher caif

ARABIC TUNISIAN BAHASA MALAYSIA CHINESE CANTONESE CHINESE M A N D A R I N CROAT IAN CAKAVSKI - Dialect of Croat CROATIAN KAJKAVSKI - Dialect of Croat G E R M A N BAVARIAN G E R M A N SWISS DIALECT - Bernese O b e r l a n d G E R M A N SWISS DIALECT - N o r t h e a s t e r n S w i t z e r l a n d ITALIAN N O R T H E R N LOMBARDY ITALIAN VENETII DIALECT - distinct from Venetians

344

Vladimir Batagelj et al. IRISH JAPANESE KANNADA LATIN LATVIAN LITHUANIA N MACEDONIAN MALAYALAM MALTESE MAORI MARAATHI NORWEGIAN ORIYA PANJABI PERSIAN POLISH PO RTU GUESE RAJASTHANI ROMANIAN RUSSIAN SANSKRIT SERBIAN SLOVAK SLOVENIAN SPA NISH SWAHILI SWEDISH TAMIL T ELUGU T URKISH UKRAINIAN WELSH C

Automatic Clustering of Languages vile zenbu yella totus visi vise site ellam kollox katoa sarva alle sabu sab hame wszystko todo sab tot vse sara sve vsetko vse todo ote alla ellaam antha butun vse pawb

ALBANIAN AR. TUNISIAN BAHASA MALAYSIA BENGALl BERBER BULGARIAN BYELORUSSIAN CATALAN CH. CANTONESE CH. MANDARIN CROATIAN

olc warui ketta malus slikts blogas los cheetta trazin kino waeet daarlig kharap bura bad zly mau kharab rau plokhoi bura los zly slab mal baya daolig keduthy chedda fena pohane drwg

bolg hara hoatti venter veders pilvas stomak vayaru zaqq hoopara poat mage peta pet shekam brzuch barriga pet burta brjukho paat trbuh brucho trebuh vientre tumbo mage vayiru kadupu karin zhevit bola

dubh kuroi kahri niger melns jaudas crn karuppu iswed hiwahiwa kaale svart kala kala siah czarny negro kalo negru cjornji kala crn cierny crn negro karipia svart karuppu nalla kara chorne du

5.

6.

7.

8.

asht adhum tulang harh ighass kost kostka os gwat si kost

dite yuum hari din as den dzen' dia yat tian dan

vdes met mati mora amath umiram pamertsi morir sei si umrijeti

pi ushrub minum khaoa sew pi pits' beure yam he piti

345

Computational Linguistics CROAT. CAKAVSKI kost CROAT. KAJKAVSKI kost C7 EC H kost DANISH ben DUTCH bot ENGLISH bone ES P ERANTO osto FINNISH luu F RENC H os GERMAN knochen GER. BAVARIAN gnocha GER. SWISS D. 1 chnoche GER. SWISS D. 2 chnoche GREEK NEW kokalo GREEK OLD kokkalos HEBREW etsem HINDI haddi HUNGARIAN csont INDONESIAN tulang ITALIAN osso IT. N. LOMBARDY oss IT. VENETII D. os IRISH chaimh JAPANESE hone KAN NADA yalabu LATIN os LATVIAN kauls LITHUANIAN kaulas MACEDONIAN koska MALAYALAM ellu MALTESE gtradma MAORI iwi MARAATHI haad NORWEGIAN ben ORIYA hada PAN J ABI hadi PERSIAN ostokhan POLISH kosc PORTUGUESE osso RAJASTHANI haddi ROMANIAN . os RUSSIAN kost SANSKRIT haddi S ERBIAN kost SLOVAK kost S LOVENIAN kost SPA NISH hueso SWAHILI mfupa

346

Volume 18, Number 3 dan dan den dag dag day tago paiva jour tag dag tag dag mera hemera yom din nap hari giorno di' di la hi dina dies diena dena den divasam gurnata maeuao diwas dag dina din ruz dzien dia din zi den' din dan den dan dia siku

umret umreti umrit at doe sterven to die morti varjata mourir sterben schteam staerbe staerbe petheno thneskein lamut marna hal mati morire muri' morir doluidh shinu satta rnori nomirt numire umira marikkuka miet hemo marney aa doe mariba marna mordan umrzec morrer marno a muri umirat marna umret zomriet urnreti morir kufov

pit piti piti at drikke drinken to drink trinki juoda boire trinken saufn trinke drinke pino pinein lishtot pina iszik minum bere bever bever olaim nomu kudi bibere dzert gerti pie kudikkuka xorob inu piney aa drikke pieeba pina nushidan pic beber peeno a bea pit peena piti pit piti beber nywa

Vladimir Batagelj et al.

Automatic Clustering of Languages

SWEDISH TAM IL T E LUGU T UR KIS H UKRAINIAN WELSH C

ben elumbu yamuka kemik kistka asgwrn

dag naal thinam gun den' dydd

att doe irappu chavu olmek vmerte marw

att dricka kuditthal thagu icmek pihte yfed

ALBANIAN AR. TUNISIAN BAH. MALAYSIA BENGALI BERBER BULGARIAN BYE kO R USS IA N CATALAN CH. CANTONESE CH. MANDARIN CROATIA N CROAT. CAKAVSKI CROAT. KAJKAVSKI CZECH DANISH DUTCH ENGLISH ESPERANTO F IN N IS H F RE NC H GERMAN GER. BAVARIAN GER. SWISS D. 1 GER. SWISS D. 2 GREEK NEW GREEK OLD H E B R EW H IN D I HUNGARIAN IN DO N ESIA N ITALIAN IT. N. LOMBARDY IT. VENETII D. IRIS H 3A PAN ES E KA N NA DA LATIN LATVIA N LITHUANIAN MACEDONIAN

9. vesh wdhin telinga kan amazough uho vukha orella yi sheng uho uho vuho ucho ore oor ear orelo korva oreille ohr oa-waschln ohr ohr afti us ozen kan ful telinga orecchio urecia recia cluas mimi kivi auris ausis auses uvo

10. ha akul makan khaoa atch jaim estsi menjar sik chi j esti jist jesti jisti at spise eten to eat mangi syoda manger essen essn aesse aesse troo trogein leechol khana eszik makan mangiare pacha' magnar ithim taberu tinnu edere est valgit jade

11. ve adhum telur dim thamalalt jaice yaika ou dan dan j aje jaje joje vejce aeg ei egg ovo muna oeuf ei oar ei ei avgho oon beytsah anda tojas telur uovo o:v ovo ubh tamago tatti ovum ola kiesinis jajce

12. sy ain mata chokh thit oko voka ull ngan yen jin oko oko oko oko oje oog eye okulo silma oeil auge augn oug oug mati blemma a'yin ankh szem mata occhio o:ch ocio suil me kannu oculus acis akys oko

347

Computational Linguistics MALAYALAM MALTESE MAORI MARAATHI NORWEGIAN ORIYA PANJABI PERSIAN POLISH PO RTU GUES E RAJASTHANI ROMANIAN RUSSIAN SANSKRIT SERBIAN SLOVAK SLOVENIAN S PAN ISH SWAH ILI SWEDISH TAMIL TELUGU TURKISH UKRAINIAN WELSH C

ALBANIAN AR. TUNISIAN BAH. MALAYSIA BENGALl BERBER BULGARIAN BYELORUSSIAN CATALAN CH. CANTONESE CH. MANDARIN CROATIAN CROAT. CAKAVSKI CROAT. KAJKAVSKI CZECH DAN ISH D UTC H ENGLISH ESPERANTO FINNISH

348

Volume 18, Number 3 chhevy widna pokoraringa kaan oere kana kan gush ucho orelha kon orechie ukho kaan uho ucho uho oreja sikio oera kaathu chevi kulak ukho clust

thinnuka kiel haupa khaney aa spise khaiba khana khordan jesc comer khano a minca jest khana jesti jest jesti comer la att aeta saapiduthal thinadam yemek yiste bwyta

mutta bajda heeki undey egg anda anda tokhm jajko ovo ando ou jajtso anda jaje vajce jajce huevo yai aegg muttai kuddu yumurta jajtse wy

kannu gtrajn kaikamo dohlaa oeye akhee akh chashm oko olho onkh ochi glaz aankh oko oko oko ojo jicho oega kann kallu goz oko llygad

13.

14.

15.

16.

ate baba ayah baba vava otec bats'ka pare ba fu qin otac otac oca otec fader vader father patro isa

peshk semica ikan mach ahithiw riba ryba peix yu yu riba riba riba ryba risk vuur fish fiso kala

pese xamsa lima panch khamsa pet pyats cinc ng wu pet pet pet pet fern vijf five kvin viisi

kembe sak kaki pa akajar noga naga peu geuk jiao stopalo taban stopalo noha fod voet foot piedo jalka

Vladimir Batagelj et al. FRENCH GERMAN

pere vater GER. BAVARIAN fadda GER. SWISS D. 1 fatter GER. SWISS D. 2 fatter GREEK NEW pateras GREEK OLD pater HEBREW aba HINDI bap HUNGARIAN atya INDONESIAN ayah ITALIAN padre IT. N. LOMBARDY pader IT. VENETII D. pare IRISH athair JAPANESE chichi KANNADA appa LATIN pater LATVIAN tevs LITHUANIAN tevas MACEDONIAN tatko MALAYALAM acchan MALTESE missier MAORI paapara MARAATHI wa-dil NORWEGIAN far ORIYA bapa PANJABI bapa PERSIAN pedar POLISH ojciec PORTUGUESE pai RAJASTHANI baap ROMANIAN tata RUSSIAN otjec SANSKRIT baap S ERBIAN otac SLOVAK otec SLOVEN IAN oce SPA NISH padre SWAHILI baba SWEDISH fader TAMIL appaa T ELUGU nanna TU RKISH baba UKRAINIAN bat'ko WELSH C tad

Automatic Clustering of Languages poisson fisch fiesch fisch fisch psari opsarion dag machli hal ikan pesce pe's pes iasc sakana meena piscis zivis zuves riba meen trut ika maasaa risk machchha ikan mahi ryba peixe machli peste riba machli riba ryba riba pez samaki risk meen chapa balik rihba pisgodyn

cinq fuenf fimfe fuef fuef pende pente chamesh panch ot lima cinque chinq zinque cuigear go aidu quinque pieci penke pet anju transa rima paach fem pancha lima panz piec cinco ponch cinci pjat paanch pet pet pet cinco tano fern ainthu ayithu bes pyat pump

pied fuss fuass fuess fuess podhi pus regel paer lab kaki piede pe pie cos ashi paad pes kaja koja stapalo kaUu sieq wae paaool fot pada kaki pa stopa pe pug picior noga pea'r stopalo noha noga pie mguu fot kaal kalu ayak noha troed

349

Computational Linguistics

Volume 18, Number 3

Appendix B. Clustering Results

CLUSE ward [0.00,680.00] Insertion-Deletion MAORI PERSIAN FINNISH BERBER HUNGARIAN TURKISH JA PANESE ALBANIAN WELSH C IRISH CHINESE CA CHINESE MA SWAHILI HEBREW ARABIC TUN MALTESE BAH. MALAY INDONESIAN LITHUANIAN LATVIAN GREEK NEW GREEK OLD MALAYALAM TAMIL KANNADA TELUGU HINDI SANSKRIT RAJASTHANI PANJABI BENGALI ORIYA MARAATHI ITALIAN N. IT.VENETI ROMANIAN PORTUGUESE SPANISH CATALAN FRENCH ITALIAN LATIN ESPERANTO GERMAN SW1 GERMAN SW2 GERMAN DUTCH GERMAN BAV DANISH NORWEGIAN SWEDISH ENGLISH CROATIAN SERBIAN CROATIAN K CROATIAN C SLOVENIAN BULGARIAN MACEDONIAN CZECH SLOVAK POLISH BYELORUSSI RUSSIAN UKRAINIAN

350

37 42 64

~_

5

24 59 28 1 63 27 10 11 53 22 6O 34 2 25 ~ ' 32 65 ~ 20 21 ~ 36 57 30 58 23 48 % 45 41 4 40 35 38 62 46 44 52 9 18 26 31 17 55 56 19 15 3 14 39 54 16 13 49 29 8 51 6 33 - 12 50 43 7 47 61

% / .1~ V - ]

L_

] J ~_~

I

%

] I ]

Vladimir Batagelj et al.

CLUSE

Automatic Clustering of Languages

ward [0.00,435.00]

Insertion-Deletion-Substitution JAPANESE SWAHILI PERSIAN TURKISH ARABIC TUN HEBREW BERBER MALTESE HUNGARIAN IRISH CHINESE CA CHINESE MA ALBANIAN WELSH C TELUGU FINNISH MAORI BAH. MALAY INDONESIAN LITHUANIAN LATVIAN GREEK NEW GREEK OLD MALAYALAM TAMIL KANNADA HINDI RAJASTHANI SANSKRIT PANJABI BENGALI ORIYA MARAATHI ITALIAN N. IT.VENETI CATALAN ROMANIAN PORTUGUESE SPANISH FRENCH ITALIAN ESPERANTO LATIN GERMAN SWl GERMAN SW2 GERMAN DUTCH GERMAN BAV NORWEGIAN SWEDISH DANISH ENGLISH CROATIAN SERBIAN CROATIAN K CROATIAN C SLOVENIAN BULGARIAN MACEDONIAN BYELORUSSI UKRAINIAN RUSSIAN CZECH SLOVAK POLISH

28 53 42 59

I--

60

k]_

22 5 34

27 10 11 163 58 64-,

I ~ ~-~

372

2s3

}

32 6s ~ 2 0 ~ 21 _ _ 1 36 -S7 ,. 3O 23

~'~

48

41 4 4O 3S 38 ~ ~ 62~ g ~ 46 44

-,

s2 ~ l 18 26

-

17 31 S5 19

15 S4 14 16

,,

13

49 29 8 6

33 7 61 47 12 $0 43~

" '

351

Computational Linguistics

Volume 18, Number 3

CLUSE ward[0.00,420.00] LSCS - Length of their Shortest Common Supersequence CHINESE CA CHINESE MA ALBANIAN HUNGARIAN JAPANESE SWAHILI TURKISH IRISH WELSH C PERSIAN FINNISH HEBREW ARABIC TUN MAORI BERBER MALTESE BAH. MALAY INDONESIAN LITHUANIAN LATVIAN GREEK NEW GREEK OLD KANNADA MALAYALAM TAMIL TELUGU HINDI PANJABI SANSKRIT BENGALI RAJASTHANI ORIYA MARAATHI CATALAN IT.VENETI ITALIAN N. ROMANIAN PORTUGUESE SPANISH FRENCH LATIN ESPERANTO ITALIAN GERMAN SWl GERMAN SW2 GERMAN DUTCH GERMAN BAV DANISH NORWEGIAN SWEDISH ENGLISH CROATIAN SERBIAN CROATIAN C CROATIAN K SLOVENIAN BULGARIAN MACEDONIAN CZECH SLOVAK RUSSIAN POLISH BYELORUSSI UKRAINIAN

352

10 11 1 24 28 53 59 27 63 42 64 22 60 37 5

34 2 25 32 65 2O 21 30 36 57 58 23 41 48 4 45 40 35 9 62 38 46 44 52 18 31 17 26 55 56 19 15 3 14 39 54 16 13 49 8 29 51 6 33 12 5O 47 43 7 61

t--

Suggest Documents