Inconsistent transliteration of Iranian university names - Springer Link

4 downloads 244 Views 740KB Size Report
Jul 21, 2012 - Keywords University ranking systems 4 Iranian universities 4 ISI Web of Science 4. Persian-English transliteration 4 Misspellings 4 Information ...
Scientometrics (2013) 95:371–384 DOI 10.1007/s11192-012-0818-2

Inconsistent transliteration of Iranian university names: a hazard to Iran’s ranking in ISI Web of Science Mohammad Reza Falahati Qadimi Fumani • Marzieh Goltaji Pardis Parto



Received: 20 May 2012 / Published online: 21 July 2012  Akade´miai Kiado´, Budapest, Hungary 2012

Abstract Today, university ranking has turned into a critical issue in the world. Each university is identified with a surface form under which the whole performance of that university is assessed. This article intends to provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and to clarify the negative impact of such inconsistencies in positioning Iranian universities in global university ranking systems. To collect various surface forms of Iranian university names, use was made of ISI Web of Science through keywords Cu = Iran and py = 2000–2009. Only MSRT universities were considered. Two M.A. experts listed all variant forms of a single university under that name. The form publicized in a university’s website was considered as its entry name. The major sources of variation identified were as follows: Acronyms, misspellings, abbreviations, space variations, syntactic permutation, application of vowels/ consonants and vowel/consonant combinations, /a/vs./aa/, Tashdid, Kasra ezafe, redundancy, downcasing, voiceless glottal stop sound /?/, shortening and deletion of titles. It was found that at its present shape Iranian universities are not receiving the rank they really deserve simply because authors affiliated to a university use university title forms inconsistently. It was recommended that authors follow the surface form publicized by universities in their websites, use the help of an editor in their works, and not be credited for their articles in case the forms deviate from those publicized through the websites. A spell checker, as an add-ins software is highly needed to homogenize Iranian university surface forms by replacing the variants by the dominant form proposed.

M. R. Falahati Qadimi Fumani (&) Faculty of Computational Linguistics Research Department, Regional Information Center for Science and Technology, RICeST, Shiraz, Iran e-mail: [email protected] M. Goltaji Islamic World Science Citation Center, Shiraz, IRAN P. Parto Department of Evaluation and Collection Development, RICeST, Shiraz, Iran

123

372

Scientometrics (2013) 95:371–384

Keywords University ranking systems  Iranian universities  ISI Web of Science  Persian-English transliteration  Misspellings  Information retrieval  Persian orthography  Downcasing

Introduction Recognition of terms comprises a major component in any natural language processing (NLP) software. In this regard, the surface forms of words are also of great importance. Any inconsistency in the orthography of words may hurt the IR system and will decrease recall. An end user may fail to retrieve some relevant documents despite their availability in the database simply because they keyed in the word with a surface form different from the one available in those documents—’program’ and ‘programm’ in and (both meaning ‘even’) in Persian, being two examples. Falahati English and Qadimi Fumani (2010) devoted a full chapter of his dissertation to description of such inconsistencies in Persian orthography. He also elaborated on such inconsistencies in two more articles (Falahati Qadimi Fumani and Ramachandra, 2008, 2011). Such inconsistencies are so much so that hundreds of articles have already been written on the issue by Persian scholars. LIS experts, as mediators of information producers and users, [as well as linguists] have long emphasized the importance of standardizing the orthography of terms especially following the expansion of information databases (Morteza’i, 2001). Such orthographic inconsistencies not only hurt the IR, but they may also damage the position of universities in global university ranking systems like ISI, Shanghai, etc. This is simply because tracing research and scientific production (often appeared in the form of books, articles, etc.) of universities is used as a major component in ranking universities. As a routine, all publication related to a specific university are listed under that university and then analyzed. Hence, if all authors affiliated to a university use a given title as the university name, all those publication could be linked to that university. But if various forms of a university name are available, works encompassing the variants will be grouped into different classes in terms of the specific variant form used by the authors. Under such circumstances, the university will be the great loser because ISI recognizes each university—and accordingly analyzes—by one and only one single surface form. That is, a number of scientific publications will be excluded in final ranking simply because authors used multiple titles, rather than a single one, to refer to the university to which they are affiliated. In fact, ISI will consider each variant a single item and accordingly will present a separate analysis and ranking for each case. That is, for each university, at best, a part of its real performance will be reflected and hence the university’s ranking in the ranking system will be underestimated. A large number of researchers over the globe have worked on university ranking and researchers’ evaluation. Chung and Park (2012), for instance, examined the Web visibility of researchers in the field of communication. Schulz and Manganote (2012) analyzed the Country Profiles, the open access data from ISI Thomson Reuter’s Science Watch. They discussed the advantages of defining a Country Profile Index (CPI), a tool for diagnosing the activity of the scientific community of a country and their possible strengths and weaknesses. Feng, Yong, Xiaolong and Wei (2012) discussed evaluation of research universities in mainland China, Hong Kong and Taiwan. They considered two variables, namely quality and quantity, of research production in research universities.

123

Scientometrics (2013) 95:371–384

373

Based on the brief introduction presented above, the main objective of the present article is to provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and to clarify the negative impact of such inconsistencies in positioning Iranian universities in global university ranking systems.

Literature review Extensive works have already been published on Persian script and the problems it faces in a computing environment. Some problems are rooted in the writing system itself but some are due to the unavailability of full-fledged, well-established and well-observed standards for Persian orthography—even if there is one it is not followed completely by language users. Proposals for modifying Persian script go back to half a century ago (Naseh, 2004). Emami (1992) stressed the need for Persian alphabet to undergo modifications to fit the computing environment. Morteza’i (2001) discussed the problems Persian writing system faces in information retrieval. From amongst those discussing the issue of computer and writing system, Horri (1993), San’ati (1992) and Ma’soumi-Hamadani (2002) are relevant examples. All the three works concentrated on the difficulties in Persian computing. Akbarnejad (1997) discussed the difficulties space variability may introduce into information storage and retrieval systems. Ahmadi-Birjandi (1973) elaborated on the two possibilities in Persian alphabet, namely joining and disjoining of letters and morphemes. To him, this dis/joining contrast is a great burden on Persian computing. Kaboli (1995) discussed space variation within the framework of word formation processes. Hendi (2002) focused on compound formation and proposed a method to standardize the writing of compounds— terms that embody a number of space variants within their inner structures. To him, compounds must be tackled as an important issue in information retrieval endeavors. Some researchers like Sama’i (2004), discussed the inconsistency in application of punctuation marks. He undermined that punctuation is an instrument that helps us a lot to communicate meaning and thus any inconsistency in its application could cause problems in the communication process. Tayyeb (1992), for instance, in his article ‘‘Homography in Persian’’ reiterated that the absence of short vowels was the root of 85 % of homographs in Persian. Yet, other researchers examined the way terms from English are handled in Persian. Behzadi (1996), for example, examined the inconsistencies in transliteration of English borrowed terms. Pourjavadi (2003) put forward suggestions to standardize Penglish (Writing Persian using English alphabet, which quite often happens in mobile SMSs, email exchanges and other similar situation, e.g. when Persian university names are written using English letters). In fact, the issue of recording Persian university names using English letters is the main theme of the present article.). As few examples, he referred to the availability of ‘oo’, ‘ou’ and ‘u’ for the long close back vowel and ‘ee’, ‘ii’, ‘ie’, ‘y’ and ‘i’ for the front close long vowel. He called for production of guide books with the objective of standardizing the recording of Persian words using Penglish in SMSs, email exchanges and on the Web. The inconsistencies observed in Persian writing system do not and must not imply that there are not guide books on principles of writing in the market. In fact, there are abundant books (Sami’i-Gilani, 2000; Jahanshahi, 1981; Yahaghi & Naseh, 1992; Saffarpour, 2001; IAPLL, 2007). Najafi’s (2005) book entitled ‘‘Ghalat Nanevisim’’ (Let’s Write Correct Persian) was also written to make the writings of different authors and, in general, the whole native speakers more consistent. So, the case is not the unavailability of guide

123

374

Scientometrics (2013) 95:371–384

books, rather the real source of the problem is that even in such sources and guide books various discrepancies are observed and more importantly in many cases, mostly cases which cause a lot of problems in Persian computing, no clear cut recommendations are put forward. Even in the book written by the IAPLL (2007) in many cases the authors have left the issue into the hands of the writers to opt for one possibility rather than the other(s). This is while Mar’ashi (2002) reiterated that the IAPLL is the only authorized body that must take responsibility over revising Persian writing system. He recommended that a grammar embodying 28 letters for the 28 sounds of Persian be proposed by the IAPLL.

Methodology Instruments To collect various surface forms of Iranian university names, use was made of ISI Web of Science through keywords Cu = Iran and py = 2000–2009. Of course, only data related to the time span 2000–2009 was considered and analyzed. To mark one surface form as dominant, use was made of Iranian university websites (the English web pages), that is, the title appeared in a university’s website was taken as the dominant form of that university name. Such dominant forms have been presented in Table 1 below. In this article, only MSRT (Ministry of Science, Research and Technology) universities were considered. Participants Two ISC (the Islamic World Science Citation Center) experts, each with an M.A. degree and good command of information science, and with at least 5 years of job experience, retrieved the whole collection of Iranian university surface forms (available in ISI Web of Science) with the objective of listing all surface forms related to a given university name under a single entry. They further visited the website of each Iranian university to come up with the surface form used in each website. They marked the form observed as the dominant surface form to which all other surface forms of that university could be referred to. This, of course, does not mean that the form used in each website was perfect and problem free—in fact, such forms also bore orthographic and linguistic problems—rather such strategy was only adopted to save space, report only one surface form for each university, and enable the inclusion of all university names in a single table (Table 1). Such strategy seemed justified since logically all faculty affiliated to a university are expected to draw on the surface form provided by the affiliated universities. This could, at least, act as an easy way to reduce divergence in orthography of university names. Procedure To carry out the study, the book by Goltaji and Alinejad Chamazkoti (2011) was drawn on as the main source of data collection. Having visited ISI Web of Science, they had extracted all variant forms of Iranian university titles—all variant forms of a university under a single entry—through keywords Cu = Iran and py = 2000–2009. To use one form as the entry term, the surface form appeared in a university’s website was considered as the entry name. For each university name, the total number of variants available—including the dominant form, was also recorded (i.e. 4 for ‘Ilam Univ’, or 6 for ‘Shiraz Univ’). After

123

Scientometrics (2013) 95:371–384

375

Table 1 Iranian universities’ dominant names and their total variant forms University name*

No.**

University name

No.

University name

No. 5

Tarbiat Modares Univ

136

Univ Isfahan

18

Bojnord Univ

Payame Noor Univ***

114

Power & Water Univ Technol

18

Univ Zabol

5

Amirkabir Univ Technol

108

Sahand univ Technol

17

Ilam Univ

4

KN Toosi Univ Technol

94

Damghan Univ

16

Tafresh Univ

4

Ferdowsi Univ Mashhad

67

Univ Appl Sci & Technol

16

Jundi Shapour Univ

4

Iran Univ Sci & Technol

60

Univ Tabriz

15

Shiraz Univ Technol

4

Tarbiat Moallem Univ

57

Shahrood Univ Technol

15

Fasa Univ

4

Shahid Beheshti Univ

56

Lorestan Univ

14

Golestan Univ

4

Shahid Bahonar Univ Kerman

50

Sari Agr Sci & Nat Resources Univ

13

Gonbad High Educ Ctr

4

Azarbaijan Univ Tarbiat Moallem

48

Univ Gilan

13

Art Univ Isfahan

4

Shahid Chamran Univ Ahvaz

45

Univ Kurdistan

11

Imam Sadiq Univ

3

Univ Mohaghegh Ardabili

40

Persian Gulf Univ

10

Univ Birjand

3

Bu Ali Sina Univ

37

Khoramshahr Marine Sci & Technol Univ

10

Chabahar Maritime Univ

3

Univ Sistan & Baluchestan

35

Petr Univ Technol

9

Univ Qom

3

Isfahan Univ Technol

34

Ramin Univ Agr & Nat Resources

9

Dr Shariaty Coll

2

Gorgan Univ Agr Sci & Nat Resources

33

Yasouj Univ

9

Imam Reza Univ

2

Babol Noshirvani Univ Technol

32

Yazd Univ

9

Kerman Grad Univ Technol

2

Allameh Tabatabai Univ

29

Inst Adv Studies Basic Sci

8

Urmia Univ Technol

2

Alzahra Univ

25

Razi Univ

8

Police U niv

2

Vali e Asr Univ Rafsanjan

24

Zanjan Univ

8

Univ Maragheh

2

Imam Khomeini Int Univ

23

Shahed Univ

8

Malayer Univ

2

Tarbiat Moallem Univ Sabzevar

23

Mazandaran Univ Sci & Technol

8

Univ Art

2

Sharif Univ Technol

23

Semnan Univ

7

Birjand Univ Technol

1

Urmia Univ

21

Univ Kashan

7

Qom Univ Technol

1

Shahid Rajaee Teacher Training Univ

21

Shiraz Univ

6

Kermanshah Univ Technol

1

Univ Tehran

20

Hormozgan Univ

6

Hamedan Univ Technol

1

Shahrekord Univ

20

Arak Univ

5

Neishabour Univ

1

123

376

Scientometrics (2013) 95:371–384

Table 1 continued University name*

No.**

Univ Mazandaran

19

University name Univ Bonab

No.

University name

No.

5

Tabriz Islam Art Univ

1

* The variant reported, in each case, as university name was extracted from the website of each university ** The frequencies reported cover sum of frequency of appearance of all variants listed under a single university name *** The case with ‘payame Noor Univ’ is different from the rest of items. In fact, in all other cases we deal with a single university and variants of its name, but Payame Noor University has a lot of branches within Iran—the branches are mostly marked by city names. So, the real amount of variations for each branch is much less than 114. Nevertheless, for ease of discussion, and for the sake of being comprehensive, the data related to all branches were merged under one single title, namely ‘Payame Noor Univ’

this step, the variants were inspected and analyzed linguistically, that is, variations were divided into different classes based on the roots of the inconsistencies observed in orthography of university names. Some sources of variation observed included ‘use of abbreviations’, ‘shift’, ‘conversion of vowels’, ‘the availability of parallel vowel/consonant letters to represent the same phone’, etc. Such factors were used as the basis of analysis as depicted in data analysis below. While linking the variant forms to their relevant university names some problems were also observed which were handled as follows: Firstly, two surface forms were differentiated from each other even when they were different only in one single character, i.e. ‘Shahrekord Uni’ vs. ‘Shahrekord Univ’, or when each surface form contained the same character set but in a different order, i.e. ‘Shiraz Univ’ versus ‘Univ Shiraz’. Secondly, surface forms were observed which could be linked to multiple university names. This mostly happened with acronyms. For example, SUT could be linked both to ‘Sahand Univ Technol’ and also to ‘Sharif Univ Technol’. In the acronym, SUT, the location of the university is not known for which reason it was added to both entries. There were not, of course, many such cases in the whole data studied. Data analysis The main objective of the present article was as follows: • To provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and, accordingly, to clarify the negative impact of such inconsistencies on positioning Iranian universities in global university ranking systems. To attain the above objective a series of analyses was carried out as follows: Table 1 presents, in a descending order, the list of 84 MSRT universities (research institutes were not considered)—each with its relevant number of orthographic variations—all extracted from ISI Web of Science. Based on Table 1, in all, 1668 orthographic variations were observed for the 84 universities under analysis, that is, on average 5 variants for each university name. Some key points observed in this table are as follows: (1) More orthographic variations were found in some university titles than in others, e.g. ‘Tarbiat Modares Univ’, ‘Payame Noor Univ’ and ‘Amirkabir Univ Technol’, with 136, 114 and 108 variants, depicted the highest number of variations. In contrast, ‘Birjand Univ

123

Scientometrics (2013) 95:371–384

377

Technol’, ‘Qom Univ Technol’, ‘Kermanshah Univ Technol’, ‘Neishabour Univ’ and finally ‘Tabriz Islam Art Univ’ stood at the bottom line with only 1 variant. In 41 universities, the total number of variants observed was a two-digit number, a number between 136 and 10. In 43 other universities less variations were observed, something between 9 and 1. Such figures clarify the extent to which authors are inconsistent while presenting their affiliation in their articles. Despite great strides made by Iranian universities to promote their position in global ranking systems it appears that what they get is much less than what they really deserve and truly have done. In simple terms, by presenting various surface forms for their universities they have unintentionally reduced the ranking of the universities to which they are affiliated since in ISI Web of Science each variant is taken as a distinct entry and the data listed under each variant is analyzed separately. In what follows some major patterns of inconsistencies observed in university titles will be introduced. The major classes discussed will be as follows: Acronyms, misspellings, abbreviations, space variations, syntactic permutation, application of vowels/consonants and vowel/consonant combinations, /a/vs./aa/, Tashdid, Kasra ezafe, redundancy, upper and lower case letters (downcasing), voiceless glottal stop sound /?/, deletion of some terms/letters (shortening) and deletion of titles. As indicated in Table 2 above, one root of variation in Iranian university titles is the parallel application of acronyms rather than full words. Based on the data, two classes of acronym application were observed: Part I of Table 2 shows examples of full acronyms, i.e. ‘Bu Ali Sina Univ’ ? ’BASU’ or ‘Shahid Bahonar Univ Kerman’ ? ’SBUK’, whereas Part II shows examples of partial acronyms—initials of some words ? abbreviated or full forms of other words are present in the title—, i.e. ‘Tarbiat Modares Univ’ ? ’TM Univ’. Authors may opt for one form rather than another, an inconsistency which has inevitably led to the availability of an array of surface forms for each university title. Table 3 illustrates samples of misspellings observed in university titles. Out of the 1666 entries inspected, 576 (34.57 %) entries contained misspellings. This means roughly onethird of the whole entries analyzed. This may indicate that the articles, in ISI journals, are not being edited/proof read by a person having a good command of Persian and English. A non-Iranian editor or proof reader has no way to find out if ‘Tabrix’ or ‘Tariz’ are wrong and if ‘Tabriz’ is the standard form to use. Neither will browsing the Web enable the proof reader to resolve the problem since in the Web the editor or proof reader will come across various surface forms. Of course, some misspellings should have been noticed by ISI Table 2 Inconsistencies rooted in acronyms Part I

Part II

University name

Acronym(s)

University name

Partial acronyms

1

Bu Ali Sina Univ

BASU

Tarbiat Modares Univ

TM Univ; TMU Univ

2

Imam Khomeini Int Univ

IKIU

Imam Khomeini Int Univ

IKI Univ

3

Tarbiat Modares Univ

TMU

Payame Noor Univ

PN Univ

4

Isfahan Univ Technol

IUT

Khaje Nasir Toosi Univ Technol

KN Toosi U Technol

5

Shahid Rajaee Teacher Training Univ

SRTTU

Sharif Univ Technol

Sharif U T

6

Univ Sistan & Baluchestan

USB

Univ Tehran

U Tehran

123

378

Scientometrics (2013) 95:371–384

Table 3 Inconsistencies rooted in misspellings Correct form

Misspelling

1

Tabriz

Tariz; Tabrix; Trabriz; Tabrize; Tebriz

2

Tarbiat

Taebiat; Tabiat; Tabriat; Tarbat; Tarbayat; Tarbeiyat; Tarbia; Tarbial; Tarbian; Tarbiart; Tarblat; Tarbita; Tarbit; Tarbist; Tarbiate; Tariat; Tarbyat; etc.

3

Ahvaz

Akhvaz

4

Persian Gulf Univ

Persian Calf Univ

5

Teacher Training Univ

Teacher Trianing Univ

6

Shiraz

Shiran

7

Rajaee

Rahaee; Radjaei; Rajae; Rajee; Rajaei; Rajaii;

8

Persian Gulf Univ

Persian Calf Univ

Table 4 Inconsistencies rooted in the application of abbreviations Word

Sample abbreviated forms

1

Technology

Tech, Technol, Techno

2

University

U, Univ, Unv*, Unvi*, Uuniv*, Unuiv*, Uinv*

3

Science

Sci

4

Petroleum

P as in (PUT); Petr as in (Petr Univ)

5

Engineering

Eng; Engn

6

Graduate

Grad as in (Kerman Grad Univ Technol)

7

Center

Ctr as in (Gonabad High Educ Ctr)

* Asterisk shows ill-formed words

journal editors had they not had any Persian background. For example, they should have known that ‘Unvi’ in ‘Razi Unvi’ is wrong—The correct form is ‘Razi Univ’ or better ‘Razi University’. So, the logical conclusion is that journals often trust the affiliation information submitted by Iranian authors. One way to remove this problem is that Iranian authors seek help from Iranian editors or colleagues who also master English, or at least use the surface form available in the university’s website. As shown in Table 4, authors have used abbreviations in a clumsy way. Various abbreviations have been used for a single term. At times, the abbreviated form used seems awkward and bizarre, i.e. the use of ‘Engn’ along with ‘Eng’ for ‘Engineering’ (as in ‘Engn Table 5 Inconsistencies rooted in space variations

123

With space

Without space

1

Khajeh Nasir

Khajehnasir

2

Kashan Univ

KashanUniv

3

Amir Kabir Univ

Amirkabir Univ

4

Bu Ali Univ

Buali Univ

5

Bu Ali Sina Univ

Bu AliSina Univ

6

Imam Reza Univ

Imamreza Univ Mashhad

Scientometrics (2013) 95:371–384

379

Fac Bonab’ for ‘Univ Bonab’), or ‘U’ and ‘Unv’, along with ‘Univ’ being only few examples. According to Table 5, space variation (especially zero and full space variants, half space being the third type) can result in emergence of various surface forms. In ISI Web of Science, two arrays of words or letters are taken distinct even if the only difference between them is the application of a different space variant. On this basis, ‘Khajeh Nasir’ and ‘KhajehNasir’, or ‘Kashan Univ’ and ‘KashanUniv’ are deemed as different entries, though they are actually two variant surface forms of a single entity. Table 6 reveals that word order can also result in university title variations. On this basis, ‘Arak Univ’ and ‘Univ Arak’ are considered different, though they both are variants of the same university name. The permutation model observed in Table 6 is ‘A B ? B A’. More extended forms of permutation may also occur especially in longer phrases, i.e. ‘A B C ? C A B, C B A, A C B, B A C and B C A. Quite a few such cases were observed in the data studied, e.g. ‘Univ Bu Ali Sina’ ? Bu Ali Sina Univ, where only the word ‘Univ’ has been permutated. As indicated in Table 7, authors have also been inconsistent in transliterating Persian vowels and consonants into English. For instance, ‘o’, ‘ow’, ‘ou’, ‘oo’ and ‘u’ have all been used to stand for the half close back vowel. Similarly, ‘q’ and ‘gh’ have been drawn on to stand both for and for , voiced velar fricative phonemes. At times, the variations have been unbelievably vast and awkward, a fact that highlights the urgent need for standardizing transliteration of vowels/consonants in Persian. Some more variation sources In addition to the above general classes some more variation sources were also observed a brief account of which will be catered for below: (I)

/a/vs./aa/ It appears that no distinction is often made between open front spread vowel ‘a’/æ/, as in ‘cat’, and open back round vowel ‘aa’/ :/, as in ‘car,’ by Iranian authors when transliterating Persian university titles into English. In ‘Shahed Univ’ vs. ‘Shaahed Univ’, for instance, ‘a’ and ‘aa’ have both been used to stand for the sound/ :/. Similarly, ‘a’ has often been used to stand for/ :/as in ‘Zanjan’, ‘Baluchestan’, ‘Chamran’ and ‘Hamadan’. In all these words, the first occurrence of ‘a’ sounds/æ/and the second occurence—in fact, the third one in ‘Hamadan’— stands for/ :/. Only rarely, have some authors used letters ‘a’ and ‘aa’, consistently, to stand for/æ/and/ :/respectively, e.g. ‘Univ Mazandaraan’. Other authors have, of

Table 6 Inconsistencies rooted in syntactic permutation

Form 1

Form 2

1

Arak Univ

Univ Arak

2

Golestan Univ

Univ Golestan

3

Mashhad Univ

Univ Mashhad

4

Kurdestan Univ

Univ Kurdestan

5

Malayer Univ

Univ Malayer

6

Art Univ

Univ Art

7

Police Univ

Univ Police

123

380

Scientometrics (2013) 95:371–384

Table 7 Inconsistencies rooted in variations in the application of vowels/consonants and vowel/consonant combinations Vowel(s)

Variations

Consonant(s)

Variations

1

/u:/as in ‘you’

oo (Toosi, Shahrood), o (Tossi), u (Tusi), ou (Shahroud)

/g/

g (Gorgan), Gh (Ghorgan*)

2

/o/as in ‘book’

u (Jundi), o (Jondi)

/ /, as in French word ‘merci’/m

si/

Q (Qom), Gh (Damghan)

3

/i/as in ‘seed’

Beheshtee, Shahid, Shaheed

/h/

h (Allameh), —(Allame)

4

/ /as in ‘know’

Ferdoosi, Ferdosi, Ferdowsi, Ferdousi





course, used forms like ‘Univ Mazandaran’ where ‘a’ stands both for/æ/and for/ :/ even in a single word. (II) Tashdid Tashdid simply means repeating a single letter and putting more emphasis on it. Compared to other letters, the pronunciation of such letters needs more force and duration. Some authors ignored such letters and used their simple forms (without Tashdid) rather than the emphasized forms, e.g. ‘Modares’ rather than ‘Modarres’ and ‘Khoramshahr’ rather than ‘Khorramshahr’. At times, misapplication of Tashdid was observed: Authors used it where they should not, e.g. ‘Illam Univ’ rather than ‘Ilam Univ’. So, ignoring Tashdid or its misapplication acted as another contributing factor to the emergence of university title variations. (III) Kasra ezafe In Persian Kasra ezafe functions something like ‘of’ in the English phrase, ‘The house of the president’. It is shown by the under script in Persian orthography. Although, it is an optional symbol most Iranians wish to ignore or skip in writing. The data analysis revealed that authors sometimes ignored this symbol altogether (i.e. ‘Univ Tarbiat Modarres’, Kasra ezafe after the second word is missing.), and sometimes they used different symbols (i.e. joining and disjoining ‘e’ and ‘E’ letters) to represent it, e.g. ‘Univ Tarbiate Modarres’ and ‘Univ Tarbiat E Modarres’—’e’ and ‘E’ after ‘Tarbiat’ representing Kasra ezafe. Another example being ‘Univ Shahr E Kord’ and ‘Shahrekord Univ’. So, at least three variations are observed regarding kasra ezafe which will ultimately add up to the collection of orthographic variations. (IV) Redundancy Some terms have been used redundantly by authors while providing their affiliation information in their publication. The examples below will clarify the point. The term ‘Univ’ has quite often been drawn on redundantly in university titles as in ‘IUST Univ’: The full form of this university title is ‘Iran Univ Sci & Technol Univ’ in which double application of the word ‘Univ’ is sure redundant. ‘Univ TMU’ the abbreviated form of ‘Univ Tarbiat Modarres University’ is just another example. (V) Upper and lower case letters (downcasing) Quite often upper case letters are used to form acronyms. Some authors, however, used upper and lower case letters inconsistently. As an example, some authors used ‘TMU’ to stand for ‘Tarbiat Modarres University’ but some others drew on the form ‘Tmu’ as in ‘Tmu Univ’. Here, the author has used the upper case letter ‘T’ for ‘Tarbiat’ and lower case letters ‘m’ and ‘u’ for ‘Modarres’ and ‘University’ respectively. Such cases were observed abundantly in the data under study.

123

Scientometrics (2013) 95:371–384

381

(VI)

Voiceless glottal stop/?/ Authors did not use any symbol to represent voiceless glottal stop in university titles. The two letters ‘a’ and ‘aa’ after ‘o’ respectively in ‘Moallem’ and ‘Moaallem’ have been used by authors to stand for the voiceless glottal stop sound. The sound/?/is present before ‘a’ and ‘aa’ in the above two words or after ‘a’ before ‘ii’ in ‘‘Tabatabaii’. The point is that none of these variations seem to work. Rather, they could have used the simple super script to stand for the glottal stop sound/?/. In this way, the words ‘Moallem’ and ‘Tabatabaii’ could have been written instead as ‘Mo’allem’ and ‘Tabaatabaa’ii’ respectively. (VII) Deletion of some terms/letters (shortening) Cases were observed where title words had been deleted by the authors. This point also gave rise to more title variations. ‘Petr Univ Technology’, for instance, changed into ‘Petr Univ’ by some authors. Sometimes, differences in pronunciation—due to regional and social accents—led also to orthographic variations. The word ‘Mashhad’, for example, also appeared as ‘Mashad’ and ‘Meshed’—’h’ deleted in the former and ‘a’ changed into ‘e’ in the latter. (VIII) Deletion of titles By title we mean words like ‘Dr’, ‘Prof.’, ‘Shahid’ meaning ‘Martyr’, etc. Such titles were drawn on by some authors but ignored by others and hence the divergence in university title names. ‘Dr Shariaty Coll’ versus ‘Shariaty Coll’ could be cited as one example, where ‘Dr’ is missing in the latter. Further, such titles appeared in different forms (full, acronym-wise and abbreviated forms). The word ‘Shahid’, for instance, appeared as ‘S’ and ‘Sh’ as well.

Discussion Based on the analyses in the previous section, the following points could be made: Firstly, a surprisingly large number of variations were observed in transliterating Iranian university names into English. This sort of diversity could be due to a number of reasons including lack of standards for transliteration of Persian words into English, authors’ less interest in having their articles edited and proof read by a qualified editor and dearth of effective laws to prevent authors from introducing different surface forms. From among 1668 title forms inspected, 576 (34.57 %) entries embodied misspellings. This was shown to be mostly due to carelessness of authors and, in part, due to unfamiliarity of journal editors with orthography of Persian university names. Such journals could, of course, be criticized with certainty for having missed some clear examples of misspellings, like ‘Unvi’ vs. ‘Univ’. Misspelling was not the only source of problems observed, in fact, some 15 sources of inconsistency were discussed in this article. Secondly, such inconsistencies will downgrade the ranking of Iranian universities in global ranking systems. Some may claim that had authors followed the orthography publicized by their affiliated university, through its website, the issue of various surface forms would have been settled totally, but in reality this is not the case. To clarify this point, four sample universities, each with 8 variants, were inspected. Further, the number of articles under each variant was also recorded. Based on the data in Fig. 1, it is true that most authors draw on the surface form publicized by their affiliated universities in their websites; nevertheless, the number of forms other than the dominant one is not that marginal. In ‘Mazandaran Univ Sci & Technol’, ‘Shahed Univ’, ‘Zanjan Univ’ and ‘Inst Adv Studies Basic Sci’, 15.38 % (12 out of 66), 10 % (66 out of 594), 28.71 % (172 out of 727) and 31.33 % (26 out of 57) of the surface forms differed from the surface form of the

123

382

Scientometrics (2013) 95:371–384

Fig. 1 Percentage of dominant versus other surface forms in 4 Iranian university titles

university publicized in university websites. Such deviant forms comprise almost one-third of the total forms used by authors of ‘Inst Adv Studies Basic Sci’. So, the mere application of the dominant form would not resolve the problem completely, though it can reduce the problem to a great extent. It seems that there is an urgent need to propose a standard system for transliteration of Persian words into English. This system can be included as add-ins software in Microsoft Word Office and could act as a spell checker for Iranian university names. Any form other than the default form could be identified by this software and converted into the standard form proposed. This is, of course, the topic of another article by the authors.

Recommendations Based on the findings, a number of recommendations could be made as follows: • There is a need to standardize the transliteration of Persian words, in general, and Persian university titles, in particular, into English. • Persian authors are advised to have their articles revised by a person who masters English along with Persian. They may also use the cooperation of such a person as coauthor. This will surely reduce title variations as it will also add up to the quality of the articles in terms of grammar, etc. • University faculty members are promoted to higher ranks based on a number of factors within which their publications forms a core element. Rules could be ratified, or reinforced with regard to laws already available, so that as authors receive credit for their articles, they should also be punished for at least not observing the university title publicized through the website of that university (Of course, in October 2011 the MSRT approved a law emphasizing that an article by a faculty member will be considered for their promotion provided the author(s) have included the affiliation publicized through the website of the university in which they work’’). • A standard list of Iranian university names could be produced and embedded as add-ins software in Microsoft Word to act as a spell checker for university titles. This last recommendation could be the most tangible and effective one.

123

Scientometrics (2013) 95:371–384

383

Concluding remarks This article tackled orthographic variations in Iranian university titles. The extent of such variations was found to be very wide having their root in a variety of issues including: misspellings, abbreviations, space variations, syntactic permutation, application of vowels/ consonants and vowel/consonant combinations,/a/vs./aa/, Tashdid, Kasra ezafe, redundancy, upper and lower case letters (downcasing), voiceless glottal stop sound/?/, deletion of some terms/letters (shortening) and deletion of titles. It was discussed that today the issue of positioning Iranian universities in global ranking systems is taken as an important issue and the MSRT, as a high priority, has adopted policies to promote the ranking of Iranian universities at the global scale. It was found that at its present shape Iranian universities are not receiving the rank they really deserve simply because authors affiliated to a university use various titles to stand for the university name. Authors have proved so inconsistent in this regard. It was recommended that authors follow the surface form publicized by universities in their websites, use the help of an editor while writing their articles, and be punished—as they are encouraged for their publication—by not crediting their articles in case they deviate from the surface form publicized. A spell checker, as an add-ins software is highly needed to homogenize Iranian university surface forms by replacing the variants by the default form proposed.

References Ahmadi-Birjandi, A. (1973). Ghesse-ye por ghosse-ye ettesal va enfesal [The sorrowful story of joining and disjoining]. Yaghma, 26(7), 473–475. Akbarnejad, S. (1997). Fasele-ye khali miyan-e vajeha dar zakhire va bazyabi-ye rayane’i-ye ettela’at [The issue of inner and outer word spaces in information storage and retrieval]. Faslname-ye Ketab (pp. 49–56). Berlin: Spring and Summer Issue. Behzadi, M. (1996). Shive-ye zabt-e a’lam-e engelisi dar Farsi [A method for recording English proper nouns in Persian]. Tehran: Markaz-e Nashr-e Daneshgahi, Ketabkhane-ye melli-ye jomhoori-ye eslami-ye Iran. Chung, C.J., & Park, H.W. (2012). Web visibility of scholars in media and communication journals. Scientometrics. doi:10.1007/s11192-012-0707-8, pp. 1–9. Emami, K. (1992). Lozoom-e baznegari dar shive-ye khatt-e Farsi [The need to revise Persian writing system]. Adine, 73(74), 18–19. Falahati Qadimi Fumani, M. R. (2010). Proposing a model of automatic key phrase indexing for a specific type of persian scientific articles based on a linguistically enriched statistical approach. India: Kuvempu Institute of Kannada Studies, University of Mysore. Falahati Qadimi Fumani, M. R. (2011). The Persian Agrovoc in an indexing context. Int. J. Index. (The Indexing), 29, 23–29. Falahati Qadimi Fumani, M. R., & Ramachandra, C. S. (2008). The concept of stopwords in Persian chemistry articles: A discussion in automatic indexing. Glossa, 4(1), 146–164. Feng, L., Yong, Y., Xiaolong, G., & Wei, Q. (2012). Performance evaluation of research universities in Mainland China, Hong Kong and Taiwan: based on a two-dimensional approach. Scientometrics, 90, 531–542. doi:10.1007/s11192-011-0544-1. Goltaji, M., & Alinejad Chamazkoti, F. (2011). Motale’e-ye ‘ashoftegi-ye negaresh-e nam-e daneshgahhaye vezarat-e ‘olum, tahqiqat va fannavari dar paygah-e tamson roiterz va yekdast sazi-ye nam-e ‘anha [Iran’s MSRT university title variations in ISI Web of Science: The need for consistency]. Shiraz: Takht-e Jamshid Publications. Hendi, S. (2002). Dastoor-e khatt-e Farsi: shive’i dar negaresh-e kalameha-ye morakkab [Persian writing system grammar: a method to write compound terms]. Aamoozesh-e Zaban va Adab-e Farsi, 16(63), 27–31. Horri, A. (1993). Kampiyuter va rasm-ol-khatt-e Farsi [Computer and Persian writing system]. Payam-e Ketabkhane, 3(1), 6–11.

123

384

Scientometrics (2013) 95:371–384

IAPLL. (2007). Dastoor-e khatt-e Farsi [Persian writing system grammar] (7th ed.). Tehran: Farhangestan Publications. Jahanshahi, (1981). Rahnamay-e nevisande va virayesh [A guide for writers and editing]. Tehran: Shooray-e Ketab-e Koodak. Farhangname-ye Koodakan va Nojavanan. Kaboli, I. (1995). Vajesazi va bifasele nevisi [Word formation and joining of compound term elements]. Adine, 97, 56–59. Mar’ashi, A.A. (2002). Chegoone ba doshvarihay-e khatt-e farsi kenar biya’im? [How to deal with the difficulties in Persian writing system?] Technoloji-ye Amoozeshi, 17(137), 28–32. Ma’soumi-Hamadani, H. (2002). Khatt-e Farsi va rayane [Persian writing system and computer]. Nashr-e Danesh, 19(2), 2–6. Morteza’i, L. (2001). Masaa’ele zabaan va khatt-e Faarsi dar zakhire va baazyaabi-ye ettelaa’aat [The problems with Persian orthography in information retrieval and storage]. Faslnaame-ye Ettelaa’resaani [Ettelaa’resaani Quarterly], 17(1,2), 24–29. Najafi, A. (2005). Ghalat Nanevisim. Farhang-e doshvarihay-e zaban-e Farsi [Let’s write correct Persian. A dictionary of difficulties in Persian writing] (14th ed.). Tehran: Markaz-e Nashr-e Daneshgahi. Naseh, M. A. (2004). Negahi be Payannamehay-e daneshgahi dar zamine-ye khatte Farsi (1974–2003) [An overview of academic theses on Persian writing system (1974–2003)]. Name-ye Farhangestan, 6(3), 47–50. Pourjavadi, N. A. (2003). Dar jabolsay-e internet: zaroorat-e khatt-e latini baray-e Farsi [In the Internet: the need for the Penglish]. Nashr-e Danesh, 20(2), 2–5. Saffarpour, A. (2001). Olgoohaa-ye yaaddehi-yaadgiri- ye gaam be gaam-e enshaa-ye Faarsi [Teaching and learning step-wise patterns of Persian spelling]. Tehran: Mo’asese Sama’i, S. M. (2004). Karbord-e neshaneha dar khatt-e Farsi [Use of punctuation marks in Persian orthography]. Oloom-e Ettela’ Resani, 19(1/2), 8–12. Sami’i-Gilani, A. (2000). Negaresh va virayesh [Writing and editting] (2nd ed.). Tehran: SAMT Publications. San’ati, M. (1992). Doshvariha-ye zaban-e Farsi ba kampiyuter [Difficulties in Persian computing]. Adine, 72, 56–57. Schulz, P.A., & Manganote, E.J.T. (2012). Revisiting country research profiles: learning about the scientific cultures. Scientometrics. doi:10.1007/s11192-012-0696-7, pp. 1–15. Tayyeb, (1992). Homography in Persian. Res. J. Isfahan Univ. (Humanities), 4, 15–38. Yahaghi, M. J., & Naseh, M. M. (1992). Rahnamay-e negaresh va virayesh [A guide to writing and editing]. Tehran: Astan-e Ghods-e Razavi Publications.

123