Scientometrics (2013) 95:371–384 DOI 10.1007/s11192-012-0818-2
Inconsistent transliteration of Iranian university names: a hazard to Iran’s ranking in ISI Web of Science Mohammad Reza Falahati Qadimi Fumani • Marzieh Goltaji Pardis Parto
•
Received: 20 May 2012 / Published online: 21 July 2012 Akade´miai Kiado´, Budapest, Hungary 2012
Abstract Today, university ranking has turned into a critical issue in the world. Each university is identified with a surface form under which the whole performance of that university is assessed. This article intends to provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and to clarify the negative impact of such inconsistencies in positioning Iranian universities in global university ranking systems. To collect various surface forms of Iranian university names, use was made of ISI Web of Science through keywords Cu = Iran and py = 2000–2009. Only MSRT universities were considered. Two M.A. experts listed all variant forms of a single university under that name. The form publicized in a university’s website was considered as its entry name. The major sources of variation identified were as follows: Acronyms, misspellings, abbreviations, space variations, syntactic permutation, application of vowels/ consonants and vowel/consonant combinations, /a/vs./aa/, Tashdid, Kasra ezafe, redundancy, downcasing, voiceless glottal stop sound /?/, shortening and deletion of titles. It was found that at its present shape Iranian universities are not receiving the rank they really deserve simply because authors affiliated to a university use university title forms inconsistently. It was recommended that authors follow the surface form publicized by universities in their websites, use the help of an editor in their works, and not be credited for their articles in case the forms deviate from those publicized through the websites. A spell checker, as an add-ins software is highly needed to homogenize Iranian university surface forms by replacing the variants by the dominant form proposed.
M. R. Falahati Qadimi Fumani (&) Faculty of Computational Linguistics Research Department, Regional Information Center for Science and Technology, RICeST, Shiraz, Iran e-mail:
[email protected] M. Goltaji Islamic World Science Citation Center, Shiraz, IRAN P. Parto Department of Evaluation and Collection Development, RICeST, Shiraz, Iran
123
372
Scientometrics (2013) 95:371–384
Keywords University ranking systems Iranian universities ISI Web of Science Persian-English transliteration Misspellings Information retrieval Persian orthography Downcasing
Introduction Recognition of terms comprises a major component in any natural language processing (NLP) software. In this regard, the surface forms of words are also of great importance. Any inconsistency in the orthography of words may hurt the IR system and will decrease recall. An end user may fail to retrieve some relevant documents despite their availability in the database simply because they keyed in the word with a surface form different from the one available in those documents—’program’ and ‘programm’ in and (both meaning ‘even’) in Persian, being two examples. Falahati English and Qadimi Fumani (2010) devoted a full chapter of his dissertation to description of such inconsistencies in Persian orthography. He also elaborated on such inconsistencies in two more articles (Falahati Qadimi Fumani and Ramachandra, 2008, 2011). Such inconsistencies are so much so that hundreds of articles have already been written on the issue by Persian scholars. LIS experts, as mediators of information producers and users, [as well as linguists] have long emphasized the importance of standardizing the orthography of terms especially following the expansion of information databases (Morteza’i, 2001). Such orthographic inconsistencies not only hurt the IR, but they may also damage the position of universities in global university ranking systems like ISI, Shanghai, etc. This is simply because tracing research and scientific production (often appeared in the form of books, articles, etc.) of universities is used as a major component in ranking universities. As a routine, all publication related to a specific university are listed under that university and then analyzed. Hence, if all authors affiliated to a university use a given title as the university name, all those publication could be linked to that university. But if various forms of a university name are available, works encompassing the variants will be grouped into different classes in terms of the specific variant form used by the authors. Under such circumstances, the university will be the great loser because ISI recognizes each university—and accordingly analyzes—by one and only one single surface form. That is, a number of scientific publications will be excluded in final ranking simply because authors used multiple titles, rather than a single one, to refer to the university to which they are affiliated. In fact, ISI will consider each variant a single item and accordingly will present a separate analysis and ranking for each case. That is, for each university, at best, a part of its real performance will be reflected and hence the university’s ranking in the ranking system will be underestimated. A large number of researchers over the globe have worked on university ranking and researchers’ evaluation. Chung and Park (2012), for instance, examined the Web visibility of researchers in the field of communication. Schulz and Manganote (2012) analyzed the Country Profiles, the open access data from ISI Thomson Reuter’s Science Watch. They discussed the advantages of defining a Country Profile Index (CPI), a tool for diagnosing the activity of the scientific community of a country and their possible strengths and weaknesses. Feng, Yong, Xiaolong and Wei (2012) discussed evaluation of research universities in mainland China, Hong Kong and Taiwan. They considered two variables, namely quality and quantity, of research production in research universities.
123
Scientometrics (2013) 95:371–384
373
Based on the brief introduction presented above, the main objective of the present article is to provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and to clarify the negative impact of such inconsistencies in positioning Iranian universities in global university ranking systems.
Literature review Extensive works have already been published on Persian script and the problems it faces in a computing environment. Some problems are rooted in the writing system itself but some are due to the unavailability of full-fledged, well-established and well-observed standards for Persian orthography—even if there is one it is not followed completely by language users. Proposals for modifying Persian script go back to half a century ago (Naseh, 2004). Emami (1992) stressed the need for Persian alphabet to undergo modifications to fit the computing environment. Morteza’i (2001) discussed the problems Persian writing system faces in information retrieval. From amongst those discussing the issue of computer and writing system, Horri (1993), San’ati (1992) and Ma’soumi-Hamadani (2002) are relevant examples. All the three works concentrated on the difficulties in Persian computing. Akbarnejad (1997) discussed the difficulties space variability may introduce into information storage and retrieval systems. Ahmadi-Birjandi (1973) elaborated on the two possibilities in Persian alphabet, namely joining and disjoining of letters and morphemes. To him, this dis/joining contrast is a great burden on Persian computing. Kaboli (1995) discussed space variation within the framework of word formation processes. Hendi (2002) focused on compound formation and proposed a method to standardize the writing of compounds— terms that embody a number of space variants within their inner structures. To him, compounds must be tackled as an important issue in information retrieval endeavors. Some researchers like Sama’i (2004), discussed the inconsistency in application of punctuation marks. He undermined that punctuation is an instrument that helps us a lot to communicate meaning and thus any inconsistency in its application could cause problems in the communication process. Tayyeb (1992), for instance, in his article ‘‘Homography in Persian’’ reiterated that the absence of short vowels was the root of 85 % of homographs in Persian. Yet, other researchers examined the way terms from English are handled in Persian. Behzadi (1996), for example, examined the inconsistencies in transliteration of English borrowed terms. Pourjavadi (2003) put forward suggestions to standardize Penglish (Writing Persian using English alphabet, which quite often happens in mobile SMSs, email exchanges and other similar situation, e.g. when Persian university names are written using English letters). In fact, the issue of recording Persian university names using English letters is the main theme of the present article.). As few examples, he referred to the availability of ‘oo’, ‘ou’ and ‘u’ for the long close back vowel and ‘ee’, ‘ii’, ‘ie’, ‘y’ and ‘i’ for the front close long vowel. He called for production of guide books with the objective of standardizing the recording of Persian words using Penglish in SMSs, email exchanges and on the Web. The inconsistencies observed in Persian writing system do not and must not imply that there are not guide books on principles of writing in the market. In fact, there are abundant books (Sami’i-Gilani, 2000; Jahanshahi, 1981; Yahaghi & Naseh, 1992; Saffarpour, 2001; IAPLL, 2007). Najafi’s (2005) book entitled ‘‘Ghalat Nanevisim’’ (Let’s Write Correct Persian) was also written to make the writings of different authors and, in general, the whole native speakers more consistent. So, the case is not the unavailability of guide
123
374
Scientometrics (2013) 95:371–384
books, rather the real source of the problem is that even in such sources and guide books various discrepancies are observed and more importantly in many cases, mostly cases which cause a lot of problems in Persian computing, no clear cut recommendations are put forward. Even in the book written by the IAPLL (2007) in many cases the authors have left the issue into the hands of the writers to opt for one possibility rather than the other(s). This is while Mar’ashi (2002) reiterated that the IAPLL is the only authorized body that must take responsibility over revising Persian writing system. He recommended that a grammar embodying 28 letters for the 28 sounds of Persian be proposed by the IAPLL.
Methodology Instruments To collect various surface forms of Iranian university names, use was made of ISI Web of Science through keywords Cu = Iran and py = 2000–2009. Of course, only data related to the time span 2000–2009 was considered and analyzed. To mark one surface form as dominant, use was made of Iranian university websites (the English web pages), that is, the title appeared in a university’s website was taken as the dominant form of that university name. Such dominant forms have been presented in Table 1 below. In this article, only MSRT (Ministry of Science, Research and Technology) universities were considered. Participants Two ISC (the Islamic World Science Citation Center) experts, each with an M.A. degree and good command of information science, and with at least 5 years of job experience, retrieved the whole collection of Iranian university surface forms (available in ISI Web of Science) with the objective of listing all surface forms related to a given university name under a single entry. They further visited the website of each Iranian university to come up with the surface form used in each website. They marked the form observed as the dominant surface form to which all other surface forms of that university could be referred to. This, of course, does not mean that the form used in each website was perfect and problem free—in fact, such forms also bore orthographic and linguistic problems—rather such strategy was only adopted to save space, report only one surface form for each university, and enable the inclusion of all university names in a single table (Table 1). Such strategy seemed justified since logically all faculty affiliated to a university are expected to draw on the surface form provided by the affiliated universities. This could, at least, act as an easy way to reduce divergence in orthography of university names. Procedure To carry out the study, the book by Goltaji and Alinejad Chamazkoti (2011) was drawn on as the main source of data collection. Having visited ISI Web of Science, they had extracted all variant forms of Iranian university titles—all variant forms of a university under a single entry—through keywords Cu = Iran and py = 2000–2009. To use one form as the entry term, the surface form appeared in a university’s website was considered as the entry name. For each university name, the total number of variants available—including the dominant form, was also recorded (i.e. 4 for ‘Ilam Univ’, or 6 for ‘Shiraz Univ’). After
123
Scientometrics (2013) 95:371–384
375
Table 1 Iranian universities’ dominant names and their total variant forms University name*
No.**
University name
No.
University name
No. 5
Tarbiat Modares Univ
136
Univ Isfahan
18
Bojnord Univ
Payame Noor Univ***
114
Power & Water Univ Technol
18
Univ Zabol
5
Amirkabir Univ Technol
108
Sahand univ Technol
17
Ilam Univ
4
KN Toosi Univ Technol
94
Damghan Univ
16
Tafresh Univ
4
Ferdowsi Univ Mashhad
67
Univ Appl Sci & Technol
16
Jundi Shapour Univ
4
Iran Univ Sci & Technol
60
Univ Tabriz
15
Shiraz Univ Technol
4
Tarbiat Moallem Univ
57
Shahrood Univ Technol
15
Fasa Univ
4
Shahid Beheshti Univ
56
Lorestan Univ
14
Golestan Univ
4
Shahid Bahonar Univ Kerman
50
Sari Agr Sci & Nat Resources Univ
13
Gonbad High Educ Ctr
4
Azarbaijan Univ Tarbiat Moallem
48
Univ Gilan
13
Art Univ Isfahan
4
Shahid Chamran Univ Ahvaz
45
Univ Kurdistan
11
Imam Sadiq Univ
3
Univ Mohaghegh Ardabili
40
Persian Gulf Univ
10
Univ Birjand
3
Bu Ali Sina Univ
37
Khoramshahr Marine Sci & Technol Univ
10
Chabahar Maritime Univ
3
Univ Sistan & Baluchestan
35
Petr Univ Technol
9
Univ Qom
3
Isfahan Univ Technol
34
Ramin Univ Agr & Nat Resources
9
Dr Shariaty Coll
2
Gorgan Univ Agr Sci & Nat Resources
33
Yasouj Univ
9
Imam Reza Univ
2
Babol Noshirvani Univ Technol
32
Yazd Univ
9
Kerman Grad Univ Technol
2
Allameh Tabatabai Univ
29
Inst Adv Studies Basic Sci
8
Urmia Univ Technol
2
Alzahra Univ
25
Razi Univ
8
Police U niv
2
Vali e Asr Univ Rafsanjan
24
Zanjan Univ
8
Univ Maragheh
2
Imam Khomeini Int Univ
23
Shahed Univ
8
Malayer Univ
2
Tarbiat Moallem Univ Sabzevar
23
Mazandaran Univ Sci & Technol
8
Univ Art
2
Sharif Univ Technol
23
Semnan Univ
7
Birjand Univ Technol
1
Urmia Univ
21
Univ Kashan
7
Qom Univ Technol
1
Shahid Rajaee Teacher Training Univ
21
Shiraz Univ
6
Kermanshah Univ Technol
1
Univ Tehran
20
Hormozgan Univ
6
Hamedan Univ Technol
1
Shahrekord Univ
20
Arak Univ
5
Neishabour Univ
1
123
376
Scientometrics (2013) 95:371–384
Table 1 continued University name*
No.**
Univ Mazandaran
19
University name Univ Bonab
No.
University name
No.
5
Tabriz Islam Art Univ
1
* The variant reported, in each case, as university name was extracted from the website of each university ** The frequencies reported cover sum of frequency of appearance of all variants listed under a single university name *** The case with ‘payame Noor Univ’ is different from the rest of items. In fact, in all other cases we deal with a single university and variants of its name, but Payame Noor University has a lot of branches within Iran—the branches are mostly marked by city names. So, the real amount of variations for each branch is much less than 114. Nevertheless, for ease of discussion, and for the sake of being comprehensive, the data related to all branches were merged under one single title, namely ‘Payame Noor Univ’
this step, the variants were inspected and analyzed linguistically, that is, variations were divided into different classes based on the roots of the inconsistencies observed in orthography of university names. Some sources of variation observed included ‘use of abbreviations’, ‘shift’, ‘conversion of vowels’, ‘the availability of parallel vowel/consonant letters to represent the same phone’, etc. Such factors were used as the basis of analysis as depicted in data analysis below. While linking the variant forms to their relevant university names some problems were also observed which were handled as follows: Firstly, two surface forms were differentiated from each other even when they were different only in one single character, i.e. ‘Shahrekord Uni’ vs. ‘Shahrekord Univ’, or when each surface form contained the same character set but in a different order, i.e. ‘Shiraz Univ’ versus ‘Univ Shiraz’. Secondly, surface forms were observed which could be linked to multiple university names. This mostly happened with acronyms. For example, SUT could be linked both to ‘Sahand Univ Technol’ and also to ‘Sharif Univ Technol’. In the acronym, SUT, the location of the university is not known for which reason it was added to both entries. There were not, of course, many such cases in the whole data studied. Data analysis The main objective of the present article was as follows: • To provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and, accordingly, to clarify the negative impact of such inconsistencies on positioning Iranian universities in global university ranking systems. To attain the above objective a series of analyses was carried out as follows: Table 1 presents, in a descending order, the list of 84 MSRT universities (research institutes were not considered)—each with its relevant number of orthographic variations—all extracted from ISI Web of Science. Based on Table 1, in all, 1668 orthographic variations were observed for the 84 universities under analysis, that is, on average 5 variants for each university name. Some key points observed in this table are as follows: (1) More orthographic variations were found in some university titles than in others, e.g. ‘Tarbiat Modares Univ’, ‘Payame Noor Univ’ and ‘Amirkabir Univ Technol’, with 136, 114 and 108 variants, depicted the highest number of variations. In contrast, ‘Birjand Univ
123
Scientometrics (2013) 95:371–384
377
Technol’, ‘Qom Univ Technol’, ‘Kermanshah Univ Technol’, ‘Neishabour Univ’ and finally ‘Tabriz Islam Art Univ’ stood at the bottom line with only 1 variant. In 41 universities, the total number of variants observed was a two-digit number, a number between 136 and 10. In 43 other universities less variations were observed, something between 9 and 1. Such figures clarify the extent to which authors are inconsistent while presenting their affiliation in their articles. Despite great strides made by Iranian universities to promote their position in global ranking systems it appears that what they get is much less than what they really deserve and truly have done. In simple terms, by presenting various surface forms for their universities they have unintentionally reduced the ranking of the universities to which they are affiliated since in ISI Web of Science each variant is taken as a distinct entry and the data listed under each variant is analyzed separately. In what follows some major patterns of inconsistencies observed in university titles will be introduced. The major classes discussed will be as follows: Acronyms, misspellings, abbreviations, space variations, syntactic permutation, application of vowels/consonants and vowel/consonant combinations, /a/vs./aa/, Tashdid, Kasra ezafe, redundancy, upper and lower case letters (downcasing), voiceless glottal stop sound /?/, deletion of some terms/letters (shortening) and deletion of titles. As indicated in Table 2 above, one root of variation in Iranian university titles is the parallel application of acronyms rather than full words. Based on the data, two classes of acronym application were observed: Part I of Table 2 shows examples of full acronyms, i.e. ‘Bu Ali Sina Univ’ ? ’BASU’ or ‘Shahid Bahonar Univ Kerman’ ? ’SBUK’, whereas Part II shows examples of partial acronyms—initials of some words ? abbreviated or full forms of other words are present in the title—, i.e. ‘Tarbiat Modares Univ’ ? ’TM Univ’. Authors may opt for one form rather than another, an inconsistency which has inevitably led to the availability of an array of surface forms for each university title. Table 3 illustrates samples of misspellings observed in university titles. Out of the 1666 entries inspected, 576 (34.57 %) entries contained misspellings. This means roughly onethird of the whole entries analyzed. This may indicate that the articles, in ISI journals, are not being edited/proof read by a person having a good command of Persian and English. A non-Iranian editor or proof reader has no way to find out if ‘Tabrix’ or ‘Tariz’ are wrong and if ‘Tabriz’ is the standard form to use. Neither will browsing the Web enable the proof reader to resolve the problem since in the Web the editor or proof reader will come across various surface forms. Of course, some misspellings should have been noticed by ISI Table 2 Inconsistencies rooted in acronyms Part I
Part II
University name
Acronym(s)
University name
Partial acronyms
1
Bu Ali Sina Univ
BASU
Tarbiat Modares Univ
TM Univ; TMU Univ
2
Imam Khomeini Int Univ
IKIU
Imam Khomeini Int Univ
IKI Univ
3
Tarbiat Modares Univ
TMU
Payame Noor Univ
PN Univ
4
Isfahan Univ Technol
IUT
Khaje Nasir Toosi Univ Technol
KN Toosi U Technol
5
Shahid Rajaee Teacher Training Univ
SRTTU
Sharif Univ Technol
Sharif U T
6
Univ Sistan & Baluchestan
USB
Univ Tehran
U Tehran
123
378
Scientometrics (2013) 95:371–384
Table 3 Inconsistencies rooted in misspellings Correct form
Misspelling
1
Tabriz
Tariz; Tabrix; Trabriz; Tabrize; Tebriz
2
Tarbiat
Taebiat; Tabiat; Tabriat; Tarbat; Tarbayat; Tarbeiyat; Tarbia; Tarbial; Tarbian; Tarbiart; Tarblat; Tarbita; Tarbit; Tarbist; Tarbiate; Tariat; Tarbyat; etc.
3
Ahvaz
Akhvaz
4
Persian Gulf Univ
Persian Calf Univ
5
Teacher Training Univ
Teacher Trianing Univ
6
Shiraz
Shiran
7
Rajaee
Rahaee; Radjaei; Rajae; Rajee; Rajaei; Rajaii;
8
Persian Gulf Univ
Persian Calf Univ
Table 4 Inconsistencies rooted in the application of abbreviations Word
Sample abbreviated forms
1
Technology
Tech, Technol, Techno
2
University
U, Univ, Unv*, Unvi*, Uuniv*, Unuiv*, Uinv*
3
Science
Sci
4
Petroleum
P as in (PUT); Petr as in (Petr Univ)
5
Engineering
Eng; Engn
6
Graduate
Grad as in (Kerman Grad Univ Technol)
7
Center
Ctr as in (Gonabad High Educ Ctr)
* Asterisk shows ill-formed words
journal editors had they not had any Persian background. For example, they should have known that ‘Unvi’ in ‘Razi Unvi’ is wrong—The correct form is ‘Razi Univ’ or better ‘Razi University’. So, the logical conclusion is that journals often trust the affiliation information submitted by Iranian authors. One way to remove this problem is that Iranian authors seek help from Iranian editors or colleagues who also master English, or at least use the surface form available in the university’s website. As shown in Table 4, authors have used abbreviations in a clumsy way. Various abbreviations have been used for a single term. At times, the abbreviated form used seems awkward and bizarre, i.e. the use of ‘Engn’ along with ‘Eng’ for ‘Engineering’ (as in ‘Engn Table 5 Inconsistencies rooted in space variations
123
With space
Without space
1
Khajeh Nasir
Khajehnasir
2
Kashan Univ
KashanUniv
3
Amir Kabir Univ
Amirkabir Univ
4
Bu Ali Univ
Buali Univ
5
Bu Ali Sina Univ
Bu AliSina Univ
6
Imam Reza Univ
Imamreza Univ Mashhad
Scientometrics (2013) 95:371–384
379
Fac Bonab’ for ‘Univ Bonab’), or ‘U’ and ‘Unv’, along with ‘Univ’ being only few examples. According to Table 5, space variation (especially zero and full space variants, half space being the third type) can result in emergence of various surface forms. In ISI Web of Science, two arrays of words or letters are taken distinct even if the only difference between them is the application of a different space variant. On this basis, ‘Khajeh Nasir’ and ‘KhajehNasir’, or ‘Kashan Univ’ and ‘KashanUniv’ are deemed as different entries, though they are actually two variant surface forms of a single entity. Table 6 reveals that word order can also result in university title variations. On this basis, ‘Arak Univ’ and ‘Univ Arak’ are considered different, though they both are variants of the same university name. The permutation model observed in Table 6 is ‘A B ? B A’. More extended forms of permutation may also occur especially in longer phrases, i.e. ‘A B C ? C A B, C B A, A C B, B A C and B C A. Quite a few such cases were observed in the data studied, e.g. ‘Univ Bu Ali Sina’ ? Bu Ali Sina Univ, where only the word ‘Univ’ has been permutated. As indicated in Table 7, authors have also been inconsistent in transliterating Persian vowels and consonants into English. For instance, ‘o’, ‘ow’, ‘ou’, ‘oo’ and ‘u’ have all been used to stand for the half close back vowel. Similarly, ‘q’ and ‘gh’ have been drawn on to stand both for and for , voiced velar fricative phonemes. At times, the variations have been unbelievably vast and awkward, a fact that highlights the urgent need for standardizing transliteration of vowels/consonants in Persian. Some more variation sources In addition to the above general classes some more variation sources were also observed a brief account of which will be catered for below: (I)
/a/vs./aa/ It appears that no distinction is often made between open front spread vowel ‘a’/æ/, as in ‘cat’, and open back round vowel ‘aa’/ :/, as in ‘car,’ by Iranian authors when transliterating Persian university titles into English. In ‘Shahed Univ’ vs. ‘Shaahed Univ’, for instance, ‘a’ and ‘aa’ have both been used to stand for the sound/ :/. Similarly, ‘a’ has often been used to stand for/ :/as in ‘Zanjan’, ‘Baluchestan’, ‘Chamran’ and ‘Hamadan’. In all these words, the first occurrence of ‘a’ sounds/æ/and the second occurence—in fact, the third one in ‘Hamadan’— stands for/ :/. Only rarely, have some authors used letters ‘a’ and ‘aa’, consistently, to stand for/æ/and/ :/respectively, e.g. ‘Univ Mazandaraan’. Other authors have, of
Table 6 Inconsistencies rooted in syntactic permutation
Form 1
Form 2
1
Arak Univ
Univ Arak
2
Golestan Univ
Univ Golestan
3
Mashhad Univ
Univ Mashhad
4
Kurdestan Univ
Univ Kurdestan
5
Malayer Univ
Univ Malayer
6
Art Univ
Univ Art
7
Police Univ
Univ Police
123
380
Scientometrics (2013) 95:371–384
Table 7 Inconsistencies rooted in variations in the application of vowels/consonants and vowel/consonant combinations Vowel(s)
Variations
Consonant(s)
Variations
1
/u:/as in ‘you’
oo (Toosi, Shahrood), o (Tossi), u (Tusi), ou (Shahroud)
/g/
g (Gorgan), Gh (Ghorgan*)
2
/o/as in ‘book’
u (Jundi), o (Jondi)
/ /, as in French word ‘merci’/m
si/
Q (Qom), Gh (Damghan)
3
/i/as in ‘seed’
Beheshtee, Shahid, Shaheed
/h/
h (Allameh), —(Allame)
4
/ /as in ‘know’
Ferdoosi, Ferdosi, Ferdowsi, Ferdousi
…
…
course, used forms like ‘Univ Mazandaran’ where ‘a’ stands both for/æ/and for/ :/ even in a single word. (II) Tashdid Tashdid simply means repeating a single letter and putting more emphasis on it. Compared to other letters, the pronunciation of such letters needs more force and duration. Some authors ignored such letters and used their simple forms (without Tashdid) rather than the emphasized forms, e.g. ‘Modares’ rather than ‘Modarres’ and ‘Khoramshahr’ rather than ‘Khorramshahr’. At times, misapplication of Tashdid was observed: Authors used it where they should not, e.g. ‘Illam Univ’ rather than ‘Ilam Univ’. So, ignoring Tashdid or its misapplication acted as another contributing factor to the emergence of university title variations. (III) Kasra ezafe In Persian Kasra ezafe functions something like ‘of’ in the English phrase, ‘The house of the president’. It is shown by the under script in Persian orthography. Although, it is an optional symbol most Iranians wish to ignore or skip in writing. The data analysis revealed that authors sometimes ignored this symbol altogether (i.e. ‘Univ Tarbiat Modarres’, Kasra ezafe after the second word is missing.), and sometimes they used different symbols (i.e. joining and disjoining ‘e’ and ‘E’ letters) to represent it, e.g. ‘Univ Tarbiate Modarres’ and ‘Univ Tarbiat E Modarres’—’e’ and ‘E’ after ‘Tarbiat’ representing Kasra ezafe. Another example being ‘Univ Shahr E Kord’ and ‘Shahrekord Univ’. So, at least three variations are observed regarding kasra ezafe which will ultimately add up to the collection of orthographic variations. (IV) Redundancy Some terms have been used redundantly by authors while providing their affiliation information in their publication. The examples below will clarify the point. The term ‘Univ’ has quite often been drawn on redundantly in university titles as in ‘IUST Univ’: The full form of this university title is ‘Iran Univ Sci & Technol Univ’ in which double application of the word ‘Univ’ is sure redundant. ‘Univ TMU’ the abbreviated form of ‘Univ Tarbiat Modarres University’ is just another example. (V) Upper and lower case letters (downcasing) Quite often upper case letters are used to form acronyms. Some authors, however, used upper and lower case letters inconsistently. As an example, some authors used ‘TMU’ to stand for ‘Tarbiat Modarres University’ but some others drew on the form ‘Tmu’ as in ‘Tmu Univ’. Here, the author has used the upper case letter ‘T’ for ‘Tarbiat’ and lower case letters ‘m’ and ‘u’ for ‘Modarres’ and ‘University’ respectively. Such cases were observed abundantly in the data under study.
123
Scientometrics (2013) 95:371–384
381
(VI)
Voiceless glottal stop/?/ Authors did not use any symbol to represent voiceless glottal stop in university titles. The two letters ‘a’ and ‘aa’ after ‘o’ respectively in ‘Moallem’ and ‘Moaallem’ have been used by authors to stand for the voiceless glottal stop sound. The sound/?/is present before ‘a’ and ‘aa’ in the above two words or after ‘a’ before ‘ii’ in ‘‘Tabatabaii’. The point is that none of these variations seem to work. Rather, they could have used the simple super script to stand for the glottal stop sound/?/. In this way, the words ‘Moallem’ and ‘Tabatabaii’ could have been written instead as ‘Mo’allem’ and ‘Tabaatabaa’ii’ respectively. (VII) Deletion of some terms/letters (shortening) Cases were observed where title words had been deleted by the authors. This point also gave rise to more title variations. ‘Petr Univ Technology’, for instance, changed into ‘Petr Univ’ by some authors. Sometimes, differences in pronunciation—due to regional and social accents—led also to orthographic variations. The word ‘Mashhad’, for example, also appeared as ‘Mashad’ and ‘Meshed’—’h’ deleted in the former and ‘a’ changed into ‘e’ in the latter. (VIII) Deletion of titles By title we mean words like ‘Dr’, ‘Prof.’, ‘Shahid’ meaning ‘Martyr’, etc. Such titles were drawn on by some authors but ignored by others and hence the divergence in university title names. ‘Dr Shariaty Coll’ versus ‘Shariaty Coll’ could be cited as one example, where ‘Dr’ is missing in the latter. Further, such titles appeared in different forms (full, acronym-wise and abbreviated forms). The word ‘Shahid’, for instance, appeared as ‘S’ and ‘Sh’ as well.
Discussion Based on the analyses in the previous section, the following points could be made: Firstly, a surprisingly large number of variations were observed in transliterating Iranian university names into English. This sort of diversity could be due to a number of reasons including lack of standards for transliteration of Persian words into English, authors’ less interest in having their articles edited and proof read by a qualified editor and dearth of effective laws to prevent authors from introducing different surface forms. From among 1668 title forms inspected, 576 (34.57 %) entries embodied misspellings. This was shown to be mostly due to carelessness of authors and, in part, due to unfamiliarity of journal editors with orthography of Persian university names. Such journals could, of course, be criticized with certainty for having missed some clear examples of misspellings, like ‘Unvi’ vs. ‘Univ’. Misspelling was not the only source of problems observed, in fact, some 15 sources of inconsistency were discussed in this article. Secondly, such inconsistencies will downgrade the ranking of Iranian universities in global ranking systems. Some may claim that had authors followed the orthography publicized by their affiliated university, through its website, the issue of various surface forms would have been settled totally, but in reality this is not the case. To clarify this point, four sample universities, each with 8 variants, were inspected. Further, the number of articles under each variant was also recorded. Based on the data in Fig. 1, it is true that most authors draw on the surface form publicized by their affiliated universities in their websites; nevertheless, the number of forms other than the dominant one is not that marginal. In ‘Mazandaran Univ Sci & Technol’, ‘Shahed Univ’, ‘Zanjan Univ’ and ‘Inst Adv Studies Basic Sci’, 15.38 % (12 out of 66), 10 % (66 out of 594), 28.71 % (172 out of 727) and 31.33 % (26 out of 57) of the surface forms differed from the surface form of the
123
382
Scientometrics (2013) 95:371–384
Fig. 1 Percentage of dominant versus other surface forms in 4 Iranian university titles
university publicized in university websites. Such deviant forms comprise almost one-third of the total forms used by authors of ‘Inst Adv Studies Basic Sci’. So, the mere application of the dominant form would not resolve the problem completely, though it can reduce the problem to a great extent. It seems that there is an urgent need to propose a standard system for transliteration of Persian words into English. This system can be included as add-ins software in Microsoft Word Office and could act as a spell checker for Iranian university names. Any form other than the default form could be identified by this software and converted into the standard form proposed. This is, of course, the topic of another article by the authors.
Recommendations Based on the findings, a number of recommendations could be made as follows: • There is a need to standardize the transliteration of Persian words, in general, and Persian university titles, in particular, into English. • Persian authors are advised to have their articles revised by a person who masters English along with Persian. They may also use the cooperation of such a person as coauthor. This will surely reduce title variations as it will also add up to the quality of the articles in terms of grammar, etc. • University faculty members are promoted to higher ranks based on a number of factors within which their publications forms a core element. Rules could be ratified, or reinforced with regard to laws already available, so that as authors receive credit for their articles, they should also be punished for at least not observing the university title publicized through the website of that university (Of course, in October 2011 the MSRT approved a law emphasizing that an article by a faculty member will be considered for their promotion provided the author(s) have included the affiliation publicized through the website of the university in which they work’’). • A standard list of Iranian university names could be produced and embedded as add-ins software in Microsoft Word to act as a spell checker for university titles. This last recommendation could be the most tangible and effective one.
123
Scientometrics (2013) 95:371–384
383
Concluding remarks This article tackled orthographic variations in Iranian university titles. The extent of such variations was found to be very wide having their root in a variety of issues including: misspellings, abbreviations, space variations, syntactic permutation, application of vowels/ consonants and vowel/consonant combinations,/a/vs./aa/, Tashdid, Kasra ezafe, redundancy, upper and lower case letters (downcasing), voiceless glottal stop sound/?/, deletion of some terms/letters (shortening) and deletion of titles. It was discussed that today the issue of positioning Iranian universities in global ranking systems is taken as an important issue and the MSRT, as a high priority, has adopted policies to promote the ranking of Iranian universities at the global scale. It was found that at its present shape Iranian universities are not receiving the rank they really deserve simply because authors affiliated to a university use various titles to stand for the university name. Authors have proved so inconsistent in this regard. It was recommended that authors follow the surface form publicized by universities in their websites, use the help of an editor while writing their articles, and be punished—as they are encouraged for their publication—by not crediting their articles in case they deviate from the surface form publicized. A spell checker, as an add-ins software is highly needed to homogenize Iranian university surface forms by replacing the variants by the default form proposed.
References Ahmadi-Birjandi, A. (1973). Ghesse-ye por ghosse-ye ettesal va enfesal [The sorrowful story of joining and disjoining]. Yaghma, 26(7), 473–475. Akbarnejad, S. (1997). Fasele-ye khali miyan-e vajeha dar zakhire va bazyabi-ye rayane’i-ye ettela’at [The issue of inner and outer word spaces in information storage and retrieval]. Faslname-ye Ketab (pp. 49–56). Berlin: Spring and Summer Issue. Behzadi, M. (1996). Shive-ye zabt-e a’lam-e engelisi dar Farsi [A method for recording English proper nouns in Persian]. Tehran: Markaz-e Nashr-e Daneshgahi, Ketabkhane-ye melli-ye jomhoori-ye eslami-ye Iran. Chung, C.J., & Park, H.W. (2012). Web visibility of scholars in media and communication journals. Scientometrics. doi:10.1007/s11192-012-0707-8, pp. 1–9. Emami, K. (1992). Lozoom-e baznegari dar shive-ye khatt-e Farsi [The need to revise Persian writing system]. Adine, 73(74), 18–19. Falahati Qadimi Fumani, M. R. (2010). Proposing a model of automatic key phrase indexing for a specific type of persian scientific articles based on a linguistically enriched statistical approach. India: Kuvempu Institute of Kannada Studies, University of Mysore. Falahati Qadimi Fumani, M. R. (2011). The Persian Agrovoc in an indexing context. Int. J. Index. (The Indexing), 29, 23–29. Falahati Qadimi Fumani, M. R., & Ramachandra, C. S. (2008). The concept of stopwords in Persian chemistry articles: A discussion in automatic indexing. Glossa, 4(1), 146–164. Feng, L., Yong, Y., Xiaolong, G., & Wei, Q. (2012). Performance evaluation of research universities in Mainland China, Hong Kong and Taiwan: based on a two-dimensional approach. Scientometrics, 90, 531–542. doi:10.1007/s11192-011-0544-1. Goltaji, M., & Alinejad Chamazkoti, F. (2011). Motale’e-ye ‘ashoftegi-ye negaresh-e nam-e daneshgahhaye vezarat-e ‘olum, tahqiqat va fannavari dar paygah-e tamson roiterz va yekdast sazi-ye nam-e ‘anha [Iran’s MSRT university title variations in ISI Web of Science: The need for consistency]. Shiraz: Takht-e Jamshid Publications. Hendi, S. (2002). Dastoor-e khatt-e Farsi: shive’i dar negaresh-e kalameha-ye morakkab [Persian writing system grammar: a method to write compound terms]. Aamoozesh-e Zaban va Adab-e Farsi, 16(63), 27–31. Horri, A. (1993). Kampiyuter va rasm-ol-khatt-e Farsi [Computer and Persian writing system]. Payam-e Ketabkhane, 3(1), 6–11.
123
384
Scientometrics (2013) 95:371–384
IAPLL. (2007). Dastoor-e khatt-e Farsi [Persian writing system grammar] (7th ed.). Tehran: Farhangestan Publications. Jahanshahi, (1981). Rahnamay-e nevisande va virayesh [A guide for writers and editing]. Tehran: Shooray-e Ketab-e Koodak. Farhangname-ye Koodakan va Nojavanan. Kaboli, I. (1995). Vajesazi va bifasele nevisi [Word formation and joining of compound term elements]. Adine, 97, 56–59. Mar’ashi, A.A. (2002). Chegoone ba doshvarihay-e khatt-e farsi kenar biya’im? [How to deal with the difficulties in Persian writing system?] Technoloji-ye Amoozeshi, 17(137), 28–32. Ma’soumi-Hamadani, H. (2002). Khatt-e Farsi va rayane [Persian writing system and computer]. Nashr-e Danesh, 19(2), 2–6. Morteza’i, L. (2001). Masaa’ele zabaan va khatt-e Faarsi dar zakhire va baazyaabi-ye ettelaa’aat [The problems with Persian orthography in information retrieval and storage]. Faslnaame-ye Ettelaa’resaani [Ettelaa’resaani Quarterly], 17(1,2), 24–29. Najafi, A. (2005). Ghalat Nanevisim. Farhang-e doshvarihay-e zaban-e Farsi [Let’s write correct Persian. A dictionary of difficulties in Persian writing] (14th ed.). Tehran: Markaz-e Nashr-e Daneshgahi. Naseh, M. A. (2004). Negahi be Payannamehay-e daneshgahi dar zamine-ye khatte Farsi (1974–2003) [An overview of academic theses on Persian writing system (1974–2003)]. Name-ye Farhangestan, 6(3), 47–50. Pourjavadi, N. A. (2003). Dar jabolsay-e internet: zaroorat-e khatt-e latini baray-e Farsi [In the Internet: the need for the Penglish]. Nashr-e Danesh, 20(2), 2–5. Saffarpour, A. (2001). Olgoohaa-ye yaaddehi-yaadgiri- ye gaam be gaam-e enshaa-ye Faarsi [Teaching and learning step-wise patterns of Persian spelling]. Tehran: Mo’asese Sama’i, S. M. (2004). Karbord-e neshaneha dar khatt-e Farsi [Use of punctuation marks in Persian orthography]. Oloom-e Ettela’ Resani, 19(1/2), 8–12. Sami’i-Gilani, A. (2000). Negaresh va virayesh [Writing and editting] (2nd ed.). Tehran: SAMT Publications. San’ati, M. (1992). Doshvariha-ye zaban-e Farsi ba kampiyuter [Difficulties in Persian computing]. Adine, 72, 56–57. Schulz, P.A., & Manganote, E.J.T. (2012). Revisiting country research profiles: learning about the scientific cultures. Scientometrics. doi:10.1007/s11192-012-0696-7, pp. 1–15. Tayyeb, (1992). Homography in Persian. Res. J. Isfahan Univ. (Humanities), 4, 15–38. Yahaghi, M. J., & Naseh, M. M. (1992). Rahnamay-e negaresh va virayesh [A guide to writing and editing]. Tehran: Astan-e Ghods-e Razavi Publications.
123