Development of Document Image Database for Handwritten Indic Script - A State-of-the-art Sk Md Obaidullah Aliah University
[email protected]
Nibaran Das Jadavpur University
[email protected]
Kaushik Roy WB State University
[email protected]
Ziaur Rahaman Aliah University
[email protected]
Abstract – Script identification is a complex real life problem for a multi-lingual/multi-script country like India, where a total of 22 official languages are present and 13 different scripts are used to write them. Several techniques has been developed by the document image processing researchers towards the identification of unknown scripts from multi-script document images and many works are still in progress. But one of the important issues in this field is the availability of standard database on which entire training, testing and validation operation will be done. As of now many databases has been developed by different research groups/individual researchers for various Indic scripts, but till date no review work is reported in literature with precise information about them. In this paper a state-of-theart on the handwritten document database (both offline and online) developed by different researchers for Indic scripts is reported. Ongoing efforts made by authors to build their own database are also mentioned. Finally challenges associated with database development are discussed.
multi-script identification system is already available [2, 3, 4, 5]. But in our country the researchers have recently started showing interest to this area. So a system to identify the script of the document images is of pressing need. But one of the main concerns to build such system is availability of standard database on which entire training, testing and validation operations will be done. Many researchers developed their own database to test their system. Some databases are made available for other researchers in this field by providing an URL address, where as few others can be obtained by requesting the developer. In this paper an attempt has made to provide a state-of-the-art on the handwritten document image database developed for Indic script. These databases can be used for many applications like segmentation, writer verification, handwriting recognition, script identification etc.
Keywords – Script Identification, Indic Script, MultiScript Document, Document Image Database
II. Official Languages and Scripts of India In India there are officially 22 languages [1] and 13 scripts are used to write them. Apart from these, English is also a popular language which is written by Roman script. Figure 1 shows an Indian map where different languages associated with different states are shown.
I. Introduction In India we have 22 official languages which are written using 13 different scripts [1]. Including English, which is a very popular language in India the figure become 23. So, an official document written using multiple scripts is very common in our country. Postal documents, filled up pre-printed forms etc. are good examples of multi-script documents. Optical Character Recognizer (OCR) is a tool used to extract text from document images. For a multi-script country like India, designing of a general OCR system for all official languages is very challenging task. To overcome this, an automatic script identification system can be developed and the output can be feed to script dependent OCR. Document image containing more than one scripts are generally classified as Intradocument script identification problem to the OCR research community. Where as in Inter-document script identification problem each document is considered to be written using one script only. For Inter-document script, page level identification is possible but in Intra-document script identification, it may be in block, line, word, even in character level also. In many countries automated
The paper is organized as follows: section II discuss briefly about official languages and scripts of India. Indic script database is discussed in section III. Author’s contribution is discussed in section IV. Challenges associated with database development are discussed in section V. Conclusion and Future scopes are provided in section VI. Finally references are available in last section.
Fig. 1 A map showing different languages and scripts for different states [34]
Table 1 shows different official languages of India and different scripts used to write those languages with their population distribution. Table 1 Official language with total speaker and its script [1] Sl. No. Language Speaker (M) Script Writer (M) 1 Assamese 16.8 Bangla 211.5 2 Bengali 181 3 Manipuri 13.7 4 Bodo 0.5 5 Hindi 182 6 Konkani 7.6 Devnagari 328.23 7 Maithili 34.7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Marathi Nepali Sanskrit Sindhi Santhali English Dogri Gujarati Kannada Kashmiri Malayalam Oriya Punjabi Tamil Telugu Urdu
68.1 13.9 0.03 21.4 6.2 328 3.8 46.5 3.63 5.6 35.9 31.7 1.05 65.7 69.8 60.6
Roman Dogri Gujarati Kannada Kashmiri Malayalam Oriya Punjabi Tamil Telugu Urdu
334.2 3.8 46.5 3.63 5.6 35.9 31.7 1.05 65.7 69.8 60.6
III. Indic Script Database This paper presents a state-of-the-art on the handwritten document image database developed by the researchers on Indic scripts. The work is reported with respect to the application of script identification problem [32, 33]. All the script identification system can be broadly classified into two categories depending on the nature of the input document image provided. These are namely Printed and Handwritten script identification system. Again handwritten script identification system can be classified into two types namely Offline and Online script identification system. A. Database for Offline Script Offline script identification system has got extra attention because in day to day life use of offline documents is much more than online documents. For offline document images the script identification process starts after the writing or printing is complete. Then the document is digitized by an appropriate scanner to obtain the images. These images are supplied as an input to the identification system. Example of such type of documents are postal document, filled up pre-printed application form, official letter content etc. In many cases the offline document contains more than one script, generally containing any local language mixed with English language. So script identification is must in these situations before choosing the appropriate OCR for the target language. In typical offline OCR system, input characters are read and digitized by an optical scanner or digital camera. Then the document image is supplied to the OCR system for identifying the script/scripts of the document image. Then the language specific OCR is
chosen from the OCR bank for final language identification. Many researchers have carried out researches concerning script identification on offline printed or handwritten document images and still it is in progress. In following section different offline handwritten database developed by different researchers are reported. CAMTERgt1.2.2.3 CAMTER is a professional group in Jadavpur University known as Centre for Microprocessor Applications for Training Education and Research. Many databases on Indic scripts mainly Bangla, Roman and Devnagari are developed and freely available for non commercial researchers. This publicly available database is located at https://code.google.com/p/cmaterdb [6, 7]. In CAMTERgt1.2.2.3 database 150 document images where Roman script is mixed with Bangla script is available. Their ground truth labeling is also done by using Bangla script as blue color and Roman script as red color. All the image files are saved in BMP file format. CAMTERgt1.5.1 This is also a publicly available database under CEMATER, Jadavpur University, Kolkata, available online at https://code.google.com/p/cmaterdb [6, 7]. Each document page contains Devnagari and Roman script distributed at word level, marking Devnagari by blue and Roman by red color. Total 150 document pages are available. All the image files are saved in BMP file format. CAMTERdb3 It is a publicly available database at character level located at https://code.google.com/p/cmaterdb/ [8, 9, 10, 11, 12]. This database has several sub versions like, CAMTERdb3.1.1, CAMTERdb3.1.2, CAMTERdb3.1.3.1, CAMTERdb3.1.3.2, CAMTERdb3.1.3.3, CAMTERdb3.1.4, CAMTERdb3.2.1, CAMTERdb3.2.3, CAMTERdb3.2.4. Under these databases more than 5000 character and digits of Bangla, Devnagari, Telugu, Arabic scripts are available. All the image files are saved in BMP file format. CENPARMI-U Center for Pattern Recognition and Machine Intelligence or CENPARMI is a research group Concordia University. CENPARMI-U is an Urdu handwriting database developed by M. W. Saqheer et al. in the year 2009 [13]. This database contains isolated digits, 44 isolated characters, 57 Urdu financial related words, numeral strings including/excluding decimal points, five special symbols. Around 18000 words are available under this database. IAM-Handwritting-DB
IAM database is one of the most popular databases in the field of optical character recognition. It has a huge repository of offline, online, historical document image database. IAM-Handwritting-DB is a Roman script database developed by U. Marti and H. Bunke [14, 15] which contains 1539 pages, 5685 sentences, 13353 lines, 115320 words. This database can be used for various application related to offline handwritten text recognition from document images. An automatic word segmentation scheme was also developed by M. Zimmermann and H. Bunke [16] to extract those words. All images are provided in PNG file format and all the preprocessing information is also provided in XML format [17]. KHTD It is a benchmark database developed by A. Aleai et al. in the year 2011 [18]. KHTD contains unconstrained 204 handwritten document pages which include 4298 lines, 26115 words. To incorporate maximum variety around 51 writers contributed to build this. Pixel information and content information based ground truth are provided for this database. KHTD is available freely for non commercial researchers. Devnagari-DB Devnagari is one of the most popular scripts in India with 328.23 million users. Devnagari-DB, a Devnagari script database was developed by V. J. Dongre and V. H. Mankar in the year 2012 [19]. Total 20305 characters, 5137 digits are available under this database and almost 750 writers contributed to built this database. All the images are stored in TIFF image format. This database is publicly available at https://code.google.com/p/devnagari-database [20]. UHSD It is an unconstrained Urdu sentence level database developed by A. Raza et al. in the year 2012 [21]. This corpus is a collection of different forms written by native Urdu speakers in their own handwriting. In UHSD around 400 forms were collected and about 200 different writers contributed to construct this database. QUWI QUWI is Qatar University Writer Identification dataset. A. Raza et al. developed this database in the year 2012 [22]. It is a sentence level database which contains Roman and Arabic scripts. 1017 volunteers of different age and gender groups were asked to write a predefined text in a given form. In this way around 4068 forms were collected to build the database. CVL-DataBase M. Diem et al. in the year 2013 [23] developed this database. It contains seven different handwritten texts out of which six are Roman and one Germen text. About 311 writers contributed to generate 2163 different forms. Images are stored in 300 dpi RGB color format. In
ICDAR and ICHFR writer identification contest an evaluation of the best algorithms was performed using the CVL-database [23]. Tamil-DB It is a Tamil handwritten word level city name database developed by Thadchanamoorthy et al. in the year 2013 [24]. Tamil-DB contains 265 city names written by Tamil script. The dataset is freely available for the non commercial use. Bangla-Numeral-DB B. B. Chaudhuri in the year 2006 [25] developed this Bangla numeral script database. A volume of 23392 offline isolated strings and numerals are present in this database. Tamil-Kannada-DB This database was developed at MILE lab of IISc Bangalore. It was built in 2010 by B. Nethravathi [26] as a part of Tamil and Kannada handwriting recognition work. 100000 words for each of the Tamil and Kannada scripts were developed under this project. 600 different users with different age and sex groups were contributed to build this. In table 2 the entire database reported so far are summarized. Table 2 Offline handwritten document image database for Indic scripts Sl. Database Name & Scripts Level Size No. Year Considered 1.
CAMTERgt1.2.2.3 [6,
2.
7], 2012
CAMTERgt1.5.1 [6,
7], 2012
Bangla, Roman
Word
150 document pages. Distribution at word level
Devnagari and Roman
Word
150 document pages. Distribution at word level
3.
CAMTERdb3 [8, 9, 10, 11, 12], 2012
Bangla, Devnagari, Telugu, Arabic
Character, Digit
More than 5000 characters.
4.
CENPARMI-U [13], 2009
Urdu
Word, Character, Digit
18000 words
5.
IAMHandwritting-DB [14, 15], 1999, 2002
Roman
Document, Line, Sentence, Word
1539 pages, 5685 sentences, 13353 lines, 115320 words, 657 writers
6.
KHTD [18], 2011
Kannada
Document, Line, Word
204 documents, 4298 lines, 26115 words
7.
Devnagari-DB [19], 2012
Devnagari
Character, Digit
20305 characters, 5137 digits
8.
UHSD [21], 2012
Urdu
Sentence
400 forms
9.
QUWI [22], 2012
Roman, Arabic
Sentence
4068 forms
10.
CVL-DataBase [23], 2013
Roman, German
Sentence
2163 forms
11.
Tamil-DB [24], 2013
Tamil
Word
265 city names
12.
Bangla-NumeralDB [25], 2006
Bangla
String, Character
23392 offline isolated numerals
13.
Tamil-KannadaDB [26], 2010
Tamil, Kannada
Word
About 100000 words
IAM-onDo-DB is one of the very useful online document level database developed by E. Indermühle, M. Liwicki, and H. Bunke in the year 2010 [30]. All the data was acquired using Logitech IO2 digital pen. In these documents various type of contents like text block, graphics, tables, formulas etc. text and non texts are taken providing a predefined template to the writers. Detail about this database can be available at [31]. Bangla-Numeral-DB Bangla numeral script database built in the year 2007 [25]. In contains online and offline string and isolated Bangla numerals. A volume of 8348 online numeral strings are available there for research purpose. Table 3 Online handwritten database for Indic scripts Level Sl. Database Name Scripts No. & Year Considered 1.
IAM-OnDB [27], 2005
Roman
Document, Line, Sentence, Word
more than 1700 acquired forms, 13049 isolated and labeled text lines, 86272 word instances from a 11059 word dictionary
2.
IAMonDo-DB [30], 2010
Roman
Document, Words, Lines
2
BanglaNumeral-DB [25], 2006
Bangla
String, Character
941 documents, 68841 words, 7616 text lines in text blocks, 1478 text blocks, 8348 online numerals
B. Database for Online Script In case of online script identification system the script is recognized in real time. That’s why they are also called real-time or dynamic document processing system. It uses a special writing device for writing the particular script. Writing device captures the temporal or dynamic information which includes the direction, number, duration, the order of each stroke, etc. It is also necessary to mention here that the term document image database will not be truly applicable in case of online system, rather we will use the term as online document database. Compared to offline script identification system the number of works reported in online script identification is very less. Few researchers have taken effort to develop standard database for online documents which are reported here. IAM-OnDB This is an online handwritten Roman script database developed by Liwicki, M. and Bunke, H in the year 2005 [27]. The database contains forms of unconstrained handwritten text, acquired with the E-Beam System [28]. It contains more than 1700 acquired forms, more than 13000 isolated and labeled text lines in on-line format and more than 86000 word instances from an 11059 word dictionary. Around 221 different writers contributed to build this database. It is available at [29] for non commercial research purpose. IAM-onDo-DB
Size
IV. Authors Contribution We are also working towards the development of a corpus of official Indic script document images for both printed and handwritten categories. Till date about 900 printed and 400 handwritten document level image database of different Indic script is constructed and it is still in progress. The database created by the authors is freely available to the document image processing researchers for non commercial use. Printed More than 900 printed document pages are collected till date and it is still in progress. Out of these document pages there are 300 Bangla, 100 Devnagari, 200 Roman, 60 Gujrati, 60 Oriya, 60 Telegu, 60 Kannada, 30 Kashmiri, 30 Malayalam and 30 Urdu script images. We
have tried to collect real life printed script data from different sources like books, articles etc. Those documents are digitized using HP scanner.
Fig. 3 (Printed) Sample images from our database of (a) Bangla, (b) Devnagari, (c) Roman (d) Oriya (e) Urdu (f) Gujarati (g) Telegu (h) Kannada (i) Malayalam (j) Kashmiri script documents in grayscale
Handwritten About 400 handwritten document image databases is prepared out of which 200 Bangla, 50 Roman, 50 Devnagari, 50 Urdu, 30 Malayalam and 30 Oriya scripts are present.
Fig. 4 (Handwritten) Sample images from our database of (a) Bangla, (b) Devnagari (c) Roman (d) Oriya (e) Urdu (f) Gujarati script documents in grayscale
Preprocessing A two stage based preprocessing mechanism is also developed by the authors to represent the data in 0-1 format if required. All the collected documents are digitized using an HP flatbed scanner at 300 dpi. Initially the digitized images are stored using 256 different intensity levels. A two stage based binarization algorithm is applied afterwards to generate the two-tone images. At first stage, to get an idea about different ROI (region of interest), pre-binarization is done based on local window based algorithm. Here depending on the window size some hollow/stray effect may be generated. To overcome
this effect RLSA (run length smoothing approach) is applied on the pre-binarized image. It helps to overcome the limitations (s) of the local binarization method used earlier and generates small blocks/components. Finally, component labeling is done, where each component generated by RLSA is selected and they are mapped in the original gray scale image to get respective zones of the original image. The final binary image (0-1) is generated using histogram based global binarization algorithm on these regions of the original image [3]. V. Challenges The main challenge for OCR research on Indic scripts is the availability of benchmark database. There are 13 official scripts in our country but as per table 4 till date for many scripts no standard database has been built. In scripts like Gurumukhi, Gujarati many works are reported, but no standard database development work has been reported in the literature. The difficulties of database development for all Indian scripts lies in the geographical barrier due to varying culture, language, religion etc. As the work is voluntary many writers shows unwillingness while they are said to contribute few lines of their native language. A sincere and long term effort is required to achieve the goal of building all official handwritten Indic script databases. Table 4 Indic script wise reported database Sl. No. Script Reported Work 1 Bangla 6, 7, 8, 9, 10, 11, 12, 25 2 Devnagari 6, 7, 8, 9, 10, 11, 12, 19 3 Roman 6, 7, 14, 15, 22, 23 4 Dogri 5 Gujarati 6 Kannada 18, 26 7 Kashmiri 8 Malayalam Author 9 Oriya Author 10 Punjabi 11 Tamil 24, 26 12 Telugu 8, 9, 10, 11, 12 13 Urdu 13, 21
VI. Conclusion and Future Scope In this paper a state-of-the-art on the development of offline and online database for official Indic scripts has discussed based on the available works reported in the literature so far. Different script databases are reported specifying their name, year of creation, volume, presented format etc. It can be observed that the database development trend on official Indic scripts has got a rapid growth since last couple of years. Among the reported work on offline and online databases out of 16 (13 offline and 3 online) databases 12 were developed after 2009. During the development phase a large number of writers with varying age group, sex and educational background were incorporated to capture maximum variability within the dataset.
13.
14.
15.
16.
17. 18.
Fig. 5 Year wise database development statistics
We have started this paper with reference to the script identification problem. To the best of our knowledge no script identification system has been developed till date considering all official Indic scripts. This fact encourages us to take upon the challenge to build an automatic script identification system for all 13 official Indic scripts. We have faced the problem of inadequacy of datasets for all official Indic scripts. That is why we are on the way to build our own offline handwritten database for all 13 official scripts. Our future plan includes strengthening this database built up process and benchmarking the result for script identification problem. References 1. 2. 3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
http://www.rajbhasha.nic.in/8thschedulehin.pdf accessed on 01.11.2014. D. Ghosh et al., “Script Recognition- A Review”, IEEE Transactions on PAMI, 32(12), 2010, pp. 2142-2161. K.Roy, U. Pal and B. B. Chaudhuri, "A System for Neural Network based Word-wise Handwritten Script Identification for Indian Postal Automation", Second International Conference on Intelligent Sensing and Information Processing and Control, 2005, pp. 581-586, J. Hochberg et al., “Automatic Script Identification from Document Images Using Cluster based Templates,” IEEE Transaction on PAMI, 19(2), 1997, pp. 176-181. Lijun Zhou et al., “Bangla/English Script Identification Based on Analysis of Connected Component Profiles”, LNCS, vol. 3872, 2006, DOI: 10.1007/11669487_22 R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri, D. K. Basu, “CMATERdb1: a database of unconstrained handwritten Bangla and Bangla–English mixed script document image”, International Journal on Document Analysis and Recognition, vol. 15, issue 1, 2012, pp 7183. S.Basu, C.Chaudhury, M.Kundu, M.Nasipuri, D.K.Basu, “Text Line Extraction from Multi Skewed Handwritten Documents”, Pattern Recognition, Elsevier, vol. 40, no. 6, 2007, pp. 1825 – 1839. N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu, “A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application”, Applied Soft Computing, vol. 12, 2012, pp. 1592-1606. N. Das, J. M. Reddy, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri, and D. K. Basu “A statistical–topological feature combination for recognition of handwritten numerals”, Applied Soft Computing, vol. 12, 2012, pp. 2486-2495. N. Das, K. Acharya, R. Sarkar, S. Basu, M. Kundu, and M. Nasipuri, “A Novel GA-SVM Based Multistage Approach for Recognition of Handwritten Bangla Compound Characters”, Proceedings of the International Conference on Information Systems Design and Intelligent Applications. Visakhapatnam, India, vol. 132, S. Satapathy, et al. Eds., ed: Springer Berlin / Heidelberg, 2012, pp. 145-152. N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “Handwritten Bangla Compound character recognition: Potential challenges and probable solution”, in 4th Indian International Conference on Artificial Intelligence, Bangalore, 2009, pp. 1901-1913. N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri, and D. K. Basu, “An Improved Feature Descriptor for Recognition of Handwritten Bangla Alphabet”, in International conference on Signal and Image Processing, Mysore, India, 2009, pp. 451-454.
19.
20. 21.
22.
23.
24.
25.
26.
27.
28. 29. 30.
31. 32.
33.
34.
M. W. Saqheer, C. L. He, N. Nobile, C. Y. Suen, “A New Large Urdu Database for Off-Line Handwriting Recognition”, in proceedings of 15th International Conference on Image Analysis and Processing, 2009, pp. 538-546. U. Marti and H. Bunke, “A full English sentence database for off-line handwriting recognition”, In Proc. of the 5th Int. Conf. on Document Analysis and Recognition, 1999, pp. 705 - 708. U. Marti and H. Bunke, “The IAM-database: An English Sentence Database for Off-line Handwriting Recognition”, Int. Journal on Document Analysis and Recognition, Volume 5, 2002, pp. 39 - 46. M. Zimmermann and H. Bunke, “Automatic Segmentation of the IAM Off-line Database for Handwritten English Text”, In Proc. of the 16th Int. Conf. on Pattern Recognition, Volume 4, 2000, pp. 35 - 39. http://www.iam.unibe.ch/fki/databases/iam-handwriting-database accessed on 01.11.2014. A. Aleai, P. Nagabhushan, U. Pal “A Benchmark Kannada Handwritten Document Dataset and Its Segmentation”, In Proceedings of International Conference on Document Analysis and Recognition, 2011, pp. 140-145. Vikas J Dongre, Vijay H Mankar, “Development of comprehensive Devnagari numeral and character database for offline handwritten character recognition”, Journal of Applied Computational Intelligence and Soft Computing (ACISC), Hindawi Publishing Corporation, 2012, doi:10.1155/2012/871834. https://code.google.com/p/devnagari-database/ accessed on 01.11.2014. A. Raza, I. Siddiqi, A. Abidi, F. Arif, “An Unconstrained Benchmark Urdu Sentence Database with Automatic Line Segmentation”, In Proceedings of International Conference on Frontiers in Handwriting Recognition, 2012, pp. 491-496. A. Raza, I. Siddiqi, A. Abidi, F. Arif, “QUWI: An Arabic and English Handwriting Dataset for Offline Writer Identification”, In Proc. of International Conference on Frontiers in Handwriting Recognition, 2012, pp. 746-751. M. Diem, S. Fiel, F. Kleber, R. Sablatnig, “CVL-Database: An Offline Database for Writer Retrieval, Writer Identification and Word Spotting”, In Proc. of the 12th Int. Conference on Document Analysis and Recognition (ICDAR), 2013, pp. 560-564. S. Thadchanamoorthy, N. D. Kodikara, H. L. Premaretne, U. Pal, F. Kimura “Tamil Handwritten City Name Database Development and Recognition for Postal Automation”, In Proceedings of the 12th Int. Conference on Document Analysis and Recognition (ICDAR), 2013, pp. 793-797. B. B. Chaudhuri, “A complete handwritten numeral database of Bangla-A major Indic script”, in proceedings of the 10th International Workshop on Frontiers of Handwriting Recognition, La Baule, France, 2006, pp. 379–384, B. Nethravathi, C. P. Archana, K. Shashikiran, A. G. Ramakrishnan, V. Kumar, “Creation of a huge annotated database for Tamil and Kannada OHR”, In proceedings of IWFHR, 2010, pp. 415-420. Liwicki, M. and Bunke, H., “IAM-OnDB - an On-Line English Sentence Database Acquired from Handwritten Text on a Whiteboard”, 8th Intl. Conf. on Document Analysis and Recognition, 2005, Volume 2, pp. 956 – 961 http://www.e-beam.com/home.html accessed on 01.11.2014 http://www.iam.unibe.ch/fki/databases/iam-on-line-handwritingdatabase accessed on 01.11.2014. E. Indermühle, M. Liwicki, and H. Bunke, “IAMonDo-database: an Online Handwritten Document Database with Non-uniform Contents”, Proc 9th Int. Workshop on Document Analysis Systems, 2010. http://www.iam.unibe.ch/fki/databases/iam-online-document-database accessed on 01.11.2014. S. M. Obaidullah, S. K. Das, K.Roy, “A System for Handwritten Script Identification From Indian Document”, in Journal of Pattern Recognition Research, vol. 8, no. 1, 2013, pp. 1-12. S. M. Obaidullah, S. K. Das, K.Roy, “Structural Feature Based Approach for Script Identification from Printed Indian Document”, in International Conference on Signal Processing and Integrated Network, Noida, 2014, pp. 120-124. http://commons.wikimedia.org/wiki/File:States_of_South_Asia.png visited on 01.11.2014.