Creation of precise alphabet fonts of early Brahmi script ... - CiteSeerX

17 downloads 274157 Views 1MB Size Report
can be slightly different depending on the creator. ... the 3rd century AD, Brahmi scripts was the basic language of ... Modern Sinhala period [12th AD-present].
Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012

Creation of precise alphabet fonts of early Brahmi script from photographic data of ancient Sri Lankan inscriptions Dammi Bandara1,2, Nalin Warnajith1, Atsushi Minato1 and Satoru Ozawa1 1 Graduate School of Science & Engineering, Ibaraki University, Hitachi 316-8511, Japan 2 Department of Archaeology, Colombo 7, Sri Lanka [email protected]

Sri Lanka is fortunate because of the diversity and richness of historical sources, especially literary sources like Mahavamsa and Deepawamsa [6]. In addition, there are the foreign sources in several languages which are useful for studying history in Sri Lanka[6]. All these literal sources were written in later periods. Inscriptions sources are of inestimable value because they are contemporaneous unlike most of the literary sources. If they have escaped from the ravages of man and nature, they remain in their original form. The early inscriptions are free of poetic embellishments and merely record events without didactic or pedagogic objective, unlike the Mahavamsa which was written for a particular purpose. Therefore, the historical value of inscriptions is very much enhanced[7]. When we get a newly unearthed inscription, we must first identify the letters inscribed on the stone surface. The stone surface is usually contaminated by various kinds of noises such as scratches, cracks, voids, etc. And also, the same letter sometimes takes different shapes depending on the skill of inscriber and the tool of inscribing. Up to recent days, the work of identification of letters has been carried out by human sense. We like to introduce more scientific method for the letter identification. In order to have a scientific method, the first step is to produce alphabet fonts of ancient scripts. The alphabet of early Brahmi script has already been created by archaeologist by studying the common features of letters found in inscriptions[5,8]. Their method is based on human sense and hand writing. Therefore, the shape of the alphabet can be slightly different depending on the creator. The aim of this research is to create more precise alphabet fonts of early Brahmi scripts in Sri Lanka by analyzing a lot of inscriptions by computer without depending too much on human sense.

Abstract — Inscriptions are used as resources for studying

ancient history of any counties of the world. Brahmi script is one of the most important ancient letters in South Asia. It became the matrix of Debanagari character used for Sanskrit and Hindi. And, it produced Burmese script, Khmer script, Thai alphabet, Laotian alphabet, Tibetan alphabet, etc., during the last two thousand years. The aim of this paper is to study the evolution of Brahmi script into Sinhala script on the basis of ancient Sri Lankan documents inscribed on stone surface. With the aid of modern techniques of computer image processing, precise alphabet fonts of early Brahmi scripts has been produced from photographic data of ancient Sri Lankan inscriptions. It has been shown that the produced fonts is available for establishing a method of automatic reading of ancient inscriptions by computers. Keywords — archaeology, processing, alphabet fonts

Brahmi

inscriptions,

image

I. INTRODUCTION Ancient Sri Lankan people used stone surfaces for writing letters. All ancient documents inscribed on stone surface are called "inscriptions". The Sri Lankan inscriptions broadly fall into four classes: cave, rock, slab and pillar inscriptions[1,2]. The document inscribed on the drip ledges of cave is called cave inscriptions. The document inscribed on natural stone is called stone inscriptions. And slab inscriptions and pillar inscriptions are the documents inscribed on human processed stone such as slabs and pillars [1]. The most of the inscriptions are records of donation to temples. From the analysis of the inscriptions, it has been found the following facts: (1) the donation was for the maintenance of the temple, (2) the inscription was for the instructions about how the donated things should be used. It issued orders of tasks which were needed for the welfare of the temple and the monks, (3) the inscriptions were used as a propaganda medium, (4) the inscriptions was used as instructions about social, cultural and economical matters, and (5) the donor attempted to immortalize his name [3]. Up to now, at least 3500 stone inscriptions have been discovered from various parts of Sri Lanka. About 1500 inscriptions among them have been copied by stone rubbing method[ ]. In addition of historical, social, religious and political information they provide remarkable evidence on language and literature [4,5].

II. EVOLUTION OF LETTERS IN SRI LANKA According to the reference of the Lalithavistara literature of the 3rd century AD, Brahmi scripts was the basic language of the date, which was studied by the Bodisttva Siddhartha Gautama Siddartha Gauthama [9]. In the 19th century, Asokan inscription was deciphered by Jemes Princep. The Asokan script was found to be the oldest in this region. This oldest scripts was named as Brahmi in the Lalitavitara [9]. Inscriptions of ancient Sri Lanka belonging to the 3rd century BC to the middle of the 1st century AD have been written in a script similar to the Asokan script of India. Likewise, the script 33

Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012 is similar to the script on the entrance of the pagodas at Barhut and Sanchi in India. So, the scripts in ancient Sri Lanka were influenced from the Indian scripts and they were the slightly developed form of the Indian scripts. From 3rd century BC to the present, Sri Lankan scripts have developed and changed as shown in Figure 1. The ancient letters shown here are handwritings and they depend on human sense.

experience in scripting, properties of the tools used (composition, sharpness, shape etc.). Therefore, it is needed to look for an ideal shape of each letter which is free from the above-mentioned distortions. The next section of this paper concerns the methodology of looking for an ideal shape of alphabet of early Brahmi script. III. COMPUTATIONAL METHOD OF PRODUCING BRAHMI ALPHABET FONTS

The source data of this study is already published grayscale images of early Brahmi script [6]. These images are taken from the paper copies of inscriptions called as estampages (rubbed copy). Up to present, it is a main and popular copying method of inscriptions in Sri Lanka. This method is composed of the following processes; (1) put some water on to the inscription and lay a special paper on it, (2) chopping on the paper, until the shape of letters engrave on the paper, (3) put ink on the paper by rubbing the surface of the paper with a piece of cloth with ink and let the paper to dry in some extent, and (4) remove the paper from the stone.

Fig. 1. Evolution of Brahmi scripts into Sinhala scripts. Here, (1) is Early Brahmi period, (2) is Later Brahmi period, (3) Transitional Brahmi period, (4) Mediaeval Sinhala period, (5) Modern Sinhala period.

Now, let us explain how to produce ideal alphabet fonts taking an example of the symbol used to express letter for "A". Fifty sample letters for "A" written on estampages of almost the same period were selected. These sample image data were processed by computer in the following several stages.

The evolution of letters can be divide in to five periods. They are Early Brahmi period [3rd BC-early 1st AD], Later Brahmi period [later 1st AD- 3rd AD], Transitional Brahmi period [4th AD-7th AD], Mediaeval Sinhala period [8th AD-11th AD], and Modern Sinhala period [12th AD-present]. This paper focuses on Early Brahmi scripts.

A. Scanning and separation of sample letters An optical image scanner was used to digitize the sample image data. The resolution of 300 pixels/inch was selected for scanning the estampages. The scanned image was saved in the JPG format. Abobe PhotoShop CS3 was used for the primary image processing and each letter was separated as shown in Figure 3. These separated images to create ideal fonts were presented in a fixed size of 1 inch x 1 inch for further processing.

The early Brahmi scripts were used in the period between 3rd century B.C and 1st century A.D [1]. The early Brahmi scripts had the alphabet of 38 letters. The ancient Sri Lankans usually used only 25 letters among them [10]. In some inscriptions, different symbols (shapes) were used to represent the same letter. For example, the early Brahmi script which corresponds Roman letter “A” is very common in these inscriptions, however, the shape of the symbol is slightly different as shown in Figure 2. The differences are on the angularity or covertures of the strokes attached on the left to the vertical stroke [10].

Fig. 3. Separation of sample letters from the scanned image

Fig. 2. Examples of Brahmi letters for “A” of almost the same period

Dr. S. Paranawithana, one of the great archaeologist in Sri Lanka said that these kinds of slight differences cannot be considered as stages of evolution of letters. This may be occurred due to conditions of the surface, geographical situation of the surface, variations of learning of the person, 34

Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012 C. Majority Algorithm for creating font image In order to find out ideal shape of alphabet of early Brahmi script, a lot of sample letters of the same period were examined and produced the font on the basis of "majority algorithm". The algorithm is the followings: (1) the values of identical pixels (, ) of the sample images were examined, and (2) the majority value (either 1 or 0) was assigned to the (, ) pixel of the font image. Mathematically, the font image produced by the majority algorithm is expressed in array data (, ) defined by the following equation: 1,  ∑# "$% " (, ) /' > 0.5 (3) 0,  ∑# "$% " (, )/' ≤ 0.5 where, (, ) is the array data of the produced font, " (, ) is the array data of k-th sample image, (, ) is the pixel position (0 ≤ i ≤ 299 and 0 ≤ j ≤ 299), and N is the number of the samples. (, ) =

Fig. 4: The images (a) and (c) are in the original direction and (b) and (d) show the images after manual rotation.

The angle of the original inscription basically depends on the geographical situation of the stone. We removed the tilting of the image so as to make the base line of the image horizontal by using the image editing software. Extra space generated by the alignment process was filled with black color. These processes are illustrated in Figure 4.

In order to study effects of sample number and also how the existence of noise affects on the produced font image, this font producing process was carried out for different sample numbers and for two different sets of samples, i.e., with and without removing major noises. The noise removing was carried out manually by using "brush function" of Abobe PhotoShop CS3. Here, it should be noted that only isolated noises between letters were removed. The noises on the edge of letter were left behind because they might be a part of the letter. Figure 5 illustrates examples of the two kinds of sample letter images.

B. Submission Procedure By using Matlab image processing software, the sample image of JPG format was converted into text file, where black and white pixels were expressed by 0 and 1, respectively. This text file was further processed by author's C program and the following C array data was produced for each sample letter image. 1,  ℎ   (, ) =  (1) 0,    Here, (,  ) means the position of a pixel located at i-th column and j-th row. The letter image data is expressed in the form of 300 x 300 two dimensional integer array. Namely, 0 ≤  ≤ 299 and 0 ≤ j ≤ 299. The obtained array data can be used for further mathematical processing to create ideal fonts. The mathematical processing was coded by C programming language under Unix environment.

Next step is finding the center of the mass of the letter, which is needed to fix the horizontal position of the image in a font space of 300 pixels x 300 pixels. The x-coordinate of center of mass was calculated using the following formula:

∑ x f (i, j) = ∑ f (i, j )

Fig. 5 The images (a), (b) and (c) are in the original state and the image (d), (e) and (f) are after removing major isolated noises. Table I: Example of font images produced by "Majority Algorithm". Font type A is produced from samples with noise and Font type B is produced from samples without noise. N is number of samples. The matching between the two types of fonts is shown in percentage.

i

X Center _ of _ mass

i, j

,

(2)

N

i, j

Here,  is expressed in the unit of pixel. The summation is taken for all pixels of the image, i.e., i=0 to 299 and j= 0 to 299. Then, the image was shifted horizontally so that the center of mass image became the middle point of the image area. Note that vertical position was automatically decided so that height of letter fits with the image area. The extra empty pixels generated by this process are filled with black color, namely, the corresponding elements of the array were assigned as zero. 35

Font type A

Font type B

Matching

10

87.1%

20

88.2%

Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012

30

90.1%

40

91.3%

50

90.9%

separated and each letter was expressed by text data of 0 and 1 which represent 300 x 300 pixels, (3) the text data for each letter was combined (see Figure 7) and a giant text data were produced which represent full image of the inscription, hereafter, it is called "formatted inscription image", (4) from this giant text data, an C array data G (i, j ) , where 0 ≤ i ≤ 3899 and 0 ≤ j ≤ 299 , was produced in a similar method as described in Section III-B, and (5) the correlation function in x-direction (horizontal direction) between font data, F (i, j ) and sample data G (i, j ) were calculated. Here, the correlation function, P(x) is defined by the following;

∑ G(i − x, j ) F (i, j ) P ( x) = ×100 ∑ F (i, j )

By using the two kinds of sample images, the two type of font images were produced for different sample numbers. The result is shown in Table I. It is seen from the table that about 30 samples are needed to obtain a stable font shape even if the noises were removed. By comparing matching percentage between the two kind of images with and without noise, it is concluded that effect of noise removal is not so significant if the sample number is enough.

i, j

(4)

i, j

The summation is taken for 0 ≤  ≤ 299 and 0 ≤ j ≤ 299 and for the case when the font image overlaps only one letter in the formatted inscription image. The correlation function represents the matching percentage between the font and one letter pattern in the formatted inscription image. In the actual examination, two kinds of font data (Font type A and Font type B in the bottom of Table 1) and two kinds of sample inscription (formatted inscription images (a) and (b) in Figure 7) were compared. The results are shown in Figures 8 and 9. Numbers of effective pixels of reference font image, ∑ F (i, j ) are 18,120 for Font type A (with noise) and 16,323

IV. APPLICATION OF PRODUCED FONTS Now, let us examine how the produced fonts are available in order to establish a method of automatic identification of Brahmi letters by computer. This is the first step for developing computer system of automatic reading of ancient inscriptions in Sri Lanka. The examination was carried out in the following procedures; (1) an inscription found in Nācciyārmalai area in Trincomale district, Eastern province [5] (see Figure 6) was taken as the sample data for this examination, which includes 13 letters, (2) the letters were

i, j

for Font type B (without noise).

Fig. 6. Original inscription found in Nācciyārmalai area in Trincomale district, Eastern province, Sri Lanka

36

Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012

Fig. 7. Two kinds of formatted inscription images (a) with noises, (b) without major isolated noises, Here X axis shows the pixel numbers along the images

70.00 Font type A

Correlation function, P(x)

60.00

Font type B

50.00 40.00 30.00 20.00 10.00 0.00 0

300

600

900

1200

1500

1800

2100

2400

2700

3000

3300

3600

Horizontal position in the in pixel unit, x Fig.8. Correlation function curves calculated for the formatted inscription image (a) with noises in Fig. 7, where the two types of reference font image are used.

37

Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012 60.00 Font type A Font type B

Correlation function, P(x)

50.00

40.00

30.00

20.00

10.00

0.00 0

300

600

900

1200

1500

1800

2100

2400

2700

3000

3300

3600

Horizontal position in the in pixel unit, x Fig. 9. Correlation function curves calculated for the formatted inscription image (b) without noises in Fig. 7, where the two types of reference font image are used.

Of course, sharper matching is occurred in the case of noiseless font and noiseless sample. The reason why the matching percentage is higher in Figure 8 than in Figure 9 is that the noises of the sample also contribute to the correlation. It can be concluded that the correlation function method successfully finds out a correct letter that matches with the font.

The best matching between the reference font image and one letter pattern in the formatted inscription image occurs at a position of the maximum peak in P(x) curves. It is seen from Figures 8 and 9 that all four curves takes a maximum peak in a range 2400 ≤  ≤ 2699 which corresponds to the region of the correct letter. In this sense, the existence of noises in sample images and font images can not be a serious problem.

38

Canadian Journal on Artificial Intelligence, Machine Learning and Pattern Recognition Vol. 3 No. 3, May 2012 V. CONCLUSIONS BIOGRAPHIES

This study is concerned on the shape of ancient letters in Sri Lanka. The shape of letters changes in time. On the other hand, even in a same period, slightly different shapes has been found for one letter. In this study, a lot of sample letters of the same period were examined and found out an ideal shape of alphabet of early Brahmi script. And also, fonts of early Brahmi script have been produced on the basis of "majority algorithm". At least 30 samples of the same letter were needed to obtain a stable font shape. In order to compare the produced font image with inscription image, a correlation function method has been proposed. It has been found that the correlation function method successfully finds out a correct letter that matches with the font image, where the existence of noises on the font image and sample image is not a serious problem. These information is valuable for author's next works of producing a digital repository of Brahmi alphabet which has a function of automatic reading of ancient inscriptions by computers. The design of the digital repository will be reported elsewhere.

Dammi Bandara: She was received Bachelor of Science degree in Archaeology and Masters in Archaeology from University of Kelaniya, Sri Lanka. She is working as a Research Assistant in Department of Archaeology, Colombo Sri Lanka. Presently She is following her PhD in Ibaraki University, Japan. She is doing researches on studying ancient inscriptions and about computational approaches in Archaeology.

Nalin Warnajith: He was received Bachelor of Science degree and Post Graduate diploma in Information Technology from University of Kelaniya, Sri Lanka. He was worked as a Systems analyst at University of Kelaniya Sri Lanka. Presently he is following his PhD in Ibaraki University, Japan. His research area is e-Learning and developing computerized systems for multimedia learning.

Professor Satoru OZAWA was born in Tokyo (1948).Prof. OZAWA completed his undergraduate study in Applied Physics from the Tokyo University of Education (1972). He finished his MSc. in Physics(1974) and received Doctor of Science degree (1977) from the same institute. In 1979, Dr. OZAWA started working as an assistant professor at Ibaraki University, Japan. Presently he is a professor of Synergetic (1992- ) at Ibaraki University. In his long career, Prof. Ozawa was the Director of Information Processing Center of Ibaraki University (1997-2009) and worked as a JICA Expert (1998, 1999, 2000-1, 2002, 2003). He also worked as Visiting Professor at the University of Leads (1993) and Invited Professor of Univ. Heidelberg (1993-4).

ACKNOWLEDGMENT This work was partly supported by Department of Archaeology, Colombo, Sri Lanka. The first and second authors equally contribute to this work. REFERENCES [1]

M. Dias, Lakdiwa Sellipiwalin heliwana Sinhala Bhashawe Prathyartha namayange vikashanaya, pp.1, Department of Archaeology, Colombo Sri Lanka,1996. [2] Rev. K. Amarawansha, Lakdiwa Sellipi,pp 10, MD Gunasena, 1969 [3] A.S Hettiarachchi,. Investigation of 2nd, 3rd and 4th century Inscriptions, Inscriptions. pp 78, Department of Archaeology, Colombo, Sri Lanka, 1990. [4] A.S Hettiarachchi,. Investigation of 2nd, 3rd and 4th century Inscriptions, Inscriptions. pp 55,105, Department of Archaeology, Colombo, Sri Lanka, 1990. [5] L.S. Perera, Lanka Ithihasaye Muulashra, Journal of Sri Lankan History, vol1, pp 62,Vidyalankara university Sri Lanka, [6] L.S. Perera, Lanka Ithihasaye Muulashra, Journal of Sri Lankan History, vol1, pp 47,Vidyalankara university Sri Lanka, 1964. [7] L.S. Perera, Lanka Ithihasaye Muulashra, Journal of Sri Lankan History, vol1, pp 40,66,Vidyalankara university Sri Lanka, [8] S. Paranavitana, Inscriptions of Ceylon, Department of Archaeology, Colombo, Sri lanka, 1970. [9] M.H. Sirisoma, Brahmi inscriptions of Sri Lanka from 3rd century B.C. to 65 A.D. Inscriptions. Pp 6, Department of Archaeology, Colombo, Sri Lanka, 1990. [10] Gunasekara, Bandusena, “Akuru Upatha (in Sinhala)” Godage, Colombo, Sri Lanka, 1996

39