Preprocessing and Recognition of Characters in ... - CiteSeerX

1 downloads 0 Views 131KB Size Report
Tophat transformation, equalization and adaptive cropping have been applied so as to reduce the noise and correct the inclusion box. The training corpus.
Preprocessing and recognition of characters in container codes Ismael Salvador Igual Gabriela Andreu Garc´ıa Alberto P´erez Jim´enez Departamento de Inform´atica de Sistemas y Computadores Universidad Polit´ecnica de Valencia Camino de Vera s/n, 46071, Valencia, Spain {issalig,gandreu,aperez}@disca.upv.es

Abstract This paper describes the recognition of container code characters. The system has to deal with outdoor images which usually have damaged characters and obtain an answer in real time. Tophat transformation, equalization and adaptive cropping have been applied so as to reduce the noise and correct the inclusion box. The training corpus is artificially grown by applying controlled deformations of the samples which increase the recognition rate. Approximated nearest neighbours are used in order to perform a fast search with a large corpus.

1. Introduction Currently in the majority of trading ports, the gates are controlled by human inspection and manual registration. Using techniques of computer vision and pattern recognition, it is possible to build a system located at the gates of the port that records the entry or exit of the container. This type of systems will allow and automatic container identification. The general goal is to design an automatic system in real time to control the gates of trading ports. To achieve this goal, it will be necessary to develop a wide range techniques such as: image preprocessing, image segmentation, feature extraction and pattern classification. The process can be quite complex, the system has to deal with outdoor scenes, days with different climatology (sunny, cloudy), changes in luminous conditions (day, night) and dirty and damaged container (see Figure 1). It is also necessary to consider that the truck is not stopped when the image is acquired. A progressive scan camera was used. The process of acquisition and image segmentation were presented in a previous work [6], in this paper we present the feature extraction, the training process and the result obtained with

the classifiers. Usually, the design of a pattern recognition system involves the following activities: data collection, feature choice, model choice, training and evaluation. The next sections describe the techniques used in each of these stages and present experimental results.

2. Data Collection The segmentation process was carried out by applying the following steps: top-hat morphological operator, thresholding, connected component labeling and size filter. A previous [6] work describes this technique with more detail. A total set of 627 real images were used to obtain 10550 subimages of code characters. The image set was acquired in several days with different lighting conditions. Moreover digits and letters can be light or dark and they appear in both plain and non-plain surfaces. The non-plain surfaces can produce deformations in the code characters and shades (see Figure 1). These situations make difficult the recognition task.

3. Preprocessing The goal of the feature extraction is to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. The feature extraction is applied to sub-images obtained in the segmentation step. In our case, two pre-process techniques have been studied to represent the object: Sobel filter and top-hat operator [5]. The Sobel filter did not obtain good results. This filter applied to the object area introduced too much noise and it only contributed with information of edges and they usually are very damaged. The other pre-processing technique was the top-hat morphological operator.

1051-4651/02 $17.00 (c) 2002 IEEE

transformation is the shape and size of the SE [4]. This depends on the geometry of the structures to be found. If we want to detect dark valleys (black letters) of width smaller than w, a black top-hat with a SE slightly larger than w should be used in order to remove them in the closing step (for bright peaks the opening of the WTH is needed). These transformations produce a greyscale image.

3.2. Feature extraction

Figure 1. Examples of container codes

3.1. Top-hat operator Morphological operations work with two images: the original image and a structuring element (SE). The primary binary morphological operations, dilation and erosion, are naturally extended to greyscale by the use of min and max operations [7]. In the present work, the input data are greyscale image f (x, y) that can be thought as a set of points p = (x; y; f (x; y)) in the Euclidean threedimensional space. So greyscale morphological operations may be regarded as three-dimensional binary morphology. Let f be the original image and b the SE. The opening γ(f (x, y)) and closing φ(f (x, y)) greyscale operators are defined in terms of erosions b and dilations δb as: γ(f (x, y)) = max(i,j)∈b ([b (f )](x, y))

(1)

φ(f (x, y)) = min(i,j)∈b ([δb (f )](x, y))

(2)

As we have described in the previous section for each sub-image it will be necessary to apply the suitable tophat (WTH or BTH). In the segmentation process was defined a parameter that evaluates if the segmented image corresponds to an image with objects more luminous (white characters) that its background or viceversa. So when the image is segmented we know the appropriate operator to use in the feature extraction pre-processing. The feature extraction process is carried out on each of the sub-images obtained by the segmentation process. These sub-images can contain a container code character or other objects considered by us like noise. In order to obtain the characteristics of these objects the following steps are carried out: top-hat (WTH or BTH), equalization, cropping and scaling. Figure 2 illustrates the process.

Original

Tophat

Equalization and cropping

Scaling

0,3,2,0,197,...,250,3,1

Where the erosion and dilation are defined as follow: [b (f )](x, y)) = min(i,j)∈b [f (x + i, y + j) − b(i, j)] (3)

Figure 2. Steps carried out to extract features for each object

[δb (f )](x, y)) = max(i,j)∈b [f (x + i, y + j) + b(i, j)] (4) An important idea that resides on top-hats consists in using knowledge about the shape characteristics that aren’t shared by the relevant image structures [7]. Relevant structures can be removed by performing an opening or closing operation with a SE that does not fit them, in order to recover these structures an arithmetic difference is done. This arithmetic difference is the basis of the definition of morphological white (WTH) and black (BTH) top-hats: W T H(f ) = f − γ(f )

(5)

BT H(f ) = φ(f ) − f

(6)

Sometimes it is easier to remove relevant objects rather than finding irrelevant objects. An important parameter in this

After applying top-hat to the sub-image a greylevel image is obtained, it usually has a very poor dynamic range and the background is always darker than the object. Next, by means of an equalization the grey levels of this image are spread out and reach white. This technique increases the dynamic range and consequently produces an improvement in the image contrast (see Figure 2). Consequently this process increases distance between the grey values of the object pixels and the background pixels. The inclusion box obtained in the segmentation step may contain the character and some spots. Thus, if we have two characters of the same class, one with a little noise and other without it, their feature vectors can be quite different after rescaling them (see Figure 3). Therefore, the cropping pro-

1051-4651/02 $17.00 (c) 2002 IEEE

Border 0 1 2 3 4

Figure 3. Cropping in a noisy character cess has been applied in order to assure that in the considered sub-image the character is touching the edges of the same one. Given the equalization image E(p, q) with pixels e(x, y) for x = 0...p − 1, y = 0...q − 1 the minimum g1 and the maximum g2 grey values of E(p, q) are computed as follow: g1 = min(e(x, y) ∈ E(p, q))

(7)

g2 = max(e(x, y) ∈ E(p, q))

(8)

Border 0 0+1 0+1+2 0+1+2+3

where m ∈ [0, 100]. The cropping process begins in the first row and stops when it finds a row which does not fulfill the above condition. The same actions are carried out starting from the last row (p − 1) to continue analyzing p − 2 row, etc. Next we proceed with the columns in the same way. Finally, the sub-image is scaled to a grid of 12x24 obtaining 288 grey values. This 288 features were used to represent the objects.

4

Experiments

In this task 47 classes were considered, 26 for letters, 10 for digits, 10 for digits into a square (see the squared 8 in the last code of Figure 1) and 1 for the noise class which contains the most frequent errors as labels, mirrors, tires and some background objects. A total of 7638 samples have been used for the training corpus. The test set has 2912 samples. In order to perform the search we have used the approximated k-nearest neighbours (ANN) [1], which get good results in a very short time. ANN combines the simplicity and performance of classical k-nearest neighbours [3] with the kd-tree structure [2]. A kd-tree is a binary tree where each node represents a region in a k-dimensional space. Each internal node also contains a hyperplane (a linear subspace of dimension k − 1) dividing the region into two disjoint sub-regions, each inherited by one of its sons. Preliminary experiments revealed us that making no difference between digits and digits into a square got better results (see Table 1). Thus, we finally use 37 classes. We want to check if the segmentation step is able to get an adequate area of the objects, thus we have increased or decreased the

Recogniment 92.37 92.80 92.78 92.79

Table 2. Combination of several borders

(9)

height and width of these areas. A border of 2 means adding 2 pixels more at bottom, top, left and right side and border 0 means the original area. Table 1 shows that the best result is obtained with the original. Although original area works fine, we have applied the adaptive cropping explained in the previous section. Figure 4 shows the results with different croppings and borders. The best recognition rate is achieved with border 1 and a cropping percentage of 45%. Higher values of m decrease the recognition rate, this means that too much rows and columns are cropped and the character looses its identity. 0.94 Border 0.92

-1 0 1 2 3 4

0.9

Recogniment

g2 − g1 + g1 ] 100

no square (%) 92.3 90.8 90.6 88.9 85.3

Table 1. Results of considering square digits as a different class and making no difference from non squared ones

The row i is cropped if all of its pixels satisfy the following condition: ∀e(i, y) ∈ [g1 , m

square (%) 90.1 88.7 88.2 86.3 82.8

0.88

0.86

0.84

0.82

0.8 0

10

20

30

40

50

60

70

80

m

Figure 4. Results for different borders and cropping values for the original corpus Getting samples for a corpus is always a laborious and

1051-4651/02 $17.00 (c) 2002 IEEE

m (%) 0 25 30 35 40 45 50 55 60

expensive task. Thus a simple way for adding variability to our corpus consists in growing it artificially by adding some transformations of the original objects (see Figure 5). Translations in the 8 directions, rotations and several borders have been considered.

Recogniment (%) 93.92 94.33 94.54 94.78 94.81 94.26 93.92 93.54 93.34

Figure 5. Original, NW translation, 6 degrees rotation Table 3. Recogniment for the mixed corpus Individual transformations by themselves (see Table 2 and Figure 6) do not increase the performance very much, however if we combine the best for each corpus a higher recognition rate is achieved. 0.93 0 Border 1 2 0.925

Recogniment

0.92

0.915

0.91

0.905

5. Conclusions Container code characters are usually damaged and present some noise, this affects the inclusion box obtained in the segmentation step. In order to check the inclusion box experiments with fixed borders and an adaptive crop has been carried out. This last one improves the results with the original corpus. Moreover, controlled distortions of characters have been added to the training corpus. Combining this artificially grown training corpus with the adaptive cropping a good improvement of the recognition rate is achieved. Although the corpus is quite large, ANN perform the search under the time constraints of the system. In a immediate future we plan to apply principal component analysis (PCA) so as to reduce the dimensionality of the data and consequently the search time. Moreover after applying PCA it is expected to improve the results.

0.9 0

2

4

6

Angle

References

Figure 6. Results for different fixed borders and rotations

The new corpus is composed of the following sets: the original training corpus (border 0), its translations of 1, 2 pixels, rotations of 2, 4, 6 degrees; border 1, its translations of 1 pixel; rotations of 2, 4, 6 degrees. It’s important to note that the new size corpus has been multiplied by 32 having 244416 samples. However the fast search provided by the kd-trees permit a real time classification. This new corpus gets a 93.92% of recogniment rate. The next step is to explore which cropping value gets the best result. Table 3 shows the best rate is 94.81%, which is obtained for a value of m=40%. This result is approximately 2.5% better than the best one with the original corpus. These results show that adding variability by means of simple transformations the recognition rate can be improved with a low cost.

[1] S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. In Journal of the ACM, 1998. [2] J. L. Bentley, B. W. Weide, and A. C. Yao. Optimal expected time algorithms for closest point algorithms. ACM Trans. on Math. Software, 1980. [3] R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley & Sons, 2nd edition, 2001. [4] I. Giakoumis and I. Pitas. Digital restoration of painting cracks. In IEEE Int. Symposium on Circuits and Systems (ISCAS’98), June 1998. [5] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison-Wesley, 1993. [6] I. Salvador, G. Andreu, and A. P´erez. Detection of identifier code in containers. In IX Spanish Symposium on Pattern Recognition and Image Analysis, volume 1, May 2001. [7] P. Soille. Morphological image analysis: Principles and applications. Springer-Verlag, 1999.

1051-4651/02 $17.00 (c) 2002 IEEE

Suggest Documents