2010 International Conference on Pattern Recognition
CUDA implementation of deformable pattern recognition and its application to MNIST handwritten digit database Yoshiki Mizukami∗ , Katsumi Tadamura∗ , Jonathan Warrell† , Peng Li‡ and Simon Prince‡ ∗ Graduate School of Science and Engineering, Yamaguchi University, Ube, Japan Email: {mizu,tadamura}@yamaguchi-u.ac.jp † Department of Computing, Oxford Brookes University, Oxford, UK Email:
[email protected] ‡ Department of Computer Science, University College London, London, UK Email:
[email protected],
[email protected]
their shape contexts obtained with the geometry relationship with the remaining points, second an optimal problem is solved for pairing reference points on the image with points on the other image, and finally regularized thin-plate splines provide a sub-pixel correspondence. Several methods for giving a pixel-wise correspondence have been proposed [6], [7]. Keysers et al. studied a non-linear deformation model with local context of 18-dimensional vector based on vertical and horizontal Sobel filters [6]. A regularization-based deformable recognition approach has been studied since 1994 for computing sub-pixel correspondence between input and prototype images [8], [9], where very simple iterative equations are given from calculus of variations, but it should be noted that the whole computational cost is proportional to the size of image. Especially, this problem becomes more serious in dealing with complicated shapes of characters. One of the solutions is to efficiently reduce the dimensionality by extracting features from the image. In 1998 this deformable approach was applied to Chinese characters with complicated shapes [10], where the dimensionality of the image was reduced by employing a directional feature. In addition, it was clarified that the combination of statistical classifier with the deformable approach could improve the performance. Nevertheless, the reduction of the computation cost in this deformable approach is still very desirable. In these days, graphics processing units (GPUs) have gained attention in research and development activity due to their fast parallel computing performance [11]. It was not formerly easy for usual programmers to implement their algorithms on GPUs since they had to learn knowledge and languages on graphics programming. However, the compute unified device architecture (CUDA) succeeded in providing a simple and powerful platform [12]. In this study we propose a novel deformable pattern recognition method based on the regularization framework. Distance maps are generated from the input and prototype images, then the correspondence between them are computed in an iterative manner. The computation time is dras-
Abstract—In this study we propose a deformable pattern recognition method with CUDA implementation. In order to achieve the proper correspondence between foreground pixels of input and prototype images, a pair of distance maps are generated from input and prototype images, whose pixel values are given based on the distance to the nearest foreground pixel. Then a regularization technique computes the horizontal and vertical displacements based on these distance maps. The dissimilarity is measured based on the eight-directional derivative of input and prototype images in order to leverage characteristic information on the curvature of line segments that might be lost after the deformation. The prototypeparallel displacement computation on CUDA and the gradual prototype elimination technique are employed for reducing the computational time without sacrificing the accuracy. A simulation shows that the proposed method with the k-nearest neighbor classifier gives the error rate of 0.57 % for the MNIST handwritten digit database. Keywords-handwritten character recognition; displacement computation; graphics processing unit; compute unified device architecture;
I. I NTRODUCTION Deformable approaches are a challenging topic in the field of computer vision and pattern recognition. One of the earliest studies is a rubber mask proposed by Widrow [1]. Deformable approaches have been applied to various problems such as face, object and character recognition. Many researchers are studying character recognition based on the modified-NIST handwritten digit database (MNIST) [2]. Their methods are mainly categorized into three approaches, that is, statistical, multilayer neural network, and deformable ones. Support vector machines (SVMs) are very promising in the statistical approach, and DeCoste and Scholkopf proposed an SVM-based method with artificially generated prototypes [3]. Ranzato et al. proposed a convolutional network with unsupervised learning method for sparse and overcomplete features [4]. As a deformable approach for establishing sub-pixel correspondence between input and prototype images, Belongie et al. proposed a three-step displacement computation [5], where first many reference points on the contour are selected based on 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.493
2005 2001
(b) prototypes (before deformation)
Figure 1.
The proposed method.
(a) input LQSXW̬I
SURWRW\SH̬J
Figure 3.
Figure 2.
[\
Displacement function (u, v).
tically reduced by CUDA implementation with prototypeparallel displacement computation (PPDC) and gradual prototype elimination (GPE). The dissimilarity between the input and prototype images is calculated based on eightdirectional derivatives and the computed displacement. The performance is evaluated on the MNIST database.
where the first terms of the right side, (¯ u[t] , v¯[t] ), are the averages of four neighborhoods. The iterative equations of (4) and (5) require the subpixel value of f , fx and fy in the second terms of the right side, which are calculated by using bilinear interpolation with the four surrounding pixel values. For stability, a modification has been applied to the above equations, where the averaged displacement function (¯ u[t] , v¯[t] ) is also used in the second terms instead [t] [t] of (u , v ) [9].
II. P ROPOSED METHOD As illustrated in Fig. 1, the proposed method has three main procedures, that is, displacement computation, dissimilarity measurement and classification. This section describes the details. A. Displacement computation
B. CUDA implementation with PPDC Since the above-mentioned iterative computation consists of local parallel processing, CUDA implementation is employed for reducing the computation time. Please note that the size of input and prototype images is too small to take advantage of GPU performance. Therefore, this study employs prototype-parallel displacement computation (PPDC) [14], where three large plates are generated by arranging input images, prototypes, and displacement functions at regular intervals respectively on the main memory, and then these plates are transferred collectively to the device memory. GPUs compute the displacement for multiple pairs of input and prototype images and update the displacement function plate (See. Fig. 3). After the parallel computation of displacement between the input image plate and the prototype image plate on GPUs, the displacement function plate is transferred back to the main memory. The bilinear-interpolation function of CUDA is utilized for fast computation of Eqs. (4) and (5).
The displacement computation in the proposed method [9] is originally based on a regularization method for a stereo matching problem [13]. As shown in Fig. 2, input and prototype images are referred to as f (x, y) and g(x, y), respectively. The horizontal and vertical displacements at the coordinate (x, y) on g are represented by a set of displacement functions, (u(x, y), v(x, y)). The optimum displacement between the two images can be obtained by minimizing the following functional E, E(u, v) =P (u, v) + λS(u, v), P (u, v) = (f (x+u, y+v) − g(x, y))2 dxdy, S(u, v) = (u2x + u2y + vx2 + vy2 )dxdy,
Prototype-parallel displacement computation.
imposes a smoothness constraint on (u, v), and λ is a socalled regularization parameter controlling the effect of S. From the framework of calculus of variations, the following iterative equations are derived from Eq. (1), 1 u[t+1] (x, y) = u ¯[t] − ¯[t] , y + v¯[t] ) fx (x + u 4λ (f (x + u ¯[t] , y + v¯[t] ) − g(x, y)), (4) 1 v [t+1] (x, y) = v¯[t] − ¯[t] , y + v¯[t] ) fy (x + u 4λ (f (x + u ¯[t] , y + v¯[t] ) − g(x, y)), (5)
Y[\ X[\
(c) prototypes (after deformation)
(1) (2) (3)
where P is the Euclidean distance with considering the computed displacement, S is a stabilizing functional which 2006 2002
Table I E RROR
Figure 4.
Figure 5.
k k k k
distance maps.
λ2 =1 =3 =5 =7
RATE OF THE PROPOSED METHOD (%).
1.3 0.64 0.61 0.65 0.73
1.4 0.63 0.58 0.68 0.76
1.5 0.65 0.57 0.68 0.79
1.6 0.66 0.59 0.69 0.77
1.7 0.68 0.62 0.68 0.77
segments of input and prototype images like straight and curve ones, resulting in losing the characteristic information on the curvature of line segments or contours. In order to take advantage of such information for classification, the Euclidean distance is measured based on the derivative images generated from the intensity images, 7 D(u, v) = (f k (x+u, y+v)−g k (x, y))2 , (6)
Eight-directional derivatives.
Gradual prototype elimination (GPE) is also adopted in order to reduce the computation time without sacrificing the recognition accuracy [15]. The proposed method employs a coarse-to-fine strategy, where the computation cost per pair is small at the early stage while it becomes heavier at the later stage. Therefore, the proposed method utilizes a number of prototypes at the early stage and the prototype images are reordered based on the dissimilarities to the input image. Then fewer prototypes close to the input image are utilized at the later stage. This procedure of GPE can be expected to keep the adequate prototypes until the final stage, with saving the computation cost.
x,y k=0
where the derivative images, f k (x, y) and g k (x, y), are composed of eight-directional derivatives with 45 degree intervals as shown in Fig. 5. Please note that due to the measurement with subpixel displacement only four directions are not sufficient. III. S IMULATIONS The proposed method was evaluated based on the MNIST database, which includes 10,000 input and 60,000 prototype images. The image size was originally of 28×28 pixels, but extended to 32×32 pixels by padding a four-pixel region. The number of stage for the coarse-to-fine strategy was set to 3, and the image sizes at each stage were 8 × 8, 16×16, and 32×32 pixels, respectively. The number of the prototypes at each stage in GPE, Ln , were 6,400, 1,600 and 400 (n = 0, 1, 2), respectively. L0 prototypes were selected based on their distances to the input pattern by using simple matching. The number of iterations was 100. The threshold for separating the foreground and background in making distance maps was 150. These parameters were empirically decided based on the preliminary results. In order to assure the symmetry of dissimilarity the displacement was computed twice by using the input and prototype images as a standard and the maximum of the two dissimilarities was employed as a determinate dissimilarity. The system consisted of Intel Core2Duo E8400 (3.0GHz) and nVidia GeForce GTX295 with Windows Vista SP2(x64), Microsoft Visual C++ 2008 and CUDA toolkit V2.3. Table I shows the error rate of the proposed method with different values of the regularization parameter at the final stage, λ2 , and different numbers of nearest neighbors, k. The values of λ0 and λ1 were set to 7.5 and 1.6, respectively. It can be noted that the smallest error rate, 0.57%, was obtained with λ2 = 1.5 and that the use of k = 3 gives lower results for different values of λ2 .
C. Distance maps for displacement computation Since the proposed method employs a coarse-to-fine strategy, the most precise displacement is computed in the final stage. According to the iterative equations of (4) and (5), the self-driving force described by the second term on the right-hand side is proportional to the derivative of the input image, and then only coordinates around the foreground pixels on the input image have self-driving force. On the other hand, the background coordinates distantly-positioned from the foreground pixels do not have self-driving force but passive driving force by the smoothness effect of the first term. Please note that since this smoothness effect has action-reaction relationship between foreground and background regions, the region around the foreground pixel loses its proper displacement due to the passive driving force from the background region. To overcome this problem, as shown in Fig.4, a distance transform technique produces a set of distance maps whose pixel values are given based on the distance to the nearest foreground pixels and the displacement computation is applied to these distance maps instead of the input and prototype intensity images. D. Classification with eight-directional derivative image The proposed method measures the dissimilarities between pairs of input and prototype images and decides the classes of the input image using the k-nearest neighbor classifier. It should be noted that the displacement computation might coercively associate the different-shaped line 2007 2003
R EFERENCES
We also investigated the effect of the distance map (DM) and the eight-directional derivative image (8DDI). When the intensity images (IM) were used for computing the displacement and measuring the dissimilarity, the error rate was 0.96% (k = 3, λn = {25 · 103 , 25 · 103 , 300 · 103 }), then by using 8DDIs for measuring the dissimilarity the rate was slightly improved to 0.89%. Meanwhile, when DMs were used for computing the displacement and IMs were used for the measurement, the error rate of 0.98% was obtained(k = 5, λn = {7.5, 1.6, 1.5}). Compared with the results shown in Table I, it was clarified that the usage of DM and 8DDI together was essential for improving the accuracy. Finally, let us mention how much the required time for computing the displacement was shortened by the CUDA implementation. The total time of computing the displacement per input pattern was 500 msec in the CUDA implementation, where 6,400, 1,600 and 400 prototypes were used in respective stages of GPE. Meanwhile, the total time was 20 sec in ordinary CPU implementation. A speed up of about 40-times has been achieved by the CUDA implementation.
[1] B. Widrow, “The rubber mask technique – II. pattern storage and recognition,” Pattern Recognition, vol. 5, no. 3, pp. 199– 211, 1973. [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [3] D. Decoste and B. Scholkopf, “Training invariant support vector machines,” Machine Learning, vol. 46, pp. 161–190, 2002. [4] M. A. Ranzato, C. Poultney, S. Chopra, and Y. Lecun, “Efficient learning of sparse representations with an energy-based model,” in NIPS, 2006. [Online]. Available: http://nips.cc/Conferences/2006/Program/event.php?ID=425 [5] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE PAMI, vol. 24, no. 4, pp. 509–522, April 2002. [Online]. Available: http://dx.doi.org/10.1109/34.993558 [6] D. Keysers, T. Deselaers, C. Gollan, and H. Ney, “Deformation models for image recognition,” IEEE PAMI, vol. 29, no. 8, pp. 1422–1435, 2007.
IV. C ONCLUSION
[7] S. Uchida and H. Sakoe, “A monotonic and continuous twodimensional warping based on dynamic programming,” in Proc. 15th ICPR, vol. 1, Aug. 1998, pp. 521–524.
A deformable pattern recognition method with CUDA implementation was proposed. In order to achieve the proper correspondence between input and prototype images, a pair of distance maps was generated. The dissimilarity was measured based on the eight-directional derivative of input and prototype images. The prototype-parallel displacement computation on CUDA and the gradual prototype elimination technique were employed for reducing the computational time without sacrificing the accuracy. The simulation showed that the proposed method gave the error rate of 0.57% for the MNIST handwritten digit database. The obtained rate is slightly superior to the result of the previous subpixel deformable approach (0.63%) [5], and the proposed approach with the simple iterative equations of (4) and (5) is more straightforward than their approach, which includes the three-step displacement computation. Although one of the pixel-wise deformable approaches gave smaller error rate of 0.52% [6], the difference is not so significant. It seems necessary to keep studying the possibilities of different deformable approaches. The best result of 0.39% was reported by using a learning network [4]. Essentially deformable approaches do not require advanced learning procedures for multi-class separation problems, therefore they can be expected to be more easily applied to the largeclass problems. We would like to compare these different recognition methods on large-class sets such as Roman alphabets or Chinese characters in the future. Unfortunately, the error rate obtained in this study was not better than some of the previous works, however the proposed method seems to be a very promising application of the regularizationbased deformable approach and the CUDA technology to pattern recognition problem.
[8] Y. Mizukami, K. Koga, and T. Torioka, “A handwritten character recognition system using hierarchical extraction of displacement,” IEICE, vol. J77-D-II, no. 12, pp. 2390–2393, Dec. 1994, (in Japanese). [9] Y. Mizukami and K. Koga, “A handwritten character recognition system using hierarchical displacement extraction algorithm,” in Proc. 13th ICPR, vol. 3, 1996, pp. 160–164. [10] Y. Mizukami, “A handwritten Chinese character recognition system using hierarchical displacement extraction based on directional features,” Pattern Recognition Letters, vol. 19, no. 7, pp. 595–604, 1998. [11] J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” in Eurographics 2005, State of the Art Reports, Aug. 2005, pp. 21–51. [12] J. Fung and S. Mann, “Using graphics devices in reverse: GPU-based image processing and computer vision,” in Proc. IEEE Int’l Conf. on Multimedia & Expo, Jun. 2008, pp. 9–12. [13] R. March, “Computation of stereo disparity using regularization,” PRL, vol. 8, no. 3, pp. 181–188, Mar. 1988. [14] Y. Mizukami and K. Tadamura, “GPU implementation of deformable pattern recognition using prototype-parallel displacement computation,” in Proc. DEFORM06, 2006, pp. 71– 80. [15] Y. Mizukami, T. Sato, and K. Tanaka, “Handwritten digit recognition by hierarchical displacement extraction with gradual prototype eliminations,” in Proc. 15th ICPR, vol. 3, 2000, pp. 339–342.
2008 2004