A Robust Neural Network Based Object ... - Semantic Scholar

2 downloads 0 Views 269KB Size Report
Let us indicate with = 0; 1;:::;m] the least square error (LSE) estimate of the ..... A. Petrosino, G. Salvi, R. Vaccaro, `Data parallel object recognition using arti cial.
A Robust Neural Network Based Object Recognition System and its SIMD Implementation Alfredo Petrosino1 , Giuseppe Salvi2 1

INFM - University of Salerno Via S. Allende, 84081 Baronissi (Salerno), ITALY [email protected] 2 Istituto per la Ricerca sui Sistemi Informatici Paralleli, IRSIP-CNR Via P. Castellino 111, 80131 Naples, ITALY

Abstract. Recognition of objects is a particularly demanding problem, if

one considers that each image must be interpreted in milliseconds (usually 30 or 40 frames/second). The problem becomes more dicult if the objects are distorted and/or partially occluded. In this case a sequence of local features are to be extracted, combined in a global shape description and classi ed as belonging to pre-de ned sets of known shapes (reference shapes). In this paper we propose a massively parallel object recognition system, which makes use of the multi polygonal approximation scheme for the extraction of rotation and translation invariant shape features, in connection with arti cial neural networks for the parallel classi cation of the extracted features. The system has been successfully applied for recognizing aircraft shapes in different sizes, orientations, with the addition of noise distortion and occlusion. Timings on the Connection Machine 200 are also reported.

1 Introduction Recognition of shapes is a fundamental task in computer vision [22]. Its use proves to be useful in many applications, such as automatic target recognition (ATR), characterization of biomedical images and identi cation of industrial parts for product assembly. Object recognition is in itself still an open problem. In the last decades, successful attempts have been made for the recognition of isolated objects with complete and well-de ned boundaries. The usual process consists in the boundary extraction by applying some segmentation techniques to the original grey scale image and then applying edge detectors and thinning algorithms to the binary image to retain the boundary width as small as possible. Once the boundary has been established the feature extraction task begins with extracting relevant features from each object present in the scene. The features are chosen so as to be invariant with respect to the object position, size and orientation. In addition many applications, including automatic target recognition (ATR), characterization of biomedical images, identi cation of industrial parts for product assembly, etc. require the recognition to take place in real-time and also in strongly noisy and occlusion conditions. This problem is very interesting if we consider a sequence of images with a resolution of (512  512) pixels at colours (8-24 bits for pixel) which are to be handled at a rate of 30 frames/second. The processing and the recognition of the contents of such images can be compared to the processing of 23.6 million of bytes/second. Thus, the use of more and more sophisticated parallel architectures and procedures of computation is natural and strongly needed, capable to deal with di erent data structures at the di erent levels of processing and interpretation. However, the use of parallel architectures gives rise to new problems concerning the choice of the modality of parallel computation (SIMD, MIMD) and the most ecient programming language. To operate a suitable choice, benchmarks on di erent images are necessary, analizing and reporting not only the total time

of the entire process of recognition, but also the time employed by each processing step. All this with the aim to keep under control both the qualitative accuracy of the process and the time employed for realizing it. Current literature on parallel object recongition essentially deal with graph matching approaches implemented in parallel [1] on tree search algorithms [10], geometric hashing [3, 24] and parallel hypothesis generation [23]. Here we show that a complete set of computer vision tasks implementing ltering, feature extraction and statistic classi cation can be ecientely implemented as a whole system on a massively parallel SIMD architecture. There are many techniques available to describe an object based on their boundaries, including among others Shape Numbers [8, 7], Moments [12, 6], Fourier Descriptors [25, 17, 9], Hough Transform [2], Medial Axis Transform (MAT) [4]. Among others, the polygonal approximation (specially theat of high order) [16] combined with the Circular AutoRegressive (CAR) model approach, formerly proposed by Kashyap and Chellappa in 1981 [14], represents a good solution to handle the problems of occlusion, distorsion and noise, while remaining a low computational cost method We shall use a modi ed version of the CAR model in combination with Arti cial Neural Networks (ANNs) as nonparametric classi ers. The modi cations introduced in the CAR model are based on the consideration of one set of predictive parameters, but related to di erent models of the shape under exam, each corresponding to a di erent polygonal approximation. We shall refer to it as the Multi-Polygonal AutoRegressive Model (MPARM). In addition, Arti cial Neural Networks (ANNs) are adopted due to their ability to adjust when given new information, neuronal massive parallelism typical of the SIMD parallel mode, fault tolerance to missing, confusing and/or noisy data. The paper is organized as follows. Section 2 brie y decribes the proposed approach whereas the third section reports an ensable of experimental recognition results and times taken on the Connection Machine 200. GREY LEVEL IMAGE FILTERING

GRADIENT BASED

DYNAMIC

EDGE DETECTION

THRESHOLDING

BINARY IMAGE

THINNING

CLASS

NEURAL CLASSIFIER

FEATURE VECTOR

MULTI - POLYGONAL

FEATURE

CAR MODEL

EXTRACTION

Fig. 1. The overall scheme of the object recognition system.

2 The Algorithm The overall system is depicted in Fig. 1 [20] and the operations can be summarized as follows. The rst stage is dedicated to detect the object boundary, after cleaning out the noise from the image by applying a linear smoothing ltering and a strongly noise independent segmentation procedure, based on the image entropy optimization

between foreground and background [13]. The boundary is then coded by using elementary but salient features. We adopt and compare two strategies for the feature extraction : the rst approach uses the variational angle sequence [15] (see Fig. 2 a)), while the second and more common approach uses the sequence of euclidean distances of points along the shape boundary from the centroid [5] (see Fig. 2 b)). B A

θ1

B

θ2

d2

A

C

d1

a)

C d3

b)

Fig. 2. The adopted shape description models: angle of variation a) and distance from centroid b)

The sequence of shape features is modelled as a Circular Auto-Regressive (CAR) process [14], which is a parametric equation that expresses each sample of an ordered set of data samples as a linear combination of a speci ed number of previous samples plus an error term. Since the sequence is assumed to be circular, then it is invariant to rotation and translation. The form of the model is:

yi = 0 +

Xm kyi?k

k=1

8i = 0; 1; : : : ; n ? 1

(1)

where yi is the current primary feature; yi?k is the feature detected k times before the current features; 0 ; 1 ; 2 ; : : : ; m are the unknown CAR coecients; m is the model order. Let us indicate with = [ 0 ; 1 ; : : : ; m ] the least square error (LSE) estimate of the CAR model. To improve the representativity of this solution the following method is adopted. If we x n, the number of line segments in the shape polygonal approximation, then p = b(B=n)c pixels will lie on the contour between two end points, and (p?1) polygonal approximations of the shape through model (2) are possible, depending from the starting point. The sequences of primary features generated for each of them may be slightly di erent. To overcome this problem we adopt an iterative scheme consisting of solving p ? 1 systems each having n equations and m + 1 unknowns. It is based on the consideration that the p ? 1 sequences are obtained in some order (clockwise or counterclockwise); rstly, the CAR vector for the rst sequence of primary features is computed, then for two sequences, three sequences and so on. The process is repeated p ? 1 times, according to the number of sequences or polygonal approximations. In particular, denoted with j the solution of the j -th system thus constructed, j = 1; : : : ; p ? 1, let us de ne

Tj j?1 "j = k k2 k k2 j = 2; : : : p ? 1 and "1 = k 1 k2 j j ?1 1 a measure of similarity. The nal feature vector  is set to the solution of the s-th system, where s = arg min0jp?1 f"j g. We shall refer to this way of proceeding as the Multi{Polygonal Auto{Regressive Model (MPARM). For sake of clearness, we denote in the following, as Scheme I the MPARM applied to the variational angles, whereas Scheme II is the MPARM applied to euclidean distances for the centroid. The sequences of CAR parameters are lastly fed into an Arti cial Neural Network (ANN) trained to classify the planar objects with the Conjugate-Gradient (CG) algorithm [21]. The process is repeated for the overall set of reference images. After learning, an unknown object passing through all the previous stages is classi ed as correct or not from the ANN with frozen weights. The overall system described above has been implemented on a recon gurable SIMD machine. The implemented algorithms were: convolution on a 3D mesh, thresholding on a 2D mesh, thinning and feature description on a 2D mesh, conjugate gradient for neural learning on a 1D mesh). We report in the next section the recognition performance and the parallel excecution times of the overal system and refer to [18, 19] for a detailed description of the parallel algorithms.

Fig. 3. Reference shapes.

3 Experimental Results 3.1 Recognition Performance The task faced was the recognition of ve di erently shaped aircrafts (Fig. 2). The data set was formed by changing each aircraft in size, orientation and position. Speci cally, the objects were 50 rotated and translated in random positions within the image boundaries. The size of each aircraft was varied from 0.25 to 1.25 times the size of the original. By doing so, 5040 shapes were generated. For each shape the representative CAR vector was then extracted by using both feature angles of variation and euclidean distances from the centroid as elementary features. The data set was consequently subdivided in 3600 shapes for the training and in 1440 shapes for the test of the neural classi ers. We carried out various experiments with CAR models with di erent orders m. The value m = 20 turned out to be optimal for our purposes. Table 1 shows the classi cation performance on the training and test sets for selected classi ers with di erent number of hidden units and shape description schemes.

Fig. 4. A typical recognition process: (a) the original image rotated, translated and scaled with Gaussian noise added ( = 30); (b) after applying median ltering and dynamic thresholding; (c) after applying the edge detection; (d) after applying thinning; (e) the shape recognized.

Recognition Results on Noisy Objects Noise and distortion e ects were introduced by adding random noise to the boundary points of 360 object shapes. To each contour pixel wass assigned a probability p of retaining its original coordinates in the image plane and probability q = (1 ? p) of being randomly assigned to the coordinates of one of its neighboring pixels. The degree of noise was augmented with increased values of the noise level q. We chose q equal to 1=3, i.e. one pixel over three was selected to change its own position. The set of noisy shapes was subdivided in 270 shapes for the training and in 90 shapes for the test of the neural classi ers. Table 2 reports the classi cation performance on the 90 test noisy shapes for the neural classi ers utilized in the subsection 5.1. As expected, the classi cation performance obtained was substantially degraded. To get better classi cation performance we re-trained the previously neural classi ers only with the 270 noisy shapes and, after convergence, they were tested on the 90 test noisy patterns. The results obtained are reported in Table 3. The best performance was achieved with 35 hidden neurons by using as elementary features both the angles of variation and the euclidean distances. Recognition Results on Occluded Objects A moderate amount of occlusion

is accommodated in our experiments. The occluded shapes have been obtained by cutting o a portion of the generated shapes with a straight line. The restriction on occlusion is that all the major portions, or branches, should remain; therefore, no major geometric property is changed. The set of occluded shapes has been subdivided in 540 shapes for the training and 180 shapes for the test of the neural classi ers. Table 4 shows the classi cation performance on 180 occluded shapes for the neural classi ers shown in the subsection 5.1. To improve the classi cation performance for the neural classi ers we trained the

previously neural classi ers with 540 occluded shapes. Table 5 shows the classi cation performance for the neural classi ers trained with occluded shapes.

3.2 Timings The parallel algorithms were implemented in C* language on the Connection Machine 200 (CM-200) [11], a ne-grain SIMD parallel computer con gured with 8192 1-bit wide processors equipped with the Data Vault. This last one is a mass-storage system capable to hold ve gigabytes of data and transfer them at a rate of 40 Mbytes per second. We remark that the CM-200 supports a powerful concept called a virtual processor (VP), according to which the memory of each physical processor is space shared across a program in a user transparent way. For example, a user can specify that an application uses 128K processors, even if only 8K are available. Furthermore, it is possible to con gure all the processors of Connection Machine into a multidimensional binary cube interconnection network by using special grid connections, called NEWS, between the processors. Once con gured, the CM-200 can be used as a 2D mesh of processors. As the computation progresses from one vision step to the next one, uniform partitioning is adopted. In the rst stage of the shape recognition scheme ( ltering, segmentation, edge extraction, thinning), the adopted re-mapping mode assures that performance gains are considerable, since at this stage there is no idle processor. This is not true for the second stage (feature extraction), where a contour following algorithm is involved, and the features are extracted only for the boundary pixels determined from the previous processing stage. There is of course evidence of a slowdown in performance, which is heavy in the context of the global performance reached. A MIMD solution should, in this case, give better performance. On the contrary, the expected performance gain due to the use of a neural classi er, where no idle processors are allowed, gives global ecient performance. Table 6 gives overall system performance from the rst vision step, linear smoothing ltering, to the neural classi cation for the two schemes adopted for di erent image dimensions (N = 64; 128; 256). Fig. 4 shows a typical recognition task over a Gaussian noise corrupted image, where the object to recognize is translated, rotated and scaled. A source of degradation in performance of the Scheme I against Scheme II is due to the need of calculating the trigonometric function involved in the computation of the angles of variation, which is slow on the Connection Machine-200. Commonly, a table lookup method is useful in computing an approximation to these functions. However, indexed addressing, which is required in lookup table, is as slow as indirect access on the Connection Machine-200, when each PE needs to access its table at a di erent address.

4 Summary and Conclusions In this paper, a massively parallel system for the recognition of 2D shapes by translation, scale and rotation invariant boundary representations is reported. The noise and occlusion immunity of the system has also been tested. The aim of this paper is born from the consideration that in the last decades a lot of computational vision algorithms have emerged as solution for designing ecient computer vision systems. Many of them are inherently sequential and results as regards the eciency and applicability on parallel platforms are not reported. Others, parallel in nature, need to be tested on existing parallel platforms in order to x their advantages and/or disadvantages when considered in a more complex vision framework. Thus, selection of ecient vision steps and their validation and testing on existing parallel platforms are mandatory and pro table for architecture designers. Indeed, one should note

that many special purpose architectures have been designed for image processing, while few have been dedicated for both processing and recognition stages. Our aim was to propose ecient parallel versions on SIMD machines of a selected set of algorithms very ecient from a qualitative point of view. Speci cally, two methods for extracting primary features are employed and compared: angles of variation and euclidean distances from central moments were used. As selector of features we adopted a scheme named by us Multi-Polygonal Auto-Regressive Model (MPARM), which works better than the basic CAR model in presence of radical changes over the contour, and when the sample points are too distanced apart from each other. In addition, ANNs with the CG learning were adopted. Extensive experimentation was made with a set of objects, everywhere translated, rotated and scaled, as well as with noisy and lightly occluded objects. The recognition scheme using angles of variation as primary features can recognize more unknown patterns than the scheme adopting the centroidal pro le representation (99.44% against 98.39% of correct recognition). The parallel system has been also implemented on a speci c SIMD architecture, the Connection Machine 200, and both recognition schemes have been evaluated in terms of the implementation eciency. The scheme based on centroidal pro le representation turned out to be more ecient, but this is heavily due to the slowdown introduced by the trigonometric evaluations introduced by the extraction of angles of variation. Speci cally, the time required from initial image capture from Data Vault to display of recognized object is approximately 0.3 seconds on a 8k-processor Connection Machine running at 8Mhz, for images of dimension up to 128  128 (VP ratio = 2), strongly outperforming the timings reported in [23].

Acknowledgements This work was supported by CNR Progetto Finalizzato 'Robotica' and by INFM Research Unit of University of Salerno.

References 1. R. Allen, L. Cinque, S. Tanimoto, L. Shapiro, D. Yasuda, `A Parallel algorithm for graph matching and its MasPar implementation', IEEE Trans. on Parallel and Dist. Systems, vol. 8, n. 5, pp. 490-500, (1997). 2. D. H. Ballard, `Generalizing the Hough transform to detect arbitrary slopes', Pattern Recognition, 13, 2, pp. 111-122 (1981). 3. O. Bourdon, G. Medioni, `Object Recognition Using Geometric Hashing on the Connection Machine', Proc. of Intern. Conf. on Pattern Recognition, pp. 596-600, 1988. 4. H. Blum, `A transformation for extracting new descriptions of shape', Proc. Symp. Models Perception Speech Visual Form, MIT Press (1964). 5. C. C. Chang, S. M. Hwang, D. J. Buehrer, `A shape recognition scheme based on the relative distances of feature points from the centroid', Pattern Recognition, 24, No. 11, pp. 1053-1063, (1991). 6. S. A. Dudani, K. J. Breeding, R. B. McGher, `Aircraft identi cation by moment invariants', IEEE Trans. Computer, 26, 1, pp. 39-46 (1977). 7. H. Freeman, L. S. Davis, `A corner- nding algorithm for Chain-coded curves', IEEE Trans. Computer, 26, pp. 297-303 (1977). 8. R. C. Gonzalez, P. Wintz, Digital Image Processing, Reading, MA, Addison-Wesley (1977). 9. G. H. Granlund, `Fourier preprocessing from hand-printed character recognition', IEEE Trans. Computer, 21, 2, pp. 195-201 (1972). 10. J. G. Harris, A. M. Flynn, `Object recognition using the Connection Machine's router', Proc. of Computer Vision and Pattern Recognition, pp. 134-139, (1986).

11. W. D. Hillis, The Connection Machine, MIT Press, Cambridge, MA, 1985. 12. M. Hu, `Visual pattern recognition by moment invariant', IRE Trans. Inform. Theory, 8, pp. 179-187 (1962). 13. J. N. Kapur, P. K. Sahoo, `A new method for grey level image thresholding using the entropy of the histogram', Computer Vision, Graphics and Image Processing, 29, (1985). 14. R. Kashyap, R. Chellapa, `Stochastic models for closed boundary analysis: representation and reconstruction', IEEE Trans. Inf. Theory, 27, 5, pp. 109-119 (1981). 15. N. R. Pal, P. Pal, A. K. Basu, `A new shape representation scheme and its application to shape discrimination using a neural network', Pattern Recognition, 26, 4, pp. 543551 (1993). 16. T. Pavlidis, `Algorithms for shape analysis of contour and waveforms', IEEE Trans. Patt. Anal. Machine Intell., 2, 4, pp. 301-312 (1980). 17. E. Peerson, K. S. Fu, `Shape discrimination using Fourier descriptor', IEEE Trans. System Man & Cybernetics, 7, 3, (1977). 18. A. Petrosino, G. Salvi, `A two subcycle parallel thinning algorithm and its parallel implementation on SIMD machines', IEEE Trans. on Image Processing, to appear, (1999). 19. A. Petrosino, G. Salvi, R. Vaccaro, `Data parallel object recognition using arti cial neural networks', IRSIP Tech. Report, 1998. 20. A. Petrosino, G. Salvi, `Recognizing planar objects with neural networks', Proceedings of Intern. Conf. on Neural Networks ICANN '94, M. Marinaro and P. Morasso eds., pp. 835-838, Springer Verlag (1994). 21. M. J. D. Powell, `Restart procedures for the Conjugate Gradient method', Mathematical Programming, 12, pp. 241-254 (1977). 22. P. Suetens, P. Fua, A. J. Hanson, `Computational strategies for object recognition', ACM Computing Surveys, 24, 1, (1992). 23. L. W. Tucker, C. R. Feynman, D. M. Fritzsche, `Object recognition using the Connection Machine', Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 1988, pp. 871-877. 24. C.-L. Wang, V. K. Prasanna, H. J. Kim, A. A. Khokhar, `Scalable Data Parallel Implementations of Object Recognition Using Geometric Hashing', Journal of Parallel and Distributed Computing, 1994. 25. C. T. Zahn, R.Z. Roskies, `Fourier descriptions for plane closed curves', IEEE Trans. Computer, 21, 3, pp. 269-281 (1972). NET TRAIN TEST 20-5-5 100.0 99.72 20-10-5 100.0 98.88 20-15-5 100.0 99.16 20-20-5 100.0 99.16 20-25-5 100.0 99.16 20-30-5 100.0 99.44 20-35-5 100.0 99.16 20-40-5 100.0 99.16 (a)

NET TRAIN TEST 20-5-5 100.0 99.44 20-10-5 100.0 98.88 20-15-5 100.0 98.88 20-20-5 100.0 98.88 20-25-5 100.0 99.44 20-30-5 100.0 99.16 20-35-5 100.0 99.16 20-40-5 100.0 99.16 (b)

Table 1. The recognition performance by using Scheme I (a) and Scheme II (b).

This article was processed using the LATEX macro package with LLNCS style

NET TEST 20-5-5 89.11 20-10-5 88.66 20-15-5 88.0 20-20-5 88.44 20-25-5 88.22 20-30-5 89.33 20-35-5 88.44 20-40-5 88.66 (a)

NET TEST 20-5-5 91.11 20-10-5 89.11 20-15-5 88.66 20-20-5 89.11 20-25-5 88.22 20-30-5 89.11 20-35-5 89.77 20-40-5 89.11 (b)

Table 2. The recognition performance by using Scheme I (a) and Scheme II (b) on noisy shapes.

NET TRAIN TEST 20-5-5 96.83 95.33 20-10-5 100.0 97.77 20-15-5 100.0 97.55 20-20-5 100.0 98.44 20-25-5 100.0 97.11 20-30-5 100.0 97.55 20-35-5 100.0 98.66 20-40-5 100.0 98.22 (a)

NET TRAIN TEST 20-5-5 98.54 98.44 20-10-5 99.23 97.33 20-15-5 99.82 98.44 20-20-5 99.82 99.33 20-25-5 98.80 98.00 20-30-5 99.57 98.22 20-35-5 99.74 98.66 20-40-5 99.82 98.66 (b)

Table 3. The recognition performance by using Scheme I (a) and Scheme II (b) for neural classi ers trained on noisy shapes.

NET TEST 20-5-5 92.21 20-10-5 91.88 20-15-5 89.55 20-20-5 92.99 20-25-5 89.55 20-30-5 91.22 20-35-5 90.33 20-40-5 89.44 (a)

NET TEST 20-5-5 87.88 20-10-5 76.22 20-15-5 78.66 20-20-5 81.33 20-25-5 78.10 20-30-5 81.77 20-35-5 79.66 20-40-5 82.66 (b)

Table 4. The recognition performance by using Scheme I (a) and Scheme II (b) on occluded shapes.

NET TRAIN TEST 20-5-5 100.0 97.59 20-10-5 100.0 98.70 20-15-5 100.0 99.07 20-20-5 100.0 99.25 20-25-5 100.0 99.25 20-30-5 100.0 99.25 20-35-5 100.0 99.44 20-40-5 100.0 99.44 (a)

NET TRAIN TEST 20-5-5 97.43 96.11 20-10-5 100.0 98.14 20-15-5 100.0 96.85 20-20-5 100.0 98.33 20-25-5 100.0 97.59 20-30-5 100.0 97.22 20-35-5 100.0 97.40 20-40-5 100.0 98.33 (b)

Table 5. The recognition performance by using Scheme I (a) and Scheme II (b) for neural classi ers trained on occluded shapes.

Recognition Scheme I II N = 64 (VP ratio = 1) 0.2595958 0.231238 N = 128 (VP ratio = 2) 0.384843 0.3357263 N = 256 (VP ratio = 8) 1.0183437 0.8930011

Table 6. The experimental times (in seconds) for the whole process of computer vision as

function of the size of the image (N  N ) with the angles of variation (Scheme I) and the euclidean distances from the centroid (Scheme).

Suggest Documents