A Component-Based System for Car Detection

13 downloads 0 Views 27KB Size Report
Mohan et. al. ... to test a component-based SVM approach, similar to Mohan [6, 5] and Heisele ... DMS-9872936, and National Science Foundation Contract No.
A Component-Based System for Car Detection

Kevin Thompson, Bernd Heisele & Tomaso Poggio Artificial Intelligence Laboratory and The Center for Biological and Computational Learning Massachusetts Institute of Technology Cambridge, Massachusetts 02139

@ MIT

http://www.ai.mit.edu

The Problem: The goal of this project is to detect passenger cars based on the results of several different component classifiers, each of which is trained only to recognize a small piece of a vehicle. We also hope to develop a component extraction process which can be generalized to other classes of objects. Motivation: Heisele [3, 4] was able to classify faces very accurately using a component-based approach and an SVM classifier. We want to extend that technique first to recognize cars, and then other objects. Cars were selected because intuitively, they would seem to have strong characteristic components. Recent work in Agarwal & Roth [1] suggests that a component-based approach is suitable for cars. Previous Work: In Papageorgiou and Poggio [7], cars were detected using overcomplete Haar wavelet features and a support vector machine classifier. Perona et. al. [9] applied a framework for unsupervised feature extraction, based on the use of an ”interest operator” from [2] to select potentially useful feature regions from high-pass filtered images. After correlating and clustering potential features in the positive data set, features of sufficient cluster size were validated in terms of how well they predict the object within a statistical model. Schneiderman and Kanade [8] used features that were probability distributions on one or more levels of wavelet coefficients. Histograms of these single- or multi-level wavelet features were used in a statistical model to detect cars. In addition, separate detectors were trained to recognize eight different car orientations, so that by combining their results the system was tolerant to a large variance in orientation. The idea of using complex, spatially localized component inputs to classify objects is evident to some degree in Perona [9], in the form of the image regions found by the interest operator. Mohan et. al. [6, 5] describes a pedestrian detection system which uses SVM classifiers on Haar wavelet data to detect head, arm, and leg components within specific regions of an image. These classifier outputs are then given to a final SVM, which classifies the pattern as a pedestrian or not. Heisele et al. [3, 4] applies a similar technique to face detection, using histogram-equalized grayscale components. This problem proved particularly suitable to component-based detection, due to the availability of 3D morphable head models for generating synthetic data. Two dimensional images generated from this data have known correspondence points, based on their projection from the 3D model, which can be exploited to automatically extract features around these reference points. Most recently, Agarwal & Roth [1] have attempted automated component extraction and component-based recognition on cars. They used the same interest operator as Perona [9] for selecting fixed-size components. For classification, they generate very large, very sparse binary feature vectors which contain information on these components’ multiplicity and their spatial relationships. They use the SNoW learning system to classify these sparse feature vectors. Approach: The goal of this project is to test a component-based SVM approach, similar to Mohan [6, 5] and Heisele [3, 4], on the domain of car detection. This technique will involve component selection, component classifier training and testing, feature selection for the final SVM, and optimization. Some of the potential benefits of this system include high recognition performance, based on the good results of those two previous component-based SVM systems, and a degree of rotation invariance (as described in Heisele [2]), thanks to minimal changes in specific components under object rotation. For component selection, we have tried manually selecting a small number of example components and then using correlation within windowed constraints to build component training sets. These training sets are handrefined based on their errors and re-correlated. We will also be testing automated extraction techniques, using an 144

interest operator to pick out starting points for component building and ”growing” the components as in Heisele [3, 4]. The architecture of the completed system will be similar to those described in the Mohan and Heisele papers. Components will be searched for within restricted search windows on a particular image. The output SVM will be trained on the output levels of the selected component classifiers, as well as on their position data. Component classifier output levels are fairly straightforward to get, but there are a few options for how to represent the position data. We will be testing deviation from mean location within the search windows (used by Heisele), intercomponent distances (used by Perona), and normalized inter-component distances. The results in [1] suggest that the angle between components may also be important. Impact: We hope to show that SVM component-based classification, which has already been successful at classifying faces, can be applied successfully to other domains. A general technique for component selection will allow this technique to be applied to a broad variety of problems. Future Work: Over the course of this research, we will be trying different techniques at component selection and different metrics for component classification. Afterwards, we will see how well our component selection and classification technique works on other types of objects. Research Support: Office of Naval Research (DARPA) Contract No. N00014-00-1-0907, National Science Foundation (ITR/IM) Contract No. IIS-0085836, National Science Foundation (ITR) Contract No. IIS-0112991, National Science Foundation (KDI) Contract No. DMS-9872936, and National Science Foundation Contract No. IIS9800032. Additional support was provided by: AT&T, Central Research Institute of Electric Power Industry, Center for e-Business (MIT), DaimlerChrysler AG, Compaq/Digital Equipment Corporation, Eastman Kodak Company, Honda R&D Co., Ltd., ITRI, Komatsu Ltd., Merrill-Lynch, Mitsubishi Corporation, NEC Fund, Nippon Telegraph & Telephone, Oxygen, Siemens Corporate Research, Inc., Sumitomo Metal Industries, Toyota Motor Corporation, WatchVision Co., Ltd., and The Whitaker Foundation. References: [1] S. Agarwal and D. Roth. Learning a sparse representation for object detection. In Proc. 7th Europeaon Conference on Computer Vision, volume 4, pages 113–130, Copenhagen, Denmark, 2002. [2] R.M. Haralick and L.G. Shapiro. Computer and Robot Vision II. Addison-Wesley, 1993. [3] B. Heisele, P. Ho, and T. Poggio. Face recognition with support vector machines: global versus componentbased approach. In Proc. 8th International Conference on Computer Vision, volume 2, pages 688–694, Vancouver, 2001. [4] B. Heisele, T. Serre, M. Pontil, and T. Poggio. Component-based face detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 657–662, Hawaii, 2001. [5] A. Mohan. Object detection in images by components. AI Memo 1664, Massachusetts Institute of Technology, Cambridge, MA, June 1999. [6] A. Mohan, C. Papageorgiou, and T. Poggio. Example-based object detection in images by components. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 23, pages 349–361, April 2001. [7] C. Papageorgiou and T. Poggio. A trainable object detection system: Car detection in static images. AI Memo 1673, Massachusetts Institute of Technology, Cambridge, MA, April 1999. [8] H. Schneiderman and T. Kanade. A statistical method for 3D object detection applied to faces and cars. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 746–751, 2000. [9] M. Weber, M. Welling, and P. Perona. Unsupervised learning of models for recognition. In Proc. 6th European Conference on Computer Vision, volume 1, pages 18–32, Dublin, Ireland, 2000.

145