Fast Human Detection Using Node-Combined Part Detector

1 downloads 0 Views 5MB Size Report
with Deformable Part based Model (DPM) and over 125x as compared with .... EXPERIMENTS. We use the PASCAL VOC 2009 training dataset for training,.
2011 18th IEEE International Conference on Image Processing

FAST HUMAN DETECTION USING NODE-COMBINED PART DETECTOR Song CAO

Genquan DUAN, Haizhou AI

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

ABSTRACT Detecting people in occlusion and articulated pose remains a big challenging problem in computer vision. To achieve a fast and accurate human detection algorithm, Node-Combined Part Detector (NCPD) Model is proposed in this paper. We make two major contributions: (1) We propose a novel method, torso-nodes combination, to integrate part detectors. (2) We adopt stable part detectors described by Associated Paring Comparison Features (APCF) and trained with RealAdaBoost algorithm. This new human detection algorithm is not only much faster than the previous work but also maintaining competitive accuracy with the state-of-the-art human detection system. Besides, the algorithm performs better within low false alarm. For average time per image, our algorithm can achieve speedup rate of about 10x as compared with Deformable Part based Model (DPM) and over 125x as compared with Poselet Model. Index Terms— Object Detection, Node-Combined Part Detector, Occlusion, High Articulation 1. INTRODUCTION Object detection is to locate objects in images, e.g. face detection [1] and pedestrian detection [2], which is well studied in computer vision. However, Detecting people in occlusion and high articulation remains a big challenge. There are mainly two difficulties for human detection: 1) Humans are non-rigid objects which cause variations in contour, shape and color, thus it is hard to use one holistic classifier to describe all the situations and variations. 2) There are occlusions, due to a multitude of occluding accessories such as backpacks, clothes, bags, or due to other persons and objects. To handle this challenge, part based model becomes popular [3] [4] [5], which can be regarded as providing more variables to describe a highly varied object. But how shall we select and train these part detectors? How to integrate them into an efficient robust human detector? Various algorithms have been proposed for human detection to deal with occlusion or articulated pose. Deformable Part based Model (DPM) [5] based on Histograms of Oriented Gradients (HOG) features [2] combined with Latent Support Vector Machines (LSVM) training strategy was proposed

978-1-4577-1302-6/11/$26.00 ©2011 IEEE

in [5] for object detection, in which several part detectors are learned within the model root (a bound box of object). The authors established a star model which made each part detector has its deformable position relationship with the model root. The inner part detectors contribute to a better description of inner details of an object, which explores more information for object detection. Poselet is an innovative work that was first proposed in [6], which achieves state-of-the-art results in the detection and segmentation of human in PASCAL Visual Object Classes (PASCAL VOC) [7]. In Poselet, the authors randomly select patches from the training images as seed poselets (poselet can be folded hands, occluded legs, hands holding up and so on). Each poselet is described by HOG feature and trained with linear SVM. Then the random selected poselet detectors are cluttered and have their own prediction of potential human location. Many weak and random selected poselets indicate human position and achieve state-of-the-art results in PASCAL VOC human detection in the recent several years. However, two issues exist in the Poselet based detection algorithm. The first issue is that it is relatively time-consuming because much of the time is spent on the detection of poselets and exploiting context among poselets. The other one is that most of the random selected poselet detectors have a relative low accuracy and most of poselets indicate the same body parts like face and head shoulder. Reviewing progress of detection problems, Boosting trained detector, eg. face detection [1], pedestrian detection [8] has proven to be efficient and accurate. To achieve a highly efficient detection algorithm, we propose Node Combined Part Detector (NCPD) Model which involves four stable part detectors described by Associated Paring Comparison Features (APCF) and trained with Real-AdaBoost algorithm. Our approach is an experimental study on AdaBoost based part detectors for human detection. We consider precise and well-trained part detectors are the key to real-time human detection in occlusion and high articulation. We pick up several stable part detectors integrated by the torso-nodes as demonstrated in Fig.1. We consider our stable part detectors should not only have a high detection accuracy, but also cover most of poselets used in [4]. Therefore, in implementation, four stable part detectors (i.e. face, head shoulder, upper body, whole body) are adopted.

3650

2011 18th IEEE International Conference on Image Processing

We integrated stable part detectors through torso-nodes to establish our NCPD Model. This new human detection algorithm can speed up the detection procedure significantly while maintaining an competitive accuracy similar to the existing state-of-the-art methods.

Fig. 1. NCPD Model. The left image is the structure of our NCPD model. The right image explicitly demonstrates our stable part detectors Our contributions are summarized as follows: (1) NodeCombined Part Detector (NCPD) Model is proposed to integrate stable part detectors with torso-nodes. (2) Stable part detectors are learned by AdaBoost using APCF features which obtains high efficiency in human detection. The rest of this paper is organized as follows: The following Section gives the overview of our approach. Section 3 presents the NCPD Model proposed in this paper. While in Section 4, we demonstrate the training methods of our stable part detectors, Quantitative experiments and evaluations on PASCAL VOC test datasets are carried out in Section 5. Finally, conclusion and future work are offered in the last Section. 2. OVERVIEW OF OUR APPROACH Our approach mainly contains three steps. The first step is to train our part detectors. To improve the human detection accuracy, we should require our part detectors to be robust with fewer variations. Based on such an idea, we train detectors for parts, e.g. face, head shoulder, upper body and whole body which will be explicitly explained in Sec.4. The second step is to integrate our stable part detectors as an efficient robust human detector. We propose Node-Combined Part Detector (NCPD) Model in Sec.3.2, where each stable part detector has a prediction of the position of torso-nodes. Finally, postprocessing is made by non-maximum suppression. Following this procedure, we obtain our efficient human detector which achieves competitive results in several challenging datasets. 3. NODE COMBINED PART DETECTOR (NCPD) MODEL

poselets in the human detection system where each poselet represents a variable, thus each person can be described by a N-length vector based on poselets representation. However, in a detection problem, we should acknowledge that a N (usually N > 150) dimension space is large and extensively makes detection task more complexity. By observing that some variables are redundant and represent the same semantic meanings (e.g. many poselets are similar to face), we consider further reducing the dimension space by using limited, but principal variables. In practical, we suggest to use stable part detectors as the principle variables which have fewer variations in a highly articulated or occluded human. Motivated by [3], we define our part detectors to be face, head shoulder, upper body and whole body. These four detectors are stable and are suitable for human detection. Even in Poselet framework, most of the effective poselets are similar to these four body parts, and on the other hand, these four stable parts nearly cover most of useful poselets when poselets are applied in detection task. We have also considered adding in more stable detectors like legs, left body and right body in our algorithm. However, these detectors are in large variations and less discriminative as compared with background. To achieve high accuracy and efficiency, we do not adopt them in our current algorithm. 3.2. Integration of Stable Part Detectors Reviewing other tree structure models [5] [9], all the parts are integrated by one model root. Observing some empirical knowledge that torso is always under the head with fewer spatial variations, similar to Pictorial Structure [9] [10], our Node-Combined Part Detector (NCPD) Model is established in which torso is set as its root. However, different from [10], we adopt a new method, named as torso-nodes combination, to integrate our stable part detectors into an efficient robust human detector. Our method, applying Hough voting idea, uses the distribution of root configuration instead of root spatial center, to integrate our stable part detectors. After detection procedure of all four stable part detectors, assuming we get n part recalls where we rank them descending with detection scores as P1 , P2 , . . . , Pn . Specifically, P1 is the highest-probability part recalls. Let Li (N1i , N2i , N3i , N4i ) represent the root configuration of each part Pi . We can particularly consider Li as the torso-nodes distribution, where Nki is a Gaussian Distribution trained from training dataset. (In implementation, four torso-nodes refer to left/right shoulders and left/right hips). We integrated two part detector recalls i and j using Kullback-Leibler divergence as follows: 4 X Sij = DKL (Nki , Nkj ) + DKL (Nki , Nkj ) (1) k=1

3.1. Stable Part Detectors We consider that a human in high articulation and occlusion can be described by many variables. Assuming there are N

where Sij is an integration distance. If Sij is no larger than a threshold, then part Pi and part Pj belong to the same person. We consider integrating part recalls from the highest

3651

2011 18th IEEE International Conference on Image Processing

score one. We adopt this greedy search procedure because it utilizes the most reliable information first which owns a computational advantage. We sum up all the part recalls which belong to one potential human location as the final human detection score. Therefore, we integrate our stable part detectors under the framework of spatial consistence with the information from less varied torso-nodes. An example of integration strategy is demonstrated in Fig.2.

Fig. 2. Integration of Stable Part Detectors. Red, yellow and blue bound boxes demonstrate detection recalls of face, head shoulder and upper body respectively. As the torso-nodes distribution of face and upper body are close, they are integrated into the same potential human location. 4. TRAINING STABLE PART DETECTORS 4.1. Weak Features Previously, HOG feature combined with linear SVM is a classic method in pedestrian detection which has the advantage of capturing gradient information except its high computation complexity in both memory and time. We consider that both gradient and appearance features are important in a detection procedure, therefore we adopt Associated Paring Comparison Features (APCF) [8] which has been proved very efficient and accurate in pedestrian detection. APCF is a feature which describes invariance of color and gradient of an object to some extent and it contains two essential elements, Pairing Comparison of Color (PCC) and Pairing Comparison of Gradient (PCG). A PCC is a Boolean color comparison of two granules and a PCG is a Boolean gradient comparison of two granules in which a granule is a square window patch. For more details, please refer to [8]. 4.2. The Training Algorithm The Real AdaBoost [11] is used to learn Nested Cascade Detector [12] for part detection. For interested readers, please refer to [11] [12] for more details.

torso-nodes. To demonstrate the effectiveness and efficiency of our NCPD Model, we make the experiments on PASCAL VOC test dataset, using the same criteria as the PASCAL VOC detection competition, that is, the detection can be regarded as true positive only if it gets a ratio of overlap area to union area up to 50%. However, not as previous work in Deformable Part based Model (DPM) and Poselet, we do not use a bound box adjustment strategy as post-processing procedure, though according to reports, this adjustment strategy will improve the detection average precision for about 1% to 3%. All experiments are tested on a computer with Intel Core 2, 2.63GHz, 4GB RAM. Performance comparison. We compare the detection accuracy with two of the best human detection methods, Deformable Part based Model (DPM) and Poselet. The comparison with our NCPD Model is shown in Fig.3. These ROC curves are based on the part of PASCAL test dataset which were released with annotations. It can be found that our model (NCPD Model) gives relatively higher detection rate by 5% to some extent as compared with existing methods. We achieve better detection accuracy than Poselet in PASCAL VOC 2008 and 2010, while in PASCAL VOC 2009, we obtain a similar performance. However, we do not outperform Deformable Part based Model (DPM) in PASCAL VOC 2010. Speed comparison. We test our model for the speedup rate. The average times per image for each model and NCPD model speedup rate are summarized in Table 1 and Table 2 where PASCAL VOC 2008, 2009 and 2010 test dataset were used. It shows that Poselet is a time-consuming method. Our NCPD Model is faster than DPM, and achieves speedup rate for about 10x, and 125x as compared with Poselet. We admit that cascade DPM [13] has already improved the speed of DPM. However, our method still reach a speedup rate about 2x. While as reported in [13], to achieve high efficiency, cascade DPM might suffer a loss in accuracy comparing with original version of DPM. Fig.4 shows some results comparing Poselet with our method. Our method can better deal with occlusion and articulated pose (e.g. (a)(b) in Fig.4) than Poselet. Also, our NCPD model shows its effectiveness when integrating part detectors (e.g. (c)(d) in Fig.4). This torso-nodes combination idea helps us get higher performance in low false alarm rate by effectively integrating our boosted stable part detectors. Therefore, we achieve a fast and accurate human detection algorithm using our NCPD model.

5. EXPERIMENTS We use the PASCAL VOC 2009 training dataset for training, where we annotated the position of the four stable parts and

3652

Table 1. Average time per image for different models. average time per image PASCAL test dataset 2008 2009 2010 Poselet 112s 118s 121s DPM 8.95s 9.03s 9.01s NCPD model 0.89s 0.87s 0.93s

2011 18th IEEE International Conference on Image Processing

Fig. 3. ROC curves comparison for three different models. (a) PASCAL 2008 dataset (197 pictures, 412 annotations). (b) PASCAL 2009 dataset (72 pictures, 162 annotations). (c) PASCAL 2010 dataset (505 pictures, 737 annotations) Table 2. NCPD model speedup rate. average time per image PASCAL test dataset 2008 2009 2010 cf. Poselet 125.8x 135.6x 130.1x cf. DPM 10.1x 10.4x 9.7x

8. REFERENCES [1] P. Viola and M. Jones., “Rapid object detection using a boosted cascade of simple features,” in Proc. CVPR, 2001. [2] N. Dalal and B. Triggs, “Histogram of oriented gradients for human detection,” in Proc. CVPR, 2005. [3] G. Duan, H. Ai, and S. Lao, “A structural filter approach to human detection,” in Proc. ECCV, 2010. [4] L. Bourdev, S. Maji, T. Brox, and J. Malik, “Detecting people using mutually consistent poselet activations,” in Proc. ECCV, 2010. [5] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, 2010. [6] L. Bourdev and J. Malik, “Poselets: Body part detectors trained using 3d human pose annotations,” in Proc. ICCV, 2009.

Fig. 4. Detection Results. The first row is the detection results of Poselet. The second row is the detection results of our NCPD model

[7] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, no. 2, 2010.

6. CONCLUSION In this paper, we focus on human detection in occlusion and high articulation which remains a challenging problem in computer vision. We propose Node-Combined Part Detector (NCPD) Model which integrates stable part detectors using less varied torso-nodes into an efficient and robust human detector. Different from most previous part based work, we use AdaBoost with APCF features to train our part detectors. Our approach is well performing in occlusion and high articulation, and it demonstrates competitive detection accuracy and fast speed for human detection. We conclude that the model described in this paper for detecting people is equally applicable to other object categories. This is the subject of an ongoing research.

[8] G. Duan, C. Huang, H. Ai, and S. Lao, “Boosting associated pairing comparison features for pedestrian detection,” in Proc. ICCV Workshop, 2009. [9] P. Felzenszwalb and D. Huttenlocher, “Pictorial structures for object recognition,” International Journal of Computer Vision, vol. 61, no. 1, pp. 234–778, 2005. [10] M. Andriluka, S. Roth, and B. Schiele, “Pictorial structures revisited: People detection and articulated pose estimation,” in Proc. CVPR, 2009. [11] R. E. Schapire and Y. Singer, “Improved boosting algorithmsusing confidence-rated predictions,” Machine Learning, pp. 297–336, 1999. [12] C. Huang, H. Ai, B. Wu, and S. Lao, “Boosting nested cascade detector for multi-view face detection,” in Proc. ICPR, 2004. [13] P. Felzenszwalb, R. Girshick, and D. McAllester, “Cascade object detection with deformable part models,” in Proc. CVPR, 2010.

7. ACKNOWLEDGEMENT This work is supported by National Science Foundation of China under grant No.61075026.

3653

Suggest Documents