Nighttime Pedestrian Detection by Selecting Strong ... - IEEE Xplore

2012 15th International IEEE Conference on Intelligent Transportation Systems Anchorage, Alaska, USA, September 16-19, 2012

Nighttime Pedestrian Detection by Selecting Strong Near-Infrared Parts and Enhanced Spatially Local Model Yi-Shu Lee, Yi-Ming Chan, Li-Chen Fu, Fellow, IEEE, Pei-Yung Hsiao, Li-An Chuangs, Yi-Hsiang Chen and Ming-Fang Luo 

Abstract—We propose a nighttime pedestrian detection method for a moving vehicle equipped with a camera and the near-infrared lighting. The objects in the nighttime environment will reflect the infrared projected. In some cases, however, the clothes absorb most of the infrared and make the pedestrian partially invisible in that part. To deal with this, a part-based pedestrian detection method according to the feature points marked on parts is used. Due to high computation load, selection of effective parts becomes imperative. In this research work, we analyze the relations between the detection rate/processing time and different numbers/types of parts. Besides, traditional training of the part detector normally requires a large number of occlusion samples. To overcome this problem, we learn the spatial relationship between every pair of two parts. The confidence of the detected parts can be enhanced even if some parts are occluded. While trying to refine pedestrians after detection, we use two filters and segmentation method to verify their bounding boxes. The proposed system is verified by experiments and appealing results have been demonstrated.

P

I. INTRODUCTION

edestrian detection is an important topic in the intelligent transportation system field. The purpose of pedestrian detection is to alert drivers if there are pedestrians around the vehicle, and thus can reduce the probability of accidents. Since the pedestrian is hard to be observed at nighttime, it is more critical than daytime to detect the pedestrian for assisting drivers. The illumination source at nighttime generally comes from the artificial light such as, streetlamp or the headlight of the vehicle. Since the illumination at night is weaker than that in the daytime, the nighttime pedestrian detection (NPD) systems proposed in many research use infrared camera to provide extra illumination. There are two types of infrared cameras: the far-infrared (FIR) camera and the near-infrared (NIR) camera. The near-infrared camera has higher resolution, higher dynamic range, less noise and lower cost than the far-infrared camera, and hence use of near-infrared camera which provides extra illumination becomes a more popular way used in the NPD systems. There are some difficulties in nighttime pedestrian detection: a) High contrast. Some of materials of the clothes will Yi-Shu Lee, Yi-Ming Chan, Li-Chen Fu, Li-An Chuangs and Yi-Hsiang Chen are with the Department of Computer Science and Information Engineering, Nation Taiwan University, Taipei, Taiwan, ROC. Ming-Fang Luo is with the Chung-Shan Institute of Science and Technology, Taiwan, ROC. Pei-Yung Hsiao is with the Department. of Electrical Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, ROC. This research is partially sponsored under the project NSC99-2221-E-390-039-MY3. 978-1-4673-3063-3/12/$31.00 ©2012 IEEE

absorb infrared, and it will make part of image invisible, and causes partially occluded problem, such as that shown in Figure 1(a). 2) Pose variations may affect the appearance. Different poses lead to intra-class variations. (Figure 1(b)) 3) Pedestrian may be occluded by other pedestrians or objects. The methods that take the appearance of the entire object may tend to fail in these cases. Human detection can be categorized into two types; one is holistic-based and the other is part-based [1]. Holistic-based human detection methods encode the entire human into the feature space [2-10]. One the one hand, since them assume that the entire body are visible, it is not suitable for the occluded cases. One the other hand, it is hard to handle the gesture variations. Gavrila [11] clusters the training samples by human postures, and then train a holistic detector for each posture. Nevertheless, the quantity of postures is too large to train holistic human detectors for all situations. To solve the problems mentioned above, part-based detectors had been proposed [12-18]. They divide human images into several parts, and then train a detector for each part. Part-based methods obtain better detection results than the holistic-based methods [1] in partially occluded cases because the part detector does not require the entire information of the human. The part selection is an important issue of a part-based detection method. Learning-based methods select meaningful parts which are frequently detected in the feature space. Some methods define the parts using specific locations in the training data [14]. Felzenszwalb et al. [15] describe a mixtures of multi-scale deformable part approach, which models unknown part positions as latent variables in an SVM framework. Nevertheless, this kind of part selection does not handle the relation between parts. It will combine a pair of side-view legs with a front-view head; the combination unfortunately leads to the false result. Therefore, we need a better part selecting method.Bourdev and Malik [17, 18] proposed a novel part-based detection method named “poselet”. It includes a part selection method which can deal with above problems and outperform Felzenszwalb [15] in the person category of the PASCAL VOC challenges. They mark keypoints on the human body, and define poselet as a human body part composed by specific keypoint configuration. For each poselet they train a poselet detector by their appearance feature. This approach, however, it do not provide the method of trimming the number of poselets and detecting partially occluded pedestrian. To deal with those problems, we employ

1765

different keypoints annotation and part selection methods in classifier construction, and also propose a new spatial relation model to facilitate the required enhancement. The main contributions of this paper are: a) propose a strong human part selection method in near-infrared environment that can cover most of human’s walking postures like Figure 1(b) to improve the performance of the detector, and b) propose a model that enhancing spatial relation of parts to solve the challenge of occlusions. The system flowchart is shown in Figure 2. In training phase rich informative parts are selected to represent a pedestrian, and then the training samples of each part are chosen. We encode those patches with the HOG feature and use a linear SVM to train the part classifiers, and then learn the spatial relationships between parts. At the last, a subset of the parts is selected as final features of the pedestrian. In the detection phase, the first step is the part detection. After that, those detected parts are grouped by their spatial relationship. Finally, the human confidence of each group is evaluated. This paper is organized as follows. Section 2 presents our strong human part selection method. Section 3 presents the position relation model. Section 4 presents our two filters. Section 5 shows the experiment results. And the last section is the conclusion. II. STRONG NEAR-INFRARED PART SELECTION We define the part as a fixed ratio rectangle region contains some keypoints in a positive training dataset. For choosing the part, the works in [17, 18] selecte many random windows from the training set, and then prune the set of part candidates. Nevertheless, there are better ways to choose parts and types of keypoints for nighttime pedestrian detection. There are two issues when choosing candidate parts: (1) Which parts are better? and (2) How many parts are enough? In this chapter, we explain how to choose parts and analyze the relation between the number of parts and the performance. A. Rich Informative Part Candidate Generation Similar to the H3D dataset [18], we use 2D keypoints to define the human pose. In our positive training dataset, humans are annotated with 9 types of keypoints (eyes, ears, nose, shoulders, elbows, wrists, hips, knees and ankles). Except for nose, there are two keypoints for every keypoint type. Hence, there are 17 keypoints on a pedestrian in total. Most of the keypoints lie on the joints; we can easily sketch the human skeleton using these keypoints (Figure 4 (a)). The features of

(a)

Figure 2 The system overview.

a NIR pedestrian are usually bilaterally symmetric, as shown in Figure 4(b). If we regard left and right keypoints as different keypoints, we may fail to group the part due to false detection of direction. Thus, we do not differentiate left or right keypoints. To choose the effective parts, Bourdev et al. randomly select 500 windows from the training set (seed windows) [17, 18]. To reduce the search space of parts, we consider constraints of chossing seed windows. In the nighttime environment, the images captured from NIR camera are noisy and textureless. Some silhouettes can be found in most nighttime pedestrian videos, such as the omega-shape of head and shoulder. Figure 3 shows three typical parts of a pedestrian. In our previous work [19], we separate pedestrian into three parts. To spread seed windows on distinct contour features, we choose windows around the three regions. These regions are the region of head-shoulder, the region of human body, and the region of human leg. There are overlaps between regions because the shoulder and hip regions contain rich information. We select 2000 random seed windows from the training set around the three regions mentioned above. For each seed window we extract patches from other training examples with similar keypoint configuration to the seed window by using the Procrustes distance. Since we regard left and right keypoints as the same, we will collect training samples with similar structure of gesture but with different facing directions. B. Strong Part Classifier Construction and Selection For detecting parts in an image, we construct a HOG-SVM classifier for each part. To improve the performance and to

(b)

Figure 1 Nighttime pedestrian detection challenges. (a) The example of cloth absorbs infrared rays. The clothes and the pants of the woman absorb infrared rays. It makes those parts invisible in the image. (b) The example of the walking postures of humans. When the view point change, the appearance of legs are quite different.

Original image

Head-shoulder

Body

Figure 3 Three main contours of pedestrian. 1766

Leg

rest of parts in the set S, is the ith selected part. Combine these two rank functions can take both the coverage and the accuracy into account, and thus improve the performance of final classifiers. III. ENHANCED SPATIALLY LOCAL MODEL (b) (a) Figure 4 (a) An example of 17 keypoints in a pedestrian and the skeleton sketched by the keypoints. (b) The left and right features on the pedestrian. From left up to right down are front, rear, left and right respectively.

reduce false positive cases, we propose a refinement algorithm to select a proper subset of 2000 part candidates. In [17] they use a greedy search to choose a subset of poselets that have high cross validation score while not being close in conﬁguration space to other poselets. In [18], they first choose poselets that covers the most examples, and then incrementally add a poselet that cover the most of not-yet-covered examples. If the number of poselet is large enough, these methods may work for finding appropriate poselets. When we reduce the number of parts greatly, e.g., select 20 representative parts from 2000 parts, we may fail to keep good detection rate. We analyze parts performance to select appropriate number of parts subject to the recall, precision and processing time. There are two phases in the part classifier selection procedure. In the first phase we use 2000 detectors to scanning our training samples, and compare the keypoint configuration in the detected window with the groundtruth (Figure 5). We develop a procrustes-like procedure to measure the difference between each part p and each detected sample di from positive training dataset. After sorting parts by the accuracy, we group the parts with the same keypoints and similar configuration. To handle the occlusion problem in the nighttime system, in the second phase, we consider the constraint that selected parts must be different enough with other parts. We iteratively choose final set of part classifiers according to the following equations: * + ( Set (

(1) ( (

) )

)

A. Position Relation Model Construction We calculate the mean and variance of the keypoints to model the probability of a keypoint appear in a specific position. The uncertainty of an interior keypoint i of a part is modeled by a 2D-Gaussian:

inkpi  xi , yi , covi 

outkpi  xi , yi , wi 

B. Grouping Parts by Their Spatial Relationship A detected part can provide the interior keypoint information and the probability of the occurrence of other keypoints in specific locations. When two parts match by using their position model, we call it “they are consistent”. The consistency between two detected parts and is defined as follows:



kpi kp 

Figure 5 Analysis performance by print out detected part to compare truth keypoint location. The red points are the keypoint in the groundtruth. The green boxes and points are the part and its keypoint detected. (a) The part that be matched successfully. (b) The part that fail to be matched.

(4)

where is the weight of the keypoint vector. Figure 6 is an example of a keypoint distribution model of a front head part. For every part, we estimate the relationship between it and other keypoints not appearing in the part. To make it simpler, we cluster similar relations and use a median vector to represent the cluster. A weight is assigned using the ratio of the number of keypoints within the cluster and the total number of the outer keypoints.

cons( ,  ) 

(b)

(3)

To learn the spatial relation between two parts, we first learn the vectors between the part and an outer keypoints i:

(2)

( ) is the rank of the in the sorted part, ) is the difference between the and the

(a)

We propose a representation of the relation between parts without connection. We add a new position relation model to record the relationship between two parts. It contains a keypoint distribution model for the keypoint location distribution within a part and a location model for the spatial relationship between parts.

match ,  (i) n(  )





match , ( j ) 

kp j kp

n( )

(5)

where match , (i) and match , (i) are the max probability of i-th keypoint predicted by  and  . ( ) and ( ) are the number of keypoints in part and . The consistency between two parts should be symmetric. Therefore, the consistency between two parts and is defined as the average of the matching score from to and to . Given a keypoint in the part , the score of matching the part ’s location vector or keypoint prediction can be expressed as:

1767

(a)

(b)

(c)

(d)

(a) (b) Figure 7 (a) Result of the original segmentation. (b) The enhanced segmentation result using road area filter.

Figure 6 A position relation model example of the head-shoulder part represented by the blue rectangle in the image. (a) We use a 2D-Gaussian model to record the position of every interior keypoint. (b)(c)(d) The vectors of outer keypoints. The orange point is the centroid the keypoints of the part.





 P kpi ,  xi , yi  if kpi  kp  kp  match ,  (i )   P kpi ,  xj , yj   wj otherwise max  j

 

 

where DI represents the distance image contributed by T

(6)

where is the Gaussian probability with . Two detected parts are grouped together when their consistency is greater than a threshold. The consistency between two part groups is defined as the average of the consistency between parts in different groups. Thus, at the end there will be several groups of parts produced; each group belongs to a pedestrian. IV. NIGHTTIME PEDESTRIAN DETECTION WITH NON-PEDESTRIAN FILTER In order to reduce the false alarm rate of detecting pedestrian by the Near-Infrared Parts, we filter out regions of non-pedestrian using light source filter and the road area filter. A. Light Source Filter There are many non-pedestrian bright objects in nighttime video, such as street lights, traffic signs, and headlights. In order to reduce the computational complexity and improve the stability of detection results, the light source filter has been devised to efficiently eliminate these non-pedestrian objects before the part detection step. It removes very bright areas with aspect ratio close to 1. Given a binary segmentation result of the input image with high threshold, we calculate the aspect ratio of every connected contour. If aspect ratio closes to 1, then we regard the region as ROI and to take the contour. We will determine whether the contour is circular or not. We apply the Chamfer distance transform [20] to calculate the distance between two contours. Assume IC is a contour of light candidate and IT is the circular template contour of IC. The transformation algorithm first finds a distance image for IT where each pixel value in the distance image denotes the distance to the nearest contour pixel. The next step is to superimpose IC selected as a corresponding contour over the distance image and then to sum up the values each of which is associated with the location corresponding to the contour pixel. The contour distance between IC and IT is defined as follow: ContourDistance( I C , DI T ) 

1 N



Ptk I C

DI T  Ptk 

the circular template IT, N is the number of contour points in the light candidate IC, and Ptk is an contour point belongs to light candidate IC. For consistency, we transform the distance measure into the probability form by a Gaussian function to limit the range from 0 to 1 and make the higher probability stands for the shorter distance. B. Road Area Filter Road information is a strong evidence of non-pedestrian. We assume that each pedestrian stands on the road surface, the bottom of the detection window contains road information. To each detection result R, we calculate the mean intensity of the bottom 1/5 of the R as the road surface intensity information. ∑ (

(

)

(8)

)

where r is the bottom 1/5 of video region R, and the is the mean road surface intensity. We set only the last row of R as road surface at the beginning, and then we update each pixel iteratively until there are no more pixels can be updated: ( )

{

| ( ) |( )

(

)|

|

(9)

where p is a pixel of detection result R, is the neighbor pixel with the most similar intensity to p. are the pre-defined thresholds to control the similarity of intensity. If the threshold is larger, the tolerable intensity difference between two pixels can be larger. In our case, and . We mask the intensity of road surface, to make the final output result better. After filtering the road area, we adopted a robust local thresholding segmentation approach [19] in the post-processing stage. Instead of only using horizontal projection, Lin et al. consider both vertical and horizontal projection at the same time. After obtaining the separated binary result image, we use connected component algorithm to get each final detection result. Figure 7 is an example segmentation result before and after applying the road area filter. V. EXPERIMENTS

(7)

The NPD experiment is shown in this section. At first, the environment setting is shown, including the NIR camera 1768

setting, and the specification of the videos. And the part performance is evaluated via true positives, false positives, and false negatives. At last, the comparison of our work with others is shown. A. Environment Setting There are two near-infrared projectors and one near-infrared camera in the night-time pedestrian detection system. The camera and projectors are located in front of the vehicle, with height the same as the headlight. The resolution of the test images are 320 240 pixels. Most of the pedestrians are upright or walking in our dataset. Some cases of partial occlusions occur, including pedestrians walking across the road, and some of their clothes absorbing infrared rays. The detection system is tested with our video sequences off-line.

value

B. Part Performance Analysis Our positive dataset contains 952 upright pedestrians and their keypoint annotation. There are 477 side view pedestrians, 199 back view pedestrian, and 176 front view pedestrians in the dataset. There is no occlusion case in the training dataset. The negative dataset contains 539 images where 453 of them are from INRIA dataset, and 86 are from our video dataset without pedestrian. The final combination of selected parts will depend on the number of parts we selected. We want to cover important appearance information while reducing the number of selected parts. If we choose less than 15 parts, we need more than 6 keypoints in each part; for 15 to 30 parts, more than 5 keypoints in each part; 30 to 50 parts, more than 4 keypoints in each part; when the number of parts is greater than 50, 100 90 80 70 60 50 40 30 20 10 0

Recall rate Precision rate Processing time (fps)

5

10 15 20 35 55 80 110

number of parts Figure 8 Recall, precision and execution time of the detector with different number of parts combined.

more than 3 keypoints in each part. In the analysis of parts distribution, there are 3/5 parts come from Head-shoulder and body parts and 2/5 parts come from leg part. We analyze the recall, precision and processing time of 5, 10, 15, 20, 35, 55, 80 and 110 parts (Figure 8). We select 15 parts as our final classifier (Figure 9). C. Video Database and Detection Result Our testing video dataset contains 8 videos with 6832 frames. The videos are shot in the basement, and on the urban road with trees. There are many kinds of scene such as uniform environment, overexposed, low illumination, occlusions and partially occlusions. A partially occluded case is shown in Figure 10. Video1, 4, 5 contains side view pedestrians walking in front of the camera with distance from 20 to 40 meters. Video2, 3, 6 contains front view and back view pedestrians walking in front of the camera with distance 14 to 50 meters. Video3 contain cases with major occlusions by other pedestrians, and cases with clothes absorbing infrared light. Video5, 6, 7, 8 are captured with moving background. Except the Video4 is shot in the basement, all the other scenes are captured at urban road. In the occlusion case, if the overlap region of two pedestrian is greater than half of the pedestrian’s area, we will skip the occluded pedestrian. We use recall and precision to analyze our system performance; higher recall means our system finds more pedestrians in the testing dataset, and higher precision means the system generates less false detections. The performance of our system in our video dataset is shown in Table 1. In Video1, 2, background’s intensities differences are high. Since we can segment them easily, thus, the recall and precision are high. In Video3 there are some women partially occluded by the clothes absorbing infrared rays and unfortunately overlaps with another pedestrian, making their torso prediction hard. Video4 is shot in the basement; the clutter background with complicated pipeline makes the detection of part hard. In video 5 and 6, since they are shot in dynamic environment and there are lights, trunks and obstacles. Thus, the precision are affected easily by cluttered background. In video 7, we shot the video from side of pedestrian. In video 8, most of the scene is low illumination environment; it makes the detection of parts hard. There are some detection results shown in Figure 11. The Figure 12 shows the comparison of ROC curves. In the case of occlusion and partial occlusion, our detecting result outperforms the other methods. VI. CONCLUSION

Figure 9 Some parts of selected 10 parts. These parts can cover most walking gestures like Fig 1(c).

(a)

(b)

Figure 10 (a) The partially occluded case by clothes absorbs infrared. (b) The partially occluded and heavy occlusion case.

We present a part-based pedestrian detection system for nighttime environment. The pedestrian is represented by a set of human body parts with specific and concise keypoint configuration inside. For each part, its positive training samples are found in training dataset with the same keypoint configuration. For each positive training sample, the appearance-based feature is used for the part classifier, and then learns the spatial relation between this part and others. The feature selection procedure reduces the number of parts. In 1769

TABLE 1 SYSTEM PERFORMANCE.

REFERENCES

Frames

Type

Recall

Precision

Video1

450

Side 40m

0.964

0.965

Video2

937

Front 50-14m

0.998

0.991

Video3

301

Front 40-20m

0.916

0.883

Video4

752

Side 20m

0.914

0.919

0.875

0.952

0.996

0.906

0.872

0.947

0.935

0.873

0.934

0.930

Video5

228

Video6

226

Video7

1710

Video8

2228

Side Dynamic view Back Dynamic view Dynamic distance and angle Dynamic distance and angle

Average

(a) Video 1

(b) Video 2

(c) Video3

(d) Video4

(e) Video5 (f) Video6 (g) Video7 (h) Video8 Figure 11 Some detection results, the yellow rectangles are the bounding box predicted by our system

Figure 12 The ROC curve of the experiment under partial occlusion cases.

our experiments, 15 strong parts is enough for our scenarios. The reduction of the parts is for maintaining high detection efficiency. To detect pedestrians in the image, light sources are filtered out first. Next those parts are used to match against the image and grouped by their spatial relation. Each group of part is represented as a pedestrian. We propose a road area filter and a segmentation method to refine and verify the finally bounding boxes. In the future, the more part categorize method is going to be studied. As in [15], a coarse-to-fine scheme may help improve the efficiency of the detector.

[1] D. Geronimo, A. M. Lopez, A. D. Sappa, and T. Graf, "Survey of Pedestrian Detection for Advanced Driver Assistance Systems," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 1239-1258, 2010. [2] A. Broggi, R. L. Fedriga, and A. Tagliati, "Pedestrian Detection on a Moving Vehicle: an Investigation about Near Infra-Red Images," in IEEE Intelligent Vehicles Symposium, 2006, pp. 431-436. [3] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in IEEE Conference on Computer Vision and Patter Recognition, 2005, p. 886~893. [4] M. Mahlisch, M. Oberlander, O. Lohlein, D. Gavrila, and W. Ritter, "A multiple detector approach to low-resolution FIR pedestrian recognition," in IEEE Intelligent Vehicles Symposium, 2005, pp. 325-330. [5] F. Suard, A. Rakotomamonjy, A. Bensrhair, and A. Broggi, "Pedestrian Detection using Infrared images and Histograms of Oriented Gradients," in IEEE Intelligent Vehicles Symposium, 2006, pp. 206-212. [6] Z. Li, W. Bo, and R. Nevatia, "Pedestrian Detection in Infrared Images based on Local Shape Features," in IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1-8. [7] J. Ge, Y. Luo, and G. Tei, "Real-Time Pedestrian Detection and Tracking at Nighttime for Driver-Assistance Systems," IEEE Transactions on Intelligent Transportation Systems, vol. 10, pp. 283-298, 2009. [8] E. Binelli, A. Broggi, A. Fascioli, S. Ghidoni, P. Grisleri, T. Graf, and M. Meinecke, "A modular tracking system for far infrared pedestrian recognition," in IEEE Intelligent Vehicles Symposium, 2005, pp. 759-764. [9] M. Bertozzi, A. Broggi, A. Lasagni, and M. D. Rose, "Infrared stereo vision-based pedestrian detection," in IEEE Intelligent Vehicles Symposium, 2005, pp. 24-29. [10] R. Arndt, R. Schweiger, W. Ritter, D. Paulus, and O. Lohlein, "Detection and Tracking of Multiple Pedestrians in Automotive Applications," in IEEE Intelligent Vehicles Symposium, 2007, pp. 13-18. [11] D. M. Gavrila, "A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, pp. 1408-1421, 2007. [12] M. Krystian, S. Cordelia, and Z. Andrew, "Human detection based on a probabilistic assembly of robust part detectors," in European Conference on Computer Vision, 2004, pp. 69-81. [13] A. Mohan, C. Papageorgiou, and T. Poggio, "Example-based object detection in images by components," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, pp. 349-361, 2001. [14] Y.-T. Chen, C.-S. Chen, Y.-P. Hung, and K.-Y. Chang, "Multi-class multi-instance boosting for part-based human detection," in IEEE International Conference on Computer Vision Workshops, 2009, pp. 1177-1184. [15] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 1627-1645, 2010. [16] L. Zhe and L. S. Davis, "Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 604-618, 2010. [17] L. Bourdev, S. Maji, T. Brox, and J. Malik, "Detecting People Using Mutually Consistent Poselet Activations," in European Conference on Computer Vision, 2010, pp. 168-181. [18] L. Bourdev and J. Malik, "Poselets: Body part detectors trained using 3D human pose annotations," in IEEE International Conference on Computer Vision, 2009, pp. 1365-1372. [19] Y.-C. Lin, Y.-M. Chan, L.-C. Chuang, L.-C. Fu, S.-S. Huang, P.-Y. Hsiao, and M.-F. Luo, "Near-infrared based nighttime pedestrian detection by combining multiple features," in IEEE Conference on Intelligent Transportation Systems, 2011, pp. 1549-1554. [20] G. Borgefors, "Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, pp. 849-865, 1988.

1770