Document not found! Please try again

Pedestrian Registration in Static Images with Unconstrained ...

4 downloads 48048 Views 1006KB Size Report
registration algorithm used to find the best model parame- ters corresponding .... make use of a more domain-specific method, called FPCM- based hill-climbing ...
Pedestrian Registration in Static Images with Unconstrained Background Lixin Fan, Kah-Kay Sung and Teck Khim Ng School of Computing, National University of Singapore Singapore 117543 [email protected]

Abstract This paper introduces a human body contour registration method for static pedestrian images with unconstrained background. By using a statistical compound model to impose structural and textural constraints on valid pedestrian appearances, the matching process is robust to image clutter. Experimental results show that the proposed method register pedestrian contours in complex background effectively.

Figure 1: Initial pedestrian training images.

1.1 Related Work

1. Introduction

The major difficulty of pedestrian registration is in dealing with severe image clutter in complex outdoor scenes. Let’s look at some example pedestrian images shown in Figure 1. One can identify many types of image background, ranging from distracting objects (zebra crossing, trees and other pedestrians etc.) to various lighting conditions and shadows. Therefore, a successful pedestrian registration method must be able to deal with these image clutter effectively. Many existing human registration methods designed to work with video sequences attempt to tackle this problem by either assuming uniform background [20], or eliminating the complex image background using motion [2, 17, 15] or colour cues [34]. In addition, some of them also use interframe correlations to provide temporal constraints on the possible position of interested features [36, 30]. These methods, however, are not suitable for pedestrian registration in static images with unconstrained background, because motion, temporal and a priori colour information about pedestrians are not available for static images.

The pedestrian registration problem is to locate and label pedestrian contour and features (e.g. head, hands and feet) in a given static image. The registration outcomes are represented with different graphical primitives such as points and lines. (see Figure 9 for example results). We are interested in human registration in static image due to two reasons. Firstly, human body registration can have many potential applications including image understanding [10], contentbased image indexing and retrieval [35]. The registration results can also be used to initialize a human body tracker which is a key components in a traffic surveillance system. Secondly, pedestrian images represent a challenging class of highly cluttered image with unconstrained backgrounds. We believe that a successful pedestrian registration algorithm can readily generalize for other object classes (e.g. medical images) that share similar characteristics. A different yet closely related problem to human body registration is human body detection. Given an input image, a human body detector determines whether there are any human bodies in the image. If the detection is positive, the detector will return the location and size of each human body in the given image. In this work, we assume that the pedestrian detection has already been solved by other methods (see e.g. [25, 29]), and our objective is to register the body contour and mark the pedestrian features given initial location and size of pedestrian images.

1.2 Our Approach In this work, we adopt a statistical modeling approach to impose additional structural and textural constraints on valid pedestrian appearance. This approach, which is similar to our varying pose face registration techniques described in [12], is able to reliably match pedestrian feature points in cluttered environments. 1

To model plausible human body appearance and articulations, a statistical object model is used to simulate various image variations due to changes in object appearances, poses and lighting conditions etc. It has long been noted that one can learn such an object model from example views of objects by applying Principal Component Analysis (PCA) on training images [32, 22, 24]. A new model image can then be reconstructed as a linear combination of the eigenvectors extracted from training images. To deal with large structural variations, Beymer, Jones, Vetter and Poggio [5, 16, 33], Craw [9], and Cottes and Taylor [19, 7, 8] proposed to model textural and structural variations separately, and improved the quality of reconstructed images significantly. In our work, we adopt this technique to learn a compound structural and textural pedestrian image model from training images.

Figure 2: A model-based pedestrian contour registration approach consists three components: (1) a compound pedes trian model capturing permissible image variation; (2) a combined feature-texture similarity measure accounting for image differences between model image and given pedestrian images; and (3) an iterative pedestrian registration algorithm used to find the best model parameters corresponding to the minima of the proposed similarity measure.

Once the pedestrian image model is learnt, we formulate the pedestrian registration in static images as a model-based image matching problem (see Figure 2). The model parameters are re-estimated in such a way that the similarity (error) measures between the input image and the model image is minimized. The pedestrian features are then marked with model features defined by the optimized parameters. We refer to the overall modeling and registration methodology as view-based object modeling and registration (VOMR). In addition to statistical object modeling, the other two issues to be considered are: (1) how to quantify the differences between given images with a reliable similarity (error) measure; (2) how to search for the best model parameters using an efficient matching algorithm. In our work, following [12], we adopt a combined feature-texture similarity measure to account for both structural and textural differences between two images. In the process of estimating pose parameters, we adopt a correspondence map based hill-climbing method, which can avoid local minima more effectively and converge quickly.

(a)

(b)

(c)

(d)

Figure 3: Manual registration of training images. (a) Prototype body contour (white dots represent head, hands and feet); (b) Sparse contour point model; (c) Training image manual registration; and (d) Feature point correspondence map (FPCM) between (c) and (b).

We note that the view-based object modeling and registration approach is essentially an exemplar based approach. This view-based modeling technique, however, is different from naive exemplar based approaches in two aspects. Firstly, we preprocess example images to attain two types of example data: (1) a shape normalized pedestrian image which has structural variation removed and captures textural variation only; and (2) a set of feature point correspondence maps (FPCM) which represents the structural (pose) variations between training images and the prototype pedestrian body. Secondly, instead of storing all the sample data, the statistical modeling technique constructs a compound pedestrian image “model” by applying PCA to various pedestrian images and correspondence maps and combining textural and structural variation using an image warping process. Also note that this view-based approach is generic and can be used for other objects by simply chang-

Figure 4: Preprocessed pedestrian training images which have structural variation removed. Also, the image backgrounds are masked.

2

ing the training data. Indeed, similar methods have been successfully applied to face registration [7, 12], medical image registration [8] and many others (see [1] for website links). In Section 2, we adopt the statistical modeling techniques to construct a pedestrian image model. Section 3 introduces the similarity measure and matching algorithm used for pedestrian registration. Experimental results of pedestrian contour registration are illustrated in Section 4. We discuss the strength and the limitation of the proposed approach in Section 5 and Section 6 concludes the paper.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

2. The Compound Pedestrian Image Model

Figure 5: Pedestrian textural variation eigenvectors. (a) Mean pedestrian images; (b)-(g) Eigenvectors 1 to 6.

We use the images from MIT Pedestrian database described in [25] to learn a pedestrian appearance model. These training images show people in different poses (frontal, rear, walking, and standing), under different lighting conditions, and with unconstrained backgrounds. Some example training images are shown in Figure 1. Adopting the statistical modeling technique, we decouple and model textural and structural image variation separately, and combine them using a compound pedestrian image model. The detailed processing procedures are described below.

body. We refer to [3, 8] for detailed warping process descriptions.

2.1 The Preprocessing Stage

For each pedestrian image, the preprocessing steps generate two types of example data: (1) a shape normalized pedestrian image which has structural variation removed and captures textural variation only (see Figure 4 for several examples); and (2) a set of feature point correspondence maps (FPCM) which represents the structural (pose) variation between training images and the prototype pedestrian body (see Figure 3 (d) for an example). Both shape normalized pedestrian images and FPCMs are then used to construct the compound pedestrian model.

To learn the pedestrian model, we first need to decouple the textural and structural variations in the training images. This can be achieved by the following steps.

2.2 Shape-normalized Modeling

Textural

Variation

Once the training images are shape normalized and masked, we adopt the well-known “eigenface” approach [32, 23] to represent the texture variation only:

1. Pedestrian Prototype. We define a prototype body contour, which consists of 70 feature points, and 70 line segments (Figure 3 (a)). These feature points represent head, hands, and feet etc.. Figure 3 (b) shows the sparse contour point model, which will later be used to quantify the difference between the given pedestrian image and the warped pedestrian model (see Section 3.1).

   



(1)

where is an eigenvector matrix of significant eigenvectors obtained by applying PCA on structurally normalized   pedestrian images. The transformation vector describes the textural variation due to different factors. We  refer to the elements of as texture parameters. We also keep the  largest eigenvalues ! , which will be used to  impose constraints on the texture parameters (see Section 3.1). The first 6 eigenvectors are illustrated in Figure 5. It is shown that given enough training data the learned model can effectively represent image variations due to different clothing, shadows and lighting conditions.

2. Manual Registration. We then manually register the prototype with training images. Notice that when certain human body parts are occluded, we register the closest contour points instead (see Figure 3 (c) for an example). This approximation is good enough for our application. 3. Feature based Image Warping. We apply the featurebased image warping technique to warp the training images with respect to the prototype body contour so that the structural variation of different pedestrian images is removed. We also mask the background regions outside the boundaries of the shape-normalized human

2.3 Structural Variation Modeling Given a set of example FPCMs between training pedestrian images and the prototype pedestrian body, one can learn a 3

2.5 Model Transformation During the registration process, the model image in (3) will be translated or scaled, to best fit the given pedestrian images. This requests the estimation of the translation 1 and scaling 2 . In the proposed matching algorithm, we will first initialize 1 and 2 based on the detection results of possible pedestrian candidates, then iteratively estimate all the pa 3  1 2 (see Section 3.2). rameters

Figure 6: Pedestrian structural variation eigenvectors .

3 Pedestrian Contour Registration Within the model-based image matching framework, the similarity measure and the matching algorithm are two important issues to be discussed below.

3.1 The Similarity Measure The following similarity measure, or equivalently the error measure, works effectively for registration of given pedestrian images and the learnt compound pedestrian image model:

Figure 7: Synthesized pedestrian shapes. Left-right: eigenvectors 1-6. statistical structural variation model by applying PCA on example FPCMs. Following [5, 16, 12], an FPCM ( " ) can be approximated by:

"

# $% "

456587:9=?/@

(2)

 %

]_^ C  ] ^G  ]_lm ]yx

2.4 Combining Textural and Structural Variations

 

 %

`

` V    `

 (

V

 a b n  b3z

bc ` V ed 0f  a  b3c ih g V g  ih j a+k 1a 

o prqts  g o prqts  g

X )



(4)

(5)

(6)

(7)

g vu h uIw

(8)

g {u h uIw

(9)

We shall explain each term in details below. ]| stands for the Sum of Squared Difference(SSD) be tween a given pedestrian image and model image masked by the warped human body shapes.  is the number of image pixels within the boundaries of the warped ]N^ C body shapes. measures the edge spatial difference between the edge map of a given pedestrian image and the warped prototype with the current pose parame Ty}~pedestrian 7e@€‚~? ter . is the number of feature points in the sparse d pedestrian prototype (see Figure 3); denotes the distance between points in the warped pedestrian prototype and its nearest neighbor in the edge map of the input pedestrian im]_^ G age; measures the edge directional difference, in which

Finally, one can combined textural and structural variations to model a pedestrian image. Using notations in [5, 16, 12], a pedestrian image can be expressed as:

 %  *   +,  -. /

7A4MLNDO4MPQ?

WZY W U 4 J @SR VXW 7:9 9 = ?\[ T

in which:

in which is an eigenvector matrix of significant eigenvectors. We will refer to the elements of transformation &' % " as pose parameters. As in the case of vector texture parameters, we also keep the ( largest eigenvalues ) , which will be used to impose constraints on structural parameters (see Section 3.1). Figure 6 illustrates the first 6 eigenvectors of the learnt model and Figure 7 depicts the synthesized pedestrian shapes using various shape parameters. It is shown that we can effectively simulate different pedestrian poses such as standing, walking and running etc..

7A4+B0CEDF4B,GHDI4J,?KD

(3)

+0

where , and are defined above, and the symbol denotes a feature-based image warping process which essentially shifts pixels in images according to a given FPCM " [3, 8]. In subsequent sections, we will demonstrate how this model can be used for pedestrian registration. 4

ih j

a is the unit normal vectors ih of feature points in the warped pedestrian prototype, and 1 a denotes the unit tangent vector of the nearest neighboring edge point in the edge map of ] | {] ^ C v] ^ G a given pedestrian image. Note that together account for the differences between the given pedestrian im . age and the model image To impose constraints on permissible textural l and struc] ] x tural variation, we include the last two terms ( and ) to penalize significant deviations of model parameters from the learnt pedestrian model. If certain components of tex u ture parameter exceed the range of ƒ u„,†…t! , penalties are imposed. Empirically, we find that is big enough to capture sufficient structural and textural variation, while small enough to enable the registration algorithm successfully cope with severe image clutter. Finally, summing everything together, we have the similarity measure in (4).

make use of a more domain-specific method, called FPCM based hill-climbing to estimate . Given pedestrian images and the current model images, we establish the feature point correspondence map using a simple closest edge point matching method. We then estimate the search direc by projecting the FPCM onto the eigenshape space, tion • and iteratively re-estimate until convergence:

•

For the image matching algorithm, our ultimate objective +0   is to find the best model parameters ( 1 2 ) which correspond to the minimum of the proposed similarity (error) measure:

(11)

4 Experimental Results In our experiment, we use the MIT Pedestrian image database, which consists of more than 900 pedestrian images with various body shapes and unconstrained backgrounds. The image size is 64 x 128 pixels. From 407 training images in the database, we first construct a pedestrian model with 20 eigenvectors capturing textural variations and 15 eigenvectors to represent different poses.

4.1 Measurement Criterion To measure the performance of the registration algorithm, we define a goodness criterion for each individual registration result as follows:

– Good Registration: We declare a good registration

Initialize ‘ and ’ based on pedestrian detection results; Initialize “ and ” to 0.0; 1. Fix others, optimize ‘

"

(10)

Given the complex form of the proposed similarity measure in (4), it is difficult to obtain an analytical solution of (10). Fortunately, we can seek a numerical solution by adopting the following iterative registration algorithm in such a way that different model parameters are estimated in separate steps. In the proposed registration algorithm, step 2 first aligns the structures of two given images. Then step 3 synthetizes textural variation due to changes in object appearance and lighting condition etc.. The re-estimation of textural variation, in turn, can lead to more reliable extraction of feature points when we match object structures in subsequent iterations. This iterative estimation is similar in spirit to the Expectation-Maximization (EM) algorithm [27].

;



In the early iterations, the closest edge point matching may be poor approximation to true correspondence. Applying (11) will bring the pedestrian edge points closer to the model contour points. As the iterations continue, the matching of closest point will eventually approach the valid correspondence. The advantage of using the FPCM based hillclimbing is twofold. Firstly, the FPCM hill-climbing is more effective in avoiding local minima, since the parame ter updating • is not determined by gradient descent. Secondly, this domain-specific matching algorithm is deterministic and does not involve any stochastic process. Thus, it is more efficient compared with the general-purposed stochastic methods, such as Simulated annealing [26]. Empirically, the proposed registration algorithm converges in only a few iterations.

3.2 The Matching Algorithm

e+0   v‡y q8ˆ~‰Š‹Œ ] Žˆ ˆ e X{   1 2

r %

as having both contour and feature points (e.g. head, hands and feet) correctly registered.

’;

– Fair Registration: The result is considered as a fair

2. Fix others, optimize ” ;

registration if it has its contour correctly registered but with 1 up to 5 out of 70 feature points misaligned (i.e. feature points are more than 5 pixels away from their correct position). Note that both good registration and fair registration are deemed as successful registration.

3. Fix others, optimize “ ; 4. iterate 2 and 3, until convergence;

Note that in steps 1 and 3, we use the general optimization algorithm (i.e. Levenberg-Marquardt method [26]) to  { estimate model parameters 1 2 . In step 2, however, we

– Mis-Registration: A mis-registration is declared when the registration algorithm fails to converge; or 5

#Pedestrian 500

Good 307 (61.4%)

Fair 131 (26.2%)

Misalignment 62 (12.4%)

The experimental results show that the proposed human body registration algorithm rejects 1 ( —8˜ ) false detection, successfully registers 34 ( ™ ` ˜ ) and mis-registers 7 ( `›š ˜ ) out of 42 detected people. Figure 11 illustrates some registration results for 9 people. A detailed inspection shows that one pedestrian in image A is mis-registered due to large pose variation, and the feet of the wedding couple in image C are partially mis-registered due to the lack of reliable features.

Table 1: Pedestrian registration results, see goodness criterion defined in Section 4.1.

the registration output has the body contour and/or more than 5 feature points misaligned. We run our proposed registration algorithm on two test databases. The performance of the proposed method is characterized by the ratio of the number of good, fair and mis-registrations to the total number of human bodies in each database.

5 Discussion An exemplar based object representation often consists of example 2D views from a variety of pose, lighting condition and shape deformation. For instance, in the view-based system of Breuel [6], two airplane toy models are represented by 32 views sampled from the upper half of the viewing sphere. In a pose independent face recognition problem, Beymer [4] showed that for identification purposes, one can reliably represent human faces with 15 views of varying pose faces. One key challenge in exemplar based approach is to capture and represent the possible variation in pose, lighting and shape deformation using as few example images as possible. The view-based eigenspace method [31, 18, 32, 23] is such kind of efficient approach using principal component analysis (PCA) [11, 14] to construct a compact representation from a large set of object images. While the view-based eigenspace approach is proven to be efficient in modeling fixed pose objects, it has difficulty in representing object images which are grossly misaligned. Nayar et. al. [24] have demonstrated that the distribution of varying pose objects often forms a non-convex yet connected region in the high-dimensional image space. This complex distribution violates the underlying Gaussian distribution assumption of the eigen-subspace approach, and therefore cannot be adequately captured within a lowdimensional subspace. To represent the distribution of varying pose object images, Pentland et. al. [22] and Schneiderman [28] proposed to use multiple pose-dependent eigensubspace models. However, this approach often involves collecting example images for each individual subspace model, and it is difficult to cover arbitrary variation in pose. Following [5, 7, 8, 9, 16, 19, 33], we decouple and model “textrual” and ”structural” variations separately, and combine both types of variation using an image warping process. The image warping process actually introduces a nonlinear transformation on the distribution of fixed pose object images. While the two submodels simply represent two elliptical Gaussian distributions of textural and structural variation, the combination of these two components results in a much more complex manifold in image space. As an example, Figure 8 illustrates the distribution of a set of frontal face images and its warped counterparts synthesized

4.2 Experiment 1 In this experiment, we tested the proposed pedestrian contour registration method with 500 test images in the MIT Pedestrian image database. The purpose of this experiment is to evaluate the performance of the proposed pedestrian registration algorithm working with ideal pedestrian detectors, which output accurate initial translation and size parameters and have no false detections. Some registration results are illustrated in Figure 9. We note that the registration is generally robust to change in body texture (e.g. varying clothing and shadow etc.) and pose (e.g. frontal and rear, standing and walking etc.). Furthermore, it is shown that the proposed registration algorithm is effective even when severe clutter (e.g. zebra crossing, trees, and other pedestrians etc.) is present in the unconstrained background. Table 1 summarize the success rate of the pedestrian registration algorithm. Among 500 test images, there are more than 307 (61.4%) pedestrians well registered; 131 (26.2%) images fairly registered with few feature point misalignments; and 62 (12.4%) images mis-registered.

4.3 Experiment 2 In the second experiment, we test the proposed algorithm using 30 images with more complex scenes (see Figure 11 for some examples). These images contain both pedestrians and people in different activities (e.g. wedding, playing croquet etc.). The purpose of this experiment is to evaluate the performance of the proposed registration algorithm working with a real pedestrian detector which may produce false detections and inaccurate estimates of size and position of people. We first use the pedestrian detection method proposed in [29] to locate 42 people of difference appearance and pose in these images1 . 1 The pedestrian detection method missed 5 people in the test images. When evaluating the registration algorithm, we did not include these 5 misses.

6

include more training examples and possibly use a mixture of Gaussian model. This is one of our current research directions.

Dedication In memory of Kah-Kay Sung.

References [1] http://www.wiau.man.ac.uk/ œ bim/asm links.html. [2] A. M. Baumberg and D. C. Hogg. Learning Flexible Models from Image Sequences. Technical report, Division of Artificial Intelligence, School of Computer Studies, University of Leeds, October 1993.

Figure 8: Nonlinear distribution of varying pose face images. Each data represents a synthesized face image projected in the subspace spanned by the 3 most significant eigenvectors. (pose 1: right rotated; pose 2: slightly right rotated; post 3: frontal; post 4: slightly left rotated; and pose 5: left rotated.)

[3] T. Beier and S. Neely. Feature-based image metamorphosis. In SIGGRAPH’92 Proceedings, pages 35 – 42, 3992. Chicago, IL. [4] D. Beymer. Face Recognition under Varying Pose. AI Lab, Memo 1461, MIT, 1993. [5] D. Beymer. Vectorizing Face Images by Interleaving Shape and Texture Computations. AI Lab, Memo 1537, MIT, Sept. 1995.

by the proposed compound model [13]. One can see that the overall distribution forms a non-convex and connected region. We believe that the decoupling and combination of textural and structural variation allows us to reliably and efficiently represent grossly misaligned images with a parametric model. Finally, this view-based object modeling and registration approach requires that object shape and distinctive feature points be unambiguously identified in example object images. Therefore, this method cannot be applied to objects (e.g. cells) which have no well-defined shape and distinctive features.

[6] T. M. Breuel. An Efficient Correspondence based Algorithm for 2D and 3D Model based Recognition. AI Lab, Memo 1259, MIT, 1993. [7] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active Appearance Models. In H. Burkhardt and B. Neumann, editors, Proceedings of the European Conference on Computer Vision, volume 2, pages 484–498, 1998. [8] T. F. Cootes and C. J. Taylor. Statistical Models of Appearance for Computer Vision. Technical report, Wolfson Image Analysis Unit, University of Manchester, (http://www.wiau.man.ac.uk), Dec. 2000.

6 Conclusions and Future Work

[9] I. Craw, N. Costen, T. Kato, and S. Akamatsu. How Should We Represent Faces for Automatic Recognition? IEEE Trans. Pattern Analysis and Machine Intelligence, 21(8):725–736, August 1999.

Working within a generalized view-based object modeling and registration (VOMR) framework, we tackle the problem of pedestrian contour registration in static images with complex background. By using a statistical compound model to impose structural and textural constraints on valid pedestrian appearances, the matching process is robust to image clutter. Experimental results show that the proposed method successfully registers complex pedestrian contours even when image backgrounds are heavily cluttered. We note that mis-registrations are mainly due to two reasons. Firstly, the images are too blurry for meaningful features to be extracted reliably (see Figure 10 (a)). To extract more robust features, one could perform perceptual grouping on noisy features as suggested in [21]. Secondly, there are too much variations in pedestrian shapes, (see Figure 10 (b)). To handle a wider range of pose variations, one can

[10] L. Davis, D. Harwood, and I. Haritaoglu. Ghost: A Human Body Part Labeling System Using Silhouettes. In ICPR98, page SA11, 1998. [11] R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, second edition, 1973. [12] L. X. Fan and K. K. Sung. A Combined Feature-Texture Similarity Measure for Face Alginment Under Varying Pose. In Proc. of IEEE Conference on Computer Vision and Patter Recognition, pages 308 – 313, 2000. [13] L. X. Fan and K. K. Sung. Model-Based Varying Pose Face Detection and Facial Feature Registration in Video Images. In Proc. of ACM Multimedia, pages 295 – 302, Los Angeles, USA, 2000.

7

[30] H. Sidenbladh, M. J. Black, and D. J. Fleet. Stochastic Tracking of 3D Human Figures Using 2D Image Motion. In Proceedings of the European Confrence on Computer Vision, 2000.

[14] F. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990. [15] B. Heisele and C. Wohler. Motion-based Recognition of Pedestrian. In Proc. of IEEE International Conference on Patter Recognition, pages 1325 – 1330, 1998.

[31] L. Sirovich and M. Kirby. Low-dimensional Procedure for the Characterization of Human Faces. Journal of the Optical Society of America, 4(3):519–524, March 1987.

[16] M. J. Jones and T. Poggio. Model-Based Matching by Linear Combinations of Prototypes. AI Lab, Memo 1583, MIT Artificial Intelligence Laboratory, Nov. 1996.

[32] M. A. Turk and A. P. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3(1):71 – 86, 1991.

[17] S. O. Ju, M. J. Black, and Y. Yacoob. Cardboard People: A Parameterized Model of Articulated Motion. In Proc. of International Conference on Automatic Face and Gesture Recognition, pages 38–44, Los Alamitos, California, Oct. 1996.

[33] T. Vetter, M. J. Jones, and T. Poggio. A Bootstrapping Algorithm for Learning Linear Models of Object Classes. AI Lab, Memo 1600, Feb. 1997. [34] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland. Pfinder: Real-Time Tracking of the Human Body. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):780–785, July 1997.

[18] M. Kirby and L. Sirovich. Application of the KarhunenLoeve Procedure for the Characterization of Human Faces. IEEE Trans. Pattern Analysis and Machine Intelligence, 12:103–108, 1990.

[35] Y. Xu, E. Saber, and A. Tekalp. Object Formation by Learning in Visual Database using Hierarchical Content Description. In ICIP99, page 26PO2, 1999.

[19] A. Lanitis, C. J. Taylor, and T. F. Cootes. Automatic Interpretation and Coding of Face images using Flexible Models. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):743–756, July 1997.

[36] Y. Yacoob and L. Davis. Learned Temporal Models of Image Motion. In Proc. of IEEE International Conference on Computer Vision, pages 446 – 453, 1998.

[20] M. K. Leung and Y.-H. Yang. First Sight: A Human Body Outline Labeling System. IEEE Trans. Patterf Analysis and Machine Intelligence, 17(4):369–397, April 1995. [21] G. Medioni, M.-S. Lee, and C.-K. Tang. A Computational Framework for Segmentation and Grouping. Elsevier Science, New York, 2000. [22] B. Moghaddam and A. Pentland. Face Recognition using View-based and Modular Eigenspaces. In Proc. of SPIE, pages 12–21, 1994. [23] B. Moghaddam and A. Pentland. Probabilistic Visual Learning For Object Representation. PAMI, 19(7):696–710, July 1997. [24] S. Nayar and H. Murase. Dimensionality of Illumination Manifolds in Appearance Matching. In In Int. Workshop on Object Representations for Computer Vision, 1996. [25] M. Oren, C. Papageorgiou, and T. Poggio. Pedestrian Detection Using Wavelet Templates. In Proc. of IEEE Conference on Computer Vision and Patter Recognition, pages 193–199, 1997. [26] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press, second edition, 1992. [27] R. A. Radner and H. Walker. Mixture Densities, Maximum Likelihood and the E.M. Algorithm. SIAM Review, 26:195– 239, 1994. [28] H. Schineiderman. A Statistical Approach to 3D Object Detection Applied to Faces and Cars. PhD thesis, Carnegie Mellon University, May 10 2000. [29] H. Setyawan. Model-based Human Detection in Images. Master’s thesis, National University of Singapore, 2001.

8

Figure 9: Pedestrian registration results. Row 1 and 2: good alignments. Row 3: fair alignments. Note that severe clutter (e.g. zebra crossing, trees, and other pedestrians etc.) is present in the background.

9

(a)

(b)

Figure 10: Pedestrian mis-registration examples.Image (a) is misregistered because it is too blurry for reliable features to be extracted. The people in image (b) is misregistered because his left hand is placed on his head, exhibiting too much pose variation for the learnt pedestrian model.

Figure 11: Human body registration results with complex scene.

10

Suggest Documents