Independent Component Analysis (ICA) as an alternative shape decomposition to the PDM-based ... model move in a non-linear fashion within the image frame.
Human Figure Segmentation using Independent Component Analysis Grégory Rogez, Carlos Orrite-Uruñuela, Jesús Martínez-del-Rincón Aragon Institute for Engineering Research University of Zaragoza, María de Luna 1, 50018 Zaragoza, SPAIN e-mail: {grogez, corrite, jesmar}@unizar.es
Abstract. In this paper, we present a Statistical Shape Model for Human Figure Segmentation in gait sequences. Point Distribution Models (PDM) generally use Principal Component analysis (PCA) to describe the main directions of variation in the training set. However, PCA assumes a number of restrictions on the data that do not always hold. In this work, we explore the potential of Independent Component Analysis (ICA) as an alternative shape decomposition to the PDM-based Human Figure Segmentation. The shape model obtained enables accurate estimation of human figures despite segmentation errors in the input silhouettes and has really good convergence qualities.
1. Introduction Many works have attempted to accurately estimate the shape of the human body along a video sequence using 2D or 3D models of the object contour (e.g., see [1, 2]) and principal component analysis (PCA) is largely used in this purpose. In a previous work [3] we proposed a statistical model for detection and tracking of human silhouette, based on PCA, and the corresponding 3D skeletal structure. Following this approach, a shape model is generated from a training set (see Fig.1.) extracting the mean shape and the variation modes using PCA. Then the model is fitted to the silhouette extracted from the image by background subtraction and estimation is made of the human posture according to the contour obtained. The determination of the contour is a key factor for a good posture valuation. The more precise the segmentation is, the more accurate the estimate. The main problem is that the PCA assumes a Gaussian distribution of the input data. This supposition fails because of the non-gaussianity of the feature space, as Figure 1(c) illustrates in the Human figure case. This non-gaussianity of the landmarks distribution is mainly caused by the non-linearity of the shape variation. This non-linearity is the result of natural curvature of the model: key points of the model move in a non-linear fashion within the image frame. This may lead to a wrong description of the dataset and cause bad effects on the model that can model implausible shapes or cannot generate shapes that are desired. In the Figure 6, we show how a bad detection of the body, e.g. a silhouette badly segmented gives an unsatisfactory estimation of the contour when a good shapemodel should help to find a correct and plausible shape that fits the blob.
2
Grégory Rogez, Carlos Orrite-Uruñuela, Jesús Martínez-del-Rincón
(a)
(b)
(c)
Fig.1. (a) Contour extraction: the positions of 49 points of each shape are considered. The images from the “CMU Motion of Body Database” [4] have been used. On each picture, the 2D coordinates of the 49 landmarks have been taken manually or semi-automatically and stored in shape vectors. (b) shows a resulting contour. We processed the sequences of 15 people (2 walking cycles) in a lateral view. After aligning and scaling the 2000 shapes, we generate our ASM model by PCA. The data projected onto the 2 first modes is represented on (c).
This drives us to search for a new approach to generate our model. The Independent Component Analysis [5] has produced some encouraging results in the Biomedical Image Processing area [6]. ICA differs from PCA in that it seeks such directions in feature space that are most independent from each other instead of directions that represent data best in a least squares sense. There are two main problems to be considered when using ICA as has been noted in [7]. First the reliability of the estimated independent components is unknown: we ignore which of the components are to be considered seriously. The further problem is that most algorithms have random elements and every run gives different results. The goal of this work is to generate a reliable statistical shape model for human figure segmentation. This is achieved by using ICA to model the shape variations. In this paper we demonstrate the potential of ICA in non-linear shape modeling applying it to the human figure case. Section 2 describes shape modeling with ICA. In section 3, we apply the validation method to obtain a reliable model. Then we give some result in section 4, followed by some discussion in the conclusions section.
2. ICA modeling of the Human Figure ICA, also known as Blind Source Separation, is originally used for finding source signals from mixtures of unknown signals, without any knowledge other than the observation. It can be used too for feature extraction [8]. If we consider a human shape as a mixture of source signals (a source shapes), we can illustrate it as follows:
dX = AS ,
(1)
where A is the matrix of mixing parameters, S the source shapes and dX is the matrix of the training set, that will be defined as the matrix of the variations of the n
Human Figure Segmentation using Independent Component Analysis
3
training shape-vectors with respect to the mean shape:
dX i = X i − X , i = 1...n ,
(2)
where the Xi are the training shape-vectors and X the mean shape. To prevent the data from overlearning we pre-processed it by PCA [5]. The goal of the Blind Source Separation is to estimate the de-mixing matrix W that will give an estimation of the original source shapes: ^
^
S = W dX .
(3)
The de-mixing matrix can be found using different methods. In this work we used the FastICA algorithm developed by Hivärinen and Oja [9]. As in PCA case, the ICA model is constructed by combining the mean shape and the variation of each mode. The linear generative model is formulated as follows: ^
X ≈ X +Sb ,
(4)
where b is the weighted coefficient vector. If we vary the corresponding weight factor of an Independent Component, we can observe a variation with respect to the mean shape with certain amplitude (See Fig.2). To quantify the amplitude of the shape variation, we used a method given in [6]. We project all the shapes onto each IC and compute a histogram which width ω is considered as a measure of variation. To discard outliers and eliminate part of the noise, the width ω of the histogram is calculated as follow: parting from the median value, the “surface of interest” of the histogram is determined by summing the values until a percentage of the total surface is reached (Fig.3.)
Fig.2. Modes of PCA (left) and ICA (right) models.
Fig.3. Determination of ω with 95% of the surface considered.
Figure 2 shows two ICA derived shape variation modes. For comparison, the two first PCA derived shape variation modes are also shown. The two models have been generated with the same data. Basically, we can observe that ICA modes variations
4
Grégory Rogez, Carlos Orrite-Uruñuela, Jesús Martínez-del-Rincón
are quite localized along the shape. For each mode, a few part of the shape varies whereas the remainder part is unaffected. On the contrary, PCA modes present a global shape variation distributed over the entire contour. Like all the ICA algorithms, FastICA is stochastic, i.e. the result may be different in different runs of the algorithm. Thus, the result obtained after a single run of the FastICA algorithm cannot be trusted, and the reliability of the components has to be analyzed.
3. Validation of the ICA via Clustering The method is based on estimating a large number of candidate independent components by running FastICA many times, and clustering the components obtained in the signal space. Each estimated independent component is one point in the signal space. We will adapt the validation of the independent components proposed in [9] to our problem. First, the FastICA algorithm is run M times. The estimates of demixing matrices from each run i = 1,2,... ,M are collected into a single matrix: ∧
∧
∧
∧
W = [ W1 W2 ...WM T
T
T
].
(5)
A good measure of the similarity between the estimated independent components is the absolute value of their mutual correlation coefficients rij elements of the matrix: ∧
∧
R = W CW T .
(6)
where C is the covariance matrix of the original data dX . Therefore, we need to transform the similarity matrix into a dissimilarity matrix with elements d ij A classic way to make this transformation is given by [10]:
d ij = 1 − rij .
(7)
Using the dissimilarity as a measure of distance, we decompose the data into a several levels of nested partitioning (tree of clusters), called dendrogram. The points are successively joined into clusters when moving upwards in the dendrogram. A clustering of the data is obtained by cutting the dendrogram at the desired level. Then each connected component forms a cluster (See Fig.4a.). A representative point is then computed for each cluster: we calculate the similarity intra-cluster and consider the centre of the cluster the point with the maximum sum of similarities to other points in the cluster. To better visualize the result, we apply the Linear Discriminant Analysis for data visualization [11]. LD1 and LD2 are the first two linear discriminants that map the samples with known class from the n-dimensional space to the plane, in such a way that the ratio of the between-group variance and the within-group variance is maximized. The clusters and their interrelations are visualized in Figure 4b. We can note that there are clusters that seem to be more compact and interesting than others. Following this methodology we have found a reliable linear non-orthogonal coordinate system. We can observe how each mode is localized along the shape and
Human Figure Segmentation using Independent Component Analysis
5
models a particular part of the human figure. For example, some modes are associated to the movement of the hands while another models the movement of the head. Most of the ICs are associated to the legs movement. It’s understandable since the principal variation that characterizes the evolution of a walking person figure is the movement of the legs. A sorting based on the position of the components along the shape is done since there is no natural sorting criterion for the components in ICA (See Fig. 4c.).
Fig.4. (a) Dendrogram. Cutting it at the level dissimilarity = 0.1 gives 30 clusters when cutting it at 0.4 gives 18 clusters. (b) Similarity graph of the estimates. Clusters are indicated by convex hulls. Lines connect estimates whose similarity is larger than a threshold, the darker the line the stronger the similarity. (c) The 18 Variation Modes obtained ordered along the shape.
4. Experimental Results 4.1 Results improvement in cases of bad detection Our ICA-based model is now applied on images where our previous PCA-based model [3] fails because of the bad detection. It is iteratively deformed to fit to the blob (the silhouette) extracted from these images (See Figure 5). Some results are shown in Figure 6. The implausible shapes generated by PCA are corrected with ICA.
6
Grégory Rogez, Carlos Orrite-Uruñuela, Jesús Martínez-del-Rincón
Fig. 5. Iterative Algorithm of the Contour Segmentation.
Fig. 6. Segmentation using PCA (up) and ICA (down) based models.
4.2 Numerical Results The model is now applied on a set of images of walking people from the MoBo database [4] that we previously processed manually to determine the contour of the person. We will measure how close from this “good contour” is the one estimated with our model. In that way we define the metrics used for the evaluation of the performances. Two distances between shapes are considered. Suppose Si and Sj are 2 shape vectors (xi,1….xi,n, yi,1….yi,n) and (xj,1….xj,n, yj,1….yj,n), firstly, a Euclidean distance Dij between these two shapes is given by: n
Dij = (∑ (( xi ,k − x j ,k ) 2 + ( y i ,k − y j ,k ) 2 ) .
(8)
k =1
We also define a “Point to Curve” distance Dij between the landmarks of Si and the curve formed by the segments interpolated between the landmarks of Sj. Since Dij and Dji can have different values, we define this distance D as the mean value:
D=
( Dij + D ji ) 2
=
1 n ∑ ( d i , j , k + d j ,i , k ) , 2 k =1
where
d i , j , k = min( ( x i , k − x M ) 2 + ( y i , k − y M ) 2 ) , ( x M , y M ) ∈ curv( S j ) .
(9) (10)
The idea of this new metric is to get a null distance between two contours that differ only by a displacement of some landmarks along the shape and allow a better measure of convergence. It is to note that using this distance makes sense only if the Euclidean distance has a reasonable value: for example a shape vector containing all its components equal to one component of another contour would have a null distance with it though the two shapes are totally different. We apply now our model on a set of 450 images (30 pictures of 15 persons) and consider 20 iterations of the algorithm for each image. The Euclidean distances of the current corrected shape with the “good” shape, and with the measured shape
Human Figure Segmentation using Independent Component Analysis
7
(determined on the silhouette blob) are calculated at each iteration. For each one of the distances calculated, a mean value is represented. In order to evaluate the results obtained with ICA, we compare them to the ones obtained with PCA using the same number of components. Figure 7 shows the results we obtained.
Fig.7. Results obtained by the PDM based Human Figure Segmentation using PCA or ICA: Euclidean distances between corrected contour and “good contour” (left), and between corrected contour and measured contour (right) are given.
We can note how the distance to the “good” contour reaches its lowest value after 3 iterations with both methods and then starts to increase in the PCA case when it stays quite stable with ICA. We also can observe how the distance to the measured contour converges in both cases but with a lower value of convergence with PCA than with ICA. This can be explained by the fact that the PCA model fits exactly the blob and its eventual defects while the ICA model corrects them. The distance between the current corrected shape and the previous one is now calculated at each iteration to evaluate the convergence of the results (See Figure 8). In both cases there is convergence, but the ICA method converges faster than the PCA one. It’s mainly due to the fact that ICA method has local variations whereas with PCA the variation is global: for each iteration, the ICA model changes local parts of the shape while the PCA one moves quite all the landmarks.
Fig.8. Results obtained by the Human Figure Segmentation using PCA or ICA: “point to curve” distance between the corrected contour and the previous one.
8
Grégory Rogez, Carlos Orrite-Uruñuela, Jesús Martínez-del-Rincón
5. Conclusions This work shows the potential of the Independent Component Analysis as an analysis tool for extracting local shape variations. Indeed the ICA gives a representation of the training dataset, which consists of vectors that describe local deformations, whereas the vectors obtained by Principal Component Analysis describe global deformations. The first evaluation of the Human Figure Segmentation using ICA produces some encouraging results. Our shape model enables accurate estimation of human figure despite segmentation errors in the input silhouettes and has really good convergence qualities: compared with the PCA method, the convergence is obtained faster. We propose a new metric to measure this convergence. In a future work, we could analyze the possibility of using this convergence in the human detection task, deciding if the input silhouette is human or not. A more complete study would have to be done to test and select the different settings.
Acknowledgments G. Rogez is supported by a FPU grant AP2003-2257 and J. Martínez del Rincón is supported by a FPI grant BES-2004-3741 both from the Spanish Ministry of Education. This work is also supported by a grant TIC2003-08382-C05-05 from the Spanish Ministry of Sciences and Technology.
References [1] A. Baumberg and D. Hogg. Learning deformable models for tracking the human body, in M. Shah and R. Jain (Ed.), Motion-Based Recognition, 3 (Dordrecht: Kluwer, 1997) 39-60. [2] A. Blake and M. Isard. Active Contours, (Springer-Verlag, 1998). [3] C. Orrite-Uruñuela, J. Martínez del Rincón, J.E. Herrero Jaraba, G. Rogez: 2D Silhouette and 3D Skeletal Models for Human Detection and Tracking. ICPR (4) 2004: 244-247 [4] The CMU Motion of Body (MoBo) Database, http://www.hid.ri.cmu.edu [5] A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis. (Wiley Interscience, 2001) [6] Üzümcü, M., Frangi, A.F., Reiber, J.H., Lelieveldt, B.P. Independent Component Analysis in Statistical Shape Models. In Sonka, M., Fitzpatrick, J.M., eds.: Proc. of SPIE. Volume 5032. (2003) 375-383 [7] J. Himberg, A. Hyvärinen and F. Esposito, Validating the independent components of neuroimaging time-series via clustering and visualization. Neuroimage, 22:3, pp.1214-1222, (2004). [8] Bartlett, M.S., Movellan, J.R., Sejnowski, T.J. Face Recognition by Independent Component Analysis. IEEE Trans. on Neural Networks 13 (2002) 1450-1464 [9] A. Hyvärinen and E. Oja. A Fast Fixed-Point Algorithm for Independent Component Analysis, Neural Computation, 9(7), pp. 1483-1492, 1997 [10] B. Everitt, Cluster Analysis. Edward Arnold, London, third Edition (1993). [11] Jaakko Peltonen and Samuel Kaski. Discriminative Components of Data. IEEE Transactions on Neural Networks, accepted for publication.