Wavelet Transform-Based Hierarchical Active Shape Model for Object Tracking Hyunjong Ki, leonho Shin, and Joonki Paik. Image Processing and I nte1\i gent Systems Laboratory, Department of Image Engineering
Graduate School of Advanced Imaging Science, Multimedia, and Film, Chung-Ang University 221 Huksuk-Dong, Tongjak-ku, Seoul 156-756, Korea Tel: +82-02-820-5300, Fax: +82-02-817-5300 E-mail:
[email protected].
Abstract- This paper proposes a hierarchical approach ASM using wavelet transform. Local structure model fitting in the ASM plays an important role in model-based pose and shape analysis. The proposed algorithm can robustly find good solutions in complex images by using wavelet decomposition. We also des c ri be how to apply Kalman filterin g for ASM-based, real-time video tracking. The proposed algorithm has been tested for vario us sequences cont ainin g human motion to demonstrate the improved performance of the proposed object tracking. to active shape model
I.
The rest of this paper is organized as follows. In section 2 fundamental theory of the active shape model is revisited. In sections 3 and 4 we propose the wavelet-based hierarchical ASM with combination of Kalman fi lteri ng for real-time
tracking. We fin ally present the experi mental results and conc l usion s in sections 5 and 6, respectively.. II.
BASIC THOERY OF THE ASM
The ASM manipulates a shapc model to describc the location and structure of an object in a target image. This
INTRODUCTION
Model-based shape analysis methods make use of a
approach includes two main steps: (i) statistical model ing of
shape and appearance, and
(ii)
interpretation of the model to
prior model of what is expected in t he image, and
a na lyzed the new images. The ASM-based approach offers
the data from th e input image. Active shape model
practical applications, particularly in the areas of robotics,
typically attempt to find the best match of the model to
(ASM) refers to a method for such model-based shape analysis. An ASM algorithm iterati v ely deforms the currently obt aine d shapes to fit to image. This iterative
update, referred as local structure model fitting, plays an impor tant role in model -b ased pose and sh ape analysis.
The l ocal structure model fitting givcs ASM sign ificantly increased efficiency and flexibility.
In spite of the
generally accept able p erformance for sh ap e analysis, the
potentially acceptable and st able solution to a variety of
medical imaging, and video tracking.
A.
Landmark Points If a sha pe is described by n p oints in the dimensional shape
space, we repres ent the shape by an nd el ement vector formed by concatenating the elements of the individual landmark
point(LP) position. In n
ASM often requires non-trivial computational overhead Wavelet
transform-based
enables
to
hierarchical
reduce
decomposition,
computational
load
significantly. This method can robustly find good shapes in complicated images by using wavelet's subband properties. In this paper we present a
based
t
wave le
transfonn -
hierarchical ASM algorithm to model local structure of
the object in the image. The main source of inspiration
of our
work is initial experiments of Wolstenhole and Taylor[2J. A
multi-resolution implementation using wavelet coefficients
was provi ded by Christo and Xiao [3]. In that method the image to be segmented is smoothed by a filter followed by sampling to obta i n a sequence of m ulti scale version of input image. len-chang and Wen [4] ex tended the
wavelet approach for object segmentation and tracking to widely used generative model; Active contour model.
The rest of this paper is organized as follows.
0-7803-8639-6/04/$20.00 ©2004 IEEE.
two-dimcnsional image, we represent
X=[XI,X2 .. ··'Xn'yl' .
due to complicated processings throughout modeling.
however,
a
LPs by a 2n-dimensional vector as
. ·'ynf.
(I)
The number of LPs should be determined according to a specific application. For ins t ance in order to locate and track a
human body, only position and size of an object are of interest. So small number of LPs a re enough for efficient, fast trackin g .
On the other hand, if we want to analyze identity or
expression a human face, we need to use more number of LPs
[4}. 8.
Alignment ofthe Training Set
The
modeling process
examines
the
statIstIcs of the
coordinates of the labeled po ints over the training sets. In order to compare the equivalent p a i r of points from different
shapes, they must be aligned with respect to a set of axes . The complete alignment requires scaling, rotation, and translation
of training shapes so that they correspond to each other as closely as possible. We aim to minimize the w eighte d sum of squares of di stances
- 256 -
between
equivalent
points
on the
corresponding
pair
of shapcs.
The
iterative
alignmen t
algorithm is summarized in Algorithml .
Algorith m I: Alignment of the Training Set I. Translate each examplc so that ccnter of h'Tavity is at the ori g in.
2. Choosc one example as an initi al estimate of the mean shapc and scale so
that �xl
=
I.
3. Record the first estimate as Xo to define the default
re fere nc e fram e .
Algorithm 2: PCA algorithm
I. Compute the mean of the aligned training set,
x=-;;;I 2:Xio _
4. Align all the shapes with the current estimate of the mean shape. S. Re-estimate mean from the aligned shapes.
Ixl
=
2. Compute the covariance matrix, C, of the training set,
I.
e -;=t l f' -T ""' (x, - x}(x, - xl .
Principal component analysis(PCA) is a way of identifying patterns in data and expre ssi ng the data by highlighting their similarities and differences. Given a SlOt of multi variate
observations, application of PCA generates a new set of
called
prin cipa l
components.
(3 )
=
Each
3. Compute the e igen v ectors , ¢i' and eigen values , Ai , of C , Sort the eigenvalues and the corresponding eigenvectors in the descending order .
4. Compute the total variance, that is sum of all ei gen valu es ,
Shape Modeling Using Principal Component Analysis.
variables
.
..I
7. Convergence d c c l ared if the estimate of the m ea n does not chang e significantly after iteratio n , return to step 4. C.
(2)
;",1
6. Apply constraints on the current estimate of the mean by
aligning it with .To and scali ng so that
/II
principal
(4) {=J
S. Choose the first t eigenvalyes such that their sum is larger than a speci fic value, such as
(S)
component is a linear combination of the ori ginal variables. Al l the principle components are orthogonal to each other so there is no redundant information. The p ri n c iple components
as a whole form an orthogonal basis in the data space. This enables PCA to become forms a p ow erful tool for analyzing
where
Iv
represents the factor of the total variation_
6. Model a new sample x as X
=
(6)
x + 0 then L "" L - I . Final re sul t is given by the parameters after c onvergence at level O.
For this experiment, 50 shapes were used as the training set for PCA, and a 7 pixel-wide profile was us ed for each landmark point. The initial landmark points were manually placed as shown in Fig. 6. The length of local profiles determines the search area for minimizing the Mahalanobis distance. The larger the profile length, the wider the search area that can be guaranteed at the cost of incre ased computational complexity. Moreover, the use of effective derivative pro file according to LPs orient at i on is possible to improve model training. In order to show the performance of the proposed method, we tested hierarchical, non-hierarchical, wav el et based hierarchical methods for the image shown in Figure 7. In the existing hierarchical and the proposed wavelet based approach, level 0 represents the· original given resolution, level I the half-sized resolution, and level 2 the quarter-sized resolution. We p er forme d 5 iterations in level 2, another 5 i terations in level I, and finally 10 iterations in level o. In case of the wavelet based approach, we performe d wavelet decomposition using Oaubechies (7,9) filter. For the non-hierarchical approach we performed 10 iteration. In our experiments, we use the sum of differences between the manually assigned and the estimated landmark points as eITor measure. As shown in Table 1, the proposed method give in most c as es better fitting resu lts than the existing method, as exp ected based on the theory. Experiment results of each methods are shown in Figs. 7, 8. As shown in Figs. 7, 8. The proposed method show better fitting results than the existing method with the same profile length. In tracking objects, there are many cases where we have to deal with occlusion problems. Fig. 9 shows that the tracking result carried out on occluded objects. The proposed method p ro vided satisfactory r esu lts on occluded objects (up to 65%).
- 259 -
(b) (a ) Fig. 5. Selected frames from a set of test image:
(c)
(el
(d)
(f)
(a) Indoor_I, (b) Indoor_2, (c) Outdoor_I, (d) Outdoor_2, (e) Outdoor_3, (f) Outdoor_4
Table I.
Mean squared error of the fitted shapes using the ex is ti n g and
the propose d meth0df s orm 'door an d outdoor Images. . Test image
(b) (c) (a) Fig. 7. Model fitting results of the 3 different method s for the
Non-
hierarchical
Hierarchical
Wavelet -based hierarch ical
Indoor I
319.54
274 25
Indoor 2
227.35
193.12
185.24
Outdoor I
323.14
294.84
229.41
Outdoor 2
294.48
254.03
Outdoor 3
25),12
348.54
314.14
291.25
indoor image: (a) Non-hierarchical method,
VI.
(b) Hierarchical method, (c) The proposed method
.
245.45
CONCLUSION
A technique has been presented for recogmzmg and
tracking a moving non-rigid object or person in a video
sequence. The hierarchical representation of shape in terms of its wavelet transform allows statistical shape prior to capture fine
as
well as coarse shape characteristics, along with the
orientation of the landmark points. But all the methods
proposed in this paper fail to capture shape variations that are not presented in the training samples, no m atter how many samples we use to train the modeL However the proposed
method adapts well to changing environment and partial occlusion providing solutions for many practical ap pl i cati on (a)
(b)
(c)
in the areas of i magi ng and video technology.
Fig. 8. Model fitting results of the 3 different methods for the outdoor image: (a) Non-hierarchical method, (b) Hierarchical method, (c) The proposed method
Fig. 9. Result of the occluded image sequences
- 260 -
ACKNOWLEDGMENT
This research was supported by Korean Ministry of Science and Technology under the National Research Lab Project and by the Ministry oflnforrnation and Communication under the Information Technology Research Center Project. REFERENCES
[I]
C.B.H. Wolstenholme an d C.J. Taylor. "Wavelet compression appearance models," in medical image computing and Computer assisled intervention, MICCAI, pp. 544-554, 1999. Christos Davatzikos, Xiaodong Tao, Dinggang S he n. "Hierarchical A ctive Shape Models, Using the W avele t
of active
[2J
no.
Transfonn," IEEE Iransactions on medica! imaging, vol . 22,
3, Marc h 2003.
[3J Jen-Chang
Liu,Wcn-Liang Hwang.
"Active contour model
using wavelet modulu s for object segmentation and tracking
video
sequences."
In/ernalional
Mulliresolution and Informalion 93-113, 2003. [4]
A. Koschan, S.
jouma!
oj
in
Wavelets,
Processing, vol. I, no. I, pp .
Kang 1. K. Paik B. R. Ab id i and M. A. Abidi, ,
,
,
"Color active shape models for tracking non-rigid objects ,"
[51 [6J [7J
[8]
Pattern Recogllition Lellers, vol. 24, pp. 1751-1765, July 2003.
1. Cootes, C. 1. Taylor, D. H. Cooper, and 1. Gragam, of shape form sets of examples," In British
T.
"Training models
Machine Vision Conjerence, pp. 9-18, September 1992.
G. Welch, G. Bishop,
"An Introduction to
Technical Report, Department of Comp.
the Kalman Filter," Sc. and Engg., Univ. of
Norlh Carolina at Chapel Hill, 2002. S. J. McKenna, Y. Raja, S. Gong "Tracking c olou r objects using adaptive mixture models," Image and Vision Computing, vol. 17, pr. 225-231, 1999. T. f. Cootes, A. Hill, C. 1. Taylor C.J., and J. Haslam, "The use of active shape models for locating structures in medical ,
images," Information Processing in Medical imaging, pp. 33A7,
1993.
S. Tanimoto, T. Pavlidis, "A hierarchical data structure for Graphics and Image Processing, vol. 4, pp. 104-119,1975 PO] Q. Tian, N. Sebe, E. Loupias, and T. S. Huang "Image retrieval using wavelet-based salient points," Journal of Electronic Imaging, vol. 10, no. 4, pp. 935-849,2001.
[9]
picture processing," Comp uler
,
[1 I] A. Baumberg, "Hierarchical shape fitting L1sing an iterated linear [12]
filter," Image and Visioll Computing, vol. 16, pp. 329-335. J. Zheng, K. P. Valavanis, 1. M. Gauch, "Noise removal from color images," Intelligellt and Robotic Systems, vol 7, pp. 257-
285, 1993.
[13] M.
Kass,
.
A. Witkin, and D. Terzopoulos, "Snakes: active
contour models," illlerTIationai Journal ojComputer Vision, vol.
1,
4, pp. 321-331, 1988 J ai n. Fundamentals of Digital Image Processing. Pre ntice Hall 1989. [15] Hill A, Taylor C, "Automatic Landmark Generation for Point Distribution Models," Proceedings of the British Machine Visioll Conjerence, 1994. [16J Leymarie F, Levine M, 'Tracking defonnable objects in the plane using an active shape contour model," IEEE Transactiol1S On Pal/ern Allalysis an d Machine Intelligence, vol. 15, !IO. 6, pp. 617-634, 1993
[14] A.
no.
- 261 -