ARTICULATED REGISTRATION OF 3D HUMAN GEOMETRY TO X-RAY IMAGE Sang N. Le
Jayashree Karlekar
Anthony C. Fang
Department of Computer Science National University of Singapore
[email protected]
[email protected]
ABSTRACT A fully automatic 3D-2D articulated registration algorithm for aligning a whole-body human geometry to a 2D X-ray image is presented. Domain prior in the form of a hierarchical model encapsulating the kinematic structure of the human body is exploited. A deformation operator is then defined with respect to the hierarchical model. An iterative search using ICP is performed to estimate the optimal deformation parameters. Empirical results indicate good matching rates of 94% and above. Requiring minimal user intervention, the method is suited for large-scale data acquisition in human anthropometric studies. Index Terms— 3D-2D registration, articulated registration, model-image registration, human geometry segmentation, human stick/wire model 1. INTRODUCTION In diagnostic imaging, information provided by various imaging modalities is often complementary. Consequently, the registration of images acquired from different sources is of particular interest [1]. Beyond image domains, registration is often extended to relate image space to physical space where spatial models are transformed to align with their corresponding forms in image space. Such transformations are common in atlas-assisted preoperative planning and are also used when quantifying the anomalies of anatomical shapes. Anatomy-based image-to-space registration is typically confined to the volume enclosing the structure of interest. For rigid structures such as bones, registration is relatively straightforward and reduces to finding the appropriate translation and rotation parameters. For deformable structures, non-linear optimization is often used to search for the optimal transform. In either case, finding the registration parameters typically requires correspondence features such as anatomical landmarks [2], contour points and surfaces [3], voxel intensities [4] or some combination of the above. In this paper, we address the problem of articulated registration. In this domain, multiple rigid segments are interconThis research was supported by grant R-252-000-256-305 from A*STAR/SERC.
[email protected]
nected at joints where skin and soft tissues deforms elastically to maintain surface continuity across bends [5]. The registration of these articulated bodies to images poses a unique challenge beyond the realm of anatomy-based registration. The possibility of self-occlusion and the constraint that usually only one projection image is available renders the task significantly more challenging (and ill-posed) than the rigidobject-to-image registration problem. Existing approaches to articulated registration target specific anatomical structures such as hand/fingers [5], spinal chord [6, 7] and histological slices [8]. A majority of these methods rely on manual intervention to identify anatomical regions of rigid and non-rigid parts in order to prescribe a parametrized transformation to each region. Problems of similar nature arise in model-based human motion capture where model-to-image registration is done to prepare the system states for motion tracking. However, in order to function well under widely varying conditions, the geometric priors in vision systems are typically gross approximations, employing primitives such as cylindrical or ellipsoidal blobs [9]. This paper presents a single-image 3D-2D registration algorithm that non-linearly deforms an articulated human geometry to a full-body X-ray image. The system is designed to measure human anthropometry by combining separately acquired distributions of physical mass and spatial volume. The registration process thus reconciles the multimodal information into a single space. We exploit domain prior to ensure that the process is fully automatic, requiring no manual identification of correspondences or placement of landmarks (§2). The method is thus feasible for large-scale anthropometry studies where manual processing of individual subject data is a major concern. Our domain prior, in the form of a hierarchical stick model, encapsulates a kinematic structure with degrees of freedom (DOF) pertaining to a human. The model segments the articulated geometry into piecewise rigid parts, which are then registered to image space in a hierarchical manner. The proposed approach uses the contours of the projected geometry as features to estimate the transformation parameters. The results of the algorithm are presented in §3. §4 presents conclusion and discussion.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 1. (a) 2D X-ray image I(x, y); (b) 3D geometry G; (c) Domain prior in the form of stick model; (d) Skeleton embedded into geometry G; (e) Orthographic projection of G with color-encoded regions; (f) External body contour IC of X-ray image I(x, y). 2. 3D-2D ARTICULATED REGISTRATION A novel 3D-2D intra-subject registration approach between 2D X-ray image I(x, y) (Fig. 1(a)) and the external geometry G of the subject is presented in this section. The image I(x, y) is obtained from a full-body X-ray scanner (Hologic QDR-1000/W), while G is obtained from a commercially available laser-based scanner (Fig. 1(b)). The general correspondence problem between the image I(x, y) and geometry G is plagued by issues of ambiguity. Although I(x, y) typically captures the subject in a supine pose, the pose associated with G may be very different. Given the complexity of the human form, direct shape matching approaches are prone to local minima and false matches. Furthermore, a general freeform deformation model would afford excessive DOF allowing G to contort in unnatural and physically impossible ways. Our approach estimates the best-fit deformation parameters for articulated geometry by employing an articulated deformation operator with domain prior. Essentially, the domain prior restricts the space of allowable deformations to those that are physically plausible. The pose ambiguity issue is resolved with an assumption that the geometric polyhedral G is a closed and oriented manifold with no self-occlusions in the direction of projection. 2.1. Skeleton Fitting to G Articulated registration begins with the embedding of a stick model, thereby segmenting the object into piecewise rigid parts. Due to the complex nature of articulated objects, skeleton fitting is traditionally carried out manually by first marking the anatomical joint locations and then establishing relationships between them. [5, 6, 7]. A semi-automatic method proposed in [10] extracts the stick model (up to a scale) from user defined 2D landmark points by using statistical anthro-
pometric information whereas the approach presented in [11] assumes domain prior in the form of a stick model to locate joints in 3D geometry. To make the articulated registration process fully automatic, a stick model encapsulating the kinematic constraints of human body is assumed. The model consists of 17 sticks and 13 joints as shown in Fig. 1(c). Joint 1 has 6 DOF while rest of them have 3 DOF each. The stick model is then embedded into geometry G using the approach presented in [11]. Fig. 1(d) illustrates the typical result after the skeletal prior is embedded in G. The vertices of G are then partitioned with respect to each vertex’s relationship with the underlying skeletal structure (Fig. 1(e)). The segmentation is performed by assigning a unique color k to each vertex vj such that wjk = maxi {wji }, where weights wji expresses the degree in which vertex vj ∈ G is influenced by bone i of the skeleton. As a consequence, vertices that are predominantly influenced by the same bone are clustered together. 2.2. Contour Matching and Transformation With the embedded kinematics structure coupled with an associated deformation operator, G can now be transformed within the space induced by the operator. The domain prior limits these deformations to the plausible ones by excluding deformations that break the geometric continuity of G. The next step estimates the optimal parameters such that the associated transform is an optimal registration between G and I(x, y). Since the deformation subspace is non-linear, we can expect a solution that is by and large locally optimal. Contours and silhouettes are by far the most stable and readily available features on an object. We thus formulate the parameter searching as a function of geometric contour matching.
(a) 54% → 96%
(b) 52% → 95%
(c) 49% → 96%
(d) 49% → 94%
(e) 51% → 94%
Fig. 2. (a)-(e) Geometry G in various starting poses and skeleton progression during articulated registration. The overlap measures before and after registration are shown for each pose. Geometry after registration is also shown in (a) and (b). Despite the assumption of no self-occlusion, our algorithm is able to register reasonably self-occluded poses (d, e) to the X-ray image. Let T i ∈ SE(3) denote the world transformation of segment i of the skeleton. Transformations are obtained by point correspondences between the silhouette of the projection of geometry G, which we denote Π(G), and the external body contour IC extracted from the X-ray image I(x, y) (Fig. 1(f)). To obtain transformation T i for segment i, silhouette segmentation is performed by rendering each rigid part of geometry G under orthographic projection (Fig. 1(e)). The contour IC is obtained using a flood-fill algorithm starting from a point inside body trunk. Let the set of generalized parameters T i ∈ SE(3) corresponding to all segments be an m-dimensional vector Θ ∈ Rm . The 3D-2D registration problem thus involves finding a set of values Θ such that the silhouette of the deformed G matches the external body contour IC : arg min kΠ(G) − IC k2 .
(1)
Θ
To simulate the lying position of the subject, we include a global bias towards poses that minimize the geometry depth, P v j j |z , subject to vj |z > 0, for all vertices vj ∈ G. Finally we solve for the deformation parameters Θ through a gradient-descent minimization of the ICP (iterative closest point) distance penalty function [12]. 2.3. Hierarchical Registration After obtaining the world transformations T i , the registration of G involves transforming each vertex vj ∈ G as a weighted sum of all transformations: X wji T i (vj ), (2) vj0 = i
where T i ∈ SE(3) is the world transformation of body segment i and wji expresses the degree in which its position is
influenced by the transformation T i . This linear “blending” of geometry against multiple joint transformations is one of several ways to obtain smoothly deforming geometry around articulated joints [13]. Although the scheme is susceptible to artifacts such as collapsing volumes, our experience indicate that the registration technique does not result in “twists” which are severe enough to cause degenerate deformations. A final detail is to ensure that G and IC are appropriately scaled. We introduce a uniform scale factor in G such that its projected area matches the fixed area of IC . The 3D-2D articulated registration algorithm is summarized as follows: Algorithm ArticulatedRegistration( G, I ) 1. Embed skeletal prior into G to obtain vertex weights wji for each vj ∈ G 2. Segment G using weights wji 3. Obtain contour IC corresponding to I(x, y) 4. While (improvement to overlap% is significant): (a) Obtain segmented contours Π(G) by rendering G with color-encoded segments (b) Compute ScaleFactor as the ratio of areas enclosed by IC and Π(G). Scale G by ScaleFactor. (c) Apply ICP on each segmented contour in a hierarchical order, starting with the torso and branching outwards to the limbs and extremities.
3. RESULTS Quantitative results for articulated registration of geometry G and X-ray image are presented. With only a single view X-ray image, the transformation parameters of T i are constrained to
(a) 54%
(b) 73%
(c) 86%
(d) 91%
(e) 94%
(f) 96%
(g) 96%
Fig. 3. (a)-(g) Empirical results illustrate the intermediate states of the transformed mesh at iterations 1–5,10,30. Top row: G is superimposed on IC highlighting the degree of overlap. The overlap measure is the intersect-to-union ratio of the areas covered by the two patches. The algorithm begins with an overlap percentage of 54% and converges to 96% near the tenth iteration. lie in-plane giving rise to 3 DOF (1 for rotation and 2 for translation) for joint 1 and 1 DOF for the rest of the joints. Image I(x, y) is obtained from a DXA scanner (Hologic, QDR-1000/W) at a resolution of 106 × 150 pixels, with each pixel representing a rectangular area of 0.618cm × 1.303cm resulting in a whole body scan area of 65.5cm×195.5cm. Geometry G typically has 20000 − 25000 vertices and 50000 − 60000 triangles. Fig. 2 depicts the skeleton progressions during articulated registration for various starting poses. Geometry and skeleton are more spread out initially. Fig. 2(a) and (b) also show the geometry G after registration, which are aligned with I(x, y) without being contorted. Note that self-occluded poses in Fig. 2(e) and (f) are successfully registered to the X-ray image. Fig. 3 illustrates the iterative results for articulated registration of Fig 2(a). The initial percentage of overlap between G and image I(x, y) was 54% which increased to 96% after registration. 4. CONCLUSION We have presented a fully automatic 3D-2D articulated registration algorithm for the whole human geometry. Empirical results show that the proposed articulated registration algorithm performs reasonably well for various different starting poses as compare to target image which could not have been possible in case of segment-wise registration. As the algorithm inherently uses ICP, it may not perform well if overlap between target and source is very small. In the current framework we assumed in-plane rotation only as extraction of arbitrary rotation parameters may not be feasible due to the constraint of a single image, non-availability of other landmark features and the highly articulated nature of the subject. The problem addressed here is similar in context to model-based markerless human motion capture [9] in which motion capture is achieved by minimizing the projection of geometry G with silhouettes in multiple views as compared to single view.
5. REFERENCES [1] J. A. Maintz and M. A. Viergever, “A survey of medical image registration,” Medical Image Analysis, vol. 2, no. 1, pp. 1–36, 1998. [2] H. J. Johnson and G. E. Christensen, “Consistent landmark and intensity-based image registration,” IEEE Trans. on Medical Imaging, vol. 21, no. 5, pp. 450–461, 2002. [3] J. Feldmar, N. Ayache, and F. Betting, “3D-2D projective registration of free-form curves and surfaces,” Computer Vision and Image Understanding, vol. 65, pp. 403–424, 1997. [4] W. M. Wells, P. Viola, H. Atsumi, S. Nakajima, and R. Kikinis, “Multimodal volume registration by maximization of mutual information,” Medical Image Analysis, vol. 1, pp. 35–51, 1996. [5] M. A. Martin-Fernandez, E. Munoz-Moreno, M. Martin-Fernandez, and C. Alberola-Lopez, “Articulated registration: Elastic registration based on a wire-model,” in Medical Imaging 2005: Image Processing, J. M. Fitzpatrick and J. M. Reinhardt, Eds., 2005, pp. 182–191. [6] A. du Bois d’Aische, M. De Craene, and S. K. Warfield, “An articulated registration method,” In Proc. Int. Conf. on Image Processing, 2005. [7] J. A. Little, D. L. G. Hill, and D. J. Hawkes, “Deformations incorporating rigid structures,” Computer Vision and Image Understanding. [8] V. Arsigny, X. Pennec, and N. Ayache, “Polyrigid and polyaffine trasnformations: A new class of diffeomorphisms for locally rigid or affine registration,” Proc. of MICCAI, vol. 2, pp. 829–837, 2003. [9] D. A. Forsyth, O. Arikan, L. Ikemoto, J. OBrien, and D. Ramanan, “Computational studies of human motion: Part 1, tracking and motion synthesis,” Foundations and Trends in Computer Graphics and Vision, vol. 1, no. 2/3, pp. 77–254, 2005. [10] C. Barron and I. A. Kakadiaris, “On the improvement of anthropometry and pose estimation from single uncalibrated image,” Machine Vision and Applications, vol. 14, no. 4, pp. 229–236, 2003. [11] I. Baran and J. Popovi´c, “Automatic rigging and animation of 3d characters,” ACM Trans. Graph., vol. 26, no. 3, 2007. [12] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” Int. Conf. on 3D Digital Imaging and Modeling, pp. 145–152, 2001. [13] J. P. Lewis, Matt Cordner, and Nickson Fong, “Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation,” in Proc. 27th annual conf. on comp. graph. and interactive techniques, 2000, pp. 165–172.