Spine Segmentation in Medical Images Using Manifold ... - IEEE Xplore

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013

1227

Spine Segmentation in Medical Images Using Manifold Embeddings and Higher-Order MRFs Samuel Kadoury*, Member, IEEE, Hubert Labelle, and Nikos Paragios, Fellow, IEEE

Abstract—We introduce a novel approach for segmenting articulated spine shape models from medical images. A nonlinear low-dimensional manifold is created from a training set of mesh models to establish the patterns of global shape variations. Local appearance is captured from neighborhoods in the manifold once the overall representation converges. Inference with respect to the manifold and shape parameters is performed using a higher-order Markov random field (HOMRF). Singleton and pairwise potentials measure the support from the global data and shape coherence in manifold space respectively, while higher-order cliques encode geometrical modes of variation to segment each localized vertebra models. Generic feature functions learned from ground-truth data assigns costs to the higher-order terms. Optimization of the model parameters is achieved using efficient linear programming and duality. The resulting model is geometrically intuitive, captures the statistical distribution of the underlying manifold and respects image support. Clinical experiments demonstrated promising results in terms of spine segmentation. Quantitative comparison to expert identification yields an accuracy of 1.6 0.6 mm for CT imaging and of 2.0 0.8 mm for MR imaging, based on the localization of anatomical landmarks. Index Terms—Articulated deformable models, higher-order Markov random fields, nonlinear manifold embeddings, threedimensional (3-D) spine segmentation.

I. INTRODUCTION

S

TATISTICAL models of shape variability have been successful in addressing fundamental vision tasks such as segmentation or registration in computer vision and medical imaging analysis. Such models help to understand the distribution in appearance of a group of shapes and offer efficient parametrization of the geometric variability in a given cluster. These models have been used extensively for localized structures. On the other hand, object constellations and pose estimation which have been dedicated mainly on body [1] and

Manuscript received December 12, 2012; revised January 14, 2013; accepted January 21, 2013. Date of publication April 25, 2013; date of current version June 26, 2013. This work was supported in part by STEREOS+ of Medicen, INRIA and Fonds de Recherche du Quebec sur la Nature et les Technologies grants. Asterisk indicates corresponding author. *S. Kadoury is with École Polytechnique de Montréal, Montréal, QC, H3C 3A7 Canada, and also with the Sainte-Justine Hospital Research Center, Montréal, QC, H3T 1C5 Canada (e-mail: [email protected]). H. Labelle is with the Sainte-Justine Hospital Research Center, Montréal, QC, H3T 1C5 Canada (e-mail: [email protected]). N. Paragios is with École Centrale de Paris, École des Ponts-ParisTech and INRIA Saclay, Ile-de-France, 92295 Chatenay-Malabry, France (e-mail: nikos. [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMI.2013.2244903

hand-pose estimation [2], demonstrate significant challenges in ensuring consistency in geometric variability. Various statistical approaches for the 3-D modeling of structures are based on shape analysis. Active shape models (ASM) [3] and active appearance models (AAM) [4] have been successful in recovering object geometries obtained from dense collection of data points. Implicit representations are an alternative formulation [5] to address model-based segmentation while more recently, numerous methods based on point distribution models (PDM) and embedding on various geometric spaces (spherical [6]) have been proposed. However, model-based segmentation of single objects typically leads to fitting errors when no clear object boundary is visible, similar structures are in close vicinity, or in presence of pathologies. Simultaneous multi-object inference is often beneficial compared to the separate segmentation of individual objects or sections. In [7], a data-driven method for building a human shape model incorporates both articulated and nonrigid deformation, however includes no prior which fails in cases of occlusions and requires a manual selection of landmarks. On the other hand, expected observations were integrated as log likelihoods in a model combining both pictorial structures and Markov random fields (MRFs) [8], though prior knowledge were not directly inferred from sets of articulations. To this end, deformable models represent powerful tools for recovering the shape of a patient’s anatomy when only partial information or sparse image data is available. In orthopaedics, 3-D spine models generated from medical images can assist specialists in the diagnosis of deformations and for the surgical planning of patients, by providing an accurate segmentation and landmark localization for the complex articulated spine segment. A number of methods for segmentation of vertebrae and vertebral structures in CT and MR images were proposed in the literature. Deformable models combining geometrical shape knowledge [9] or a training set of edge detectors [10], using template matching techniques [11] or based on region growing [12] typically offer only geometrical descriptors for single vertebrae. Parametric modeling was proposed as an alternative to extract relevant clinical measures [13], but still could not provide anatomical landmark locations. Normalized graph-cut approaches limited to single cross-sections were also presented to segment MR images [14], [15]. One drawback is that all of the above mentioned methods treat each vertebra individually instead of as a global shape which remains a considerable drawback in segmental spinal surgery where intervertebral orientation and translation vary substantially. For example, by applying a statistical modeling of intervertebral transformations with gradient-based appearance, authors in [16] achieved

0278-0062/$31.00 © 2013 IEEE

1228

detection and segmentation of the spinal cord from CT images. However, optimization is also based on gradient-descent, prone to nonlinearity and local minimums. The main limitation of these methods, besides dependencies on initialization and landmark selection, is that traditional bodypose estimation methods are prone fail in medical imaging. The aim of articulated models is to deduce the anatomical structure instead of providing the most appropriate pose configuration for object constellations based on prior information [17]. Given the high dimensionality and complex nonlinear transformations of the underlying structure, commonly used linear statistics are inapplicable for articulated structures. Recent concepts on Riemannian manifolds in tensor spaces is particularly suited for this application [18] and applied for object segmentation with nonlinear models [19]. Nonlinear embeddings have been investigated in numerous studies based on probabilistic Gaussian [20] or spectral latent variables [21]. Manifold learning algorithms such as Laplacian Eigenmaps or locally linear embedding [22] maps high-dimensional observation data that are presumed to lie on a nonlinear manifold, onto a single global coordinate system of lower dimensionality. It preserves neighborhood relationships of similar object geometries, thereby revealing the underlying structure of the data which can be used for statistical modeling. On the other hand, inferring a model from the underlying manifold is a novel concept but far from being trivial, and relies on a cost function which includes visual support and prior constraints. Our aim is to overcome these major limitations by offering a generic framework which captures global variability in a nonlinear embedding to infer new articulated spine models with annotated landmarks from manifold space, without requiring user interactivity. In this paper, we introduce a deformable articulated body instantiation method through a statistical modeling of inter-object transformations. We use nonlinear manifold embeddings created from a training set to infer constellations which account for both small and large deformations. Our principal contribution lies with regards to the representation of the model and using higher-order MRFs (HOMRF) for inferring articulated objects directly from low-dimensional parameters. While in the author’s previous work, manifolds were used as a prerequisite step to reconstruct a model [23], they represent the centerpiece of proposed workflow where the low-dimensional space describes the optimization domain for possible solutions of the inferred articulated shape model. The graph involves costs related both to the data and prior geometrical dependencies, as well as higher-order cliques [24]. Recent advances in the area of discrete optimization which explore the duality theorem of linear programming [25] are exploited to obtain the lowest potential of the designed MRF objective function. We now describe the framework of the approach where the outline is illustrated in Fig. 1. The first phase consists of creating a nonlinear, low-dimensional embedding from a training dataset of annotated articulated spine models completed with triangulated mesh models. Local neighborhoods are determined with a pairwise articulated distance metric using intervertebral transformations. From these local manifold regions, individual shape variations are learned given the strong correlation between global deformation and local


Fig. 1. Flowchart diagram of the proposed manifold-based articulated shape segmentation method used for the extraction of the spinal column from medical images.

shape morphology. An integrated and interconnected HOMRF graph is used as basis to determine the optimal manifold coordinates, driven by the target image data. This graph involves costs related to image support, prior geometrical dependencies and cliques describing local shape variations. Vertebral mesh vertices are adapted through the higher-order potentials via a learning-based method to estimate optimal boundary locations with the use of feature functions. A careful selection of the intrinsic dimensionality and parameter settings is performed to properly model the nonlinear space. II. MANIFOLD EMBEDDING OF ARTICULATED SPINES The input to our method is a sample of articulated spine models which comprises a set of learning shapes. These shapes are a constellation of vertebral triangular meshes, each annotated with anatomical landmarks defined as characteristic points uniquely localized across a set of objects. We first build an articulated shape manifold from a training database by embedding the data into a low-dimensional sub-space which dimensionality corresponds to the domain of admissible variations. Local vertebral appearance is determined via analysis of variations within a sub-patch of the manifold. A. Representation of Articulated Deformable Models The geometric model of the spine consists of an interconnection of objects. For each local shape , we recover a triangular mesh with vertices , where the th vertex corresponds to approximately the same surface location from one shape to another. Additionally, every is annotated with landmarks on each personalized model to rigidly register each object to its upper neighbor. The resulting rigid transforms are stored for each inter-object link. These transforms can also be determined via an ICP-like algorithm

KADOURY et al.: SPINE SEGMENTATION IN MEDICAL IMAGES USING MANIFOLD EMBEDDINGS AND HIGHER-ORDER MRFs

1229

to recover the extrinsic parameters. Hence, an articulated deformable model (ADM) is represented by a series of local inter-object rigid transformations (translation and rotation) between each vertebra. To perform global shape modeling of the shape , we convert into an absolute representation

and where the canonical representation encodes the intrinsic orientation parameters. The difference between analogous articulations is computed within the geodesic framework

(1)

The first term evaluates intrinsic distances in the norm. Using the geodesics, it is possible to define a diffeomorphism between rotation neighborhoods in and a tangent plane . The exponential map at transforms vectors of the tangent plane to a point in the manifold which is reached by the geodesic in a unit time. In other words, if , then the inverse mapping is known as . The distances are therefore computed with the following norm based on the geodesic distances in are nonsinthe manifold. This is feasible since rotations gular, invertible matrices. One can now proceed to the manifold reconstruction using the local support in high-dimension data. 2) Embedding Algorithm: The manifold reconstruction weights are estimated by assuming the local geometry of the patches can be described by linear coefficients that permit the reconstruction of every model point from its neighbors. In order to determine the value of the weights, the reconstruction errors are measured using the following objective function:

using recursive compositions. The relationship between the shape model and the ADM is that the articulation vector controls the position and orientation of the object constellation . The ADM can achieve deformation by modifying the vector of rigid transformations, which taken in its entirety, performs a global deformation of the spine. The transformations are expressed in the local coordinate system (LCS) of the lower object, which can be defined proper to the object’s main axes of deformation. Center of transformation is located at the midpoint of the mesh. The rigid transformations are the combination of a rotation matrix and a translation vector . The scaling factor is normalized and is thus not considered. We formulate the rigid transformation of a triangular mesh model as where . Composition is given by . B. Nonlinear Manifold Embedding of ADMs Let us consider articulated shape models expressed by the , of dimensionality . The absolute vector representation aim is to create a low-dimensional manifold consisting of based on [22]. points , , where In such a framework, if an adequate number of data points is is considered to be available, then the underlying manifold “well-sampled”. Therefore, it can represent the underlying population structure. In the sub-cluster corresponding to a pathological population, each point of the training set and its neighbors would lie within a locally linear patch. 1) Nearest Neighbor Selection: The main limitation of embedding algorithms is the assumption of Euclidean metrics in the ambient space to evaluate similarity between sample points. In our approach, we define a new metric in the space of articulated structures that accommodates for anatomical spine variability in the pathological population. It adopts the intrinsic nature of the Riemannian manifold geometry allowing us to discern between articulated shape deformations in a topological closest neighbors are selected for invariant framework. The each point using a distortion metric which is particularly suited which for geodesic metrics, and is defined as . As such, estimates the distance of articulated models and are represented by the feature vectors described in (1). The distance measure can therefore be expressed as a sum of articulation deviations

(2)

(3)

(4) if not neighbor for every .

(5)

is the absolute vector describing an articulated Here, model described above and sums the squared distances between all data points and their corresponding reconstructed points. The weights represent the importance of the th data point to the reconstruction of the th element. The algorithm maps each high-dimensional to a lowdimensional . These internal coordinates are found with a cost function minimizing the reconstruction error

(6) with as a sparse and symmetric matrix enclosing the reconstruction weights such that , and spanning the ’s. The optimal embedding, up to a global rotation, is obtained from the bottom eigenvectors of and helps to minimize the cost function as a simple eigenvalue problem. The eigenvectors form the embedding coordinates. The coordinates can be translated by a constant displacement without affecting the overall cost . The eigenvector corresponding to the smallest eigenvalue corresponds to the mean value of the embedded data , . This can be discarded with to obtain an

1230

embedding centered at the origin. Hence, a new ADM can be inferred in the embedded -space as a low-dimensional point by finding its optimal manifold coordinates . 3) Inverse Mapping to the Ambient Space: To obtain the articulation vector for a new embedded point in the ambient space (image domain), one has to determine the representation in high-dimensional space based on its intrinsic coordinates. We first assume an explicit mapping from manifold space to the ambient space . The inverse mapping of is then performed by estimating the relationship between and as a joint distribution, such there exists a smooth functional which belongs to a local neighborhood. Theoretically the manifold should follow the conditional expectation


and its neighbors, the local shape model , representing the element of the ADM, is obtained by building a particular class of shapes given the set of examples . We approximate the distribution of the shape using a parameterized linear model by computing the principal vectors of variation from the shape samples. We compute the eigenvalues and corresponding eigenvectors so that a new vertebra can be instantiated, where is the mean shape of the neighboring local objects and the weight vector. The weight vector represents the unknown variables in the problem at hand and will be assigned with label sets (see Section III-B) in order to optimally warp individual instances for new local shape models. III. SEGMENTATION THROUGH HOMRF OPTIMIZATION

(7) which captures the overall trend of the data in -space. Here, both (marginal density of ) and (joint density) are unknown. Based on the Nadaraya–Watson kernel regression [26], we replace densities by kernel functions based in a conditional expectation setting such that and [27]. The Gaussian regression kernels require the neighbors of to determine the bandwidths so it includes all data points ( representing the neighborhood of ). Plugging these estimates in (7), this gives

(8) By assuming is symmetric about the origin, we propose to integrate in the kernel regression estimator, the manifold-based distortion metric which is particularly suited for geodesic metrics and articulated diffeomorphisms. This generalizes the expectation such that the observations are defined in manifold space

(9) which integrates the distance metric defined in (3) and updates using the closest neighbors of point in the manifold space. This constrains the regression to be valid for similar data points in its vicinity since locality around preserves locality in . C. Local Shape Appearances in the Manifold The key idea of capturing local vertebra appearances lies on the assumption that global deformations, represented in a local neighborhood of , will also manifest similar local geometries due to the same type of pathological deviation affecting shape morphology. The motivation stems from the fact that global shape deformations belonging to the same class will induce similar local shape patterns (for example, wedging effect). We assume here that local appearances follow a linear distribution within the low-dimensional manifold. Hence, given a data point

Once an appropriate shape modeling is determined (globally and locally), a successful inference between the image and manifold must be accomplished. We describe here how a new ADM is deformed, the similarity criterions as well as the adopted optimization procedure which infers the shape model to the sparse data. We first search the optimal embedded manifold point of the global ADM, by applying in the nondisplacement vectors belonging to linear manifold space. Such a strategy offers an ideal compromise between the prior constraints. Second, we seek the optimal weight vector by applying potential weights relating to individual shape variations described in a localized sub-patch. Bringing these two objectives together, the segmentation of the model to the image , given an initial model (corresponding to in manifold space), is obtained by

(10) The energy of inferring the model in the image is modeled by both global shape and local object in a graph representation defined in a low-dimensional domain. This involves a datarelated term expressing the image cost and a measuring the deformation between global prior term low-dimensional vectors in a neighborhood . Furthermore we introduce a higher-order term , expressed by clique variables , to link together the principal modes of variation for a vertebral shape into higher-order quadratic functions. The energy can be decoupled to a global and local optimization scheme controlled by the weighting parameters (11) The output is the shape model , i.e., a constellation of vertebral triangular meshes, each annotated with characteristic anatomical landmarks. The inferred manifold point describes the ADM (the articulated pose estimation), while the estimated controls the appearance of each vertebra. We explain in this section how we define the global and local energy terms. A. Rigid Alignment of the ADM The global alignment of the model with the target (i.e., image data) primarily drives the deformation of the ADM


in the first phase of convergence, controlled by the -term such that weights decrease when the neighbors in a manifold region stabilize. The purpose here is to estimate the set of articulations describing the global model shape by determining its optimal representation in the embedded sub-space. This is performed by reformulating the global representation using the inverse mapping in (9) so with as deformations in ambient space . This represents the model in image space based on its manifold space coordinates of . The global cost is expressed as

1231

B. Nonrigid Adaptation of Local Shapes We now provide the framework to determine weight values applied to the local shape geometry for each of the ADM’s components (shapes ). The weights applied to each principal mode of variation in a vertebral level, determined in Section II-C, are encoded as the unknown variables in the third term of the HOMRF energy function. We parameterize the ensemble of clique potentials with variables taking on costs if the cliques are added with weight vectors . The third term of (11) is described as a higher-order functional

(16) (12) Since the transformations are implicitly modeled in the absolute representation , we can formally consider the singleton image-related term as a summation of costs associated with the objects of the ADM (13) is a modular term where seeking to minimize the distance between the mesh vertices of the inferred ADM and image data by a rigid transformation of the vectors. This globally modifies the mesh vertices in order to align them with . In (13), is a mesh vertex, is the normal of the vertex pointing outwards, and is the image gradient at location . This modular term effectively measures the strength of the edges over the front facing vertices corresponding to the inferred model. A novel prior constraint for the rigid alignment is introduced to model pairwise potentials between low-dimensional features in manifold space, represented by the second term in (11) (14) and This potential measures the distance between pairs from the current point coordinates to a prior distribution built with the points in the manifold neighborhood . It ensures smoothness of the cost function so that the deformation applied to point coordinates is regular in the vicinity of the model. The pairwise potential returns the probability that the normalized geodesic distance between coordinates of the new point and each of its neighbors in manifold space, belongs to a normal distribution (15) The normalized distance is measured to a density function samples in the neighborhood with determined from the assigning a cost based on the probability that belongs to a Gaussian distribution, and and calculated from the and vectors in .

Independent clique variables , assigned to each of the localized object , are treated as a graph minimization problem. Cliques link the low-dimensional variables (eigenmodes from a local PCA), which describe the principal modes of variation of an individual vertebra level in order to find the optimal vertebral configuration given image data support. To simplify the prior term in (16), the energy term is transformed into quadratic functions of order , based on the eigenvalues in Section II-C from a local model . We therefore parameterize the compact higher-order potentials by a list of labeling deviation cost functions and corresponding costs , with a maximum cost that the potential can assign to any labeling. The potential cost functions encode how the cost changes as the labeling moves away from some desired labeling

(17)

where defines a state variable. We introduce the deviation function , where is the of the clique cost added to the deviation function if variable is assigned label . The function is the Kronecker delta function that returns value 1 if and returns 0 for all assignments of . It should be noted that the higher-order potential (17) is a generalization of the potential defined in (16). This transformation method can be seen as a generalization of the method proposed in [28] for transforming the robust Potts model potentials and can help for difficult optimization problems which are NP-hard to solve. The costs , assigned to each clique representing a vertebral level, sums the cost of each individual triangle of a mesh model within the image domain. The cost for each triangle is obtained by searching along the vertices normal direction to detect the optimal boundary position, determined by a learning-based approach which we will detail in Section IV. It therefore determines the potential cost for driving that particular triangle to the optimal edge, introducing a tradeoff of how far the directed gradient distance can diverge from the statistical shape instance.

1232


In summary, one can integrate the global data and prior terms along with local shape terms parameterized as the higher-order cliques [24], by combining (12), (14), (16)

is a cost assigning function depending and when and

where on the state variable .

Algorithm 1 Minimization procedure of the HOMRF 1:

; set

; set

2: repeat (18) There are two sets of variables to be estimated in the energy min, assigned to the quantized space imization: in manifold coordinates, and relating to the local weight parameters . C. Energy Minimization The optimization strategy of the resulting energy term (18) in the continuous domain is not a straightforward problem. The convexity of the solution domain is not guaranteed, while gradient-descent optimization approaches are prone to nonlinearity and local minimums. We therefore considered results obtained from discrete optimization approaches [25]. By initializing the solution at the origin of the manifold with , and all local shapes initialized with the mean model of their respective vertebral level, we can approximate the deformation of the shape model towards the optimal solution given a desired accuracy and quantization step. If we consider that displacing the coordinates of the sub-domain point by is equivalent to assigning label to , the energy is defined as

(19) The quantization of the label set is crucial to achieve both segmentation accuracy and computational efficiency. We adopt a coarse-to-fine approach which continuously increases the number of displacements while decreasing the search space. Furthermore to account for previously searched labels, an incremental approach is used where at each iteration we look for the set of labels that improves the current solution s.t. , which is a temporal minimization problem. Then (18) can be rewritten as a labeling problem

(20) We solve the minimization of the last term in (20) by transforming the higher-order functionals into quadratic functions -state switching variable which determines [28] using a the deviation function so to assign the lowest cost (21)

3: 4:

;

5: until convergence ; set

6: 7: repeat 8: 9: 10:

;

;

11: until convergence Finally we apply a Primal-Dual algorithm called FastPD [25] which can efficiently solve the problem in a discrete domain by formulating the duality theory in linear programming. Compared to other methods which do not provide optimal solutions or require long computational times to approximate the global minimum, the advantage of FastPD lies in its generality and efficient computational speed. It also guarantees that the generated solution will be the best approximation of the true global optimum without the condition of linearity. The step-by-step structure of the minimization process for (18) is shown in Algorithm 1, with as the primal and dual variables respectively, and the higher-order variables. In Algorithm 1, initializes the primal variables with random labels and initializes the dual variables at 0. During each inner iteration, the main update of the primal and dual variables takes place inside , which solves the max-flow problem in an appropriately constructed graph to find the optimal solution for the label assignment. Once the global shape has converged, performs the transformation of a general -order function to an equivalent quadratic function, which involves the addition of a number of auxiliary variables . These are initialized with random labels. In the final phase, both global and local variables are updated, with solving the high-order problem by assigning labels based on the computed deviation costs. IV. COST OF LOCAL SHAPE ADAPTATION The localized shape variations are triggered once the energy function in (18) has converged towards an articulated pose estimate with a low-dimensional point stabilizing in manifold space. For each vertebral clique, the designed process assigns a cost based on how far each detected mesh triangle feature point lies in the target image to the mean eigenvalue shape. We tackle this problem by opting for a mesh analysis technique


1233

similar to [16], [29], which determines the distance of the model vertices to their optimal locations. The strategy searches for the point in the target image with an optimal boundary criterion based on features learned from expert segmentations. Therefore for each mesh triangle, a search along the normal from with optimal compromise between boundary detection and distance from its original position is performed to determine the cost . The sum for each vertex in the entire vertebral mesh provides the global clique cost . The cost of vertices are therefore determined according to the following search term:

(22) In (22), is the sample space to find the closest detected feature function in which we describe below, the step size and specifies the distance between two points on the normal direction. Since the vertices configuration of the initial vertebral mesh is ideal with respect to geometrical smoothness and anatomical correspondence, we wish to maintain that particular distribution. The weights therefore controls the tradeoff of how far the directed gradient distance can diverge from the statistical shape instance. Because object boundaries are not easily captured, especially in MR imaging due to varying transitions of image intensities, edge detection is performed using a learning-based approach, unlike our previous approach which relied on a set of empirical values [23]. The proposed method computes a pool of edge detector candidates originating from a set of feature functions that were trained from a dataset of manually segmented ground-truth data [30]. This process enables to identify the optimal target position for both CT and MR imaging. It can then differentiate between different candidate positions according to an expected boundary appearance. A set of possible feature function candidates is evaluated in terms of accuracy with which the target points are detected in a simulated search. Object boundaries in general can be detected by projecting the image gradient onto the triangle normal . Each triangle is assigned to the edge detector candidate (from the pool described above), minimizing the boundary detection error given the set of reference meshes. This suppresses the effect of edges which deviate from the expected surface orientation. Furthermore, gradients exceeding a certain threshold can be damped, yielding (23) where represents the vertex coordinates of a target point in the volume. The feature function can be made even more discriminative by encoding additional knowledge that can be used to reject edges if the image violate some criteria as if otherwise

(24) where is a set of image quantities , generating over 300 candidates from all possible combinations, and represents a range of allowed values learned from the training data. If at least one of the image quantities falls outside of the

Fig. 2. Annotated landmarks on the triangulated mesh model of the first lumbar vertebra, with the four pedicle tips and two vertebral body centers.

learned range, the edge will be rejected by setting the feature response to 0. The sign accounts for the expected transition of image intensities along the triangle normal (bright-to-dark or dark-to-bright). These specific intervals, personalized for each vertebra, tend to describe the normal range of cortical bone thickness. V. RESULTS A. Training Data The manifold was built from a database containing 711 scoliotic spines demonstrating several types of deformities. Each spine model in the database was obtained from biplanar radiographic stereo-reconstructions [31]. It is modelled with 12 thoracic and five lumbar vertebrae (17 in total), represented by six landmarks on each vertebra (four pedicle extremities and two endplate center points). These were manually identified by an expert on the radiographic images. Once a 3-D point-based vertebra model is obtained, each vertebra was fitted with a triangulated mesh surface using generic vertebra priors obtained from a serial CT-scan reconstruction of a cadaver specimen. Models were segmented using a marching cube algorithm with 1-mm-thick CT-scan slices taken at 1-mm steps throughout the dry spine and warped to the landmarks with a B-spline deformation. The atlas is composed of 17 cadaver vertebrae (12 thoracic and five lumbar), where meshes were composed between 3831 and 6942 vertices depending on the vertebra level. The same six precise anatomical landmarks (four pedicle tips and two on the vertebral body) were annotated on each triangulated model as shown in Fig. 2. The coordinates for each anatomical points were used to determine the LCS of each vertebral body which defines the position and rotation (i.e., the ground-truth 3-D pose). B. Generating the Manifold As earlier explained, the dimensionality of the locally linear embedding modulates the accuracy of the inferred spine model. Our formulation of the problem is factored by the value of and hence is adapted based on the reconstruction accuracy of the prior models. To evaluate the influence of the parameter, we studied the performance of our algorithm by varying it’s value with respect to residual variance from known model representations and their equivalence from the embedding. We

1234


function (RKF) we proposed in this work, to asses the performance of the regression based in a conditional expectation setting. Results were categorized in five different classes corresponding to different types of deformity. The average AE and MOD errors were lower for RKF than the other kernels, particularly for severe deformations (C2, C4) where the improvement becomes statistically significant . This confirms the advantage of integrating geodesic distance metrics between sample points based on articulation distortions when estimating joint and marginal densities. C. Optimization Framework Fig. 3. Low-dimensional manifold embedding of the spine dataset comprising 711 models exhibiting various types of deformities. The sub-domain was used to estimate both the global shape pose costs and individual shape instances based on local neighborhoods.

TABLE I INVERSE MAPPING ERRORS BASED ON ARTICULATION VECTORS FOR EACH CASE IN THE TRAINING DATASET, GROUPED INTO FIVE CLASSES OF DEFORMATION MODELS. PATIENTS WERE CLASSIFIED AS NORMAL (C1), RIGHT-THORACIC (C2), LEFT-LUMBAR (C3), RIGHT-THORACIC-LEFT-LUMBAR (C4), AND LEFT-THORACIC (C5)

found that errors stabilize at when attaining a prespecified error threshold. The optimal neighborhood size was determined based on the number of significant weights obtained by the minimization of the cost function in (4). Another parameter of significant importance for representing the spine geometry is the number of modes of variation for the local vertebral shapes. We found this parameter to be primarily affected by the global deformation type of the entire spine rather than on the specific vertebral level. The optimal compromise was found when for the entire dataset, however it is reasonable to assume that an adaptable which varies based on a particular spine cluster would yield more accurate results. Fig. 3 displays the resulting embedding from the training data of 711 spine models in . We then tested the inverse mapping function, generating an articulation vector in the ambient space from an embedded point in manifold space. For each manifold coordinate in the training dataset, we obtained the reverse transformation and evaluated the deviation of intervertebral transformation vector to its known parameters. In Table I, we present the quantitative evaluation using three error metrics, namely the angular error (AE) measured in degrees, the magnitude of differences (MOD), similar to maximal differences measured in millimeters and mean centroid distance (MCD). Thus, we can measure the deviation of the inverse transformations compared to the ground-truth. Results obtained from two other kernel functions (Fisher and radial basis functions (RBF)) in addition to the Riemannian kernel

A series of simulation experiments were carried out to determine optimal parameter settings for the HOMRF optimization framework and evaluate the performance of the method under a controlled environment. Our evaluation set consisted of five preoperative scoliotic patients exhibiting different types of mild curvatures (15 –50 Cobb angles), unseen to the training database. The intervertebral transformations of these five spine models were computed to obtain ground-truth data and used for comparison to determine the optimal set of parameters. The image support for the singleton and higher-order data terms was based on point-to-surface distances between the inferred and original models. On each target data, uniformly distributed random noise mm was added to obtain realistic conditions. To analyze the influence of the and weights in the accuracy of the model, we varied their values but kept all other parameters fixed. By increasing , the smoothness of the intervertebral articulations increases, thus obtaining a more coherent global structure with physiologically consistent anatomical curves due to the pairings with the neighboring models. However increasing also hinders the image data support, restraining the global alignment based on the image gradient which affect the vertebra segmentations. Using a lower value for on the other hand allows for better localized shape adaptations but the general shape of the spine diverges since the optimization is primarily driven by the gradient field. As for the term which regulates the mesh analysis process to detect optimal boundary locations based on a set of learned features, a careful selection must be made in order to protect the global optimization. The advantage here is that the parameter remains at as long as the global search is processed. Once the neighborhood within manifold space has stabilized, the third term of the energy function is included in the minimization process. The optimal compromise was found when , while , as shown in Fig. 4(a) and (b). Models were initialized at the centroid of the training set. One observation was the difficulty in the early stages of the process to determine the correct subregion within the embedding which corresponds to the appropriate global deformity class, where the pairwise potentials tend to greatly penalize the energy due to inconsistencies in the neighborhood articulations. To improve the first phase of convergence, a manual classification of the input model into the appropriate manifold cluster allows a better initialization of the model which leads to improved accuracy, as shown in Fig. 5.


Fig. 4. (a) Mean error as function of the parameter (balance between unary and pairwise potentials) used to determine its optimal value. A careful balance between pairwise constraints and image attraction by the unary potential must be made. (b) Mean error as a function of the parameter which dictates the importance of the high-order potentials which drives the part-based adaptation.

Fig. 5. Comparison of the residual mean error with initialization based on a prior classification (dark gray) and with no initialization (light gray).

D. Spine Segmentation From Medical Images 1) Image Data: This final part of the study consisted of testing the proposed approach on two clinical datasets. The first consisted of volumetric CT scans ( , resolution: 0.8 0.8 mm, thickness: 1–2 mm) of the lumbar and main thoracic regions obtained from 21 different patients acquired for operative planing purposes. The MR dataset comprised multi-parametric volumetric data ( , resolution: 1.3 0.9 mm, thickness: 1 mm) of eight patients acquired for diagnostic purposes. For this study, only the T1 sequence was selected for the experiments. All patients on both datasets (29 in total) had 12 thoracic and five lumbar vertebrae. Both CT and MR data were manually annotated with 3-D landmarks by an expert in radiology, corresponding to left and right pedicle tips as well as midpoints of the vertebral body. Segmentation of the vertebrae from the CT and MR slices were also made by the same operator. For each case of the CT dataset, the MRF inference method is applied to segment the CT volume, with and in (22) to adjust individual shape instances. For the MR dataset, we chose and to handle varying image resolution and slice spacing. Quantitative assessment consisted of measuring landmark accuracy, as well as inferred surface distance errors. 2) CT Imaging Experiments: We first evaluated the model accuracy in CT images by computing the correspondence of the

1235

inferred vertebral mesh models to the segmented target structures. As a preprocessing step, a rough thresholding was performed on the whole volume to filter out noise artefacts. The overall surface-to-surface comparison results between the inferred 3-D vertebral models issued from the proposed manifold-based HOMRF framework and from known segmentations were first calculated. The mean errors are 2.2 1.5 mm (range: 0.6–5.4 mm) for thoracic vertebra and 2.8 1.9 mm (range: 0.7–8.1 mm) for lumbar vertebra. A qualitative assessment of the 3-D model from the CT images is presented for the thoracolumbar region in Fig. 6, with the outline of the vertebral body extracted from the inferred mesh model. One could observe accurate delineation of geometrical models on selected multi-planar views. In fact, it is still very challenging to precisely capture the exact vertebra geometry (with transverse and spinous processes) given limited visibility and varying patient morphology. One can observe part-based vertebral body meshes follows the cortical region of the anatomical structures as dictated by the local search algorithm. Fig. 7(a) presents box diagrams with the overall representation of differences based on Dice coefficients, comparing the proposed method with a recently proposed method based on a gradient search of the image using standard linear PCA to capture shape variations. Such a task constitutes the equivalent of using a purely image-based gradient-descent method to align the model, and demonstrates the importance of integrating such high-level constraints during the optimization. Dice scores are slightly lower in the lumbar segment due in part to a greater variability observed in the training population used as prior for building the manifold. This increased variability may therefore add complexity of determining anatomically coherent deformations and incorporate some ambiguity into the pairwise term included in the optimization scheme. The validation results presented above show that the accuracy of our segmentation method is comparable to ground-truth 3-D representation obtained from tomography. Table II presents the results from this experiment with 3-D landmark mean and standard deviation differences to annotations made by an expert in radiology. While these types of data cannot be considered as ground-truth, as the annotated landmarks are also prone to human variability, this gives a good indication on the degree of convergence from the method by computing statistical relevance using paired t-tests. The overall mean difference (method versus expert) for the selected cases was of 1.6 0.6 mm for the pedicle and endplate landmarks, which is a significant improvement to the performance of the PDM at 4.5 2.2 mm. The errors also yields statistically significantly lower standard deviations compared to a manual technique (0.64 mm versus 1.76 mm). 3) MR Imaging Experiments: For the experiments involving the segmentation of 3-D spine models from MR images, an anisotropic filtering combining diffusion and shock filters was applied to the images in order to reduce inhomogeneity due to the magnetic and motion artefacts which hinders the bony boundaries. The surface-to-surface comparison showed encouraging results (thoracic: 2.9 1.8 mm, lumbar: 3.0 1.9 mm) based on differences to ground-truth, while the box-whisker diagrams with the Dice coefficient comparison for each vertebral level is shown in Fig. 7(b). As in the previous experiments

1236


Fig. 6. The final 3-D model of the vertebral body after minimization of the HOMRF function, inferring the global and individual shape parameters of vertebral body, shown in the mid-axial (left), mid-coronal (middle), and mid-sagittal (right) with projected anatomical landmarks. Circles represent landmarks from the PDM approach.

TABLE II LANDMARK LOCALIZATION ACCURACY PROVIDED BY THE INFERRED SPINE MODELS ON CT IMAGES FOR INDIVIDUAL VERTEBRAE CONCERNING BOTH PEDICLE AND ENDPLATE REGIONS IN THORACIC AND LUMBAR SEGMENTS

Fig. 7. Volumetric comparison of inferred mesh models, computing surface errors based on Dice overlap coefficients (manual segmentation versus proposed method) with a gradient-descent appearance model approach (in dark gray). (a) Results from CT imaging. (b) Results from MR imaging.

with CT imaging, ground-truth data was generated by manually segmenting the structures models which were validated by an expert in radiology. The percentages were then compared to a PCA-like segmentation method capturing the linear shape variations as well as a model-based approach using generalized Hough transform and piecewise affine transformations. An illustrative result of the 3-D model from the MR images is shown in Fig. 8. As difficult as the CT inference is, the MR problem represents an even greater challenge as the image resolution is

more limited and inter-slice spacing is increased compared to CT. Modeling of the statistical properties of the shape variations and global pose becomes even more important in this case, as it relies heavily in the nonlinear distribution of the patient morphology. The accuracy of the method is still comparable to ground-truth but not as reliable as in the case for CT imaging. Additionally the local mesh adaptation process brings an improvement to the surface-to-surface accuracy. Table III presents the results from the comparison of 3-D vertebral landmarks to those identified by an expert in radiology. The overall mean squared distances (method versus expert) for the selected cases was of 2.0 0.8 mm for the pedicle body landmarks and of 2.1 0.8 mm for vertebral body landmarks. The global 3-D point landmark difference was 2.0 0.8 mm, which is also a significantly better than a piecewise PDM approach. When we separate the point-to-point differences in specific anatomical regions of the 3-D models issued from the proposed technique, the mean difference was of 2.4 1.0 mm for lumbar and 1.9 0.7 mm for thoracic vertebrae. VI. DISCUSSION AND CONCLUSION We proposed a method to segment articulated deformable models (ADM) from medical images using manifold embeddings. Our main contribution consists in modeling complex, nonlinear patterns of prior deformations in a Riemannian framework which is used to directly infer a new spine model from a patient’s image data. Articulated mesh models are optimized via a HOMRF framework using statistical knowledge in a manifold which captures both global pose and local shape variations. To this end, we introduced a novel conditional regression kernel to map samples in the ambient space based on neighbors selected by an articulated distance metric that incudes intrinsic


1237

Fig. 8. Qualitative assessment of the spine model segmentation results from MR images, with a series of MPR slices with corresponding geometrical vertebral models. Circles represent landmarks from the PDM approach.

TABLE III LANDMARK LOCALIZATION ACCURACY OF THE INFERRED SPINE MODELS FROM MR IMAGES

and orientation properties. One observation of the nonlinear embedding is that it avoids creating shape distortions and eliminates the need to solve large-dynamic programming problems, thus saving computational time and memory space. It incorporates the ability to reproduce in a simplistic fashion within a low-dimensional sub-space the nonlinearity which is present in a distribution of a particular organ or population. Linear analysis performed on localized manifold patches help to learn the variations of individual components. Shape instances are then warped to the image based on gradient edge alignement and learned intensity feature functions. In our previous work [23], a preoperative articulated model was reconstructed with a manifold embedding obtained from a dataset of 3-D spine centerlines. This simple manifold provided sufficient information to initialize the reconstruction process with regards to the global pose estimation prior to the procedure. Due to the intrinsic nature of the data, standard Euclidean distances could also be used to estimate neighborhoods in the embedding. However in the context of an unsupervised segmentation with no prior information from an unseen image, the embedding in [23] could not offer sufficient information for an accurate statistical representation and compensate for highly variable appearances in individual vertebral shapes. The manifold presented here describes the variations in the articulated structure which is much more complex to embed in a low-dimensional domain since the high-dimensional data does not lie in Euclidean space but rather in a Riemannian domain. Towards this end, a geodesic measure was developed to efficiently capture differences in articulated structures within the dataset, which would be representative of the underlying structure. Still, this complex structure was proven to be beneficial to infer not only the global pose, but for the instantiation of local shape as well (Fig. 3). Ultimately, the manifold represents the anchoring point of our approach by proposing a novel but simple scheme for modeling the distribution of shape variabilities: op-

timization within the simplified nonlinear domain of the articulated shape space. Compared to active appearance models using a PCA representation and to a model-based approach using piecewise affine transformations, the proposed method demonstrated the benefit of using a nonlinear manifold representation. It decreased landmark localization errors by 2.9 mm, which is significant for applications requiring high levels of accuracy, such as surgical navigation for example. The method is aimed for obtaining a 3-D articulated shape representation of the spine and to the best of our knowledge, is the first 3-D inference or segmentation method performed directly within manifold space without requiring a prior model as baseline. An additional novelty of the method is provided from the formulation of simultaneous global and local shape inference framework, which creates new shape instances directly from the simplified manifold domain. This is accomplished by analyzing variations in articulations and shape morphology, respectively. This approach significantly reduces the degrees of freedom of the problem. To determine an accurate and personalized articulated geometrical model of the spine, we propose a higher-order optimization technique based on an MRF graphs to infer intervertebral articulations. Ultimately, we believe this approach will help orthopaedists and surgeons to learn the variations of spinal shape and offer a better planning for complex corrective procedures. Modular data terms applied on CT and MR imaging were shown to achieve satisfactory results to capture transitioning regions near vertebral edges. Furthermore, it introduces prior knowledge with respect to the allowable geometric dependencies between the relative position of vertebrae, constituting another promising direction to increase the level of accuracy. Such a concept was accomplished through a piecewise decomposition of each vertebra by a labelling scheme of local eigenmodes which improves the accuracy and the precision of the results. In terms of image segmentation from MRI, the intensity gradients of bone structures are primarily situated at the edges of the vertebral body, driving the alignment of the shape instances towards to the edges of the vertebral body. By including a set of intensity feature functions learned from a training set of segmented MR models, this prevents the alignment of the model to strong intensity gradients of soft tissues. However, to potentially increase the accuracy of segmentation and cover a wider range of deformations, multiple atlases of spine pathologies may be considered.

1238


In previous methods, statistical deformable models were often dedicated to single anatomical structures. Shape analysis of ADMs on the other hand has been sparsely investigated due to the difficulty in constraining the higher number of transformation variables. The method we propose not only allows to infer shape deformations of object constellations using discrete optimization techniques, but can simplify the understanding of a particular pathology. Our framework performs the pose search directly from the learned manifold which facilitates robust pose recovery from noisy and corrupted inputs as well as for reconstructing the input. Furthermore, pose estimation is facilitated by fully taking advantage of recent advances in Potts models instead of an SVD discrete optimization and to solve for new manifold coordinates as described in [32]. The proposed framework can be extended to other applications in vision (human body pose or hand tracking) and medical imaging (articulated lower limbs or forearm [33]), to model object constellations of similar shape and size, where local shapes are mutually dependent. The proposed method promises to facilitate and accelerate quantitative image analysis for clinical diagnostics in MR or CT. Further experiments should be carried out to determine the accuracy of the resulting model with an appropriate initialization by automatically classifying the global pose in the appropriate cluster prior to the optimization phase. Future directions of our research are the extension of the model to enable the extraction of other articulated bony structures such as the lower limbs, as well as the adaptation of the model to different anatomical variants with missing or additional vertebrae. Integration of advanced graphical models in the local mesh adaptation process can potentially increase the delineation of calcified portions of bony structures. Finally, real-time inference of articulated models, motion tracking and image guidance based on higher-order clique decomposition [34] are other directions which would be beneficial towards clinical adoption. REFERENCES [1] M. Andriluka, S. Roth, and B. Schiele, “Pictorial structures revisited: People detection and articulated pose estimation,” in IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 1014–1021. [2] M. de La Gorce, N. Paragios, and D. Fleet, “Model-based hand tracking with texture, shading and self-occlusions,” in IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [3] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active shape models—Their training and application,” CVIU, vol. 61, no. 1, pp. 38–59, 1995. [4] T. Cootes, G. Edwards, and C. Taylor, “Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001. [5] M. Rousson and N. Paragios, “Prior knowledge, level set representations and visual grouping,” Int. J. Comput. Vis., vol. 76, pp. 231–243, 2008. [6] D. Nain, S. Haker, A. Bobick, and A. Tannenbaum, “Multiscale 3-D shape representation and segmentation using spherical wavelets,” IEEE Trans. Med. Imag., vol. 26, no. 4, pp. 598–618, Apr. 2007. [7] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis, “SCAPE: Shape completion and animation of people,” in SIGGRAPH, 2005, pp. 408–416. [8] M. Kumar, P. Torr, and A. Zisserman, “OBJ CUT,” in IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 18–25. [9] A. Mastmeyer, K. Engelke, C. Fuchs, and W. Kalender, “A hierarchical 3-D segmentation method and the definition of vertebral body coordinate systems for QCT of the lumbar spine,” Med. Image Anal., vol. 10, pp. 560–577, 2006. [10] J. Ma, L. Lu, Y. Zhan, X. Zhou, M. Salganicoff, and A. Krishnan, “Hierarchical segmentation and identification of thoracic vertebra using learning-based edge detection and coarse-to-fine deformable model,” in Proc. MICCAI, 2010, pp. 19–27.

[11] Z. Peng, J. Zhong, W. Wee, and J. Lee, “Automated vertebra detection and segmentation from the whole spine MR images,” in Proc. IEEE 27th Annu. Int. Conf. Eng. Med. Biol. Soc., Jan. 2005, pp. 2527–2530. [12] Y. Kim and D. Kim, “A fully automatic vertebra segmentation method using 3-D deformable fences,” Comput. Med. Imag. Graph., vol. 33, pp. 343–352, 2009. [13] D. Stern, B. Likar, F. Pernus, and T. Vrtovec, “Parametric modelling and segmentation of vertebral bodies in 3-D CT and MR spine images,” Phys. Med. Biol., vol. 56, pp. 7505–7522, 2011. [14] S.-H. Huang, Y.-H. Chu, S.-H. Lai, and C. Novak, “Learning-based vertebra detection and iterative normalized-cut segmentation for spinal MRI,” IEEE Trans. Med. Imag., vol. 28, no. 10, pp. 1595–1605, Oct. 2009. [15] J. Carballido-Gamio, S. Belongie, and S. Majumdar, “Normalized cuts in 3-D for spinal MRI segmentation,” IEEE Trans. Med. Imag., vol. 23, no. 1, pp. 36–44, Jan. 2004. [16] T. Klinder, J. Ostermann, M. Ehm, A. Franz, R. Kneser, and C. Lorenz, “Automated model-based vertebra detection, identification, and segmentation in CT images,” Med. Image Anal., vol. 13, pp. 471–482, 2009. [17] P. Felzenszwalb and D. Huttenlocher, “Pictorial structures for object recognition,” Int. J. Comput. Vis., vol. 61, no. 1, pp. 55–79, 2005. [18] P. Khurd, S. Baloch, R. Gur, C. Davatzikos, and R. Verma, “Manifold learning techniques in image analysis of high-dimensional diffusion tensor magnetic resonance images,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–7. [19] D. Cremers, T. Kohlberger, and C. Schnorr, “Shape statistics in kernel space for variational image segmentation,” Pattern Recognit., vol. 36, pp. 1929–1943, 2003. [20] N. Lawrence and A. Hyvarinen, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,” J. Mach. Learn. Res., vol. 6, pp. 1783–1816, 2005. [21] A. Kanaujia, C. Sminchisescu, and D. Metaxas, “Spectral latent variable models for perceptual inference,” in Int. Conf. Comput. Vis., 2007, pp. 1–8. [22] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, pp. 2323–2326, 2000. [23] S. Kadoury, H. Labelle, and N. Paragios, “Automatic inference of articulated spine models in CT images using high-order Markov Random Fields,” Med. Image Anal., vol. 15, pp. 426–437, 2011. [24] N. Komodakis and N. Paragios, “Beyond pairwise energies: Efficient optimization for higher-order MRFs,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 2985–2992. [25] N. Komodakis, N. Paragios, and G. Tziritas, “MRF energy minimization and beyond via dual decomposition,” IEEE Trans. Pattern Anal. Mach. Intel., vol. 33, no. 3, pp. 531–552, Mar. 2011. [26] E. A. Nadaraya, “On estimating regression,” Theory Probabil. Appl., vol. 10, pp. 186–190, 1964. [27] B. Davis, P. Fletcher, E. Bullitt, and S. Joshi, “Population shape regression from random design data,” in Proc. Int. Conf. Comput. Vis., 2007, pp. 1–7. [28] C. Rother, P. Kohli, W. Feng, and J. Jia, “Minimizing sparse higher order energy functions of discrete variables,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1382–1389. [29] M. Kaus, V. Pekar, C. Lorenz, R. Truyen, S. Lobregt, and J. Weese, “Automated 3-D PDM construction from segmented images using deformable models,” IEEE Trans. Med. Imag., vol. 22, no. 8, pp. 1005–1013, Aug. 2003. [30] O. Ecabert, J. Peters, and H. Schramm et al., “Automatic model-based segmentation of the heart in CT images,” IEEE Trans. Med. Imag., vol. 27, no. 9, pp. 1189–1201, Sep. 2008. [31] S. Kadoury, F. Cheriet, C. Laporte, and H. Labelle, “A versatile 3-D reconstruction system of the spine and pelvis for clinical assessment of spinal deformities,” Med. Biol. Eng. Comput., vol. 45, pp. 591–602, 2007. [32] A. Elgammal and C.-S. Lee, “Inferring 3-D body pose from silhouettes using activity manifold learning,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2004, pp. 681–688. [33] S. Kadoury and N. Paragios, Towards shape constellation inference through higher-order MRF optimization in nonlinear embeddings INRIA, Tech. Rep. RT-0376, 2010. [34] H. Ishikawa, “Transformation of general binary MRF minimization to the first-order case,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 33, no. 6, pp. 1234–1249, Jun. 2011.