Fusing Markov Random Fields with Anatomical Knowledge and Shape Based Analysis to Segment Multiple Sclerosis White Matter Lesions in Magnetic Resonance Images of the Brain Stephan Al-Zubia, c, Klaus Toenniesa, c, Nils Bodammerb, d, Hermann Hinrichsb, e a Institute for Simulation and Graphics, bDepartment of Neurology II Otto-von-Guericke University of Magdeburg c {stephan, klaus}@isg.cs.uni-magdeburg.de, d
[email protected] e
[email protected] ABSTRACT This paper proposes an image analysis system to segment multiple sclerosis lesions of magnetic resonance (MR) brain volumes consisting of 3 mm thick slices using three channels (images showing T1-, T2- and PD -weighted contrast). The method uses the statistical model of Markov Random Fields (MRF) both at low and high levels. The neighborhood system used in this MRF is defined in three types: (1) Voxel to voxel: a low-level heterogeneous neighborhood system is used to restore noisy images. (2) Voxel to segment: a fuzzy atlas, which indicates the probability distribution of each tissue type in the brain, is registered elastically with the MRF. It is used by the MRF as a-priori knowledge to correct miss-classified voxels. (3) Segment to segment: Remaining lesion candidates are processed by a feature based classifier that looks at unary and neighborhood information to eliminate more false positives. An expert’s manual segmentation was compared with the algorithm. Keywords: Markov Random Field, Registration, Multiple Sclerosis, Classification, Shape based analysis.
1. INTRODUCTION Multiple Sclerosis is a disease of the central nervous system. It appears as the myelin sheathes protecting nerve axons break down causing plaques. As a result 90% - 95% of lesions occur within white matter tissue. Certain parts of the nervous system are more affected by these MS-plaques: The anterior angles of the lateral ventricles, the corpus callosum and periventriclar areas. Lesion sizes vary considerably in size from a few millimeters to lesions involving the entire Centrum semiovale1. MSplaques in white matter are ellipsoidal in shape. Their intensity is characterized by a bright center with decreasing intensity as we move away from the center.
Figure 1. Left: example of a slice showing PD contrast. Middle: lesion mask obtained by intensity based classification alone and the actual lesion mask. Right: Manual segmentation. As seen much grey matter is falsely classified as lesions.
Segmentation based on intensity alone will not succeed because the intensity histograms of grey matter and lesions overlap in MRI images (See Figure 1). This means that any classifier based on voxel intensity alone will classify some gray matter as lesions and lesions as gray matter. This effect is evident where partial volume effect exists (e.g. where cerebrospinal fluid (CSF) and grey matter are
averaged)2. This makes it necessary to employ higher level knowledge about distribution of grey/white matter in the brain and lesion features. There are a number of different approaches that employ different models for segmentation. Some researchers proposed matching a brain volume with an anatomical atlas to create grey/white mask constraints used to label lesions correctly. Warfield2 proposes segmenting the cortex by region growing and constraining its boundary by an elastically registered anatomical atlas. This is used to make the white matter mask containing lesions and white matter that can be separated by a simple intensity based classifier. In Warfield 4 a statistical k-NN classifier for gray levels is used to classify tissues based on a set of manually selected prototypes. A spatial mask obtained by elastic matching adds spatial features to the k-NN classifier resolving the ambiguity of overlapping histograms. The elastic model is re-aligned with the volume using the improved segmentation. The algorithm iteratively applies k-NN with elastic matching until convergence. Similar work can be found for segmenting tumors in Kaus et al. 3 and Kamber et al. 5. They propose the use of a brain tissue probability model. This model gives the probability distribution of each tissue type within a standardized Talairach space. A set of 12 segmented brain volumes were registered into Talairach space and averaged together to construct the model. This model was used as a mask to confine search within white matter and as a way to provide geometric features used to classify lesions (e.g. probability of ventricular CSF is useful in detecting periventriclar lesions). Another way to classify lesions is using a feature space collected from possible candidate lesions to sort out false positives. Ardizzone6 uses the fuzzy c-means algorithm by first obtaining a set of oversegmented regions followed by a re-clustering phase. The re-clustering uses shape and intensity features to label or split unknown clusters. After this phase three masks corresponding to WM, WM+GM and WM+GM+CSF are built. The holes in these masks correspond to candidate lesions. A feature vector for each candidate is passed through a neural classifier to decide whether it’s a lesion or not. Those features include: Contact with WM and GM and CSF, mean intensity, shape measures like compactness and elongation and position of the lesions like the distance from the ventricular area. Fuzzy connectedness 7 is used to detect connected fuzzy objects representing white, grey and CSF objects. The holes remaining represent locations of potential lesions. False positives are eliminated by operator assistance. Fuzzy connectedness is defined by a measure that computes a connected component such that the strength of connection between any two pixels in the segment is above a certain threshold. The strength of connection between pixels is determined by the distance and intensity homogeneity. Johnson8 applies Iterated Conditional Mode (ICM) algorithm used to segment noisy images with morphological operators. This method consists of initially segmenting the volume by ICM using all labels. The segmentation is post processed to get the white matter + lesion mask. Partial volume results in classifying the areas around lesions as grey matter. These gaps in the mask are closed by binary morphology. The resulting mask is processed again with ICM but this time classifying only two tissue types. Small lesions are eliminated from the final output. Kirshnan9 uses the ratio image of PD/T2 to eliminate magnetic field inhomogeneity. He then applies a series of thresholds to extract the WM, CSF mask. The lesions are identified by intensity thresholding as the brightest spots and then applying those masks to remove false lesions outside white matter. To summarize, many authors aim to find the white matter mask containing the lesions and then segmenting the lesions afterwards. Some apply registration with an anatomical model to determine where grey matter is. Others apply feature based classifiers to eliminate false positive. The method represented in this paper combines alignment with an anatomical model, feature based classification and image restoration using Markov Random Fields. This paper is structured as follows: section (2) introduces the Markov Random Fields which will be used to restore noisy data at a voxel level and later to eliminate false positives for candidate lesion segments. Section (3) will describe the process pipeline to classify MS lesions and surveys some literature pertaining to each step. Section (4) shows the experimental results followed by discussion of future directions.
2. MARKOV RANDOM FIELDS (MRF)
Let Y= {yi,j,k,m 1≤i≤Nx, 1≤j≤Ny, 1≤k≤Nz, 1≤m≤M} be a multi-modal input image of dimension Nx×Ny×Nz and M channels. Let X= {xi,j,k,m : 1≤i≤Nx, 1≤j≤Ny, 1≤k≤Nz} be the corresponding segmentation of Y with each xi,j,k,m taking a value form a set L={1,…l} of labels. Let x∈Ω be a specific segmentation of X. Ω is the space of all segmentations. Let x be an instance of X and similarly y an instance of Y. The objective is to maximize the probability of P(X|Y). Given Bayes’ Theorem P(X|Y)∝ P(Y|X)P(X) we get x * = arg max P(y | x) P (x)
(1)
x∈Ω
The term P(y|x) in (1) is modeled as a multivariate Gaussian additive noise which is independent of location. Let µp, p∈L be the vector representing the mean intensity values for a label (p) and Σp be the corresponding covariance matrix. Then we can define P(y|x) as P ( y | x) =
l − (2π ) 2 i , j ,k
∏
Σx
− i , j ,k
1 2
(y 2
1
exp −
i, j,k
)
(
)
− µ xi , j , k ′ Σ xi , j , k −1 y i , j , k − µ xi , j , k
(2)
The advantage of using the multi-variate normal (2) is that it can better compensate for correlated tissue variations due to partial volume effect 16. P(x) in (1) is the regularization term in which each voxel interacts with its neighbors to restore noisy images by inferring the correct value from neighbors. The homogeneous multi-level logistic (MLL) distribution11 will be used to define P(x). Specifically, for each voxel a first order neighborhood system consisting of its six adjacent voxels is defined:
η xi , j , k = {x i ±1, j , k , x i, j ±1, k , x i , j , k ±1 }
(3)
A potential function is defined on cliques of one voxel and cliques of adjacent voxel pairs, specifically, given two adjacent sites u, u’:
β V 2 (u, u ′) = − β
u = u′
(4)
otherwise
And for each label type (k) we define an a-priori expectation αk and so that a single site potential is V1 ( xi , j ,k ) = α xi, j , k
(5)
Using (4) and (5), P(x) is defined as
exp V 2 (u , u ′) + ∑ V1 (u) ∑ u∈x adjacent ( u ,u ′) P (x) = Z
(6)
Z is the normalizing constant called the partition function. The local optimization method of iterated conditional modes ICM by Besag10 works by iteratively updating one site at a time in raster scanning fashion until label changes per image iteration falls below a threshold. In the case of optimizing a single site (xi,j,k=u) we only have to maximize (1) for a single site which simplifies as the minimization of: U (u ) =
(
)
(
)
∑
1 1 ′ −1 y i , j , k − µ u Σ u y i, j ,k − µ u + ln(Σ u ) − V2 (u, u ′) − α u 2 2 u ′∈η u
(7)
3. METHOD The algorithm pipeline goes through the following steps (See Figure 2): (1) Initial gray level segmentation. (2) Heterogeneous clique computation for the MRF. (3) Magnetic field inhomogeneity correction. (4) Image restoration by the ICM algorithm. (5) Registration with a tissue probability distribution atlas. (6) Using the aligned atlas to correct misclassified voxels. (7) Eliminating more false positive lesions by applying an MRF classifier on lesion segments. It looks at shape features of those segments and their neighborhood. The following are the individual steps in detail: initial T1, T2, PD data
(1) initial gray level segmentation
(2) compute heterogeneous clique weights
(3) magnetic field correction (4) ICM
(6) apply atlas on areas marked as lesions and regions that surround them using ICM
(5) register atlas with segmented data
(7) apply shape based ICM to eliminate false positives
(8) add resulting segmentation as an instance to the atlas
Figure 2. Algorithm pipeline
3.1. Initial Grey Level Segmentation The first step in the pipeline is to perform use a multivariate Gaussian grey level classifier where the sample mean and standard deviation are acquired by manual segmentations of some brain slices (see Figure 1). 3.2. Compute Heterogeneous Clique Weights The interaction between neighbors can be further improved if we include edge information into P(x). The voxel-voxel weight β in V2(see eq. 4) is redefined as a local field {βu,v}between voxels u and v to better adjust to likely edges by a function similar to eq. (13):
β u ,v =
β 1 + (∇
yσ ⋅ u)
2
/ λ2
(8)
Where u is the direction between u and v. and ∇yσ is the gradient of a Gaussian smoothed image. This method represents a low cost approximation to the heterogeneous field which enables a better behavior for the ICM algorithm (See Figure 3). Many authors include edge information into the Markov Random Field 12, 13, 14. Equation (8) is an adaptation of anisotropic diffusion filters found in Weickert15. 3.3. Intensity Inhomogeneity Correction The image Y is intensity corrected for magnetic field inhomogeneities by using the white matter mask found in step (1) and modeling the inhomogeneity as a multiplicative field (ycorrected(i,j,k)=yi,j,k*mi,j,k). A grid of 3D points is defined where at each point a local estimate of the mean gray value of white matter is calculated from a minimum neighborhood size of Q pixels. The multiplicative field is defined as the ratio between the local and the global average (mi,j,k=µglobal / µlocal(i,j,k)). Bilinear interpolation provides the estimation for the remaining voxels. This approach is similar to the one used in Rajapakse et al. 21.
Figure 3. Heterogeneous MRF neighborhood weights in X, Y, Z directions
3.4. Applying the ICM Algorithm to Restore Miss-classified Voxels The image is restored using the method described in section (2). 3.5. Registering Atlas with Segmented Data X In this paper the atlas Patlas ={ pi,j,k(l) } is a probability distribution of segments where pi,j,k(l) represents the probability that the voxel i,j,k has a label l. The transformation (T =Taffine+TB-Spline) is used to warp the atlas to the input image y or its segmentation x where: Taffine: Represents the global extrinsic transform of 9 parameters defined by the affine space. TB-Spline: B-Splines were used in to model elastic intrinsic deformation because of their: (1) built in smoothness (2) locality of control that enables fast registration (3) a small parameter space that includes only the xyz distances between control points (for more details see Rueckert et al. 17 and Studholme et al. 19). The normalized entropy measure18 {H(A,B)=(H(A)+H(B))/H(A,B)} enables the registration between x and y directly because of its ability to compare images of different modalities. A better similarity measure between Patlas and x is defined by S (Patlas , x) =
∑
1 p i , j , k ( xi , j , k ) n i , j ,k
(9)
It is possible to build an atlas by averaging several brain volumes by registration to build a probability distribution model for example see Chen et al. 20 and Studhome et al. 19. In this paper an initial atlas was obtained by segmenting a simulated volume in brainweb22 then blurring the distribution by a Gaussian kernel. The blurring improves registration by minimizing chances of local minima in the fitting process. As more segmented images are obtained we can refine the atlas by registering the segmentations with Patlas and adding the new instance to form a new distribution. nPatlas n + x (10) n +1 The atlas has the ability to include the a-priori probability distributions of lesions in the brain where pi,j,k(llesion) represents the expectancy of a lesion at location i,j,k. Initially we set p to some threshold (αpi,j,k(lwhite-matter)) which gets refined as more segmentation instances are added. Patlas n +1 =
3.6. Lesion Detection Using the registered atlas in section 3.5, we can differentiate between true and false lesions. This is done by reapplying ICM at segments labeled as lesions and their neighboring voxels with an additional term in equation (7) using Patlas.
Figure 4. Improvement after applying the anatomical atlas. Left: the lesions of Figure 1. Middle: lesions reclassified by applying the anatomical atlas. Right: the expert’s manual segmentation. As we see the atlas was able to eliminate a great deal of false positives. The remaining false positives are handled by shape based MRF. U (u ) =
(
)
(
)
∑
1 1 ′ −1 y i , j ,k − µ u Σ u y i , j ,k − µ u + ln(Σ u ) − V2 (u, u ′) − α u + γ ln(1 − pi , j ,k (u )) 2 2 u′∈η
(11)
u
This has the effect of adding special information to each voxel resulting in a better classification (see Figure 4). Registration errors may still misclassify certain voxels therefore feature based classification is needed to improve results. 3.7. Applying Shape Based MRF False positives resulting from section 3.6 are further eliminated by defining individual candidate lesions as shape units in a Markov Random Field. Each shape unit can be reassigned a state {MS lesion, grey matter} based on shape and neighborhood features. An iterative process similar to ICM is used to re-label those units until convergence. The following is the description of the steps in detail: 1. The watershed transform is used to isolate most individual plaques in each XY slice using the property that generally the intensity is bright at the center and decreases towards the edges. The connected components thus formed within white matter generally have an elliptical shape (unless they contact other structures like the ventricular system). Per slice connected components are used instead of 3D connected components because the Z aspect ratio of (3:1) makes calculation of good shape features in 3D difficult (i.e. Lesions generally do not persist more than 2-4 slices and partial volume makes them stick together in the Z direction). 2. The shape units calculated in step (1) are considered random variables (sites) which may be relabeled as {lesion, grey matter}: S= {s1 … sk} , LS: S→{lgrey_matter, llesion} 3. Unary features are computed for each shape unit that measure (1) compactness (2) elongation (3) distance from the center of ventricular area (φunary: S→vunary). 4. A neighborhood system (ηSz: S→2S) is defined for each shape unit where ηSz(si) are all other shape units that have contact with si in the Z direction. Similarly we define ηSxy as the neighborhood that specifies contact in the XY direction. Binary features are computed for each shape unit that includes: (1) the area of contact with gray matter (2) the area of contact with other shape units in the neighborhood system {ηSz, ηSxy} (φbinary: S→vbinary). 5. A binary classifier C: S×ηSz×ηSxy→(Plesion, Pgrey_matter) is defined. It assigns higher probability that a shape unit is a lesion whenever it is more oval or contacts more other lesions or has a certain location. It assigns it a higher probability that it is grey matter when it contacts more grey matter or has a certain location.
Figure 5. Left: Lesion mask after applying the anatomical atlas. Middle: Lesion mask after applying shape based MRF. Right: Experts manual segmentation
6. Using the neighborhood system and shape features, an iterated conditional mode algorithm is defined where each site is assigned a state of lesion or gray matter based the classifier C. The Algorithm is iterated a few times until convergence (see Figure 5). The algorithm uses contact information between shape units to mimic what the expert does. For example, when the expert finds an isolated oval region in white matter and looking at a the slice before he discovers that this structure is connected to cortical gray matter so it must be a continuation of cortical matter, however, if it contacts another lesion then it is likely to be a continuation of that lesion structure. Another example are elongated structures embedded in gray matter are less likely to be lesions but classification errors.
4. RESULTS Experiments were conducted on three brain volumes with T1, T2, and PD weighted contrast. Each modality is of dimension 256×256×48 where the voxel size is 0.97×0.97×3.0 mm. Manual segmentations of the multiple-sclerosis lesions for the three volumes were obtained by a medical expert. Two measures were used to compare the manual and automatic segmentation: 1. Similarity index8: The similarity index between two segments A1, A2 is a number between [0, 1] defined by S =2
A1 I A2 A1 + A2
(12)
2. Automatic and Manual segmentation volumes which show the quantity of false positives eliminated after each stage of the algorithm. Figure 6 shows similarity index for a typical MS patient after each stage of the algorithm. The ICM
algorithm restored the image and yielded a 5% improvement in similarity index for the whole volume. Using the atlas yields a 13% improvement in similarity. Using the shape based ICM yields a 13% improvement in similarity especially after slice -6 as shown in the figure. Figure 7 shows automatic versus the manual segmentation volumes for the same patient. Initially the overall correlation between manual and intensity based segmentation is 0.3. The correlation became 0.87 after applying the atlas because most false positives were eliminated at this stage. After applying the shape based ICM the correlation became 0.95 as can be seen between the closely matching curves. Experiments with the other two patients yielded a per-slice-correlation of 0.93 and 0.91 between the volumes of manual and automatic segmentation.
Intensity based Segmentation
ICM segmentation
Atlas segmentation
Shape ICM
0.9
0.8
0.7
Similarity Index
0.6
0.5
0.4
0.3
0.2
0.1
0 -20
-19
-18
-17
-16
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
Slice Number
Figure 6. Similarity Index for after each stage of the algorithm pipeline for one patient volume Manual Segmentation
Intensity based Segmentation
ICM segementation
Atlas segmentation
Shape ICM
3500
3000
Volume on MS lesions
2500
2000
1500
1000
500
0 -20
-19
-18
-17
-16
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
Slice Number
Figure 7. Manual segmentation volume vs. Automatic segmentation volume for each stage of the algorithm.
5. CONCLUSION AND FUTURE WORK A method to segment multiple sclerosis was presented. It employs a three stage algorithm which first segments and restores the image at low level and then an anatomical atlas is used to disambiguate between lesions and gray matter. A shape based MRF in which shape units consisting of lesion slices and a neighborhood system representing contact with other lesions and gray matter is used to eliminate false positives. Future work will concentrate on refining the shape based classifier. For example: (1) defining a spatial probability map for contact information. (2) Improving the feature space to include more lesion properties (3) Improving of classification rules to be more close to that of the expert.
6. REFERENCES 1. 2.
3. 4. 5.
6. 7.
8. 9. 10. 11.
12. 13.
14. 15. 16. 17.
18. 19. 20. 21. 22. 23.
C. Poster, An Atlas of Multiple Sclerosis, The Parthenon Publishing Group, 1988. S. Warfield, J. Dengler, J. Zaers, R. Guttman, W. Wells, G. Ettinger, J. Hiller, R. Kikinis, “Automatic identification of grey matter structures from MRI to improve the segmentation of white matter lesions”. J Image Guid Surg 1, pp. 326-338, 1996. M. Kaus, S. Warfield, F. Jolesz, R. Kikinis, “ Adaptive Template Moderated Brain Tumor Segmentation in MRI”. Bildverbeitung für die Medizin, 102-106, 1999. S. Warfield, M. Kaus, F. Jolesz, R. Kikinis, „Adaptive, Template Mpderated, Spatially varying Statistical Classification“, Medical Image Analysis 4(1), pp. 43-55, 2000. M. Kamber, R. Shinghal, L. Collins, G. Francis, A. Evans, “Model-Based 3D Segmentation of Multiple Sclerosis lesions in Magnetic Resonance Brain Images”, IEEE Trans. On medical Images 14(3), pp. 442-453, 1995. E. Ardizzone, R. Pirrone, “An Architecture for the Recognition and Classification of Multiple Sclerosis Lesions in MR Images”, IDAMAP, 1999. J. Udupa, L. Wei, S. Samarasekera, Y. Miki, M. Vanbuchem, R. Grossman, “Multiple sclerosis lesion quantification using fuzzy- connectedness principles”, IEEE Trans. on medical imaging 16(2), pp. 598-609, 1996. B. Johnston, M. Atkins, B. Mackiewich, M. Anderson, “Segmentation of Multiple Sclerosis Lesions in Intensity Corrected Multispectral MRI”, IEEE Trans. On Medical Imaging 15(2), 1996. K. Kirshnan, M. Atkins, “Segmenttion of Multiple Sclerosis in MRI- An Image Analysis Approach”, SPIE Conf. On Image Processing, pp. 1106- 1116, San Diego, 1998. J. Besag, “On the statistical analysis of dirty pictures”, J. Roy. Statist. Soc. B 48(3), pp. 259-302, 1986. S. Lakshmanan, D. Haluk, “Simultaneous Parameter Estimation and Segmentation of Gibbs Random Fields Using Simulated Annealing”, IEEE Trans. On Pattern Analysis and Machine Intelligence 11(8), pp 799-813, 1989. S. Geman, D. Geman, “Stochastic Relaxtation, Gibbs Distributions, and the Bayesian Restoration of Images”, IEEE Trans. On Pattern Analysis and Machine Intelligence 6(6), 1984. S. Nadabar, A. Jain, “Parameter Estimation in Markov Random Field Contextual Models Using Geometric Models of Objects ”, IEEE Trans. On Pattern Analysis and machine Intelligence 18(3), 1996. R. Aykroyd, “Bayesian Estimation for Homogeneous and Inhomogeneous Gaussian Random Fields”, IEEE Trans. On Pattern Analysis and machine Intelligence 20(5), 1998. J. Weickert, “Anisotropic Diffusion in Image Processing”, B.G Teubner Stuttgart, 1998. M. Desco, J. Gispert, S. Reig, A. Santos, J. Pascau, N. Malpica, P. Garcia-Barreno, “Statistical Segmentation of multidimensional brain datasets”, SPIE Medical Imaging, pp 184-193, 2001. D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, D. Hawks, „Nonrigid Registration Using Free-Form Deformations: Application to Breast MR Images“, IEEE Trans. on Medical Imaging 18(8), pp 712-721, 1999. C. Studholme, D. Hill, D. Hawks, “An overlap invariant entropy measure of 3D medical image alignment”, Pattern Recognition 32, pp 71-86, 1999. C. Studholme, V. Cardenas, M. Weiner, “Multi scale image and multi scale deformation of brain anatomy for building average brain atlases”, SPIE medical Imaging, pp 557-568, 2001. M. Chen, T. Kanade, D. Pomerleau, J. Schneider, ” Probabilistic Registration of 3-D Medical Images”, tech. report CMU-RI-TR-99-16, Robotics Institute, Carnegie Mellon University, 1999. J. Rajapakse, J. giedd, J. Rapoport, “Statistical Approach to Segmentation of Single-Channel Cerebral MR Images”, IEEE Trans. on medical imaging 16( 2), pp. 176- 186, 1997. Brainweb http://www.bic.mni.mcgill.ca/brainweb/ C.A. Cocosco, V. Kollokian, R.K.-S. Kwan, A.C. Evans : "BrainWeb: Online Interface to a 3D MRI Simulated Brain Database", NeuroImage 5(4), part 2/4, S425, 1997 -- Proceedings of 3-rd International Conference on Functional Mapping of the Human Brain, Copenhagen, 1997.