Automatic Contour Detection by Encoding

0 downloads 0 Views 324KB Size Report
An application to the automatic detection of the ... states for our dynamic programming method as described ..... rithm optimization for spoken word recognition.
Automatic Contour Detection by Encoding Knowledge into Active Contour Models Olivier Gérard  , Shérif Makram-Ebeid Laboratoires d’Electronique Philips S.A.S. 22 avenue Descartes, BP 15 F-94453 Limeil Brévannes, Cedex, France {gerard,makram}@lep-philips.fr  corresponding author, also with LIP6, Paris 6 University.

Abstract An original method for an automatic detection of contours in difficult images is proposed. This method is based on a tight cooperation between a multi-resolution neural network and a hidden Markov model-enhanced dynamic programming procedure. This new method is able to overcome the three major drawbacks of the “standard” active contours: initialization dependancy, exclusive use of local information and occlusion sensitivity. The driving idea is to introduce high-order a priori information in each step of the system. An application to the automatic detection of the left ventricle in digital X-ray images is proposed.

1. Introduction Edge extraction, automatic contour detection are basic problems in computer vision. This task is particularly difficult for low-contrast and noisy images common in medical applications. Furthermore, a robust automatic procedure has to cope with the wide intrinsic variability of medical images. Several authors have modeled the problem of contour recovery as an energy minimization process. An energy function is associated with every candidate contour. These active contours are splines controlled by internal and external forces [1]. The internal forces set the overall shape characteristics (such as smoothness) whereas external forces attract the contour towards local image features (such as a high gradient value). The minimization can be efficiently carried out by a discrete dynamic programming (DP) procedure. This technique is often referred to as active contours (AC) as described in [1] and [10]. This method has three main drawbacks. The first one is the precision of initial contour. A problem inherent to all iterative methods is the initialization phase. This is even

more crucial for the active contour method which requires a good initial contour in order to converge toward the desired edges. Usually, this contour has to be manually drawn by an operator. The second drawback lies in the form of the external force that is usually derived only from local information. Any a-priori knowledge which could help is thus hard to introduce. Some authors have tried to solve this problem by giving constraints on the overall shape of the desired contour which restrict the internal force inside a small set of admissible forms. For instance, [6] proposed to describe the left ventricular contour as an elliptical Fourier series. The third drawback is at the core of the AC algorithm which usually is not robust to occlusion and fails when the pre-processing edge detection step erroneously skips parts of the desired contour. In order to overcome those difficulties, we introduce a new method based on a tight cooperation between a multiresolution neural network and an enhanced dynamic programming procedure. Section 2 describes how a neural network (NN) can be used to automatically find an initial contour. This contour is used to define a rubber-band image transformation as described in Section 3. The NN uses global information derived from a pyramid of filters, and the resulting posterior probabilities are then used as basic states for our dynamic programming method as described in Section 4. An application in the medical imaging field is described throughout the paper. It consists in the automatic detection of the boundaries of the left ventricle (LV) in digital X-ray right anterior oblique (RAO) projection ventriculograms. Digital X-ray imaging is commonly used in cardiology applications. A frequent type of heart examination consists in injecting a contrast product in the left ventricle (LV) of the patient through a catheter. This allows the visualization of the LV chamber in X-ray images. An image sequence is recorded, and the physician has to outline the LV bound-

aries in the end-systolic and end-diastolic frames. Then, relevant parameters such as LV chamber volumes, ejection fraction and stroke volume can be computed. Today, techniques for a semi-automatic segmentation of the LV in the relevant image frames and for the estimation of its volume are available. These techniques are well established and a study involving post-mortem phantoms even showed that contours semi-automatically determined were more precise than those manually drawn [3]. However, the state-of-the-art algorithms require a fair degree of interaction with a human operator to at least pinpoint key points in each of the two image frames. Automatizing this task is challenging because local image configurations do not provide sufficient information to avoid false key identifications and because the gray scale ventriculograms have poor contrast with a high level of noise. This noise is due to the scattering of radiation by tissues not related to the ventricle, and interference of the ribs and the diaphragm with the LV. Also, artifacts are generated by the breathing of the patient during the catheterization procedure. Moreover, the injected contrast medium does not mix uniformly with blood in the LV and its apex zone generally does not receive much dye. Section 5 contains results obtained with our fully automatic method for coutour detection.

2. Initialization by a Neural Network As previously stated, a major problem with an active contours framework is that its requires a good initial guess in order to be able to converge to the desired position. This initial contour is usually manually drawn. Since our goal is to build a fully automatic system, we have to provide the procedure with a reasonable initial contour. The approach we have chosen is based on training of artificial neural networks (NN) with examples. Instantiating deformable models with a NN recently received attention, for instance for the handwritten digit recognition task [9]. The proposed system is based on a NN that uses multiresolution information to estimate the probabilities that a given point belongs to predefined classes. The following sections describe the steps of this procedure. The first one is to define the NN structure, namely its output (feature classes), the points that should be selected and the NN input vectors computed for these points. Then the NN training and results are discussed and a way of building an initial AC contour from its output is proposed.

2.1. Definition of Feature Classes This first step is to define the “desired classes” corresponding to the specific features of the object we want to

detect. For our application, we have followed the recommendations of the American Heart Association [2] and defined anatomic classes accordingly (see Fig. 1).

15

CA

PA

VA

CE

BA1

PP

BA2

VP

BP1

BP2

CI BA3

CE

BP3

AP

BP4

CE

Figure 1. Schematic left ventricle with anatomic classes. Classes BPn and BAn are the LV boundaries (n=1 to 4). The three key points are the antero and postero sides of the aortic valve: VA, VP and the apex AP. These points are very important for characterizing the LV chamber and the physician is usually requested to manually select them. Additional classes were defined (adding non-LVboundaries classes as PA PP, CA CI and CE) in order to increase the selectivity of the neural network.

2.2. Point selection In order to reduce the computation time, only the points with “maximal gradient” are fed to the neural network. These points are those where the gradient intensity is a local extremum along its direction. Points where the gradient is lower than those of the nearest neighbors are discarded.

2.3. Neural Network input vectors

512 512

 image, a set of low Starting from the original pass filtered versions is generated to yield a multi-resolution representation. In addition to the inital scale L0 , K other scales (Lk with k = ..K ) are computed with half and quarter nominal image size and smoothed with kernel having sizes k (spatial standard deviation) ranging from 3 to 70 pixels, see the “Pyramid of filters” in Fig. 2. Recursive filtering techniques are used to efficiently implement such large kernel sizes. In the implementation, much attention is paid to obtain nearly isotropic impulse responses. Then the derivatives of the Lk images with respect to x and y are computed. A steerable representation of the filtered output can be derived as a linear combination of the various partial derivatives (see [4]). This steerable filter bank provides an

1

( )

equivalent representation of the following derivatives:

l Lk = Dm

@l

@xm @yl?m

Lk .

(1)

=0

=1

For the finest scale (k ), one stops at order l hence generating three components. For the other scales k ; ::; , derivatives up to the th order are computed, leading to feature values. To those features, information about the gradient is added: one entry for its intensity and 16 binary entries to code the direction (with a single entry set to 1 the rest being set to 0). The last feature is the edge curvature as defined in [7], computed for the first halfsize filtered image of the pyramid. A feature vector V x; y

=1 7 105

4

108

( )

Pyramid of filters ... ... ...

Neural Network Original Image

point at position x; y . However, because of the noise and because the neural network is not perfect, we added an algorithm in order to robustly find the most probable region for the important classes. For each of those classes a probability image is built, whose pixel intensities are the corresponding probability estimated by the NN. A null intensity is attributed to non-selected points. Then this image is smoothed by a rather small kernel (spatial standard deviation of 3 pixels) in order to find regions with high probabilities. In these probability maps, the maximum pixel intensity is found (actually, the first three maxima separated by at least 50 pixels are computed). This leads to 5 possible polygon lines linking successive classes: namely VA, BA2, AP, BP3 and VP, see Fig. 1. Simple rules are used to discard incoherent relative positions of the above key points and to sort out the acceptable polygons. For example: “the apex should lie to the right of both aortic valve extremities”. The neural network is thus able to provide a polygon line which is a very crude model of the object to detect. This line is computed from non-local information and is consistent with high-order prior knowledge about the “average” object configuration.

3 = 243

2.6. NN Results Maximum Gradient Information

Figure 2. View of the feature vector computations.

126 ( )

of dimension is thus computed for each selected point at position x; y . The use of different scales allows us to combine local with contextual information. The neural network fed with this input vector V x; y , gives the probabilities of our classes Ci for the current point x; y . The feature vector computations are illustated in Fig. 2.

( )

15

( )

2.4. Multi-resolution NN Training The neural network architecture is thus defined by 126 input nodes, 50 hidden neurons and 15 output neurons. The training of the neural network is done with a learning set consisting of 32 931 examples (pairs of “feature vector”“desired output”) coming from 38 images and the generalization is checked by using of a test set consisting of 13 745 examples extracted from 16 images. Performances reached and : of correct by the neural network are : classifications for the training and testing sets respectively.

93 09%

79 34%

2.5. Neural Network Outputs The output of the neural network is thus an estimation of the posterior probabilities P Ci jV x; y for the selected

(

( ))

Figure 3. Neural Network detection and crude LV model. Figure 3 shows the results for an image where the diaphragm intersects the left ventricle. The disks are the points provided by the physician and the dashed line is the crude model. The crosses along this line are the detected

key points. For this image, the average error over the three key points is : pixels. For 58 test images of size  , the detected key points lie at an average distance of : pixels from the corresponding points defined by the physician (standard deviation of : ). Since our goal is to automatically detect the whole contour, the useful neural network output is the polygon line and the probability maps. They will be used to transform the image and to direct the DP procedure.

16 9

512 512 26 7

19 7

3. Image Transformation Because standard DP works in 1-D, a causality dimension has to be determined (i.e. a succesion order for the edge points). We have chosen to build an image which follows the line previously found with the NN. The new image is a “rubber” band around this line. In Fig. 4 the image to transform is delineated in black around the spline (in gray) based on the polygon line (see Fig. 3). The transformation

Figure 5. The transformed image of Fig. 4. along the polygon line, the vertical one is defined by the algebraic distance to the spline center-line and codes the candidate points, and the third dimension stores the different posterior probabilities for these points. Figure 6 shows a small part of such an artificial 3D probability matrix.

4. Dynamic Programming The key feature for effectiveness in contour detection is to use non-iterative optimization methods as those based on the dynamic time warping [8], usually applied to speech processing. [5] suggested a interesting way of using this procedure to link points in-between end-points related to a selection pointed by a user. The more general method presented here is based on the idea to perform the Viterbi algorithm in a 2-D form. Instead of only trying to link points, the algorithm tries to link together couples (point,class). A cost of such a link between class i of current point S (S x; y ) and class j of a possible predecessor S 0 is defined as:

( )

C (Sj0 ; Si ) = C  (Sj0 ) accumulated cost 0 + D(S ; S ) point distance + C (Si ) local cost + ct(j; i) j-i transition cost

Figure 4. The rubber band in the original image. is based on a minimal distance paradigm. The rubber-band image has to be mapped into a rectangular image. Points in overlapping regions are transformed to the nearest middle line. The corresponding transformed image is shown in Fig. 5. Since it is not a bijective transformation, some points in the image remain in black, meaning that they have no corresponding point in the original image. This rubber-band unfolding transformation is performed on the posterior probability maps computed by the neural network in order to build a 3D probability matrix. The horizontal dimension is the curvilinear coordinate or the path

(2)

where , and are weighting coefficients. So the cost of a link depends on the accumulated cost for reaching Sj0 , the distance between the two points S 0 and S (computed in the original image), the cost for going through Si (C Si ? P Ci jV x; y ) and a transition cost for going from class j to class i (ct j; i ? P Ci t jCj t ? ). P Ci jV x; y is the posterior probability for class Ci at the point S x; y and is estimated by the multi-resolution neural network. For the time being, the transition probabilare fixed beforehand but they will ities P Ci t jCj t ? be estimated too in a future system. Figure 6 shows a small synthesized sample of the image matrix with the probabilities (the darker, the higher) of the different border classes for some selected points. Note that the number of points per column is not constant and that these valid points are not evenly distributed along a column, due to the selection of points and the image transformation. Another modification of the standard DP procedure used in AC methods is the possibility to link points which are not lying on adjacent columns of the transformed image.

log ( ( )) ( ) = log ( ( ) ( ( ( )) ( ) ( ( ) ( 1))

( )= 1))

Candidate points

Cl

as se s

S

s-3

s-2

s-1

s

Curvilinear coordinate (time)

Figure 6. A small part of the image matrix. These “jumps” guarantee that a link will be found even when the selection phase erroneously missed some intermediate points. Previous points are thus searched for in a sector covering positions. The best cost for arriving at class i of point S is defined as:

51

C  (Si ) =

min

j 2f1::N g S 2Pred(S )



 C (Sj0 ; Si )

(3)

0

where Pred(S ) is the set of valid points in the sector area and N the number of predefined classes. Such links are computed for all the couples (point,class) going forward (in the causal order) in the image matrix. Then most probable contour is retrieved as the one ending with the lowest cost.

Figure 7. black line: boundary detected, white line for comparison purpose.

5. Results Figure 7 features the boundary found by the enhanced dynamic procedure (the black line). The white line (provided for comparison) is obtained by using only local information (i.e. without the transition cost and only gradient information for the local cost in eq. 2). However the ability to “jump” over some intermediate positions was used, otherwise the algorithm would not have given any solution. It may be seen that the use of the posterior probabilities gives more accurate results, particularly in the upper (or antero) part of the ventricle. In addition, it makes it possible to find the right position of detected points even when the first detection (the crosses) are far away from the physician expert localization (the disks), see for instance the first point (the antero part of the aortic valve). Figure 8 shows the posterior probability values for the different classes along the border found (the black line in Fig. 7). Ideally, each class should have a single “bump” of high posterior probability values. The classes should also follow one another from the deep part to the front part of the graphic when reading it from left to right. These desirable features are clearly visible in this figure.

Figure 8. Probabilities along the boundary found.

Figure 9 shows the results for another image. As for Fig. 7, the white line is the result of the detection when not using the posterior probabilities information, whereas the black one is the final result of the full algorithm. For comparison, a boundary delineated by an expert cardiologist is also shown (thin gray line) in this figure. Note that it is

matic contour detection can be achieved even in difficult images. We are currently working on reduction of the computation time by selecting the most relevant features in the large input vector. Accuracy will also be increased by using more images for the NN training. The next step will be the automatically tracking of the object contour in image sequences.

Acknowledgments The authors thank Dr. Florence d’Alché-Buc and Pr. Patrick Gallinari, both with the Computer Science Laboratory of Paris 6 University, for useful discussions.

References

Figure 9. black line: boundary detected, gray line: physician contour and white line for comparison purpose.

very difficult to distinguish the detected contour from the cardiologist contour. The contour found without using the posterior probabilities (but with the possibility to jump) is incorrect in the apex area because it is only attracted toward local image features while minimizing its length.

6. Conclusion Automatic contour extraction is certainly a difficult issue but it will be more and more extensively used for object based coding (as in MPEG-4). The proposed algorithm is based on the key idea of introducing high order information in the usual locally driven active contours. A multi-resolution neural network is used to compute posterior probabilities for predefined classes. These probabilities are first used to define a causality and an image transformation. Then they are used to guide the dynamic programming procedure both in lowering the cost of going through points reliably classified as borders and in defining the cost of linking two such points. This linking is encoded into a probabilistic state automaton. The first results on the left ventricle detection are promising and demonstrate that a fully auto-

[1] A. Amini, T. Weymouth, and R. Jain. Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(9):855–867, Sept. 1990. [2] W. G. Austin. A reporting system on patients evaluated for coronary artery disease. Circulation, American Heart Association, 51:7–40, 1975. [3] J. Beier, T. Joerke, S. Lempert, E. Wellnhofer, H. Oswald, and E. Fleck. A comparison of 7 different volumetry methods of left and right ventricle using post-mortem phantoms. In Computers in Cardiology, pages 33–36, 1993. [4] W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(9):891–906, Sept. 1991. [5] D. Geiger, A. Gupta, L. A. Costa, and J. Vlontzos. Dynamic programming for detecting, tracking and matching deformable contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(3):294–302, Mar. 1995. [6] B. R. Groshong, L. A. Spero, and J. T. Cusma. Maximum a posteriori optimization - a method for calculation of dynamic changes in ventricular contours from angiographic image sequence. In Computers in Cardiology, pages 355– 358, 1992. [7] J. A. Maintz, P. A. van den Elsen, and M. A. Viergever. Evaluation of ridge seeking operators for multimodality medical image matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):353–365, Apr. 1996. [8] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1(26):43–49, Feb. 1978. [9] C. Williams, M. Revow, and G. Hinton. Instantiating deformable models with a neural net. Computer Vision and Image Understanding, 68(1):120–6, Oct. 1997. [10] H. Yamada, C. Merritt, and T. Kasvand. Recognition of kidney glomerulus by dynamic programming matching method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5):731–737, Sept. 1988.

Suggest Documents