Digital Video Segmentation Using Level Set Theory

1 downloads 0 Views 82KB Size Report
for applications such as video telephony. The actual segmentation methods are not part of the MPEG-4 standards and are left to the implementers to develop ...
Proceeding of th e Irish Signals and Systems Conf., Maynooth, June 2001

Digital Video Segmentation Using Level Set Theory Patrick Kehoe, Prag Sharma, Richard B. Reilly Department of Electronic & Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland

Abstract Emerging video coding standards such as MPEG-4 have taken a radical departure from previous video coding standards with the emphasis being placed on content-based functionality and interactivity. Instead of representing video sequences as rectangular arrays of pixels MPEG-4 represents each scene as a composition of video object planes (VOP’s) with each VOP corresponding to one semantically meaningful object. Inspired by recent work on level sets that give excellent results for image segmentation this paper looks at applying them to video segmentation. An overview of level set theory is given. It is shown how level sets can be applied to image and video segmentation. Finally various methods are discussed to reduce the complexity of proposed segmentation techniques.

1. INTRODUCTION The emergence of the MPEG-4 standard has brought the area of video segmentation and contour tracking to the forefront or research[1,2]. In MPEG-4 video sequences are decomposed into various Video Object Planes (VOP’s), which are coded separately into the video stream (Fig. 1). The use of VOP’s allows coding efficiency and flexibility. It permits, for example, the transmission of variable frame rate video coding, with the more perpetually important objects being transmitted at higher rates. Separating the video content into various objects also provides increased flexibility in the reconstruction by allowing for selective decoding based on available bandwidth and are very attractive for applications such as video telephony. The actual segmentation methods are not part of the MPEG-4 standards and are left to the implementers to develop efficient strategies for specific applications.

Figure 1: Segmentation of video frame into set of VOP’s and recombination. To date, segmentation of images or video has been addressed using a variety of approaches. Low level segmentation techniques where the image sequence is simply modeled as a statistical distribution of pixel values in both space and time dimensions employ low-level features such as colour, optical flow, gradient, texture and motion to segment images. Several researchers have developed segmentation based on these low-level features [3]. An integration of both space and motion information has lead to improvement in the significance of the segmented elements [4]. High Level Segmentation, on the other hand, adopts a more intelligent understanding of the video scene. This information can range from the

Proceeding of th e Irish Signals and Systems Conf., Maynooth, June 2001

simple rules that govern all physical scenes up to complete a priori knowledge of the number and type of objects to be extracted. So, by including certain knowledge about the rules that govern the content of a real world scene as viewed by the camera, more robust segmentation procedures can be formed. This high-level knowledge of what to expect in a scene leads to an increase in the a priori knowledge of the scene thus leading to an increase in computation, as well as reduced flexibility. Level sets have been applied to numerous problems in physics, medical imaging and image processing and computer vision. They can be applied to give excellent results in image segmentation and so are useful in video [5].

2. THE LEVEL SET APPROACH The Level Set Method approach is based on the ideas developed by Osher and Sethian [6] to model propagating interfaces with curvature dependent speeds. The interface (front) is a closed, nonintersecting, hypersurface flowing along its gradient field with a speed that depends on the curvature. The speed is made dependent on the curvature as it can be shown that this adds stability to the propagating interface avoiding the necessity of the interface to cross over itself. The interface is moved with time by solving a “Hamilton-Jacobi” type equation written for a function in which the interface is a particular level set.

Φ t + F ∇Φ = 0

(1)

The central idea of the level set approach is to represent the front as the zero level set of a higher dimensional function Φ . Moving the problem to one higher dimension gives the method tremendous generality as well as making it possible to build highly accurate schemes to approximate the equations of motion. In such a setting, topological changes occur and are handled in a natural manner, and the technique extends trivially to three dimensions. The way the propagating front moves over the image depends on the speed term F(k), where k represents the curvature. In general, the speed function can be split into two components:

F = Fa +Fg (κ )

(2)

The first term referred to as Fa is the advection term. Fa is a constant independent of the moving front geometry. The front uniformly expands or contracts depending on the sign of Fa . The second term, Fg (κ ) , depends on the geometry of the front such as its local curvature, κ . It is usually set to equal − εκ where κ is the local curvature and ε is a small positive constant. This diffusion term smoothes out the high curvature regions of the front. It has a regularising effect on the front and is necessary for stability allowing the entropy condition to hold [6]. The Level Set method presented above is a relatively straightforward to program. However, it is not particularly fast, nor does it make efficient use of computational resources. In the next section a variety of modifications are looked at to speed up the algorithm.

3. REDUCTIONS IN COMPLEXITY 3.1 Narrow band Method The Level Set Approach leads to excellent segmentation results [5] but is computationally very expensive. Performing calculations over an n×n image requires O(n2 ) operations per time step. An efficient modification to this method leads to the Narrow Band approach where computation is performed only within a closed neighbourhood of the zero Level Set. In this method pixels only within a certain distance d of the zero Level Set are updated with time thus reducing the complexity from O(n2 ) to O(kn), where k is the number of cells in the narrow band.

Proceeding of th e Irish Signals and Systems Conf., Maynooth, June 2001

3.2 Fast Marching Method However an even greater reduction in computational cost can be achieved by converting the Level Set approach to a stationary formulation, i.e. using the Fast Marching Method. The Fast Marching Method solves the general static Hamilton-Jacobi, equation, which applies in the case of a convex, non-negative speed function. Starting with an initial position for the front, the method systematically marches the front outwards one grid point at a time, relying on entropy-satisfying schemes to produce the correct viscosity solution. The main idea is to exploit a fast heapsort technique to systematically locate the proper grid point to update so that one need never backtrack over previously evaluated grid points. The resulting technique obtains the evolving time position of the front as it propagates through the image in a much less computationally expensive manner. Consider the special case of a front moving with speed F = F(x,y), F > 0 . The front which is monotonically advancing is governed by the Level Set equation given by equation (1). Imagine the two-dimensional case in which the interface is a propagating curve, and suppose the graph is taken of the evolving zero level set above the x-y plane. That is let T(x,y) be the time at which the curve crosses the point (x,y). The surface T(x,y) the satisfies the equation

∇T F = 1

(3)

This is a form of the well-known Eikonal equation, which converts the problem to a stationary formulation, because the front crosses each grid point only once.

3.3 Parallelism The Level Set algorithm almost falls under the classification of “embarrassingly parallel”. At each time step each pixel on the n×n image is calculated from the neighboring pixels within a 3×3 bounding box (This is assuming a first order model for the implementation of equation 1). This implies through the use of a parallel processor, the image could be broken down into k× k regions (k being dependent on the parallelism of the processor) and the Level Set values calculated for these regions at the same time. Now the regions are smaller thus reducing the complexity considerably from O(n2 ) to O(k 2 ). Theoretically if reduce the full grid were stored in memory and was implemented on a parallel processor capable of handling the number of grid points, the complexity per time step would reduce from O(n2 ) to O(1) .

3.4 Sub-Sampling Another simple yet effective method that is proposed to reduce the complexity of the algorithm is to sub-sample the original n×n image to by a factor k. The image is reduced to size n/k × n/k , thus the area is reduced by k 2 . This reduces the complexity also by a factor of k 2 . The Level Set method can be applied to the sub-sampled image. The resulting boundary can be extrapolated onto the larger n×n image. The resulting zero Level Set is now very close to the boundary of the n×n image’s object (within k pixels). A Narrow-band Level Set Method can be applied to adapt to the nearby boundary.

4. APPLICATIONS TO VIDEO SEGMENTATION Video processing involves time varying images stored as rectangular arrays. The main challenge in video segmentation is to quickly identify the object to be segmented in the image and to then track it successfully over the entire sequence of images. Most of the computational cost arises from identifying the object in the first image (frame) since there is little change from frame to frame (assuming a slow moving object > 30fps). A variety of parameters can be used to determine the segmentation of an image or a sequence of images. Colour, gradient, motion vectors and texture can all be used as parameters to help identify objects in image sequences and generate a stopping criterion. In our approach, the gradient information present in the image was used to adapt the evolving Level Set to the object surface.

Proceeding of th e Irish Signals and Systems Conf., Maynooth, June 2001

Useful image features are edge boundaries. By detecting areas of high gradient values in an image, a stopping criterion for the propagating Level Set can be determined. Thus, a formulation for the level set propagation, which will result in the zero Level Set settling to the edges, is required. The speed function, F, from equation 2, is multiplied by a quantity g .

g ( x, y) =

1 (1 + ∇I ( x, y ) )

(4 )

∇I ( x, y ) represents the magnitude of the gradient of the image I ( x, y) . The function g ( x, y ) is close to unity away from boundaries, and drops to zero near sharp changes in the image gradient. So, by multiplying this with the speed function the interface speed can be reduced to zero near the edges of an object.

Original Image: Miss America Image size 288×360 pixels Downsampled : Miss America Image size 78×90 pixels

DOWNSAMPLE by 4

UPSAMPLE by 4

Level Set iterated and found face boundary using gradient for image speed.

Up-Sampled Original Image

Figure 2.

5. RESULTS Sub-sampling was used in conjunction with the Narrow-band Level Set Method to segment the first frame of the “Miss America” sequence. The test frame (288×360) and was down-sampled by a factor of 4 thus having 1/16 the number of pixels. This has the effect of decreasing the computation for the level set considerably (by a factor of 16). Then the Narrow-band Level Set Method was applied to the sub-sampled image and the speed function is multiplied by the image speed term g(x,y) in equation 4 above. The resulting zero-Level Set was then extrapolated to the original sized image. The process of

Proceeding of th e Irish Signals and Systems Conf., Maynooth, June 2001

sub-sampling in conjunction with the Narrow-band Method significantly reduces computational complexity by an approximate factor of 100.

6. DISCUSSION & CONCLUSION The level set approach provides a useful tool for image and video processing applications. While computationally intensive, it yields good results. There are a variety of methods that can be applied to reduce complexity. In this paper sub-sampling is suggested as a way to reduce the complexity of the problem considerably. When combined with the Narrow Band Method the reduction in complexity is considerable showing that there may be a case for the use of Level Sets in video segmentation. For further research it would be interesting to apply the Fast Marching method to the intermediate stages where the image is sub-sampled. Future work will concentrate on using the Fast Marching Method to locate the initial object in the first frame of a video sequence and to switch to the Narrowband Level Set method to track the object from frame to frame. Also how the Level Set is adapted from frame to frame is important problem. Possibilities include the use colour information in conjunction with the gradient to give improved performance as well as motion estimation information. Finally parallel processing can be applied to the level set algorithms to achieve further gains.

REFERENCES 1. 2. 3. 4.

5. 6. 7.

F. Pereira, “MPEG-4: Why, what, how and when?”, Signal Processing: Image Communication. Vol. 15, 2000, p271-279. T. Sikora, “The MPEG4 Video Standard Verification Model”, IEEE Trans on Circuits and Systems for Video Technology, Vol. 7, No. 1, Feb. 1997, pp19-31. P. Campadelli, D. Medici, R. Stetting (1997), "Colour image segmentation using Hopfield networks", Image and Vision Computing, Vol.15, p161-166. J. Beois -Pineau, F. Morier, D. Barba, H. Sanson, “Hierarchical Segmentation of Video Sequences for Content Manipulation and Adaptive Coding”, Signal Proceedings, Special issue on Video Sequence Segmentation. Vol. 66(2), 1998. S. Osher and J.A.Sethian, “Fronts Propagating with Curvature-Dependent Speed: Algorithms Based on Hamilton--Jacobi Formulations”, Journal of Computational Physics, 12-49, .1988. J.A.Sethian, “Level Set Methods”, Cambridge University Press. 1996. P.Harper, “Level Set Methods for Video and Image Segmentation.” M.EngSc.Thesis, Dept. of Electronic Engineering. UCD, 2000.