Homogeneous Region Merging Approach for Image Segmentation Preserving Semantic Object Contours Hyun Sang Park and Jong Beom Ra Dept. of Electrical Engineering, Korea Advanced Institute of Science and Technology 373-1 Kusongdong Yusonggu, Taejon, Korea E-mail:
[email protected],
[email protected] Abstract - Morphological image segmentation is an essential
are preferred because they begin with precise segmentation
tool for various image analysis tasks. In this paper, we propose
preserving most of the perceptually important object contents.
two successive region-merging algorithms that mainly aim to
Thus, we will consider the bottom-up approach for image
represent homogeneous objects simply and clearly. First, we
segmentation in this paper.
define a marker cluster and extract markers having similar
One of the semantic image analysis applications is VOP (Video
intensity values from each marker cluster. After initial
Object Plane) generation in MPEG-4 [9]. A semantic object
segmentation using the markers, perceptually insignificant small
consisting of several large distinct regions would be easily
regions are removed to reduce the number of regions based on
extractable by properly merging them. In practice, however, a
some heuristics. Then, we classify all regions into three classes
semantic object is often decomposed into a large number of small
according to their homogeneity. Finally, gradient-based region
regions of arbitrary shapes and inconsistent statistical properties.
merging is performed for all regions except inhomogeneous
This makes it very difficult to directly extract and compose
regions. Experimental results show that the proposed region-
semantic objects from a video frame.
merging scheme can represent homogeneous objects with few
As a simple example of semantic object extraction, let us
regions while preserving most of the semantic object shapes.
consider an image composed of two semantic objects; a human
Hence, the proposed method can be a useful tool for semantic
object and a plain background. Such scenes can be found in
object segmentation.
typical ‘head and shoulder’ image sequences. Here, the goal is to separate two semantic objects. However, without prior knowledge
1.
Introduction
about objects, i.e., well-formulated semantics, it is very difficult to compose ‘a human object’ from a number of arbitrarily shaped
Many morphological image segmentation methods have been
and randomly distributed regions. That is, a semantic object can
proposed and applied for various image analysis tasks [1-8].
not be composed using signal-processing techniques, but has to
Among these tasks, the goal of region-based image/video coding
be treated in the basis of semantics or proper object models.
[2,3,4] is to improve subjective and objective quality in very low-
However, if we first compose an object from segmented
bit rates. To achieve a high compression ratio, the coding schemes
homogeneous regions of similar characteristics, the object
have to reduce bits for region contour coding, which may occupy
extraction problem can be simplified. In other words, this object
up to 45 % of the total bits [4]. So, they limit the number of
can be regarded as the background and the remaining regions as
regions regardless of their semantic contents, and thereby lack in
another object, regardless of its semantic contents. This simplified
preserving low-contrast object contours that may be perceptually
problem implies that homogeneous objects in an image should be
important. In segmentation for other image analysis tasks, low-
represented as simply as possible to make the semantic
contrast edges are usually not preserved, because their
segmentation problem easier.
discrimination is not essential for these tasks [1,5-8].
The purpose of this paper is to present an image segmentation
For semantic image analysis tasks, top-down approaches
method that aims to represent homogeneous or simple objects
[2,3,6] have difficulty in defining perceptually important objects.
with few regions while preserving low-contrast object contours.
In other words, it is hard to find a good rule to split coarsely
We adopt a bottom-up approach that consists of five successive
segmented
bottom-up
steps as shown in Fig. 1. First, we will describe initial region
approaches [7,8] composed of region segmentation and merging
segmentation in section 2. The proposed region merging method
regions
semantically.
Therefore,
will be described in section 3. Section 4 will be devoted to present
Step 2:
Find the representative intensity value I p that provides
the largest accumulated histogram from I p − 1 to I p + 1 in
simulation results. Finally, conclusions will be given in section 5.
M.
Original image
Step 3:
Find a connected region R composed of the three
intensity values
Morphological image simplification
{I
p
}
− 1, I p , I p + 1 , and then update M by
excluding R from M. Step 4:
Marker extraction
If Area ( R ) ≥ h , R is registered as a marker and go to
step 3.
Boundary decision by watershed algorithm
Step 5:
If Area ( M ) ≥ h , go to step 2.
Step 6:
If there remains any unexamined marker cluster, go to
step 1. Step 7:
Small region merging
End.
Here, it is preferable that size criterion h is set to the minimum
Homogeneous region merging
area of preserved components after morphological filtering. Let
ψ N be a morphological filter by reconstruction that employs a
Fig.1. Overall procedure
2.
square structuring element whose side is (2 N + 1) . It will remove
Initial region segmentation
all components that do not fit into the structuring element in a binary image. Thus, the proper value of h is
Let f
be a gray image and
f s be the morphologically
(2 N + 1)× (2 N + 1) .
However, it does not hold for a gray image.
simplified image from f . Among a number of flat regions in
For a gray image composed of only rectangular components,
f s , regions brighter or darker than their surroundings are highly
the smallest region that is preserved after morphological filtering
noticeable. Morphological gradient operators are very attractive
is always larger than (the side length of the structuring
in defining such regions, since the morphological gradient by
element) × 1. Then, the size criterion can be defined as h = 2N +1 .
dilation (or erosion) can locate a zero gradient value inside locally
(2)
bright (or dark) components. Therefore, to locate those noticeable
However, this criterion is not valid for a gray image having
regions precisely, we propose an operation G m ( f s ) as follows.
components of arbitrary shapes. In this case, the simplified image
= minδ f − f , f − ε f ,
G m f s = min G + f s , G − f s 1
s
s
s
1
may have very small regions including one-pixel regions. (1)
s
where G , G , δ N and ε N denote morphological gradient by +
−
Nevertheless, experimental results show that the criterion given in Eq. (2) works well in most cases. Therefore, we use this value as a size criterion.
dilation, morphological gradient by erosion, dilation of size N,
After marker extraction, initial region segmentation is
and erosion of size N, respectively. Then, those regions as well as
performed by applying the modified watershed transform on
large flat regions can be discriminated by examining areas with
extracted markers [2].
zero values of G m .
3.
Let us define a marker cluster as a connected region with zero values of G
m
Homogeneous region merging
( f s ). A marker cluster includes various significant
flat regions such as large flat regions, brighter or darker flat
3.1.
Ordered small region merging algorithm
regions compared to the surroundings regardless of their sizes. It
Initial region segmentation contains a number of redundant
should be noted that they are all very important for representing
small regions. In this section, we propose an ordered region-
object contents. Now we can restrict marker extraction within
merging algorithm to reduce these small regions. Let Ri be a region considered for merging, and ξ (Ri ) be a
marker clusters without the loss of significant flat regions. Since markers are not separated by using G m ( f s ), they are to be discriminated by the following procedures. Step 1:
Find a marker cluster M in G
m
( f s ).
region set whose element is one of the neighboring regions of Ri
by 4-connectivity [10]. First, valid merging candidates ξ M (Ri )
have to be selected from ξ (Ri ). To find appropriate merging
candidates, ξ M (Ri ) , we have observed some tendencies from
homogeneous objects may be extractable more easily than other
several simple region-merging experiments [10], i.e.,
most of the small regions are likely to be merged to larger
complex objects, since their components have very similar
ones;
statistical properties to each other. However, low contrast
regions of low variances are often merged to regions of
boundaries between objects may result in merging objects of
higher variances;
different semantics. To avoid non-semantic merging, we perform
regions near the image boundary may be components of
a ternary classification for segmented regions. We determine the class of a region Ri , C ( Ri ) , as follows.
the background or the simple objects.
H , if σ ≤ VAR _ TH , C R = IAH , if C R = H , R ∈ξ R , IH , otherwise,
The first two observations often happen at the same time. Let
2
i
S i , µ i , and σ i denote the area, statistical mean, and variance 2
∃
i
of Ri , respectively. Then, a region R j can be a valid merging candidate, if either of following conditions is satisfied, i.e., or
j
j
i
(3)
σ i ≥ σ j , S i ≥ S j and µ i − µ j ≤ DIFF ,
where VAR_TH is the variance of the largest region. Here, H, IAH,
σ i < σ j , S i < S j and µ i − µ j ≤ DIFF ,
and
IH
are
abbreviations
of
a
homogeneous
region,
where DIFF is a threshold value to prevent dissimilar regions
inhomogeneous region adjoining a homogeneous region, and
from being merged. Additionally, if both Ri and R j are
inhomogeneous region adjoining only inhomogeneous regions,
adjacent to the image boundary and
µ i − µ j ≤ DIFF + 2, R j is
also taken as a merging candidate. ξ M (Ri ) can be determined in this way.
respectively. After classification, we examine only regions of class H for merging, and regard regions of class H or IAH as valid merging
In region merging, the merging order is an important factor to
candidates. That is, we examine Ri and R j for merging such
( )
consider, because merged results can be quite different according
that C (Ri ) = H and R j ∈ ξ (Ri ), C R j ≠ IH . This restriction is
to the merging order even with the same merging criterion. In this
to prevent two regions of different semantic contents from being
paper, we regard the priority of a region as its variance value, and
merged. Here, ξ M (Ri ) is selected by using gradient-based
the region of a lower priority will be merged first. This ordering
criterion.
can be implemented by a loop operation that begins from one and
Pixel p is defined as a boundary pixel between Ri and R j , if
proceeds to a sufficiently large number. Then, only the regions
there are two pixels, pi and p j such that pi ∈ Ri , p j ∈ R j ,
having a lower standard deviation than the argument in the loop will be considered for merging. Additionally, in order to promote an opportunity for a region to be merged, we expand ξ M (Ri ) by
and p j ∈ N 4 (pi ) , where N 4 (p ) is a set of pixels neighboring p
by 4-connectivity. Then, merging candidates ξ M (Ri ) for a given
Ri can be determined by considering the weakness of boundary
increasing threshold value DIFF, as merging proceeds. The
pixels. If R j satisfies both conditions, C (Ri ) ≠ IH and at least
proposed region-merging algorithm is as follows.
half of the boundary pixels between two regions have gradient
Step 1:
DIFF = 0, VAR = 1.
Using these merging candidates, we perform the following
Step 2:
Find a region Ri such that σ i ≤ VAR , S i ≤ S MIN , and
algorithm until it terminates automatically.
values less than 2 VAR _ TH , R j is an element of ξ M (Ri ) .
ξ M (Ri ) ≠ φ .
If there is no such region, go to step 3.
Otherwise, find a merging pair ( Ri , R j* ) that provides the
Step 1:
Find Ri such that C (Ri ) = H and ξ M (Ri ) ≠ φ .
smallest value of variance after merging, and then merge
Step 2:
If there is no such region, the merging procedure is
them. Repeat step 2. Step 3:
terminated. Otherwise, find merging pair ( Ri , R j* ) that
Increase VAR by one. If VAR > VARMAX, go to step 4.
Otherwise, go to step 2. Step 4:
provides the smallest value of variance after merging, and then merge them.
Increase DIFF by one. If DIFF > DIFFMAX, it is
terminated. Otherwise, go to step 2.
Step 3:
region continuously by merging. Step 4:
3.2.
Classify the merged region to H in order to expand this Go to step 1.
Homogeneous region merging
The previous stage only removes redundant regions that do not
4.
Simulation results
annoy object semantics. The main purpose of this paper is to represent
homogenous
objects
with
few
regions.
Such
Experimental image segmentation has been performed on the
first frames of typical video sequences: “Miss America” and
coders. Our further work will be focused on a merging method
“Mother and daughter.” The open-close by reconstruction filter of
that aims to reduce the number of regions in complex objects as
size one [2] is used for image simplification. Markers are
well as homogeneous objects.
extracted with h = 3 as described in section 2. Boundary decision
References
is performed by using the modified watershed algorithm [2]. For merging, we select parameters of DIFFMAX = 1, VARMAX = 50, and SMIN = 200. We should mention that DIFFMAX is set to one in order to maintain region homogeneity as highly as possible. The first and second rows in Fig. 2 show segmentation results for “Miss America” and “Mother and daughter,” respectively. In each row, 4 figures represent an original image, and results of region segmentation, ordered small region merging, and homogeneous region merging, respectively. The number of regions after the final segmentation is 163 for “Miss America”, and 256 for “Mother and daughter”, respectively. Even though the number of regions is somewhat large, simulation results demonstrate that extracted homogeneous background contours coincide with natural object boundaries very well.
5.
Conclusions
In this paper, we adopt a bottom-up image segmentation approach
and
propose
two
region-merging
algorithms.
Experimental results show that the proposed image segmentation method preserves most of the semantic object contours well while clearly representing homogeneous objects with few regions. Hence, the proposed work is highly suitable for initial image
[1]
F. Meyer and S. Beucher, “Morphological segmentation,” Journal of Visual Comm. and Image Representation, vol. 1, no. 1, Sep. 1990, pp. 21-46. [2] P. Salembier, “Morphological multiscale segmentation for image coding,” Signal Processing, vol. 38, pp. 359-386, 1994. [3] P. Salembier and M. Pardas, “Hierarchical morphological segmentation for image sequence coding,” IEEE Trans. on Image Process., vol. 3, no. 5, pp. 639-651, Sep. 1994. [4] D. Wang, et al., “Segmentation-based motion-compensated video coding using morphological filters,” IEEE Trans. on Circ. and Syst. for Video Tech., vol. 7, no. 3, pp. 549-555. June 1997. [5] D. Cortez, et al., “Image segmentation towards new image representation methods,” Signal Processing: Image Comm., vol. 6, pp. 485-498, 1995. [6] J.G. Choi, et al., “Spatio-temporal segmentation using a joint similarity measure, IEEE Trans. on Circ. and Syst. for Video Tech., vol. 7, no. 2, pp. 279-286, Apr. 1997. [7] K. Haris, et al., “Hybrid image segmentation using watersheds,” Proc. of VCIP, vol. 2727, pp. 1140-1151, 1996. [8] L. Shafarenko, et al., “Automatic watershed segmentation of randomly textured color images,” IEEE Trans. on Image Process., vol. 6, no. 11, pp. 1530-1544, Nov. 1997. [9] “MPEG-4 video verification model V.5.0,” ISO/IEC JTC 1/SC 29/WG 11/N1469, Nov. 1996. [10] A.K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, New Jersey, 1989.
segmentation in various image segmentation applications, including semantic image analysis tasks and region-based image
(a)
(b)
(c)
(d)
Fig. 2. (a) Original images, (b) initial image segmentation results, (c) small region merging results, and (d) the final results by homogeneous region merging.