Pyramid segmentation parameters estimation based on image total

0 downloads 0 Views 246KB Size Report
Keywords— Image processing, segmentation, pyramid, image total variation ... Digital image multiscale processing allows for super- vised details selection and ..... [7] Jain, A. K. Fundamentals of Digital Image Processing. Prentice Hall, New ...
Pyramid segmentation parameters estimation based on image total variation Andrej Koˇsir, Jurij F. Tasiˇc Faculty of electrical engineering Trˇzaˇska 25, Ljubljana Slovenia email: [email protected] Abstract— In this paper, a procedure for estimating input parameters (thresholds) of the pyramid segmentation algorithm based on image total variation is proposed. Image segmentation is a crucial part of low and high level digital image analysis. Among others, pyramid segmentation algorithm depends on input parameters to be provided as an a-priori known input data. In the case when one single image is segmented, those parameters can be determined interactively. In our work, a database of images were to be segmented in a given time constraints what requires an automatic estimation of segmentation input parameters. In order to achieve this, a digital image total variance is defined and an estimation formula based on image total variance is evolved. The proposed parameters estimation formulas are experimentally evaluated. Keywords— Image processing, segmentation, pyramid, image total variation

I. I NTRODUCTION Image segmentation is a process of segmenting an image into a group of homogenous regions according to certain (local) image characteristics. Algorithmically, a label is assigned to each image pixel during the segmentation process to indicate to which of segmented regions it belongs. An output set of label is a set of extracted regions. Due to the specific requirements the image segmentation results should met, certain properties of segmented regions are required (connectedness, edge properties etc.) A need for of an automatic image segmentation algorithm tuning arose as a part of image scene analysis and initiated this work. Large number of images of traffic scenes were to be processed, see [8], and image segmentation is one of preprocessing techniques. Therefore, an interactive segmentation parameter determination should be replaced by the automatic parameter estimation. Many segmentation techniques were taken into account to select the most appropriate one, see section II. Considering preliminary results on test image database, space and time complexity and stability, the pyramid segmentation proposed in [7] was chosen. This technique requires two thresholds to be provided as a-priori information. Estimation formulas for pyramid segmentation thresholds are given later in this paper.

A-priory pyramid segmentation algorithm parameters estimation proposed in this paper is based on an introduced total image variation. Preliminary results showed that statistical moments do not provide the necessary information to be utilized in the efficient parameter estimation. II. S EGMENTATION TECHNIQUES The following segmentation techniques were tested in a preliminary phase of our work. The simplest technique is optimized amplitude thresholding, see [10]. It is applicable when there is a clear separation between pixel values of different components. Edge based methods try to find edges of connected regions and then fill those regions to get homogenous components, see [14]. A segmentation technique based on a-priori known templates is applicable when the shape of the segmented region is known. High computational complexity prevents this method from efficient use, see [1]. Region growing is recursive process carried out until there are no neighbors who should belong to the same region, see [7]. Clustering techniques were utilized in image segmentation tasks in different ways. Each one of the specific clustering methods suffers from certain difficulties what eliminates it from the area of adequate methods for the segmentation of large number images. Among others, K-Means clustering is a clustering technique where the clustering proceeds by computing the affinity of each entity with each of K clusters that already exist. The entity is assigned to the cluster to which it is closest in the measurement space. The means are updated and the process is repeated for the next entity. The entities are cycled through until no entities are moved between clusters, see [10]. The idea of the texture segmentation is to associate with each pixel, not just a single pixel value, but a vector of texture measurements. The underlying assumption in texture segmentation is that the different regions that are the result of segmentation possess different textural attributes. Statistical methods are usually a part of texture

segmentation, see [10]. Image segmentation based on vector field analysis provides amazing results, see [13]. Unfortunately, its computational complexity is to high. A. Multiscale segmentations Digital image multiscale processing allows for supervised details selection and deletion. This is the main reason why multiscale segmentation provides good results in a reasonable number of operations, for reference see [9]. Typically, multiscale segmentation is performed in one of the following ways. Image pyramids are computed applying certain filters in order to obtain the image at different scales. Each of these images (replica of the same image at different scales) is then segmented and results are combined to obtain segmentation result. In the second approach, quad-tree decomposition is performed. In the initial step, the entire image is assumed to be a single region. Iteratively, regions are then split into smaller regions and similar adjacent regions are merged into a single region. Pyramid segmentation modeling by Markov process is given in the paper [2]. We were unable to utilize this approach for guided segmentation required by scene analysis, see [8].

The motivation for the definition is the total variation of a real function f : [a, b] → IR given by V (f, [a, b]) = supD |f (xi−1 ) − f (xi )|, where D = (a = x0 < x1 < · · · < xn = b) is a partition of the interval [a, b], for reference see [5]. The neighborhood of the point xi in the partition D is {xi−1 , xi+1 } and is determined by the order of the set of real numbers IR. Since no natural ordering is introduced in a 2 dimensional lattice ZZ2 , a neighborhood of the pixel (x, y) ∈ ZZ2 is not induced by the order relation. It can be defined in many different ways, we chose a neighborhood depicted on figure 1, see reference [4]. It is denoted by Nv (x, y), where v is an order of the neighborhood. Labels of the neighborhood pixels of the pixel p = (x, y) represents the highest order of the neighborhood they belong to, i.e. label 2 implies that corresponding pixel belongs to N1 (x, y), N0 (x, y) and N2 (x, y). Clearly for u < v we have Nu (x, y) ⊂ Nv (x, y). Neighborhoods of order up to v = 5 are used, explicitly N0 (x, y) = {}, N1 (x, y) = {(x + 1, y), (x, y + 1), (x − 1, y), (x, −1)}, etc., see Figure 1.

III. F ORMULATION OF SEGMENTATION The digital image i of the resolution n2 can be considered as an outcome of a random variable (experiment is applying sensors for taking picture) I : G → En2 × IRs , where En = {1, . . . , n} and s = 1 (gray level image) or s = 3 (color image). Clearly, i ∈ En2 × IRs , that it is a function i : En2 → IRs . Pixel locations are denoted by (x, y) ∈ En2 and pixel values are i(x, y) ∈ IRs . An image region is a subset R ⊆ En2 . A segmentation of a digital image i is a decomposition {Rk }k of the set En2 such that En2 = ∪M k=1 Rk ,

Rk ∩ Rl = ∅ for k 6= l.

Clearly, a trivial segmentation composed of one pixel regions Rk = {(xk , yk )} can be assigned to every image. The number of segmented regions is denoted by M . Common pixel value of the homogenous region R of the image i is denoted by i(R) and its area is denoted by m(R) (number of pixels in R). IV. I MAGE TOTAL VARIATION During the preliminary study it turned out that spatial and statistical moments do not provide the information necessary to control segmentation results. To extract such information, a new numeric quantity assigned to a digital called image total variance was introduced.

5

4

3

4

5

4

2

1

2

4

3

1

p

1

3

4

2

1

2

4

5

4

3

4

5

Fig. 1. Pyramid segmentation

. According to the above given motivation, a total variation of the image region R ⊆ En2 of a grey level image i : En2 → IR of order v is defined as Vv (i, R) = P

(x,y)∈R

1 m(R)

(1)

max(x0 ,y0 )∈Nv (x,y) |i(x, y) − i(x0 , y 0 )|.

Total variation of the color image region R = (RR , RG , RB ) is given by Vv (i, R) = 0.3Vv (iR , RR ) + 0.59Vv (iG , RG ) + 0.11Vv (iB , RB ). A total variation of the whole image i is given by Vv (i, En2 ) and is shortened to Vv (i). The order index v is usually known and therefore omitted from the notation, Vv (i, R) is simplified to V (i, R). Clearly, for any image i it holds V (i) ≥ 0 and V (i) = 0 if and only if i = const. For disjoint image regions R1 and R2 we have 1 m(R1 ∪R2 )

V (i, R1 ∪ R2 ) = (2) (m(R1 )V (i, R1 ) + m(R2 )V (i, R2 ))

Beside that, for any real constant α ∈ IR, V (αi, R) =

|α|V (i, R). It is also clear that image total variation V (i, R) is translation and rotation invariant. V. P YRAMID SEGMENTATION In order to avoid ambiguity, the image segmentation parameter estimation presented in this paper is limited to the pyramid segmentation proposed by Burt in the paper [3]. We believe no loss of generality of our results is caused by such choice. For testing purposes, an excellent implementation of Burt’s pyramid segmentation techniques provided by Open Source Computer Vision Library, see web page [6]. Pyramid segmentation algorithm is illustrated by Figure 2. ((X-1)/2,(y-1)/2,l+1)

l-1 p

(x,y,l)

l

p

(2x,2y, l-1)

l+1

region is given by max d(i(x1 , y1 ), i(x2 , y2 )) ≤ LT2 ,

where (x1 , y1 ), (x2 , y2 ) ∈ Rk and for any Rk . Intuitively, increasing the threshold T1 results in a decreased number of segmented regions M and decreasing the threshold T2 results in a increased number of segmented regions M . VI. E STIMATIONS OF TRESHOLDS In this subsection, thresholds T1 and T2 , see previous section, are estimated to obtain desired number of segmented regions M . What is the appropriate form of information to be provided by the user of the segmentation technique? According to our experience, the most appropriate form of the information provided to control the segmentation result is to determine the number of segmented regions per area of the input image. When the area of the image m(En2 ) is given by the image resolution, the number of segmented regions is to be provided. Every digital image i can be partitioned into two (possibly and usually disconnected) parts according to discrete gradient operator (∇i)(x, y) = (i(x + 1, y) − i(x, y), i(x, y + 1) − i(x, y)), I 1 = {(x, y) ∈ En2 : (∇i)(x, y) 6= 0},

Fig. 2. Pyramid segmentation

. There are two parameters to be a-priori known to apply the Intel implementation of pyramid segmentation, thresholds T1 and T2 , see Intel library manual available on web page [6]. Their role is the following. The link between pixels (x, y) at the level l and the candidate father pixel (x0 , y 0 ) on the adjacent level l + 1 is established if the distance between their local image property c(x, y) satisfies the inequality d(c(x, y, l), c(x0 , y 0 , l + 1)) < T1 . On the other hand, any two pixels (x1 , y1 ) and (x2 , y2 ) at the same level l are grouped into the same cluster if d(c(x1 , y1 , l), c(x2 , y2 , l)) < T1 . Distance measure d is an Euclidian metric on IR or IR3 . Local image property c is usually pixel gray level or pixel color, other properties can be assigned as well. Therefore, the threshold T1 is a lower bound of the differences of pixel values of any two segmented regions Rk 6= Rs d(i(Rk ), i(Rs )) ≥ T1 (3) The threshold T2 is the maximal value of the difference between pixel values inside each segmented region at two adjacent pyramid levels. As a consequence, for the pyramid with L levels (0 ≤ l ≤ L), the upper bound of pixel valu differences inside each of the segmented

(4)

I 2 = En2 \I 1 .

For obvious reasons, I 1 is called a monotone subregion of the digital image i and I 2 is a non-monotone subregion of the image i. Connected components of regions I 1 and I 2 are denoted by C(I 1 , k) and C(I 2 , k), respectively. The minmax operator mM is defined for the image region R ⊆ En2 of the image i, mM (i, R) = max i(x, y) − min i(x, y). (x,y)∈R

(x,y)∈R

The downsampling operator reducing the image resolution by the factor 2l is denoted by Sl , that is Sl i is the image obtained by downsampling the original image i. For details see [9]. Clearly, when an image region of the image i is En2 , an image region of the downsampled 2 2 image Sl i is En/2 l , denoted by Sl En . Let R = {Rk }k be a segmentation of an image i and T > 0 a predefined threshold. Segmented regions can be partitioned into two subsets R = R1 ∪ R2 , where R1

=

{R ∈ R; ∃k : R ∩ C(I 2 , k) 6= ∅ & mM (C(R, k)) > T }.

According to the equation (2), the total variation V (i) can be expressed as V (i)

=

1 m(En2 )

X

R∈R1

m(R)V (i, R)

(5)

+

X

!

B. Estimation of threshold T1

m(R)V (i, R) .

R∈R1

A. Estimation of threshold T2 Direct calculation shows that for R1 ∈ R1 , R2 ∈ R2 and larger downsampling parameter l (l ≤ log 2 n) it holds V (Sl i, R1 ) ≈ 2l V (i, R1 ),

V (Sl i, R2 ) ≈ 0. (6)

The symbol ≈ stands for approximate equality. Beside that, for each monotone region Ik1 ⊆ I 1 segmented to Mk regions {Rk1 , . . . , RkMk } satisfies the inequality s ξ(Rkt ) V (i, Rkt ) ≤ Mk T, 1 ≤ t ≤ Mk , (7) m(Rkt ) where T is predefined threshold (see above) and ξ(Rkt ) is an eccentricity of the segmented region Rkt . It is defined as an eccentricity of the ellipse that best fits the region Rkt . The eccentricity depends on the specifics of the segmented image. Therefore, at least a typical value of ξ(R) is known from the context of the image segmentation task, usually 12 ≤ ξ(R) ≤ 2. Altogether, applying the equation (7) to a downsampled image Sl i, we get the estimation V (Sl i) ≤

l X p 22 ξ(Sl Rk )Mk T . 2 m(Sl En )

Theoretical consideration and experimental results of pyramid segmentation showed that the threshold T1 has minor influence on segmentation results, especially on the number of segmented regions. This is due to the fact that according to the inequality (3) which describes the role of the threshold T1 , the number of segmented regions is not dependent on the average distance between region levels. Direct calculation shows that total variance of a highly downsampled image does depend on the threshold T1 and the following inequality holds (10)

T1 ≤ V (Sl i),

where l is the downsampling parameter, see previous subsection. VII. E XPERIMENTAL RESULTS Typical total variation V (Sl i) (solid line) for different downsample parameters l are depicted at Figure 3. According to the equation (6), it is interesting to compare the total variation of the downsampled image V (Sl i) to the scaled total variation of the original image 2l V (i), dashed line at Figure 3. V 60

(8)

50

Rk ∈R1

40

Loosely speaking, during the pyramid segmentation, non-homogenous regions are joined to homogenous ones and only homogenous regions contribute significantly to the total variation of the downsampled image. According to the equation (4) we choose T = LT2 . The above reasoning yields s m(Sl En2 ) 1 V (Sl i), (9) T2 ≥ l Mξ 22 L

30

where the downsampling parameter l is large enough (typically 3 ≤ l ≤ 5), L is Pthe height of the segmentation pyramid and M = k Mk . Value ξ is the average value of the region eccentrics. For practical use, to estimate T2 we choose 12 ≤ ξ ≤ 2 (depends on the application) and equality in the above inequality (9). Note that the above estimation is applicable to gray level and color images. In the gray level case, the distance between image levels of is given by d(i(x, y), i(x0 , y 0 )) = |i(x, y) − i(x0 , y 0 )|. Color image i is a triple i = (iR , iG , iB ) and the pixel level difference is given by d(i(x, y), i(x0 , y 0 )) = 0.3|iR (x, y)− iR (x0 , y 0 )|+0.59|iG (x, y)−iG (x0 , y 0 )|+0.11|iB (x, y)− iB (x0 , y 0 )|.

20 10 0

l 2

3

4

5

6 V(Sl i) l

2 V(i) Fig. 3. Image total variation

The total variation of the downsampled image V (Sl i) is lower than the scaled scaled total variation of the original image 2l V (i). This is due to the fact that according to the equation (6) the total variation of non-monotone regions vanishes by increasing the downsampling parameter l. Both thresholds estimation equations (9) and (10) require a parameter of downsampling l. It turned out that it is good enough to choose l ≥ L, where L is the height of the segmentation pyramid. Typically, 3 ≤ L ≤ 5 and l = L. Several images were analyzed to compare the desired number of segmented regions M and the actual number ˆ when estimated thresholds Tˆ1 of segmented regions M

M 2 3 5 6 10 15 20

Tˆ1 41.2 41.2 41.2 41.2 41.2 41.2 41.2

Tˆ2 29.2 23.2 18.5 14.6 12.8 10.6 8.9

ˆ M 6 7 7 10 11 12 17

M 2 3 5 8 10 15 20

Tˆ1 47.9 47.9 47.9 47.9 47.9 47.9 47.9

Tˆ2 33.8 27.6 21.5 16.9 14.9 12.3 10.3

ˆ M 6 6 10 11 12 14 18

TABLE I E STIMATED THRESHOLDS AND ACTUAL NUMBERS OF REGIONS

and Tˆ2 are used in the segmentation procedure. Estimated thresholds were computed applying inequalities (9) and (10) where inequality signs were replaced by equality signs. The downsample parameter l = 4 and shape eccentric parameter ξ = 1 was chosen to obtain presented results. Table I shows estimation results on two images, Donut (left table) and Lena (right table). ˆ fit to the The number of actually segmented regions M desired number of regions reasonably in most cases and it allows the application of presented thresholds estimation formulas in image database segmentation tasks. Segmentation results on the above mentioned images are shown by Figure 4. The desired number of segmented regions were M = 4 (Donut, upper images) and M = 3 (Lena, lower images). Note that the segmented regions are not necessarily connected.

VIII. C ONCLUSION Input parameter estimation of pyramid segmentation based on introduced image total variation is presented in the paper. It allows for an automatic input thresholds calculation to provide a-priori information required by the pyramid segmentation algorithm. Such estimation is of crucial importance when larger image databases are segmented. Unfortunately, the accuracy of the estimation results is dependent on image specifics, but the results can be utilized in image segmentation applications successfully. The introduced image total variation posses an interesting properties and can be effectively implementation as a parallel algorithm. Its properties and applicability will be studied in the future. R EFERENCES [1] [2]

[3]

[4]

[5] [6] [7] [8] [9] [10] [11]

[12]

[13]

[14] Fig. 4. Input and segmented images

Borgefors, G., Nystrom, I. Efficient shape representation by minimizing the set of centers of maximal discs/spheres Pattern Recognition Letters 18, pp 465-472, 1997. Bouman, P. J., Hong, T. Rosenfeld, A. Segmentation and Estimation of Image Region Properties Through Cooperative Hierarchial Computation IEEE Transaction on systems, man, and cybernetics, vo. smc-11, no, 12, pp. 802-809, December 1981. Burt, P. J., Hong, T. H., Rosenfeld, A. Segmentation and Estimation of Image Region Properties Throuhg Cooperative Hierarchial Computation IEEE Transactions on System, Man, and Cybernetics, Vol. SMC-11, No. 12, December 1981. Dubes, R. C., Jain, A. K., Nadabar, S. G., Chen, C. C. MRG Model-Based Algorithm For Image Segmentation IEEE Proceedings of the 10th Int. Conference on Pattern Recognition, Atalantic City, New Jersey, pp. 808-814, 1990. Fleming, W. H., Functions of several variables Springer Corp., New York, 1971. Web page Open Source Computer Vision Library, http://www.intel.com/research/mrl/ library/index.htm Jain, A. K. Fundamentals of Digital Image Processing Prentice Hall, New York, 1989. Koˇsir, A., Tasiˇc, J. F. Kodiranje oblike digitalne slike v odvisnosti od podroˇcja uporabe Elektrotehniˇski vestnik, vol. 70, pp.66, 2003. Mallat, S. A wavelet tour of signal processing Academic Press, 1999. Pal, N. R., Pal, S. K. A Review on Image Segmentation Techniques Pattern Recognition, Vol. 26, No. 9, pp. 1277-1294, 1993. Panjwani, D. K., Healey, G. Markov Random Field Models for Unsupervised Segmentation of Textured Color Images IEEE Transaction on Pattern Analysis and Machine Inteligence, Vol. 17, No. 10, October 1995. Sethuraman, S., Siegel, M. W., Jordan, A. G. A multiresolutional region based segmentation scheme for stereoscopic image compression Digital Video Compression: Algorithms and Technologies, Vol. 2419, pp. 265 - 274, February, 1995. Tabb, M., Ahuja, N. Multiscale Image Segmentation by Integrated Edge and Region Detection IEEE Transaction on Image Processing, Vol. 6, No. 5, May 1997. Taubin, G. Estimation of Planar Curves, Surfaces, and Nonplanar Space Curves Defined by Implicit Equations with Applications to Edge And Range Image Segmentation IEEE Transaction on Pattern Analysis and Machine Inteligence, Vol. 13, No. 11, November 1991.