LBLA: Line Based Label Algorithm

4 downloads 0 Views 512KB Size Report
labeling algorithm based on line description method and optimized tree Union-Find strategy. The algorithm transforms the pixel-connected issue, which most of ...
A Novel Line Based Connected Component Labeling Algorithm Hualong Zhao2

Yebin Fan1, Shengsheng Yu3

National Key Laboratory of Science and Technology on Multi-spectral Information Processing Technologies, Institute for Pattern Recognition and Artificial Intelligence Huazhong University of Science and Technology Wuhan, China All of the above algorithms assign or spread a label by examining each pixel and his neighbors. Differ from these approaches, our proposed algorithm called Line Based Label Algorithm (LBLA) is based on lines. All connected neighbor pixels in one row are extracted to one line. It’s obviously that the pixels in the same line should share the same label. Using lines instead of pixels will save much pixels-examining time, and which is dramatically in some case. In double scan algorithms, the label spreading and the connected component identification are accomplished in a single stage. In our approach, this stage is separated into two independent phase, which will remove many redundant Union-Find operations. The details are described in Section II. This paper is organized as follows: Section II details the labeling process of LBLA. The algorithm optimizations and analysis is presented in Section III. Performance measurements are given in Section IV. Our future work is contained in Section V.

Department of Computer Science Huazhong University of Science and Technology Wuhan, China [email protected], [email protected] Abstract—This paper presents a fast connected component labeling algorithm based on line description method and optimized tree Union-Find strategy. The algorithm transforms the pixel-connected issue, which most of proposed algorithms focus on, into line-connected issue. This algorithm is comprised of three phrases, line extraction, connected component identification and label assignment. The line description method transforms the connected pixels into line form for reducing the scan time. While the new tree Union-Find strategy diminishes the redundant root compare operations. A comparison analysis is performed with other optimized famous component labeling algorithms. Our algorithm has shown an outstanding performance with respect to the processing time, which achieves 1.1~8 times as fast as the other algorithms in various test cases. Keywords-LBLA;Connected Component Labeling; UnionFind

I.

INTRODUCTION

LINE BASED LABEL ALGORITHM To simplify the question, two notifications are announced in advance in this paper. The first one is only the binary images stored as 2D array of pixels are considered. The second one is that as our algorithm is easy applied in 4 and 8 connected situations, we only take 8-connected components of 2D images as an example for our experiments and illustrations. In order to better describe our proposed algorithm, three definitions are given as follows: Definition 1: The line is defined as a five-element set L(LineID, x1, x2, y, LinkID), where y is row coordinate of the image, x1 is the column coordinate of start point, x2 is the column coordinate of end point, the LineID is the index of the line and LinkID is the link field of rooted tree which is composed of all the lines in the same connected component. Definition2: Provided two lines: L1 (LineIDL1, x1L1, x2L1, yL1, LinkIDL1) and L2 (LineIDL2, x1L2, x2L2, yL2, LinkIDL2), L2 meets the following condition, then L2 is called the connected line of L1. II.

The connected component labeling is an important process in many machine vision systems and image processing applications. Various algorithms have been proposed on this issue. Suzuki [1] categorized them into four groups: (1) Algorithms [5][6][7] by repeated passes over the data back and forth until each pixel has a stable label. Most of these methods can be implemented in-place without any additional work space. (2) Algorithms [2] of two passes over the data. In the first pass, a provisional label is assigned to each pixel and in the second pass temporary label conflicts are resolved to assign each label to its corresponding class. (3) Algorithms have been developed for the images represented by hierarchical tree structures. (4) Parallel algorithms. Most of these algorithms [1][2][3] use some sort of Union-Find algorithms to resolve the label conflicts. In Union-Find algorithm, the equivalence of the connected components is often represented as rooted trees form. Kesheng Wu [3] proposed to implement the rooted trees with an array rather than pointers to reduce the cost of Union-Find algorithms. He also optimized Suzuki’s [1] and Fiorio’s [2] algorithms by using of a decision tree to reduce the number of neighbors examined during scanning steps. Ayman AbuBakerl [4] proposed a new one scan algorithm, which spread a new label to the entire connected pixel once a time, which has shown an outstanding performance according to his paper. _____________________________________

ȁ›୐ଵ െ ›୐ଶ ȁ ൌ ͳ ൜ Ǩ ሺሺšʹ୐ଵ ൏ ሺšͳ୐ଶ െ ͳሻሻȁሺšͳ୐ଵ ൐ ሺšʹ୐ଶ ൅ ͳሻሻሻ‹•–”—‡ Definition 3: If a line’s LinkID is equal to its LineID, the line is called the entrance line. The proposed algorithm has three stages as Figure 1. LBLA The first phrase is called line extraction which

978-1-4244-5539-3/10/$26.00 ©2010 IEEE 168

extracts lines from pixels of the original image. All lines are initiated as the entrance lines at the line extraction phase of the proposed algorithm.

can both be treated as some sort of Union-Find algorithms. But the Fiorio’s algorithm has more constraints than our proposed algorithm in the comparison of root nodes of trees. In Fiorio’s algorithm, two trees’ root is compared firstly; the smaller value one will be the new root of the union tree. Then all nodes in the search path from leaf to root of the bigger value one should be modified to pointing to the new root. This procedure can be named as find-compress operation. Thus, provided unifying the number of N trees, N times operations of finding root and N-1 times operations of find-compress will be needed. While the proposed algorithm in this paper always integrates the latter tree into the previous one. So only 1 time operation of finding root and N-1 times operations of find-compress will be needed.

Figure 1. LBLA Algorithm

The second phase is connected component identification. If two lines (La, Lb) have the following relationship, they are connected lines. The second phase is called connected component identification. In this phase, the lines which belong to the same connected component are organized as a rooted tree based on LinkID, and each line is a node of the rooted tree. LinkID is used to point to the parent node, that is, LineID of the parent node line is assigned to LinkID of the current line. So the final entrance line can be found by performing a recursive search on the lines’ LinkID. The convergence condition of the line search is that LinkID of one line in the search path is equal to its own LineID. According to Definition 2, the connection relationship is only occurred between two neighbor rows. So the identification process examines two neighbor rows once a time progressively. There are three situations for a line to be examined with the lines in the up row: 1) Disconnected with all lines in the up row. 2) Connected with only one line in the up row. 3) Connected with more than one line in the up row. Situation 1: It implies that a new connected component is discovered. Nothing should be done. Situation 2: It means that a existed connected component should be stretched. Current working line and the corresponding connected line in the up row belong to the same connected component. Current line should be merged into this connected component by simply copying LinkID of the connected line, which is illustrated by the line K in Figure 2. Situation 3: It means a merge operation should be performed. The procedure of the merge operation is as follows: Firstly, find out the entrance line of the first connected component which the first connected line belongs to. Secondly, assign LinkID of that entrance line to LinkID of current working line (i.e., merge current working line into the first encountered connected component). Finally, find out the corresponding entrance lines of all the other connected lines and modify the LinkIDs of all lines in the search path to the first obtained LineID. This situation is illustrated by line L in Figure 2. Obviously, the key of proposed connected component labeling algorithm is the combination of connected components. This procedure is similar with the mergence of equal connected components in Fiorio’s[2] algorithm. They

Figure 2. An example for connected component identification

The third phase is called label assignment, which assigns an unique label to each connect component, and all the lines in the same connected component share this label. The number of connected components in the labeled image can be acquired by the numbers of lines that their LinkID are equal to LineID. The whole labeled image can be acquired by process of the following steps: 1) Fetch out lines from the working space sequentially. 2) If the line is the entrance line, check whether it has been assigned a label value. If not, assign a new label value to this line. 3) If the line is not the entrance line, find out the corresponding entrance line by LinkID. Then do the procedure 2 recursively, and assign the label value of the corresponding entrance line to current line. After the three phases mentioned above, all lines in the same connected component should have been assigned the same unique label value. III.

ALGORITHM ANALYSIS & OPTIMIZATION

The three main phrases of proposed algorithm are analyzed based on the worst condition, providing the image size is N*N. In Line extraction phrase, the maximum number of lines is N*N/4. In order to accelerating the search speed, a row index is established, which is pointing to the LinkID of the first line in each row. Thus, the set for describing the line L(LineID, x1, x2, y, LinkID) can be further reduced to L(LineID, x1, x2, LinkID). And the LineID, as the intrinsic

169

property of the index of line itself, can be represented as the suffix of line array. From the foregoing, provided the value is represented in the int type, the size of row index is N*sizeof(int), while the line size is 3*sizeof(int). Totally, the supplementary memory space for line extraction phrase is N+N*N/4*3=N*(4+3N)/44). The complexity of the memory space is O(N2), that is, the supplementary space is approximately equal to the original image. In connected component identification phrase, no supplementary memory space is needed. In the label assignment phrase, in order to saving the storage space, LinkID can be used to store the assigned label value when the line has been assigned. If a line’s LinkID is less than 0, it indicates the line has been assigned the label value, which is equal to the absolute value of its LinkID field. For the purpose of reducing the search time of entrance lines, the find-compress procedure is added in this phrase. During the search process, a temporary stack, of which the size is equal to the number of column, is used to record the current search path of tree. Once the entrance line is found, its label value will be assigned to all the line nodes in this search path by the content recorded in the temporary stack, which can accelerate the following search speed IV.

shortly named OL, as the reference for the comparison with the theoretic speed. This optimal labeling algorithm is simply implemented by assigning the every effective pixel of the image a random number causally. Figure 3 illustrates the influence on the performance of all algorithms by different probability factors in the two same-scale images. It shows when the probability factor is less than 0.1, One Scan (OS) algorithm has the advantage, since the effective pixel is deficient. As the scale is expanding, the advantages of Wu’s algorithm and LBLA are gradually emerging out. Both of these algorithms are based on the strategy of Union-Find. Moreover, LBLA is 10%~100% faster than Wu’s algorithm in all of the test images, since LBLA diminishes the scan scale and unnecessary operations for two trees’ root comparison. For OS algorithm, LBLA, at most, is about 8 times as fast as OS. The same conclusion can be got by applying different scale images, which is illustrated by the comparison of Figure 3(a) and Figure 3(b). In the following, we use 9 application pictures shown in Figure 4 to evaluate these algorithms. The result is shown in Table 1. In the contrast with random images, the LBLA algorithm gets better performance in standard application images’ test. The reason for this situation is that the generation function for random images obeys the uniform distribution, which makes the generated images more similar to the random noise image {REF_Ref25682049\h\*MERGEFORMAT} (i). In the test binary images, we define pure white color as the background value of the image, pure black color as the foreground object. The test performance table shows the accelerate ratio of LBLA corresponding to OS becomes larger as the larger ratio of foreground and background, which can be considered that OS algorithm is unsuitable for large-scale foreground images. To Wu’s algorithm, the accelerate ratio of LBLA is stable in approximately 2 times except {REF_Ref25682049\h\*MERGEFORMAT} (h) oracle rubbing, which gets more than 4 times acceleration. This is mainly because that the original image has the horizontal connected pattern in its foreground objects which is actually the spare space without those characters. The proposed line extraction manner saves much of the pixel scan time. Even compare to the OL algorithm it’s only slower about 50%.

TEST & CONCLUSION

The test plan for performance is mainly based on 2 categories, random 2D binary images and standard application images. All tests are performed in the same computer: Intel Core 2 Duo processor E7400 2.8GHz with 2GB RAM. The generating method for random images is that each pixel in the N*N image array would be set to 1 in accordance with the probability of random function, otherwise would be set to 0. The complexity of the generated images depends on by the factor of the random function. These test images are generally harder to label than real application images. However, they make reasonable test images to measure the performance of connected component labeling algorithms. A comparison of our algorithm and other connected component labeling algorithms is presented in Figure 3 using series random generated images with different scale. The two algorithms used for comparison are One Scan algorithm [4], which is called OS for short, and Wu’s algorithm [3]. At last, we used the optimal labeling algorithm in theory, which is

170

Figure 3.

(a) 5000X5000 random image (b) 10000X10000 random images TABLE I.

TEST RESULT

Image

Size

BlockNum

OL (ms)

Wu's (ms)

OS (ms)

LBLA

OL/LBLA

Wu's/LBLA

OS/LBLA

(a)

1033X 832

11469

1.81

8.67

11.29

4.28

0.42

2.03

2.64

(b)

930X1023

14730

2.14

9.41

9.42

4.05

0.53

2.32

2.33

(c)

708X1162

1415

1.43

8.47

16.92

2.8

0.51

3.03

6.04

(d)

2278X381 0

31672

17.78

91.66

177.49

42.53

0.42

2.16

4.17

(e)

512X 512

526

0.46

2.69

3.42

0.87

0.53

3.09

3.93

(f)

1200X 916

10222

3.53

11.28

14.53

5.84

0.60

1.93

2.49

(g)

4219X586 5

104434

58.1

258.75

610.12

115.69

0.50

2.24

5.27

(h)

1266X154 5

476

4.68

19.33

8.66

5.21

0.90

3.71

1.66

(i)

578X 437

8810

0.57

2.76

2.09

1.44

0.40

1.92

1.45

171

Figure 4. (a) comics (b) maze (c) backbone (d) moon (e) woman (f) newspaper (g) Taiwan (h) oracle rubbing (i) random noise [3]

V.

FUTURE WORK

The LBLA algorithm proposed in this paper has an outstanding performance for labeling the connected component. However, the drawback of LBLA is the extra requirement of supplementary memory space in the line extraction phrase, which needs a further solution for decreasing this extra requirement. As the connected components identification phrase is independent in the whole algorithm and the connected relationship should only be produced between two neighbor lines, the LBLA algorithm possesses the potential parallel computation capability inherently. So another part of the future work will be focus on the parallelization of the LBLA algorithm, and make it be applied on some very large image processing scenarios such as remote sensing.

[4]

[5]

[6]

[7]

REFERENCES [1]

[2]

K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connectedcomponent labeling based on sequential local operations,” Comput.Vis. Image Underst. 89(1), pp. 1–23, 2003. C. Fiorio and J. Gustedt, “Two linear time union-find strategies for image processing,” Theor. Comput. Sci. 154(2), pp. 165–181,1996.

172

Kesheng Wu, Ekow Otoo, Arie Shoshani, “Optimizing connected component labeling algorithms”, Lawrence Berkeley National Laboratory, Paper LBNL-56864, 2005 A AbuBaker, R Qahwaji, S Ipson, M Saleh, “One Scan Connected Component Labeling Technique” in Proc. IEEE International Conference on Signal Processing and Communications(ISCPC 2007), 2007, pp. 1283 – 1286. Tao Jiang, Ming Qiu, Jie Chen, Xue Cao, “LILA: A Connected Components Labeling Algorithm in Grid-Based Clustering” In Proc. International Workshop on Database Technology and Applications, 2009, pp. 213- 216. Ni Ma, Bailey, D.G., Johnston, C.T., “Optimised Single Pass Connected Components Analysis” in Proc, International Conference on ICECE Technology(FPT 2008), 2008, pp. 185- 192 Bailey, D.G., Johnston, C.T., Ni Ma, “Connected components analysis of streamed images” in Proc, International Conference on Field Programmable Logic and Applications(FPL 2008), 2008, pp. 679- 682.

Suggest Documents