IECON2015-Yokohama November 9-12, 2015
Novel Fast and Scalable Parallel Union-Find ASIC Implementation For Real-Time Digital Image Segmentation Ehab Salahat, Hani Saleh, Andrzej Sluzek, Mahmoud Al-Qutayri, Baker Mohammad and Mohammad Ismail Department of Electrical and Computer Engineering, Khalifa University, Abu Dhabi, U.A.E.
[email protected]
Abstract— This paper presents a new fast and scalable Parallel Union-Find algorithm for image segmentation and its System-on-Chip (SoC) implementation using 65nm CMOS technology following the Application-Specific Integrated Circuit (ASIC) design flow. The algorithm is capable of labeling all foreground and background pixels, using the least possible pixels scanning. This contrasts the classical labeling algorithms that label only foreground (or background) pixels in a single run. The new algorithm utilizes only two memory blocks. In one memory block, it labels image segments using their seeds as the label and, simultaneously, the segments sizes are used as the other label in second memory block. By this parallel labeling, monitoring the image segments is very fast and efficient. With 350 MHz operating frequency, the processing rate estimated to be 2100 frames/sec, the total chip area of 15950.5 m2 (off-chip memory) and very low-power of 0.3 mW, the SoC tends to be an excellent candidate for mobile devices and real-time applications. Keywords—Parallel Labeling, Union-Find Algorithm, ASIC, SoC, Segmentation, Low-power, Real-time.
I. INTRODUCTION OMPUTER VISION is an active area of research and its applications seem limitless. One of the fundamental yet primary tasks in image processing is to extract and label the significant segments (i.e. regions) from an image [1] [2] [3]. For example, labeling is indispensable in almost all image-related applications, such as face identification, fingerprint and character recognition, target recognition, medical image analysis, computer-aided diagnosis and objectoriented classification [4] [5] [6] [7] [8]. The problem of labeling, i.e. identifying connected components in an image, can be considered as a transformation of the image, which originally consists of the pixels belonging to either the foreground pixels set , or to the background pixels set ℬ. The new transformed image contains a unique label for each segment in the source image and all pixels from the same segment are assigned that particular label [9]. The considered pixels adjacency (connectivity) could be 4- or 8-pixels adjacency for (and correspondingly 8- and 4-pixels adjacency for ℬ) depending upon the application. Nevertheless, labeling is one of the most time-consuming image operation as compared to other fundamental operations (e.g. noise reduction, thresholding, interpolation, etc.) and therefore it is considered to be one of the major “bottlenecks” in the entire processing.
C
and the label equivalences are stored in a table array. During the second scan, label equivalences are resolved by using other techniques. (3) Hybrid algorithm: like multi-scan algorithms, this algorithm scans an image in forward and backward raster directions alternately, but label equivalences are solved using the same techniques as those of the two-scan algorithms. (4) Contour-tracing algorithms: they are algorithms that avoid the label equivalences analysis by tracing the contours of objects. There are many applications in which labeling algorithms need to be run twice to label both foreground and background. For example, the Maximally Stable Extremal Regions (MSER) algorithm [11] [12] uses the foreground pixels and the background pixels to detect so-called bright and dark MSERs. Running labeling algorithms twice is time consuming as it significantly reduces the processing rate (frames/second) and is power-inefficient. In this paper, we present a novel solution for simultaneous labeling of the foreground and background pixels, the Parallel Union-Find algorithm. The new algorithm is capable of labeling all the image segments in parallel, i.e. in the same run. The new algorithm requires only two memory blocks of the same size as the image, and assigns two different labels (the seed and the size of each segment) to easily manipulate the segments. The algorithm is implemented as an SoC using 65nm CMOS technology and ASIC design flow, resulting in a very memory- and power-efficient chip, with a power consumption of 0.3 mW, small area of 15950 μm2 (off-chip memory), operating frequency of 350 MHz and 2100 frames/sec estimated processing rate for 256×256 images. The remaining part of this paper is structured as follows. In section II, the classical Union-Find algorithm is re-visited and the parallel Union-Find algorithm is introduced. The SoC microarchitecture is analyzed in section in III. Finally, the paper findings and contributions are summarized in sect. IV. II. THE PARALLEL UNION-FIND ALGORITHM A. Classical Union-Find Algorithm The Union-Find algorithm is a segmentation algorithm that has been widely used in numerous applications and for processing binary images [13]. The algorithm initially assumes that each pixel of value 1 (foreground pixel) is a singleton (i.e. not connected to any pixel). This is shown in Fig. 1.
Many algorithms have been proposed to address labeling. Suzuki [10] categorized them into four classes. (1) Multi-scan algorithms: algorithms that scan an image in forward and backward raster directions alternately to propagate the label equivalences until no label changes. (2) Two-scan algorithms, where algorithms that complete labeling in two scans: during the first scan, provisional labels are assigned to object pixels,
978-1-4799-1762-4/15/$31.00 ©2015 IEEE
003122
1
0
1
0
1
1
0
4
0
7
1
0
0
0
1
2
0
0
0
8
0
1
1
1
0
0
3
5
6
0
7
(b) 8
(a) 1
2
3
4
5
6
(c) Fig. 1: (a) binary image, (b) initial labeling, (c) and the singletons.
Subsequently, the algorithm scans all singletons to find the pixels that each singleton is connected to, i.e. its connectivity, and builds up their component tree (sets), where the first detected singleton that is found to be disjoint from the previously detected ones is assigned to be a root of that tree. Two variants exist when searching for the pixel’s connectivity, as shown in Fig. 2.
(a) (b) Fig. 2: Two variants of connectivity (a) 4-pixel, and (b) 8-pixel.
At the end, all pixels are grouped into separate trees based on their connectivity. Fig. 3 and 4 illustrate how the Union Find algorithm works for the two variants. 1
0
4
0
7
1
0
4
0
2
0
0
0
8
1
0
0
0
7
0
3
5
6
0
0
3
3
3
0
3
2
4
5
8
1
0
4
0
7
1
0
4
0
1
2
0
0
0
8
1
0
0
0
1
0
3
5
6
0
0
1
1
1
0
(a)
2
3
5
4 6
8
7
(b) Fig. 4: Union-Find results using 8-connectivity in (a) and their trees in (b)
Sample labeling results for the binary image, given in Fig. 5 (a), are shown in Fig.5 (b) and (c) for the two connectivity variants, where the different colors indicate different labels.
(a) (b) (c) Fig. 5: (a) Binary image, its (b) 4- and (c) 8-connectivity labeling.
B. The Parallel Union-Find Algorithm The main difference between the classical Union-Find and the Parallel Union-Find algorithm is that the later labels all pixels (foreground and background pixels) at the same time. This is unlike the classical Union-Find (used for example in the MSER detector [11]) that labels only white pixels ignoring the black ones, as shown in Fig. 3 and Fig. 4 (or the other way around). Nevertheless, all labels used in the image are unique. The Parallel Union-Find algorithm works as follows. First, it initializes two matrices of M×N size (the size of an image). The first one is called the ℑ matrix, and is given as: 1 +1 2 +2 ℑ = ⋮ ⋮ 2
The values stored in the ℑ and ℛ matrices are updated as the algorithm scans for connectivity. Henceforth, 4-pixels connectivity is assumed. For the illustration purpose, we will use the binary image in Fig. 6.
The Parallel Union-Find algorithm, after the aforementioned initialization, consists of the following two main parts (refer to Fig. 7 and Fig. 8 for illustrations):
(b) Fig. 3: Union-Find results using 4-connectivity in (a) and their trees in (b)
1
(2)
Fig. 6: Sample Binary Image.
7
6
1 … 1 ℛ = ⋮ ⋱ ⋮ . 1 … 1 ×
7
(a) 1
which effectively assigns unique labels to all pixels (of both 0 or 1 values) under the same initial assumption, as that of the classical Union-Find, i.e. all pixels are singletons, and the label values are spread from 1 to M×N. The second matrix defines the size of each region (segment), and since initially all pixels are considered singletons, this matrix, ℛ, of region sizes is given as:
… ( − 1) ⋱ ( − 1) + 1 , ⋱ ⋮ …
×
(1)
a) Find: Used to find the roots (seeds) of each region, and then the corresponding seed tree. The image is scanned horizontally and then vertically, or vice versa, where each pixel in the binary image is compared to the adjacent pixels. It can be shown that ( − 1) + memory accesses are required, that is, less than two complete scans. The first ( − 1) accesses assign initial horizontal labels for the image, which can be thought of as detecting the horizontal lines of the regions, as shown in Fig. 7(a). The remaining accesses spread the horizontal labels vertically, and hence detect the vertical lines of the regions, as shown in Fig. 7(b). In both cases, if a pixel P has the same intensity as one of its adjacent pixels that were scanned before, the pixel P will inherit the value of the seed, call it r, from ℑ matrix at that previously scanned pixel. The value in ℛ matrix at r location will be incremented by the count value of that seed label at P. b) Merge: After this stage, ℑ will be labeled by the region seed value r, i.e., the root of a specific region becomes its label. Similarly, ℛ is marked by the region size, i.e. the count of pixels at the seed pixel. This can be achieved in a straightforward manner by using the (3) and (4). ℑ() = ℑℑ(), ℛ() = ℛℑ().
(3) (4)
The stages of the Parallel Union-Find labeling are illustrated in Fig.7(a)–(c) and Fig.8 (a)–(c) for the ℑ and ℛ memories respectively. The ℑ labels are unique, while ℛ labels are not (i.e. two different regions can have the same region size; however, each has its unique root). Following a similar approach, 3-dimensional matrix labeling can be achieved by dividing it into 2-dimensional “slices”, as shown in Fig. 9, and carrying the same procedure as describe earlier. Sample Parallel Union-Find labeling results for two different test images, assuming 4-pixels connectivity, are illustrated.
003123
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 18 19 4 5 6 7 8 9 10 11 12 13 30 31 16
1 18 19 4 5 6 7 8 9 10 11 12 13 30 31 16
1 50 51 4 5 6 7 8 9 10 11 12 13 62 63 16
65 66 67 68 69 70 71 72 9 10 11 12 77 78 79 80
65 66 67 68 69 86 87 72 9 10 11 12 77 78 79 80
65 66 67 68 69 86 87 72 9 10 11 12 77 78 79 80
1 1 1 1 5 5 5 5 5 5 5 5 13 13 13 13
65 65 65 65 65 193 193 193 193 66 66 66 66 66 194 210 210 242 67 67 67 67 67 195 211 211 243 68 68 68 68 68 196 196 196 196 69 133 133 133 133 133 133 133 133 118 134 134 134 134 134 134 134 134 119 135 135 135 135 135 135 135 135 72 136 136 136 136 136 136 136 136 9 137 137 137 137 201 201 201 201 10 138 154 154 186 202 202 202 202 11 139 155 155 187 203 203 203 203 12 140 140 140 140 204 204 204 204 77 77 77 77 77 205 205 205 205 78 78 78 78 78 206 222 222 254 79 79 79 79 79 207 223 223 255 80 80 80 80 80 208 208 208 208
1 18 18 4 5 6 7 8 9 10 11 12 13 30 30 16
1 18 19 4 5 6 7 8 9 10 11 12 13 30 31 16
1 1 1 4 5 6 7 8 9 10 11 12 13 13 13 16
65 65 65 65 69 69 69 69 9 10 11 12 5 5 5 5
65 66 67 68 69 86 86 72 9 10 11 12 77 78 79 80
65 66 67 68 69 86 87 72 9 10 11 12 77 78 79 80
(a)
65 66 67 68 69 69 69 72 9 10 11 12 77 78 79 80
65 65 65 65 193 193 193 193 66 66 66 66 193 210 210 193 67 67 67 67 193 210 211 193 68 68 68 68 193 196 196 196 65 133 133 133 133 133 133 133 65 134 134 134 134 134 134 134 65 135 135 135 135 135 135 135 65 136 136 136 136 136 136 136 137 137 137 137 65 201 201 201 137 154 154 137 65 202 202 202 137 154 155 137 65 203 203 203 137 140 140 140 65 204 204 204 77 77 77 77 205 205 205 205 78 78 78 78 205 222 222 205 79 79 79 79 205 222 223 205 80 80 80 80 205 208 208 208
1 1 1 1 5 5 5 5 5 5 5 5 13 13 13 13
1 18 18 1 5 5 5 5 5 5 5 5 13 30 30 13
1 18 18 1 5 5 5 5 5 5 5 5 13 30 30 13
1 1 1 1 5 5 5 5 5 5 5 5 13 13 13 13
65 65 65 65 69 69 69 69 5 5 5 5 5 5 5 5
65 65 65 65 69 86 86 69 5 5 5 5 5 5 5 5
65 65 65 65 69 86 86 69 5 5 5 5 5 5 5 5
(b)
65 65 65 65 69 69 69 69 5 5 5 5 5 5 5 5
65 65 65 65 193 193 193 193 65 65 65 65 193 210 210 193 65 65 65 65 193 210 210 193 65 65 65 65 193 193 193 193 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 137 137 137 137 65 65 65 65 137 154 154 137 65 65 65 65 137 154 154 137 65 65 65 65 137 137 137 137 65 65 65 65 5 5 5 5 205 205 205 205 5 5 5 5 205 222 222 205 5 5 5 5 205 222 222 205 5 5 5 5 205 205 205 205
(c)
Fig. 7: ℑ memory block (a) after ( − 1) accesses, (b) after ( − 1) + accesses, and (c) its final labels. 4 1 1 4 4 4 4 4 8 8 8 8 4 1 1 4
1 2 2 1 1 1 1 1 1 1 1 1 1 2 2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
8 8 8 8 4 1 1 4 1 1 1 1 8 8 8 8
1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 8 8 8 8 4 1 1 4 1 1 1 1
1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4 1 1 4 1 1 1 1 4 4 4 4 4 1 1 4
1 2 2 1 1 1 1 1 1 1 1 1 1 2 2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
12 1 1 4 80 4 4 4 8 8 8 8 12 1 1 4
1 4 2 1 1 1 1 1 1 1 1 1 1 4 2 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 80 1 1 8 1 1 8 1 1 8 1 1 12 1 1 1 4 1 1 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 1 1 8 1 1 8 1 1 8 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
(a)
1 1 1 1 1 1 1 1 1 1 1 1 1 8 1 1 8 1 1 8 1 1 8 1 1 12 1 1 1 4 1 1 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 12 1 1 1 4 1 1 2 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 4 1 1 4 1 1 4 1 1 12 1 1 1 4 1 1 2 1 4 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
12 12 12 12 80 80 80 80 80 80 80 80 12 12 12 12
12 4 4 12 80 80 80 80 80 80 80 80 12 4 4 12
12 4 4 12 80 80 80 80 80 80 80 80 12 4 4 12
12 12 12 12 80 80 80 80 80 80 80 80 12 12 12 12
80 80 80 80 12 12 12 12 80 80 80 80 80 80 80 80
80 80 80 80 12 4 4 12 80 80 80 80 80 80 80 80
(b)
80 80 80 80 12 4 4 12 80 80 80 80 80 80 80 80
80 80 80 80 12 12 12 12 80 80 80 80 80 80 80 80
80 80 80 80 80 80 80 80 12 12 12 12 80 80 80 80
80 80 80 80 80 80 80 80 12 4 4 12 80 80 80 80
80 80 80 80 80 80 80 80 12 4 4 12 80 80 80 80
80 80 80 80 80 80 80 80 12 12 12 12 80 80 80 80
12 12 12 12 80 80 80 80 80 80 80 80 12 12 12 12
12 4 4 12 80 80 80 80 80 80 80 80 12 4 4 12
12 4 4 12 80 80 80 80 80 80 80 80 12 4 4 12
12 12 12 12 80 80 80 80 80 80 80 80 12 12 12 12
(c)
Fig. 8: ℛ memory block (a) after ( − 1) accesses, (b) after ( − 1) + accesses, and (c) its final labels.
Stable Extremal Regions (MSER) algorithm [14], as in [15], where it was proven therein that such parallel labeling technique can enhance the MSER’s performance significantly (nearly by twofold), doubling the processing rate. N K M
Fig. 9: 3-D to 2-D matrix decomposition.
The first test image is shown in Fig.10 (a), with its labeling results being illustrated in Fig. 10 (b). It can be clearly seen that each segments of test image is assigned a different and unique color, i.e. a unique label. As a further test, the image in Fig. 11(a) is thresholded using a threshold value of 100, resulting in the binary image shown in Fig. 11(b). The labeling result of running the Parallel Union-Find algorithm first, second and final stages on Fig.11 (b) are shown in Figs. 11 (c), (d) and (e), respectively, where it can be clearly seen that segments of black as well as white pixels are assigned unique labels (colors).
III. SYSTEM-ON-CHIP MICROARCHITECTURE The microarchitecture of the SoC for the parallel Union-Find algorithm is shown in Fig. 12. The microarchitecture design and its processing follow exactly the aforementioned stages discussed earlier. Using the 65-nm CMOS technology, the SoC was synthesized, placed and routed. Fig. 13 illustrates the final chip-finished SoC details of the Parallel Union-Find algorithm. While R1≠ ℐ (R1) or R2 ≠ ℐ (R2)
Assign R1, R2
YES NO R1?=R2
N1=ℛ(R1) N2=ℛ(R2)
R1= ℐ (R1); R2= ℐ (R2)
NO Memory (M,N)
Counter Memory ℐ (M,N) = 1:( − 1) +
Memory ℛ(M,N)
(R1)?=(R2) YES
ℐ(R1)=R2 ℛ(R2)=N1+N2
NO N1≥N2 YES
ℐ(R2)=R1 ℛ(R1)=N1+N2
Fig. 12: Parallel Union-Find Algorithm SoC microarchitecture.
(a) (b) Fig. 10: (a) Logo binary test image, and its labeled version.
The proposed parallel labeling algorithm has many promising applications, in which simultaneous labeling of both black and white binary segments is of importance. For example, the proposed labeling technique was used with the Maximally
003124
Fig. 13: Final placed and routed Parallel Union-Find Algorithm.
(a) (b) (c) (d) (e) Fig. 11: (a) Butterfly test image, its (b) binary image (thresholded by a value of 100), (c) first, (d) second, and (e) final labeling stages results
Table I summarizes the SoC specifications and compare them with the closest recently published implementations of image labeling algorithms, mainly in terms of the operating speed (frequency), processing rate (in frames/second), and testing resolutions. From Table I, one can clearly see that our SoC frequency of 350 MHz is much higher than any of the cited designs. Similarly, the estimated processing rate of 2100 fps of our SoC is much higher than the processing rates reported in the cited at the respective testing resolutions. This high processing rate is a proof of the suitability of the proposed SoC for real-time applications. Furthermore, our worst-case estimated memory requirement (in bytes) is given by: (5) Memory Requirement= ⌈log !(ℛ)⌉ℛ/4 , where ⌈ ∙ ⌉ is the ceiling function and ℛ denotes the image resolution, and is much smaller than the memory requirement reported in [16] which is given by: $%&'* = [ℛ + MaxRuns × 2 log ! (max(M, N)) + 3 log !(MaxRuns)]/8, (6) where M, N and MaxRuns denote the width, length, and a design parameter, respectively. Finally, our chip area is only 15950.5 -m2 and the required power is 0.301 mW only, which are low values suitable for devices with limited power sources. TABLE I: PARALLEL UNION -FIND SOC VS. OTHER LABLEING ALGORITHMS IMPLEMENTATIONS – PERFORMANCE C OMPARISON. Performance Metric Parallel Labeling Maximum Resolution Testing Resolution (ℛ) Memory Storage (bytes) Frame Rate (fps) Operating Frequency Total Power Total Area
This Work Yes Scalable 65,536 ⌈log ! (ℛ)⌉ℛ/4 2100 (estimated) 350 MHz 0.3011 mW 15950.5 -m2
[17] No 480,000 76,800 N/A 1200 100 MHz N/A N/A
[16] No 480,000 307,200 See (6) 34 43 MHz N/A N/A
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, May, 2013. [3] C. Grana, D. Borghesani and R. Cucchiara, "Fast Block Based Connected Components Labeling," in IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, November, 2009. [4] F. Qin, J. Guo and F. Lang, "Superpixel Segmentation for Polarimetric SAR Imagery Using Local Iterative Clustering," IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 1, pp. 13-17, January, 2015. [5] C. Grana, D. Borghesani and R. Cucchiara, "Optimized Block-Based Connected Components Labeling With Decision Trees," IEEE Trans. on Image Processing, vol. 19, no. 6, pp. 1596-1609, June, 2010. [6] L. He, Y. Chao and K. Suzuki, "A Run-Based Two-Scan Labeling Algorithm," IEEE Trans. on Image Processing, vol. 17, no. 5, pp. 749756, May, 2008. [7] L. He, Y. Chao and K. Suzuki, "A Run-Based One-And-A-Half-Scan Connected-Component Labeling Algorithm," International Journal of Pattern Recog. and Artificial Intellig., vol. 24, no. 4, pp. 557-579, 2010. [8] L. He, Y. Chao and K. Suzuki, "Two Efficient Label-EquivalenceBased Connected-Component Labeling Algorithms for 3-D Binary Images," IEEE Trans. on Image Processing, vol. 20, no. 8, pp. 21222134, August, 2011. [9] N. Ma, D. G. Bailey and C. T. Johnston, "Optimised Single Pass Connected Components Analysis," in International Conference on ICECE Technology, Taipei, December, 2008. [10] K. Suzuki, I. Horiba and N. Sugie, "Linear-Time Connected Component Labeling Based on Sequential Local Operations," Computer Vision and Image Understand., vol. 89, no. 1, pp. 1-23, 2003. [11] J. Matas, O. Chum, M.Urban and T. Pajdla, "Robust Wide Baseline Stereo From Maximally Stable Extremal Regions," in 13th British Machine Vision Conference, Cardiff, 2002.
[18] No N/A 243,600 N/A 575 140 MHz N/A N/A
[12] E. Salahat, H. Saleh, S. Salahat, A. Sluzek, M. Al-Qutayri, B. Mohammed and M. Ismail, "Extended MSER Detection," in The IEEE International Symposium on Industrial Electronics, Rio de Janeiro, Brazil, 3-5 June 2015. [13] R. Sedgewick, Algorithms, Addison-Wesley, 19 March, 2011.
IV. CONCLUSION In this paper, a new fast and scalable Parallel Union-Find algorithm for image segmentation and its SoC implementation are presented. The algorithm can label both foreground and background pixels, using the least possible pixels scanning. By using parallel labeling and monitoring the image segments, it is very fast and efficient. Due to the adopted ASIC design flow, the SoC has a 350 MHz operating frequency, 2100 fps estimated processing rate, a small area of 15950.5 μm2 (offchip memory) and is very low-power as it requires only 0.3 mW. Hence, the algorithm and its SoC tend to be excellent candidates for mobile devices and real-time applications. REFERENCES [1] A. Sluzek, Local Detection and Identification of Visual Data: Selected Techniques and Applications, LAP, Sept, 2013. [2] D. J. C. Santiago, T. I. Ren, G. D. C. Cavalcant and T. I. Jyh, "Fast Block-Based Algorithms For Connected Components Labeling," in
003125
[14] E. Salahat, H. Saleh, A. Sluzek, M. Al-Qutayri, B. Mohammad and M. Ismail, "A Maximally Stable Extremal Regions System-on-Chip For Real-Time Visual Surveillance," in Annual Conference of the IEEE Industrial Electronics Society, Yokohama, Japan, 9 -12 November, 2015. [15] E. Salahat, H. Saleh, A. Sluzek, M. Al-Qutayri, B. Mohammad and M. Ismail, "Architecture and Method for Real-Time Parallel Detection and Extraction of Maximally Stable Extremal Regions (MSERs)". U.S. Patent (Pending). [16] K. Appiah, A. Hunter, P. Dickinson and J. Owens, "A Run-Length Based Connected Component Algorithm for FPGA Implementation," in International Conference on ICECE Technology, Taipei, Dec., 2008. [17] V. S. Kumar, K. Irick, A. Al-Maashri and N. Vijaykrishnan, "A Scalable Bandwidth Aware Architecture for Connected Component Labeling," in Annual Symposium on VLSI, Kefalonia, July, 2010. [18] H. Flatt, S. Blume, S. Hesselbarth, T. Schunemann and P. Pirsch, "A Parallel Hardware Architecture for Connected Component Labeling Based on Fast Label Merging," in Int. Conf. on Application-Specific Systems, Architectures and Processors, Leuven, July, 2008.