GPU-Accelerated Foreground Segmentation and ... - Semantic Scholar

sustainability Article

GPU-Accelerated Foreground Segmentation and Labeling for Real-Time Video Surveillance Wei Song 1,2, *, Yifei Tian 1 , Simon Fong 3 , Kyungeun Cho 4, *, Wei Wang 5 and Weiqiang Zhang 4 1 2 3 4 5

*

Department of Digital Media Technology, North China University of Technology, Beijing 100144, China; [email protected] Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, North China University of Technology, Beijing 100144, China Department of Computer and Information Science, University of Macau, Macau, China; [email protected] Department of Multimedia Engineering, Dongguk University, Seoul 04620, Korea; [email protected] Guangdong Electronic Industry Institute, Dongguan 523808, China; [email protected] Correspondence: [email protected] (W.S.); [email protected] (K.C.); Tel.: +86-10-8880-1991 (W.S.); +82-10-9070-9342 (K.C.)

Academic Editor: James Park Received: 30 May 2016; Accepted: 22 August 2016; Published: 29 September 2016

Abstract: Real-time and accurate background modeling is an important researching topic in the fields of remote monitoring and video surveillance. Meanwhile, effective foreground detection is a preliminary requirement and decision-making basis for sustainable energy management, especially in smart meters. The environment monitoring results provide a decision-making basis for energy-saving strategies. For real-time moving object detection in video, this paper applies a parallel computing technology to develop a feedback foreground–background segmentation method and a parallel connected component labeling (PCCL) algorithm. In the background modeling method, pixel-wise color histograms in graphics processing unit (GPU) memory is generated from sequential images. If a pixel color in the current image does not locate around the peaks of its histogram, it is segmented as a foreground pixel. From the foreground segmentation results, a PCCL algorithm is proposed to cluster the foreground pixels into several groups in order to distinguish separate blobs. Because the noisy spot and sparkle in the foreground segmentation results always contain a small quantity of pixels, the small blobs are removed as noise in order to refine the segmentation results. The proposed GPU-based image processing algorithms are implemented using the compute unified device architecture (CUDA) toolkit. The testing results show a significant enhancement in both speed and accuracy. Keywords: feedback background modeling; connected component labeling; parallel computation; video surveillance; sustainable energy management

1. Introduction In the sustainable energy-saving strategies, smart meters provide an effective power reduction support by controlling electricity with intelligent functions [1]. Figure 1 shows the architecture of a sustainable energy management system consisting of three modules, which are monitoring, ubiquitous controlling, and power management. The monitoring module aims to sense the environmental information, including temperature, humidity, foreground objects, and other datasets. These sensing datasets are converted into several condition variables for evaluating an environment situation, which are then transmitted to smart meters via a low-cost and wireless sensor network based on ZigBee communication protocol [2]. Based on the received condition variables, the ubiquitous

Sustainability 2016, 8, 916; doi:10.3390/su8100916

www.mdpi.com/journal/sustainability

Sustainability 2016, 8, 916

2 of 20

controlling module achieves an auto-controlling approach for energy management using smart meters. Sustainability 8, 916meters enable remote operation of switching and setting values through 2 of 20cloud Meanwhile, the2016, smart computing and controlling. In the power management module, the power socket of an appliance is smart meters. Meanwhile, the smart meters enable remote operation of switching and setting values linked with acloud smart meter, whose controlling signals able to control thetheconnected appliance, through computing and controlling. In the powerare management module, power socket of such an asappliance changing temperature of air condition, turning on or turning off lights, heating is linked with a smart meter, whose controlling signals are able to control the connectedwater, and other activities. appliance, such as changing temperature of air condition, turning on or turning off lights, heating In such a sustainable energy management system, the real-time foreground objects detection water, and other activities. In such a sustainable management the real-time foreground detection in in the monitoring module isenergy an important tasksystem, for smart controlling [3]. Forobjects example, in a garage, the monitoring module is an important task for smart controlling [3]. For example, in a garage, when when a moving vehicle appears, the nearby lightings need to be turned on; if no foreground object vehicleshould appears, nearby be turnedTo on;provide if no foreground object exists, exists,a moving the lightings bethe turned offlightings so as to need save to electricity. an effective environment the lightings should be turned off so as to save electricity. To provide an effective environment monitoring approach, this paper aims to study a real-time and accurate foreground object detection monitoring approach, this paper aims to study a real-time and accurate foreground object detection algorithm for sustainable energy management. Meanwhile, fast and accurate foreground segmentation algorithm for sustainable energy management. Meanwhile, fast and accurate foreground for video surveillance is a challenging image processing and processing computer vision [4]. Thisvision approach segmentation for video surveillancetask is a in challenging task in image and computer is also[4]. necessary for many applications, such as virtual reality, human-machine interaction, This approach is alsomultimedia necessary for many multimedia applications, such as virtual reality, humanand mixed reality [5]. machine interaction, and mixed reality [5].

Figure 1. The framework of the typical sustainable energy management system. Figure 1. The framework of the typical sustainable energy management system.

Precise background subtraction is an important step for accurate foreground segmentation.

Precise background subtraction an important step for accurate segmentation. Traditionally, a background model isisinitialized from the sequential imagesforeground captured by stationary Traditionally, a background model is initialized from theasequential images captured stationary cameras [6]. The foreground pixels are subtracted with pixel-wise difference detectionby method cameras [6]. the Thecurrent foreground subtracted with a pixel-wise difference method between image pixels and theare generated background model. To improve thedetection segmentation accuracy, the adaptive background modeling methods are widely researched for situations where between the current image and the generated background model. To improve the segmentation illumination is gradually changing, such as the codebook [7].researched Because thefor image scanning accuracy, the adaptive background modeling methods aremodel widely situations where process causes huge computation, the traditional background model, as codebook model, is illumination is gradually changing, such as the codebook model [7].such Because the image scanning initialized in the training phase, but not updated with the segmentation process simultaneously. process causes huge computation, the traditional background model, such as codebook model, Although simple spatiotemporal background modeling strategies enable foreground detection is initialized in the training phase, but not updated with the segmentation process simultaneously. without prior knowledgebase, noise and hole regions always exist in the segmentation results [8]. Although spatiotemporal background modeling strategies enable foreground detection To enhancesimple the foreground segmentation accuracy, some advanced technologies of background without prior knowledgebase, noise and hole regions always exist in the segmentation results modeling algorithms are studied, such as adaptive background learning, Gaussian mixture model [8]. To enhance foreground segmentation accuracy, some to advanced technologies of background (GMM), the self-organizing map, etc. [9]. However, it is difficult segment foreground moving objects modeling algorithms are studied,outdoor such asenvironments adaptive background learning, model without noise in unconstrained in rainy, cloudy, foggy,Gaussian and windymixture situations. Dueself-organizing to these weathermap, issues, illumination variance an air or particle causes amoving short-term (GMM), etc. [9]. However, it is of difficult to object segment foreground objects without noise in unconstrained outdoor environments in rainy, cloudy, foggy, and windy situations. Due to these weather issues, illumination variance of an air or object particle causes a short-term


3 of 20

noisy spot. Using the traditional algorithms, it is hard to determine whether the noisy spot belongs to background or foreground models. The noisy spot is a blob of a small quantity of pixels in the segmented binary foreground result. Based on this definition, it is an effective solution to count pixels in each blob and remove the small blobs as noise. The connectivity-based clustering methods are widely researched to group the foreground pixels into several coherent blobs in spatial domain [10]. By labeling blobs with distinguished labels, we produce distinct objects for further processing steps such as objects tracking and analysis [11]. In the rough foreground segmentation process, the computation complexity of pixel-wise difference detection is so easy that foreground pixels can be subtracted fast using CPU programming. However, in the refinement process, the iterative scanning processes of the connected component detection, labeling and noise removal cause huge computation consumption. It is difficult to utilize the traditional CPU computation method for real-time connected component labeling (CCL). To further improve the processing speed, this paper proposes to utilize a graphics processing unit (GPU) programming method instead of CPU to speed up the feedback background modeling, foreground segmentation, and CCL algorithms. Using GPU programming, we create a thread for each grid in order to analyze its local dataset in parallel [12]. In pixel-wise foreground segmentation algorithms, the image processing in each pixel is independent from others so that the traditional CPU-based algorithm is easy to be implemented [13]. For each pixel, we create a color histogram to record its color-changing information, which combines into a background model. If a pixel color in the current image does not locate around the peaks of its histogram, it is segmented as a foreground pixel. Meanwhile, the current image updates the background model synchronously. In CCL algorithms, the image processing of a pixel is affected by its neighbor pixels [14]. When we apply parallel computation to process such dependent situations, a thread event on the neighbor grids keeps on being processed no matter that the processing results executed by the threads of the neighboring pixels [15]. Because the GPU threads do not process at the same speed, the parallel computing results are not certainly same for the dependent situations. For example, in CCL algorithm, we need to label a clique with a minimum value among them, while all labels update based on their neighboring labels simultaneously. If a thread of a neighbor pixel is faster than others, its label will be updated earlier among its local clique so as to cause uncertain labeling results. If the pixel thread is locked until its neighboring computations are finished, there is no difference between CPU and GPU processing methods. For CCL acceleration, we develop a novel parallel connected component labeling (PCCL) algorithm to speed up the image processing of the dependent situation. After the foreground pixels are clustered into several blobs, the pixel count of each blob is computed. The noise blob is assumed to have small quantity of pixels so as to be removed based on the PCCL results. The main contributions of the proposed PCCL method consist of accurate foreground segmentation and real-time component labeling. The GPU-based feedback background modeling algorithm is able to update the background model of pixel-wise color histogram in real time. The PCCL algorithm increases the labeling speed for component labeling in high-resolution videos. Our proposed method is compatible with real-time remote monitoring, virtual-physical collaboration, sensor network, and other multimedia applications. The remainder of this paper is organized as follows. Section 2 provides an overview of related work. Section 3 introduces the proposed feedback background modeling method and the PCCL algorithm. Section 4 evaluates the performance of the proposed methods. Section 5 concludes the paper. 2. Related Works Currently, feedback background modeling algorithms are widely researched for detection of moving objects. Cuevas et al. [16] presented a nonparametric background modeling strategy with a particle filter for lightweight smart camera applications. In this strategy, the probability density


4 of 20

functions based on Gaussian kernels were defined at each pixel to nonparametrically estimate whether it belonged to the image background or moving foreground regions. Ramasamy et al. [17] and Casares et al. [18] employed a temporal difference detection method to generate adaptive background models in dynamic scenes for surveillance enhancement. By detecting the difference between the consecutive frames, the foreground regions of interest were removed. The background model was modeled by the segmented background image frames. Azmat at al. [19] proposed a low-cost adaptive background modeling method by analyzing temporal scene events including observation frequency and longevity. Using such pixel-wise threshold and frame difference methods, there were noise and holes existing in the segmentation results of the complex scenes with cluttered objects. Gallego et al. [20] proposed a foreground segmentation system with a tracking algorithm in camera capture videos. In this system, pixel-wise difference detection between current image and background model was implemented in the initialization phase. The background modeling method modified the mean-shift algorithm with a feedback background model, which enhances the foreground detection accuracy. Zamalieva et al. [21] introduced a background subtraction method in complicated scenes captured at dynamic camera viewports. The background geometric structure was constructed and the foreground pixels were detected from a series of one-direction conversion between the frames. In these methods, the feedback background modeling phases causes huge computation consumption for high-resolution video sequences. Currently, the traditional data analysis algorithms are enhanced using GPU technologies [22], which enable parallel computation of large-scale datasets. The compute unified device architecture (CUDA) from Nvidia (Nvidia, Santa Clara, CA, USA) provides GPU programming interfaces, which is widely studied for pixel-wise background modeling and foreground segmentation [23]. To extend the typical CPU-based GMM background modeling method [24], Pham et al. [25] and Boghdady et al. [26] utilized CUDA libraries to implement the background modeling process in parallel. They created a pixel-wise probability distribution of intensity, which was updated by each allocated pixel thread in parallel. To remove shadow effect under intensity changes, Fukui et al. [27] recorded illumination intensity and RGB (red, green, blue) color information into a histogram table as an adaptive background model. The intensity and color threshold of each pixel were generated from their distribution stored in the corresponding histogram. The histogram buffer was allocated and updated in GPU memory so as to record each pixel’s information in parallel. Although the background modeling and foreground segmenting processes were able to be implemented synchronously using these methods, noise and holes existed in the foreground segmentation results. To realize real-time and accurate foreground–background segmentation, Griesser et al. [28] applied a pixel-shaders GPU programming technology to accelerate the iterative image scanning speed. The noise was removed and holes were filled in the rough segmentation results using an iterative Markov random field model. Cheng et al. [29] proposed an online learning framework for foreground segmentation to adapt for spatio and temporal changes of the background scene. For that their proposed algorithms were compatible with parallel computation formalism, they utilized a GPU programming to facilitate real-time foreground segmentation from dynamic scenes over time. These GPU-based foreground segmentation methods achieved real-time approaches, but the noise still existed in difficult scenarios such as foggy and maritime environments. Connectivity-based clustering methods are studied for distinguishing and tracking objects so as to segment foreground without noisy speckle. Wang et al. [30] applied an adaptive connected component analysis method to remove the small spots of incorrect foreground pixels from the segmentation results of visual background extractor (ViBe). Using a single-chip Field Programmable Gate Array (FPGA) for the segmentation of moving objects in a video, Appiah et al. [31] developed a novel feedback background updating method with a CCL algorithm. The moving foreground segmentation and labeling processes were implemented in parallel on the FPGA hardware platform. Using a morphological opening method, the noise was removed and the holes were filled. Jiang et al. [32] developed connectivity-based segmentation in 2D/3D sensor networks for geometric environment understanding. Flooding from the boundary nodes, the whole geometric cells were scanned to


5 of 20

construct a Reeb graph. If the neighbor cells were not mutex, they were merged into one component. The iterative scanning for counting the blobs pixel by pixel causes huge computational consumption. To speed up the connectivity detection in binary images, He et al. [33] enhanced the conventional two-scan CCL 2016, algorithm. On the first scan, all labels of a connected component were assigned Sustainability 8, 916 5 of 20 with the same temporary equivalent integer. On the second scan, the corresponding labels in a connected component. iterative scanning counting the blobs pixel pixel surveillance causes huge computational component wereThe assigned their uniqueforrepresentative labels. Forbyvisual of moving objects, consumption. To speed up the connectivity detection in binary images, He ettoal.reduce [33] enhanced the Hu et al. [34,35] proposed a fast 4-connected component labeling algorithm the influence of conventional two-scan CCL algorithm. On the first scan, all labels of a connected component were a dynamic environment, such as wave ripples. In each scan process of this algorithm, the adjacent pixels assigned with the same temporary equivalent integer. On the second scan, the corresponding labels of two rows were merged into an isolated block, which was labeled with the minimum label value in in a connected component were assigned their unique representative labels. For visual surveillance the block. After the foreground pixels were roughly subtracted by an illumination difference detection of moving objects, Hu et al. [34,35] proposed a fast 4-connected component labeling algorithm to method, thethe noise assumed as smallenvironment, isolated block removed. reduce influence of a dynamic suchwas as wave ripples. These In eachCPU-based scan processcomponent of this labeling algorithms were able to deal with the low-resolution video images, hard scan large algorithm, the adjacent pixels of two rows were merged into an isolated block, but which wastolabeled foreground blobs in real time. with the minimum label value in the block. After the foreground pixels were roughly subtracted by an illumination detection method, the noiseCCL assumed as small isolated block was removed. In contrast to difference these state-of-the-art sequential methods, parallelism and multithreading These accelerated CPU-based component labeling algorithms were able to deal with the low-resolution strategies the labeling speed in high-resolution videos. Typically, there video were two images, but hard to scan large foreground blobs in real time. propagation methods implemented in the CCL kernel, including directional pass and multi-way In contrast to these state-of-the-art sequential CCL methods, parallelism and multithreading pass. Using CUDA, Kalentev et al. [36] and Hashmi et al. [37] implemented iterative row–column strategies accelerated the labeling speed in high-resolution videos. Typically, there were two scanning on 2D binary grids. After initializing all non-zero element labels with their corresponding propagation methods implemented in the CCL kernel, including directional pass and multi-way pass. index, thisCUDA, algorithm propagated the Hashmi lowest label value along theiterative rows and the columns to label Using Kalentev et al. [36] and et al. [37] implemented row–column scanning all non-zero neighbors with this lowest value. Hawick et al. [38] imaplemented and analyzed on 2D binary grids. After initializing all non-zero element labels with their corresponding index, this the performance two-pass, one-pass, CCL forcolumns speeding up graph component algorithmofpropagated the lowest and labelmulti-pass value along the algorithms rows and the to label all non-zero labeling. Theywith allocated a thread forHawick each pixel, which propagated itsanalyzed adjacentthe pixel list and labeled neighbors this lowest value. et al. [38] imaplemented and performance of andlabel multi-pass algorithms for speeding up graph component labeling. themtwo-pass, with theone-pass, minimum value CCL of them. The propagation iterations were stopped and the They allocated a thread for until each pixel, which propagated its adjacent pixel list and labeled them with labeling process was finished all labels were not changed. In the directional propagation labeling the minimum label value of them. The propagation iterations were stopped and the labeling process algorithms, such as one-pass, two-pass, and row–column scanning, a GPU thread scanned a row and was finished until all labels were not changed. In the directional propagation labeling algorithms, a column at least twice. The first scanning was to search minimum label, and the second scanning was such as one-pass, two-pass, and row–column scanning, a GPU thread scanned a row and a column to label all grids in the row and column with the minimum label. This scanning method was effective at least twice. The first scanning was to search minimum label, and the second scanning was to label for low-resolution images. In contrast thread, the This processing speed of GPU thread was reduced all grids in the row and column withto theCPU minimum label. scanning method was effective for lowwhenresolution the computation became high. Thus, the speed row–column scanning algorithm of CCL images. Incomplexity contrast to CPU thread, the processing of GPU thread was reduced when caused overloadbecame for high-resolution From their algorithm experiments, thecaused multi-pass thecomputational computation complexity high. Thus, thesituations. row–column scanning of CCL algorithms provided the bestfor performance. In description of their multi-pass algorithms, each thread computational overload high-resolution situations. From their experiments, the multi-pass algorithms provided the best performance. In description of their multi-pass algorithms, each thread searched and labeled the corresponding local and neighbor pixels synchronously. Figure 2 shows searched and labeled the corresponding local and neighbor pixels synchronously. Figure 2 shows an example of two iterations of their method. To keep synchronous reading and updating labels,an a GPU example of two their method. To keep synchronous reading searching and updating a GPUlabel. thread executed itsiterations functionofand held on until every thread finished thelabels, minimum thread executed its function and held on until every thread finished searching the minimum label. Also, a GPU thread only labeled the local pixel with the minimum label, instead of propagating the Also, a GPU thread only labeled the local pixel with the minimum label, instead of propagating the minimum label to its neighbor pixels. Although this method kept the consistency of the CCL results, minimum label to its neighbor pixels. Although this method kept the consistency of the CCL results, a large number of iterations were datasets due bandwidth a large number of iterations wererequired requiredfor for large-scale large-scale datasets due to to thethe lowlow bandwidth of theof the labeling process. labeling process.

Figure 2. The multi-pass connected component labeling (CCL) algorithm proposed by Hawick et al. [38].

Figure 2. The multi-pass connected component labeling (CCL) algorithm proposed by Hawick et al. [38].

Sustainability 2016, 8, 916 Sustainability 2016, 8, 916

6 of 20 6 of 20

Using the thethreads threads synchronization function, the parallel computation speed was reduced, Using synchronization function, the parallel computation speed was reduced, because because thehad threads hadfor to all wait for allto threads finish. Meanwhile, the labeling processslow became slow the threads to wait threads finish.to Meanwhile, the labeling process became without without propagating the minimum label to neighbor pixels. If the GPU threads processed the propagating the minimum label to neighbor pixels. If the GPU threads processed the neighbor neighbor propagation without synchronization, the data-dependent issue in Figure 3 would arise, propagation without synchronization, the data-dependent issue in Figure 3 would arise, because the because the GPU threads executed with different speeds. Taking firstasfour pixels as an GPU threads executed with different speeds. Taking the first fourthe pixels an example, theexample, labeling the labeling process inslower threadthan 4 was slower 2, so result that the labeling result “1112” was process in thread 4 was thread 2, sothan that thread the labeling “1112” was covered by “1133”. covered by “1133”. characteristic This asynchronous characteristic of GPU threads caused times. uncertain iteration times. This asynchronous of GPU threads caused uncertain iteration

Figure 3. issue of the propagation process in theingraphics processing unit (GPU)Figure 3. The Thedata-dependent data-dependent issue of the propagation process the graphics processing unit based CCL algorithm. (GPU)-based CCL algorithm.

The GPU-based CCL algorithms should address the data-dependent issues. Cabaret et al. [39] The GPU-based CCL algorithms should address the data-dependent issues. Cabaret et al. [39] benchmarked the existing GPU-based CCL algorithms and concluded they were all multi-pass databenchmarked the existing GPU-based CCL algorithms and concluded they were all multi-pass independent approaches, which were not suitable for GPU acceleration. Instead of GPU data-independent approaches, which were not suitable for GPU acceleration. Instead of GPU programming technology, they utilized OpenMP to program a parallel version of CCL on multi-core programming technology, they utilized OpenMP to program a parallel version of CCL on multi-core processors. However, such multithread architecture of CCL was still inefficient to deal with dataprocessors. However, such multithread architecture of CCL was still inefficient to deal with dependent scanning process in parallel. To realize the neighbor propagation in parallel, this paper data-dependent scanning process in parallel. To realize the neighbor propagation in parallel, this paper presents a novel PCCL algorithm to increase the labeling speed for the data-dependent situations. presents a novel PCCL algorithm to increase the labeling speed for the data-dependent situations. 3. GPU-Accelerated Foreground Segmentation and Labeling 3. GPU-Accelerated Foreground Segmentation and Labeling This section describes the GPU-accelerated foreground segmentation and labeling methods This section describes the GPU-accelerated foreground segmentation and labeling methods using using a GPU programming technology. We propose a feedback background modeling method for a GPU programming technology. We propose a feedback background modeling method for foreground foreground segmentation from video sequences. From foreground pixels, a PCCL algorithm is segmentation from video sequences. From foreground pixels, a PCCL algorithm is developed to cluster developed to cluster and label them into several distinguish blobs. and label them into several distinguish blobs. 3.1. The The GPU-Accelerated GPU-Accelerated Framework Framework 3.1. In contrast contrast to to CPU-based CPU-based image image processing processing methods, methods, aa GPU-accelerated GPU-accelerated framework framework for for In foreground CCL CCL in in video video images images is is proposed proposed as as in in Figure Figure 4. 4. The The framework framework contains contains two two modules, modules, foreground including foreground segmentation and labeling. Using CUDA software development kits (SDKs), including foreground segmentation and labeling. Using CUDA software development kits (SDKs), we allocate (W × H − 1)/T + 1 blocks and T threads in each block in order to create W × H threads to we allocate (W × H − 1)/T + 1 blocks and T threads in each block in order to create W × H threads to compute the the proposed proposed image image processing processing algorithms algorithms on on each each pixel pixel in in parallel. parallel. Here, Here, W W and and H H are are the the compute width and height of the video image separately. width and height of the video image separately.


7 of 20 7 of 20

In the foreground segmentation module, we allocate the video image and the background model Sustainability 2016, 8, 916 of 20 buffers GPU memory. In the background model, the pixel-wise RGB and histograms are 7created for In theinforeground segmentation module, we allocate video image the background model recording the color changing distribution. The background model keeps updating based on new buffers inInGPU memory. In the background model, pixel-wise RGBimage histograms created for recording the foreground segmentation module, we allocate the video and theare background model registered video images. Meanwhile, the foreground pixels are segmented, if their colors do notvideo locate buffers in GPUdistribution. memory. In The the background RGB histograms are created for the color changing backgroundmodel, model pixel-wise keeps updating based on new registered around the peaks of their corresponding histograms. recording the colorthe changing distribution. background model updating new the images. Meanwhile, foreground pixels The are segmented, if theirkeeps colors do notbased locateonaround In their the CCL module, the labelsthe of foreground the segmented foreground initialized with their registered video images. Meanwhile, pixels are segmented,pixels if theirare colors do not locate peaks of corresponding histograms. around peaks of their corresponding histograms. corresponding indices. After a GPU thread searchesforeground the minimum label a pixel In the the CCL module, the labels of the segmented pixels are among initialized withand theirits In the pixels, CCL module, the labels of the segmented foreground pixels areThis initialized with their neighboring these pixels are assigned with the minimum label. propagation process corresponding indices. After a GPU thread searches the minimum label among a pixel and itsis corresponding indices. After GPUisthread searchesinthe minimum label among a pixel and its implemented iteractively untilathere no changing all labels. This way, the segmented foreground neighboring pixels, these pixels are assigned with the minimum label. This propagation process is neighboring pixels, these pixels are assigned with the minimum label. This propagation process is pixels are clustered into until several distinguished blobs. implemented iteractively there is no changing in all labels. This way, the segmented foreground implemented iteractively until there is no changing in all labels. This way, the segmented foreground

pixels are clustered into several pixels are clustered into severaldistinguished distinguished blobs. blobs.

Figure 4. The GPU-accelerated framework for foreground segmentation and labeling. Figure segmentation and labeling. Figure4.4.The TheGPU-accelerated GPU-acceleratedframework frameworkfor forforeground foreground segmentation and labeling.

3.2. Foreground Segmentation by a Feedback Background Modeling Method

3.2.Foreground ForegroundSegmentation Segmentationbybya Feedback a FeedbackBackground Background Modeling Method 3.2. Modeling Method

This section proposes a feedback background model using GPU programming technology for Thissection section proposesaAs afeedback feedback background modelusing usingGPU GPUprogramming programming technologyfor for This proposes background foreground segmentation. shown in Figure 5, thismodel model implements the background technology modeling foreground segmentation. Assynchronously. shownininFigure Figure5,5,this thismodel modelimplements implementsthe thebackground backgroundmodeling modeling foreground segmentation. As shown and segmentation processes

andsegmentation segmentationprocesses processessynchronously. synchronously. and

Figure 5. The dataflow of the feedback background modeling and foreground segmentation.

As shown in dataflow Figure 6,ofin GPU memory, we allocate pixel-wise RGB colorsegmentation. histograms and Figure 5.5.The feedback background modeling and Figure The dataflow ofthe the feedback background modeling andforeground foreground segmentation. background colors of the histograms for each pixel. To record the color changing history of red, green, and blue channels, H × W × 3 histograms are updating with the new registered color at the Asshown shownininFigure Figure6,6,ininGPU GPUmemory, memory,we weallocate allocatepixel-wise pixel-wiseRGB RGBcolor colorhistograms histogramsand and As corresponding pixels. Each histogram contains 256 integer buckets to record the color appearance backgroundcolors colorsof of the histograms for each pixel. ToTo record thethe color changing history of red, green, background histograms for each pixel. record color changing history of red, counts. When a color (r, g, b) at a pixel (x, y) is registered to the background model, the corresponding andhistogram blue channels, H ×HWi×is×W 3×histograms areare updating color atatthe the green, and blue 3 histograms updatingwith withthe the new new registered color Mi channels, at the bucket updated as follows:

correspondingpixels. pixels.Each Eachhistogram histogramcontains contains256 256integer integerbuckets bucketstotorecord recordthe thecolor colorappearance appearance corresponding counts.When Whena acolor color(r,(r,g,g,b)b)atata apixel pixel(x, (x,y)y)isisregistered registeredtotothe thebackground backgroundmodel, model,the thecorresponding corresponding counts. histogramMM i at thebucket bucketi is i isupdated updatedasasfollows: follows: histogram the i at


8 of 20


8 of 20

MWy+ x+ HWrM= MWy+M x + HWr +  11 Wy  x  HWr Wy  x  HWr MWy+ x+ HWgM +1 +257HW = MWy + x HWg+257HW MWy  x+ 1 Wy  x  HWg  257 HW  HWg  257 HW MWy+ x+ HWbM = M +2×257HW Wy+ x + HWb+2 ×1257HW + 1 M Wy  x  HWb  2257 HW

(1)

(1)

Wy  x  HWb  2257 HW

A pixel is classified to background class by determining whether its color is repeatedly appearing. A pixel is classified to background class by determining whether its color is repeatedly Duringappearing. the training stage, background are registered the peaksthe in peaks the pixel-wise During the the training stage, the colors background colors are around registered around in the histograms. From the histograms, background colors corresponding to the peaks stored pixel-wise histograms. From thethe histograms, the background colors corresponding to the are peaks are in the 256th buffer blocks. During the testing stage, if a pixel color in a new registered image locates stored in the 256th buffer blocks. During the testing stage, if a pixel color in a new registered image around peakshistograms, of its RGB histograms, it is determined as a background pixel. Otherwise, aroundlocates the peaks of the its RGB it is determined as a background pixel. Otherwise, this pixel this pixelasis asegmented as apixel. foreground Using GPU programming, the background training is segmented foreground Usingpixel. GPU programming, the background training and and testing stages are implemented so as to realize a synchronization approach. stages testing are implemented in parallel in so parallel as to realize a synchronization approach. int

R

G

0 1

0 1

H

255

256

...

...

256

...

W 255

B

0 1

255

256

Figure 6. The background model pixel-wisecolor color histograms histograms ininGPU memory. Figure 6. The background model ofof pixel-wise GPU memory.

3.3. PCCL Algorithm

3.3. PCCL Algorithm

This section proposes a PCCL algorithm to cluster the segmented foreground pixels into several

This section proposes a PCCL algorithmTotorealize cluster the segmented foreground pixels into several separate blobs using GPU programming. a parallel computation approach, a GPU thread separate blobs using GPU programming. To realize a parallel computation approach, a GPU thread is created to process the labeling and propagation processes for each foreground pixel, as shown as is createdthe tored process the labeling and7a. propagation fortoeach foreground color regions in Figure A thread hasprocesses an ID equal its processed pixelpixel, index.as shown as the A label in map L in Figure 7b is initialized from the foreground segmentation result. If a pixel i is red color regions Figure 7a. A thread has an ID equal to its processed pixel index. as La inforeground the labelfrom at i the is specified as L(i) = i; otherwise L(i) If = anull. Aclassified label map Figure 7bpixel, is initialized foreground segmentation result. pixel i In the connectivity detection stage, each foreground pixel propagates to its adjacent foreground pixels is classified as a foreground pixel, the label at i is specified as L(i) = i; otherwise L(i) = null. In the which are labeled with the minimum label value among them. In the typically CCL algorithm, connectivity detection stage, each foreground pixel propagates to its adjacent foreground pixels which a pixel (x, y) propagates its right and bottom pixels, as shown in Figure 8a. In our proposed PCCL, are labeled with the minimum label value among them. In the typically CCL algorithm, a pixel (x, y) we enlarge the propagation range for accelerating the convergence of the iterative scanning. As propagates its right and bottom pixels, as shown in Figure 8a. In our proposed PCCL, we enlarge the shown in Figure 8b,c, a neighboring clique for a pixel (x, y) is defined as the adjacent and local pixels propagation range foraaccelerating of the iterative scanning. As shown in Figure 8b,c, to the pixel with distance of d the (d ≥convergence 1). a neighboring clique for a pixel (x, y) is defined as and local pixels to the pixel with We illustrate the propagation in Figure 7c withthe d =adjacent 1. In the CPU-based sequential computing a distance of d (d 1). methods, the≥propagation process is implemented pixel by pixel. In contrast to them, we process all foreground individually in in Figure parallel7c bywith H×d W=threads. Because a foreground pixelcomputing and its We illustratepixels the propagation 1. In the CPU-based sequential neighboring foreground pixels propagate to their neighboring cliques synchronously, the methods, the propagation process is implemented pixel by pixel. In contrast to them, we process computation of local and the neighboring labels are not certainly finished. Thus, the labeling result is its all foreground pixels individually in parallel by H × W threads. Because a foreground pixel and not always the same. When this parallel propagation process is executed iteratively until all label neighboring foreground pixels propagate to their neighboring cliques synchronously, the computation values no longer change, the connected pixels are labeled with the lowest label value among them of local and the neighboring labels are not certainly finished. Thus, the labeling result is not always the after several iteration operations, as shown in Figure 7d. same. When this parallel propagation process is executed iteratively until all label values no longer change, the connected pixels are labeled with the lowest label value among them after several iteration operations, as shown in Figure 7d.


9 of 20


9 of 20

(a)

(b)

(c)

(d) Figure 7. A processing example of the proposed parallel connected component labeling (PCCL) Figure 7. A processing example of the proposed parallel connected component labeling (PCCL) algorithm. (a) The segmented foreground pixels; (b) the initialized label map; (c) the propagation algorithm. (a) The segmented foreground pixels; (b) the initialized label map; (c) the propagation process from any foreground pixel to its neighboring clique; (d) the labeling result. process from any foreground pixel to its neighboring clique; (d) the labeling result.

Sustainability 2016, 8, 916 Sustainability 2016, 8, 916 Sustainability 2016, 8, 916

10 of 20 10 of 20 10 of 20

(a) (b) (c) (a) (b) (c) Figure 8. The propagating neighboring clique. (a) The typical clique; (b) the proposed clique of d = 1; Figure 8. The propagating neighboring clique. (a) The typical clique; (b) the proposed clique of d = 1; Figure 8. The propagating clique. (a) The typical clique; (b) the proposed clique of d = 1; (c) the proposed clique of d neighboring = 2. (c) the proposed clique of d = 2. (c) the proposed clique of d = 2.

3.4. Components Extraction 3.4. Components Extraction 3.4. Components Extraction In object tracking applications, distinguished components need to be extracted respectively from In object tracking applications, distinguished components need extracted respectively from In object tracking applications, distinguished components need totobebeextracted respectively from the labeling result. Also using GPU programming, we propose a parallel components extraction the labeling result. Also using GPU programming, we propose a parallel components extraction the method labelingasresult. using shown Also in Figure 9. GPU programming, we propose a parallel components extraction method as shown in Figure A label cl9.is9.defined as the value of its pixel labels, which is the minimum index method as component shown in Figure A them. component label is as value theHvalue ofpixel its pixel which the minimum among Bylabel traversing alldefined labels using ×ofWits threads, a cllabels, is which selected valued as index any index label A component cl is cl defined as the labels, is and theisminimum among among them. By traversing all labels using H × W threads, a cl is selected and valued as any label equal to the label index. Then, we extract the component pixels in the component cl, and label their them. By traversing all labels using H × W threads, a cl is selected and valued as any label equal to the equal as to null. the label Then, we extract the component pixels in the component cl, and label their This process is implemented iteratively any component label does not change innull. labellabels index. Then, weindex. extract the component pixels in until the component cl, and label their labels as labels asmap. null. This process is implemented iteratively until any component label does not change in the label This process is implemented iteratively until any component label does not change in the label map. the label map. Besides objects tracking applications, this method provides a clue to remove noise from the Besides objects tracking applications, this method provides a clue to remove noise from the Besidessegmentation. objects tracking this method a clue to remove noise of from the foreground The applications, noise is always formed as aprovides blob containing a small quantity pixels. foreground segmentation. The noise isisalways formed as aa blob blobcontaining containing a small quantity of pixels. foreground segmentation. The noise always formed as a small quantity of pixels. If the extracted pixel of a component is small, we determine that this component is noise and it is If the of aofcomponent isissmall, thatthis thiscomponent component is noise If extracted the extracted pixel a component small,we we determine determine that is noise andand it isit is removed frompixel the segmentation results. removed from the segmentation results. removed from the segmentation results.

Figure 9. The dataflow of the components extraction from the labeling result computed by PCCL. Figure 9. The dataflow componentsextraction extraction from from the by by PCCL. Figure 9. The dataflow of of thethe components thelabeling labelingresult resultcomputed computed PCCL.

4. Experiments and Analysis 4. Experiments and Analysis In this section, we analyze the performance of the proposed GPU-accelerated foreground 4. Experiments and Analysis ® Core™ In this section, we analyze of the proposed GPU-accelerated foreground segmentation and labeling method.the Theperformance experiments were implemented on a 3.20 GHz Intel In this section, we theThe performance of the proposed foreground ® Core™ segmentation and labeling method. experiments were on agraphics 3.20 GHzcard Intel(Nvidia, (Intel, Santa Clara, CA, analyze USA) Quad CPU computer with aimplemented GeForce GT GPU-accelerated 770 ® Core™ segmentation and labeling method. The experiments wereaimplemented on graphics a 3.20 GHz Intel (Intel, Santa Clara, CA, USA) Quad CPU computer with GeForce GT 770 card (Nvidia,

(Intel, Santa Clara, CA, USA) Quad CPU computer with a GeForce GT 770 graphics card (Nvidia, Santa Clara, CA, USA), 4 GB RAM. The GPU-based parallel computation was implemented using the CUDA toolkit V5.5 (Nvidia, Santa Clara, CA, USA).


11 of 20


11 of 20

Clara, CA, USA), 4 GBPerformance RAM. The GPU-based 4.1. Santa Foreground Segmentation Analysis parallel computation was implemented using the CUDA toolkit V5.5 (Nvidia, Santa Clara, CA, USA).

Figure 10 shows the experimental results of the proposed methods, which were executed on 4.1. Foreground Segmentation Performance Analysis several typical and challenging scenarios from a crowd dataset (PETS2009) [40], vehicle dataset (AVSS2007, London, UK) jug dataset (ICCV2003, Nice,methods, France) which [42], and dataset Figure 10 shows the [41], experimental results of the proposed wereWaveTree executed on (Microsoft Research, USA)from [43],a from to right. In these the camera several typical and Redmond, challenging WA, scenarios crowdleft dataset (PETS2009) [40],datasets, vehicle dataset viewports wereLondon, stationary. dataset containedNice, a 768 × 576[42], AVI and video for the evaluation of (AVSS2007, UK)The [41],crowd jug dataset (ICCV2003, France) WaveTree dataset (Microsoft Research, Redmond, WA, USA) [43], from left to right. In these datasets, the camera moving pedestrian detection. The vehicle dataset had 720 × 576 video sequences for the evaluation viewportssegmentation, were stationary.which The crowd dataset parked contained a 768 × 576The AVIjug video for thecontained evaluation of of vehicles contained vehicles. dataset a set of moving pedestrian detection. The vehicle dataset had 720 × 576 video sequences for the evaluation of from 320 × 240 sequential images for evaluating the moving semitransparent jug segmentation vehicles segmentation, which contained parked vehicles. The jug dataset contained a set of 320 × 240 maritime environments where wave ripples existed. The WaveTree was a low-resolution dataset sequential images for evaluating the moving semitransparent jug segmentation from maritime of 160 × 120 sequential images, which had a waving tree as a background object. The available environments where wave ripples existed. The WaveTree was a low-resolution dataset of 160 × 120 buffer size of the applied NVIDIA hardware was 5622 megabytes, which satisfied thesize GPU memory sequential images, which had a waving tree as a background object. The available buffer of the requirement of the background modeling for our experiments. applied NVIDIA hardware was 5622 megabytes, which satisfied the GPU memory requirement of After several frames of for background model updating, the background images in Figure 10b were the background modeling our experiments. After several frames of background model updating, the color background images Then, in Figure were generated from the color values located in the peaks of the histograms. we10b implemented generated from thealgorithm color values in the peaks of thesegmented color histograms. we implemented the proposed PCCL inlocated the foreground pixels using Then, the generated background the proposed PCCL algorithm in thewere foreground pixels segmented using the generated model. The connected components rendered with distinguished colors in thebackground labeling results, model. The connected components were rendered with distinguished colors in the labeling results, shown in Figure 10c. After the components containing small quantity of pixels were removed as noise, shown in Figure 10c. After the components containing small quantity of pixels were removed as noise, the foreground segmentation results were refined and the foreground objects as large components the foreground segmentation results were refined and the foreground objects as large components were extracted, shown in Figure 10d. Finally, the foreground component pixels were extracted in were extracted, shown in Figure 10d. Finally, the foreground component pixels were extracted in Figure 10e10e by by masking the Figure10d 10dtotothe theoriginal original images. Figure masking thebinary binarymap map in in Figure images.

(a)

(b)

(c)

(d) Figure 10. Cont.


12 of 20


12 of 20


12 of 20

(e) Figure 10. The experimental results of proposed foreground segmentation method using the PCCL

Figure 10. The experimental results of proposed foreground segmentation method using the PCCL algorithm. (a) The original video images; (b) background image generated from the histogram peaks (e) algorithm. (a) The original video images; (b) background image generated from the histogram peaks in the background model; (c) the labeling results; (d) the foreground component extraction without in the background model; (c) the labeling results;foreground (d) the foreground component extraction without Figure 10. The experimental results of proposed segmentation method using the PCCL noise; (e) the foreground pixel extraction results. noise; (e) the foreground pixel extraction algorithm. (a) The original video images;results. (b) background image generated from the histogram peaks in the background model; the labeling results; the foreground component extraction without In order to analyze the(c) performance of our(d) proposed foreground segmentation and PCCL noise; (e) the foreground pixel extraction results. methods, conducted same experiments using a self-adaptive model proposed by In order we to analyze thethe performance of our proposed foregroundcodebook segmentation and PCCL methods, Shah et al. [44] theexperiments typical GMMusing proposed by Stauffer and Grimson [24]. Following we conducted the and same a self-adaptive codebook model proposedthe bycodebook Shah et al. [44] In order to analyze the performance of our proposed foreground segmentation andlibrary PCCL description in [24,44], we Stauffer programmed the codebook andFollowing GMM using thecodebook OpenCV and and the GMM typical GMM proposed by and Grimson [24]. the and GMM methods, we conducted the same experiments using a self-adaptive codebook model proposed by (Itseez, San Francisco, CA, USA). description in [24,44], we programmed the codebook and GMM using the OpenCV library (Itseez, ShahFigures et al. [44] typical GMM proposed by Stauffer and Grimson [24]. Followingsegmentation the codebook 11 and andthe 12 illustrate the qualitative comparison results of these foreground San Francisco, CA, USA).in [24,44], we programmed the codebook and GMM using the OpenCV library and GMM description methods on the 768 × 576 crowd dataset and 640 × 480 campus dataset, respectively. Figures 11a and Figures 11Francisco, andsampling 12 illustrate theofqualitative comparison results of these foreground segmentation (Itseez, CA, frames USA). 12a haveSan several the original videos. The foreground segmentation results using Figures 11 and 12 illustrate the qualitative comparison these foreground segmentation methods on the 768 × 576 crowd dataset and 640 × 480 campus dataset, respectively. Figures the codebook model are shown in Figures 11b and 12b,results whichofcontained the noise caused by 11a methods on the 768 × 576 crowd dataset and 640 × 480 campus dataset, respectively. Figures 11a and and illumination 12a have several sampling the original videos. Themodel foreground segmentation changes and smallframes wavingof objects. Although the GMM removed the noisy spotresults 12a have frames the videos. The12b, foreground segmentation results using the codebook model are shown in original Figures 11b and which the noiseusing caused by caused byseveral varyingsampling illumination, theofspattered noise of waving objects still contained existed in the segmentation the codebook model are shown in Figures 11b and 12b, which contained the noise by results, as changes shown in and Figures 11cwaving and 12c. objects. Using theAlthough PCCL algorithm, our model proposed methodcaused removed illumination small the GMM removed the noisy spot illumination changes and small waving objects. Although the GMM model removed the noisy spot both spot and spattered noise of a small quantity of pixels and provided accurate foreground caused by varying illumination, the spattered noise of waving objects still existed in the segmentation caused byresults varying illumination, the spattered noise of waving objects still existed in the segmentation detection Figures11c 11dand and 12d. Using results, as shown ininFigures 12c. the PCCL algorithm, our proposed method removed results, as shown in Figures 11c and 12c. Using the PCCL algorithm, our proposed method removed bothboth spot spot and spattered noisenoise of a small quantity of pixels and provided accurate foreground detection and spattered of a small quantity of pixels and provided accurate foreground results in Figures 11d 12d. detection results in and Figures 11d and 12d.

(a)

(a)

(b)

(b)

(c)

(c) Figure 11. Cont.


13 of 20


13 of 13 20 of 20

(d)

(d) Figure 11. The qualitative comparison results on the crowd dataset. (a) The original video images;

Figure 11. The comparison results results onthe thecrowd crowddataset. dataset.(a) (a)The Theoriginal original videoimages; images; Figure 11. segmentation The qualitative qualitative comparison (b) the results using the codebookonmodel; (c) the segmentation results using video the GMM; (b) the segmentation results using the codebook model; (c) the segmentation results using the GMM; (b) the results using (d) segmentation the segmentation results usingthe ourcodebook proposed model; method.(c) the segmentation results using the GMM; (d) the segmentation results using our proposed method. (d) the segmentation results using our proposed method.

(a)

(a)

(b)

(b)

(c)

(c)

(d) Figure 12. The qualitative comparison results on our campus dataset. (a) The original video images; (b) the segmentation results using the codebook model; (c) the segmentation results using the GMM; (d) the segmentation results using our proposed method.

(d)

Figure 13 shows the speed performance of the foreground segmentation and labeling methods Figure 12. The The qualitative comparison results on dataset. (a) video Figure 12. comparison resultsmodel, onour ourcampus campus dataset. (a)The Theoriginal original videoimages; images; compared withqualitative the self-adaptive codebook the GMM, and the GPU GMM proposed by (b) the segmentation results using the codebook model; (c) the segmentation results using the GMM; (b) the segmentation results using the codebook model; (c) the segmentation results using the GMM; (d) the the segmentation segmentation results results using using our (d) our proposed proposed method. method.

Figure 13 shows the speed performance of the foreground segmentation and labeling methods compared with the self-adaptive codebook model, the GMM, and the GPU GMM proposed by


14 of 20

Figure 13 shows the speed performance of the foreground segmentation and labeling methods compared with the self-adaptive codebook model, the GMM, and the GPU GMM proposed by 2016,We 8, 916implemented the training and testing stages synchronously. The 14 ofaverage 20 PhamSustainability et al. [25]. processing speeds of the foreground segmentation in the 640 × 480 resolution dataset were Pham et al. [25]. We implemented the training and testing stages synchronously. The average 39.3867 frames per second (fps), 6.8981 fps, 29.1489 fps, and 56.3287 fps by using the codebook model, processing speeds of the foreground segmentation in the 640 × 480 resolution dataset were 39.3867 the GMM, GMM, the proposed feedback model the PCCL, respectively. For the framesthe perGPU second (fps), and 6.8981 fps, 29.1489 fps, and 56.3287 fpswith by using the codebook model, the 768 ×GMM, 576 resolution dataset, the processing speed reduced to 15.8867 fps, 4.6507 fps, 20.6014 the GPU GMM, and the proposed feedback model with the PCCL, respectively. For the 768 × fps, and 34.1488 fps. The results thatspeed whenreduced the video resolution became thefps, foreground 576 resolution dataset, the reflect processing to 15.8867 fps, 4.6507 fps,high, 20.6014 and 34.1488 fps. Thecodebook results reflect that when the were videonot resolution became high, segmentation using and GMM models able to implement in the real foreground time. The GPU using codebook GMM models were not to implement in realprocessing time. The GPU threadsegmentation is fit for computing simpleand programs in parallel, butable becomes slow when complex thread is fit for computing simple programs in parallel, but becomes slow when processing complex algorithms. The Gaussian probability density function computation was so complex that the GPU algorithms. The Gaussian probability density function computation was so complex that the GPU computation speed of GMM was not fast enough for real-time computation in high-resolution videos. computation speed of GMM was not fast enough for real-time computation in high-resolution videos. Although the processing speed of our method was reduced in the high-resolution situation, it was Although the processing speed of our method was reduced in the high-resolution situation, it was more more than than 30 fps satisfied thethe requirement monitoring. 30and fps and satisfied requirementof ofreal-time real-time monitoring. 90

80

Our feedback model with the PCCL

Processing speed (fps)

70

60

50

Codebook 40

GPU GMM

30

20

GMM

10

0

0

100

200

300

400

500

600

700

800

Frames

(a) 60

Processing speed (fps)

50

Our feedback model with the PCCL

40

30

GPU GMM 20

Codebook 10

GMM 0

0

100

200

300

400

500

600

700

800

Frames

(b) Figure 13. The foreground segmentation speed performances of the proposed feedback model with

Figure 13. The foreground segmentation speed performances of the proposed feedback model with the the PCCL, compared with the codebook, the GMM, and GPU GMM. (a) The dataset of 640 × 480 PCCL,resolution; compared(b)with the codebook, the GMM, and GPU GMM. (a) The dataset of 640 × 480 resolution; the dataset of 768 × 576 resolution. (b) the dataset of 768 × 576 resolution.


15 of 20

4.2. PCCL Performance Analysis The existing CPU-based directional pass and multi-way pass CCL methods utilized a single thread to propagate the minimum label to the local and neighboring dataset pixel by pixel. Their principles were similar such as descriptions by He et al. [33] and Hu et al. [34]. The different performances of iteration and processing speeds were caused by the different label array range and testing different resolution. If the label array range was enlarged, the minimum label of the large range was quickly searched propagated to the large range by an execution. However, the processing speed became slower due to the more computation consumption required in the larger labeling range. To compare the performances of the PCCL and the typical CCL methods, we programmed the CPU-based CCL proposed by Hu et al. [34] and GPU-based multi-pass CCL methods proposed by Hawick et al. [38], which were implemented in the 768 × 576 resolution dataset. In these methods, the label array of a labeling execution was selected based on the propagating neighboring clique definition in Figure 8b. Figure 14 shows the relation between processing speeds and iteration times of these methods. Because the CPU thread scanned the video images pixel by pixel during each iteration, many labeling iterations were required so that the labeling speed was slow when the foreground pixels occupied large regions in high-resolution images. As shown in Figure 14a, the speed of the CPU-based CCL decreased faster when more propagating iterations were executed. The average labeling speed performance of the CPU-based method was 41.31 fps with 14.40 iterations. Approximately, if more than 29 iterations were executed, the labeling speed would be reduced to below 20 fps, which does not satisfy the real-time requirement. Different from CPU programming, the GPU-based labeling methods processed all pixels in parallel so that the labeling speeds did not decrease a lot even for the iteration increment. The GPU-based multi-pass CCL method could be implemented in real time for low-resolution images. However, this method required a large number of iterations for the propagation for high-resolution images, because a GPU thread only updated the center pixel in a clique with the minimum label. Thus, labeling convergence became slow due to the ineffective propagation process. As shown in Figure 14b, 121.98 iterations were executed using the GPU CCL in our experiment, and the labeling speed was reduced to 27.57 fps on average. In our CCL implementation, the local and neighboring labels were updated with the minimum label, instead of only one local label updated by the GPU CCL. Although the ranges of the label array in the GPU CCL and the CPU CCL were same, in the GPU CCL, the propagation bandwidth was smaller so that more iterations were required. Also, in our PCCL implementation, a GPU thread labeled its corresponding local and neighbor pixels with the minimum label among its clique. In the PCCL and GPU CCL, the label array ranges were same. We updated five labels by a GPU thread, more than only one label updated by the GPU CCL, so that the labeling convergence was accelerated. This way, the iteration times were reduced and the labeling speed increased. As shown in Figure 14c, 64.32 iterations were executed using the PCCL, and the labeling speed was 71.35 fps on average. Although our proposed PCCL executed more interactions than the CPU CCL, the parallel computing mechanism accelerated the labeling speed. Thus, the PCCL was able to provide the best performance for high-resolution images.


16 of 20 16 of 20

(a)

(b)

(c) Figure 14. The relation between the labeling speed and the iteration times, executed in the dataset of Figure 14. The relation between the labeling speed and the iteration times, executed in the dataset 768 × 576 resolution. (a) The performance of the CPU-based CCL; (b) the performance of the GPUof 768 × 576 resolution. (a) The performance of the CPU-based CCL; (b) the performance of the based multi-pass CCL; (c) the performance of the proposed PCCL. GPU-based multi-pass CCL; (c) the performance of the proposed PCCL.

Figure 15 illustrates a video surveillance application for a sustainable energy management system using the proposed GPU-accelerated foreground segmentation and labeling algorithms. In this project, a smart meter developed by Beijing Innolinks Technology Co. Ltd. (Beijing, China) [45], was utilized for ubiquitous controlling of the appliances, including air conditioning, water heater, Sustainability 17 of 20 and light. 2016, 8, 916 In the monitoring module, the environment information was recorded by multiple sensors and reported to the smart meter via the ZigBee communication. In our application, the ambient 4.3. Sustainable Energy Management Application temperature was sensed by the thermometer mounted at the smart meter, and the foreground objects 15 illustrates a video method. surveillance application a sustainable management system wereFigure detected by our proposed Based on these for sensed datasets, energy the smart meter controlled using the proposed GPU-accelerated and labeling In thissupply. project, the appliances for electricity saving foreground by changingsegmentation operation parameters or algorithms. switching power a smart meter developed by Beijing Technology [45], was services utilized Meanwhile, the smart meter enabledInnolinks the remote operationCo. by Ltd. smart(Beijing, phonesChina) through Cloud controlling of the appliances, including air conditioning, water heater, and light. for ubiquitous controlling.

Figure Figure 15. 15. The The proposed proposed GPU-accelerated GPU-accelerated foreground foreground segmentation segmentation and and labeling labeling algorithms algorithms applied applied in a sustainable energy management system with smart meters. in a sustainable energy management system with smart meters.

In this project, our contribution was to develop a real-time foreground objects detection method In the monitoring module, the environment information was recorded by multiple sensors and for providing the smart meter with a decision-making basis. When the number of people in a reported to the smart meter via the ZigBee communication. In our application, the ambient temperature workplace decreased, the smart meter turned up the temperature of the air condition. When a vehicle was sensed by the thermometer mounted at the smart meter, and the foreground objects were detected moved in a dark garage, the lightings around the car were turned on. When a person came into a by our proposed method. Based on these sensed datasets, the smart meter controlled the appliances for room, the appliances connected with the smart meters were turned on. Using such smart energy electricity saving by changing operation parameters or switching power supply. Meanwhile, the smart management, the appliances were able to save around 20%–50% energy. meter enabled the remote operation by smart phones through Cloud services for ubiquitous controlling. In this project, our contribution was to develop a real-time foreground objects detection method 5. Conclusions for providing the smart meter with a decision-making basis. When the number of people in a workplace Recently, surveillance such as traffic analysis, indoor When and outdoor decreased, the video smart meter turnedsystems, up the temperature of the air condition. a vehiclesecurity, moved people detection and tracking, and remote controlling, have been widely studied in intelligent in a dark garage, the lightings around the car were turned on. When a person came into a room, monitoring as connected an important of future sustainable computing. real-time the appliances withissue the smart meters were turned on. UsingThis suchpaper smart developed energy management, foreground detection andtolabeling methods to segment the appliances were able save around 20%–50% energy.the foreground pixels in video sequences. In the foreground detection process, we created a background model of pixel-wise color histograms 5. GPU Conclusions in memory to record color changes of all pixels. The background model was updating with the new Recently, registeredvideo frames, while thesystems, foreground were detected if theand pixels’ colorsecurity, did notpeople locate surveillance suchpixels as traffic analysis, indoor outdoor around the peaks of the corresponding histograms. To refine the segmentation results, we developed detection and tracking, and remote controlling, have been widely studied in intelligent monitoring aas PCCL algorithm the rough segmented pixels intoreal-time several foreground distinguish an important issuetoofcluster future sustainable computing. foreground This paper developed connected blobs. A GPU thread labeled a foreground pixel and its neighboring foreground pixels detection and labeling methods to segment the foreground pixels in video sequences. In the foreground with the minimum label among them. This propagation process was implemented iteratively until detection process, we created a background model of pixel-wise color histograms in GPU memory all label values not of changed. TheThe noise blobs containing small quantitywith of pixels wereregistered removed to record color were changes all pixels. background modelawas updating the new for increasing the foreground segmentation accuracy. The experimental results confirmed that the frames, while the foreground pixels were detected if the pixels’ color did not locate around the peaks of proposed method was able to deal with data-dependent situations in parallel and achieved real-time the corresponding histograms. To refine the segmentation results, we developed a PCCL algorithm to approach. The real-time andforeground accurate foreground objectsdistinguish detection enhanced the auto-controlling cluster the rough segmented pixels into several connected blobs. A GPU thread efficiency of the smart meter for realizing sustainable energy management. Using proposed labeled a foreground pixel and its neighboring foreground pixels with the minimumour label among them. This propagation process was implemented iteratively until all label values were not changed. The noise blobs containing a small quantity of pixels were removed for increasing the foreground segmentation accuracy. The experimental results confirmed that the proposed method was able to deal with data-dependent situations in parallel and achieved real-time approach. The real-time and accurate foreground objects detection enhanced the auto-controlling efficiency of the smart meter for


18 of 20

realizing sustainable energy management. Using our proposed method, the connected components were clustered, but it was hard to distinguish two objects connecting with each other. In future, we will study an object tracking method for distinguishing the objects in the crossing situation. Acknowledgments: This research was supported by Beijing Innolinks Technology Co. Ltd., by the National Natural Science Foundation of China (61503005), by Science and Technology Project of Beijing Municipal Education Commission (KM2015_10009006), by the young researcher foundation of NCUT (XN070027), by Advantageous Subject Cultivation Fund of NCUT, by SRF for ROCS, SEM. Author Contributions: All the authors contributed equally to this work. All authors read and approved the final manuscript. Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations The following abbreviations are used in this manuscript: GPU CCL PCCL CUDA FPGA GMM fps

Graphics processing unit Connected component labeling Parallel connected component labeling Compute Unified Device Architecture Field Programmable Gate Array Gaussian mixture model Frames per second

References 1. 2.

3. 4. 5. 6.

7. 8. 9. 10. 11. 12. 13. 14.

Carroll, J.; Lyons, S.; Denny, E. Reducing household electricity demand through smart metering: The role of improved information about energy saving. Energy Econ. 2014, 45, 234–243. [CrossRef] Saha, A.; Kuzlu, M.; Pipattanasomporn, M.; Rahman, S.; Elma, O.; Selamogullari, U.S.; Uzunoglu, M.; Yagcitekin, B. A Robust Building Energy Management Algorithm Validated in a Smart House Environment. Intell. Ind. Syst. 2015, 1, 163–174. [CrossRef] Marinakis, V.; Karakosta, C.; Doukas, H.; Androulaki, S.; Psarras, J. A building automation and control tool for remote and real time monitoring of energy consumption. Sustain. Cities Soc. 2013, 6, 11–15. [CrossRef] Ahn, H.; Jung, B.K.; Park, J.R. Effect of reagents on optical properties of asbestos and remote spectral sensing. J. Converg. 2014, 5, 15–18. Erkan, B.; Nadia, K.; Adrian, F.C. Augmented reality applications for cultural heritage using Kinect. Hum. Cent. Comput. Inf. Sci. 2015, 5, 1–18. Bayona, A.; SanMiguel, J.C.; Martinez, J.M. Comparative evaluation of stationary foreground object detection algorithms based on background subtraction techniques. In Proceedings of the 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, Genova, Italy, 2–4 September 2009; pp. 25–30. Kim, K.; Chalidabhongse, T.H.; Harwood, D.; Davis, L. Real-time foreground–background segmentation using codebook model. Real Time Imaging 2005, 11, 172–185. [CrossRef] Bouwmans, T.; Porikli, F.; Horferlin, B.; Vacavant, A. Background Modeling and Foreground Detection for Video Surveillance; CRC Press: Boca Raton, FL, USA; Taylor and Francis Group: Boca Raton, FL, USA, 2014. Sobral, A.; Vacavant, A. A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 2014, 122, 4–21. [CrossRef] Zhong, B.; Chen, Y.; Chen, Y.; Ji, R.; Chen, Y.; Chen, D.; Wang, H. Background subtraction driven seeds selection for moving objects segmentation and matting. Neurocomputing 2013, 103, 132–142. [CrossRef] Das, S.; Konar, A. Automatic image pixel clustering with an improved differential evolution. Appl. Soft Comput. 2009, 9, 226–236. [CrossRef] Song, W.; Cho, K. Real-time terrain reconstruction using 3D flag map for point clouds. Multimed. Tools Appl. 2015, 74, 3459–3475. [CrossRef] Premanand, P.G.; Nilkanth, B.C. Content Based Dynamic Texture Analysis and Synthesis Based on SPIHT with GPU. J. Inf. Process. Syst. 2016, 12, 46–56. Komura, Y.K.; Okabe, Y.T. GPU-based Swendsen–Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems. Comput. Phys. Commun. 2012, 183, 1155–1161. [CrossRef]


15. 16. 17.

18. 19. 20. 21. 22.

23.

24.

25.

26.

27.

28.

29.

30. 31.

32.

33. 34. 35.

19 of 20

Song, W.; Wu, D.; Xi, Y.L.; Park, Y.W.; Cho, K. Motion-based skin region of interest detection with a real-time connected component labeling algorithm. Multimed. Tools Appl. 2016. [CrossRef] Cuevas, C.; Garcia, N. Efficient Moving Object Detection for Lightweight Applications on Smart Cameras. IEEE Trans. Circuit Syst. Video Technol. 2013, 23, 1–14. [CrossRef] Ramasamy, T.; Asirvadam, V.; Sebastian, P. Detecting Background Setting For Dynamic Scene. In Proceedings of the IEEE 7th International Colloquium on Signal Processing and Its Applications, Penang, Malaysia, 4–6 March 2011; pp. 146–150. Casares, M.; Velipasalar, S.; Pinto, A. Light-weight salient foreground detection for embedded smart cameras. Comput. Vis. Image Underst. 2010, 114, 1223–1237. [CrossRef] Azmat, S.; Wills, L.; Wills, S. Temporal Multi-Modal Mean. In Proceedings of the 2012 IEEE Southwest Symposium on Image Analysis and Interpretation, Santa Fe, NM, USA, 22–24 April 2012; pp. 73–76. Gallego, J.; Pardàs, M.; Haro, G. Enhanced foreground segmentation and tracking combining Bayesian background, shadow and foreground modeling. Pattern Recognit. Lett. 2012, 33, 1558–1568. [CrossRef] Zamalieva, D.; Yilmaz, A. Background subtraction for the moving camera: A geometric approach. Comput. Vis. Image Underst. 2014, 127, 73–85. [CrossRef] Jian, L.H.; Wang, C.; Liu, Y.; Liang, S.S.; Yi, W.D.; Shi, Y. Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture(CUDA). J. Supercomput. 2013, 64, 942–967. [CrossRef] Wilson, B.; Tavakkoli, A. An Efficient Non Parametric Background Modeling Technique with CUDA Heterogeneous Parallel Architecture. In Proceedings of the 11th International Symposium of International Symposium on Visual Computing, Las Vegas, NV, USA, 14–16 December 2015; pp. 210–220. Stauffer, C.; Grimson, W.E.L. Adaptive background mixture models for real-time tracking. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, 23–25 June 1999; pp. 252–258. Pham, V.; Vo, P.; Thanh, H.V.; Hoai, B.L. GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction. In Proceedings of the IEEE International Conference on Computing and Telecommunication Technologies, Hanoi, Vietnam, 1–4 November 2010; pp. 1–4. Boghdady, R.; Salama, C.; Wahba, A. GPU-accelerated real-time video background subtraction. In Proceedings of the 10th IEEE International Conference on Computer Engineering & Systems, Cairo, Egypt, 23–24 December 2015; pp. 34–39. Fukui, S.; Iwahori, Y.; Woodham, R. GPU Based Extraction of Moving Objects without Shadows under Intensity Changes. In Proceedings of the IEEE Congress on Evolutionary Computation, Hong Kong, China, 1–6 June 2008; pp. 4165–4172. Griesser, A.; Roeck, S.D.; Neubeck, A. GPU-Based Foreground-Background Segmentation using an Extended Colinearity Criterion. In Proceedings of the 2005 Vision, Modeling, and Visualization, Erlangen, Germany, 16–18 November 2005; pp. 319–326. Cheng, L.; Gong, M. Real Time Background Subtraction from Dynamics Scenes. In Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 2066–2073. Wang, J.; Lu, Y.; Gu, L.; Zhoua, C.; Chai, X. Moving object recognition under simulated prosthetic vision using background-subtraction-based image processing strategies. Inf. Sci. 2014, 277, 512–524. [CrossRef] Appiah, K.; Hunter, A.; Dickinson, P.; Meng, H. Accelerated hardware video object segmentation: From foreground detection to connected components labeling. Comput. Vis. Image Underst. 2010, 114, 1282–1291. [CrossRef] Jiang, H.; Yu, T.; Tian, C.; Tan, G.; Wang, C. CONSEL: Connectivity-based Segmentation in Large-Scale 2D/3D Sensor Networks. In Proceedings of the 31st Annual IEEE International Conference on Computer Communications (INFOCOM 2012), Orlando, FL, USA, 25–30 March 2012; pp. 2086–2094. He, L.F.; Chao, Y.Y.; Suzuki, K.J.; Wu, K.S. Fast connected-component labeling. Pattern Recognit. 2009, 42, 1977–1987. [CrossRef] Hu, W.C.; Yang, C.Y.; Huang, D.Y. Robust real-time ship detection and tracking for visual surveillance of cage aquaculture. J. Vis. Commun. Image Represent. 2011, 22, 543–556. [CrossRef] Hu, W.C.; Chen, C.H.; Chen, T.Y.; Huang, D.Y.; Wu, Z.C. Moving object detection and tracking from video captured by moving camera. J. Vis. Commun. Image Represent. 2015, 30, 164–180. [CrossRef]


36. 37.

38. 39. 40.

41.

42.

43.

44. 45.

20 of 20

Kalentev, O.; Rai, A.; Kemnitzb, S.; Schneider, R. Connected component labeling on a 2D grid using CUDA. J. Parallel Distrib. Comput. 2011, 71, 615–620. [CrossRef] Hashmi, M.F.; Pal, R.; Saxena, R.; Keskar, A.G. A new approach for real time object detection and tracking on high resolution and multi-camera surveillance videos using GPU. J. Cent. South Univ. 2016, 23, 130–144. [CrossRef] Hawick, K.A.; Leist, A.; Playne, D.P. Parallel graph component labelling with GPUs and CUDA. Parallel Comput. 2010, 36, 655–678. [CrossRef] Cabaret, L.; Lacassagne, L.; Etiemble, D. Parallel Light Speed Labeling: An efficient connected component algorithm for labeling and analysis on multi-core processors. J. Real Time Image Proc. 2016. [CrossRef] Ferryman, J.; Shahrokni, A. PETS2009: Dataset and challenge. In Proceedings of the Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, Snowbird, UT, USA, 7–9 December 2009; pp. 1–6. Boragno, S.; Boghossian, B.; Black, J.; Makris, D.; Velastin, S. A DSP-based system for the detection of vehicles parked in prohibited areas. In Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, London, UK, 5–7 September 2007; pp. 260–265. Zhong, J.; Sclaroff, S. Segmenting foreground objects from a dynamic textured background via a robust Kalman filter. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 44–50. Toyama, K.; Krumm, J.; Brumitt, B.; Meyers, B. Wallflower: Principles and Practice of Background Maintenance. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 255–261. Shah, M.; Deng, J.D.; Woodford, B.J. A Self-adaptive CodeBook (SACB) model for real-time background subtraction. Image Vis. Comput. 2014, 38, 52–64. [CrossRef] Beijing Innolinks Technology Co. Ltd. Available online: http://www.innolinks.cn (accessed on 12 March 2014). © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

GPU-Accelerated Foreground Segmentation and ... - Semantic Scholar

GPU-Accelerated Foreground Segmentation and ... - Semantic Scholar

Suggest Documents

Neural Network Based Foreground Segmentation ... - Semantic Scholar

Robust Foreground Segmentation from Color ... - Semantic Scholar

Foreground Segmentation of Live Videos using ... - Semantic Scholar

Foreground Segmentation and Tracking based on Foreground ... - Core

Foreground-background Segmentation by Cellular

Background and Foreground Segmentation from ...

Over-Segmentation Based Background Modeling and Foreground ...

Foreground Detection BasedonLow-rank and ... - Semantic Scholar

Background initialization and foreground ... - Semantic Scholar

Foreground/Background Segmentation with Learned ...

Depth Gradient Based Segmentation of Overlapping Foreground ...

Robust Foreground Segmentation Using Improved ... - Eurecom

Foreground Segmentation-Based Human Detection With Shadow ...

Mutual Foreground Segmentation With ... - CVF Open Access

Cast Shadow Removing in Foreground ... - Semantic Scholar

Efficient Incorporation of Motionless Foreground ... - Semantic Scholar

Techniques to improve the foreground ... - Semantic Scholar

Foreground-Background Regions Guided ... - Semantic Scholar

Joint Background Reconstruction and Foreground Segmentation via A ...

Foreground Segmentation in Depth Imagery Using Depth and Spatial ...

foreground segmentation based on multi-resolution and matting - arXiv

Sentenece segmentation - Semantic Scholar

Bayesian Foreground and Shadow Detection in ... - Semantic Scholar

Close-range Scene Segmentation and ... - Semantic Scholar