Research Article
Coverage problem in camera-based sensor networks using the CUDA platform
International Journal of Distributed Sensor Networks 2017, Vol. 13(12) Ó The Author(s) 2017 DOI: 10.1177/1550147717746353 journals.sagepub.com/home/dsn
Jae-Hyun Seo1, Yourim Yoon2 and Yong-Hyuk Kim3
Abstract Closed-circuit televisions serve as prevention against crime, and many studies for closed-circuit television deployment have been conducted. The closed-circuit television deployment in downtown is similar to solving coverage problem in wireless camera-based sensor networks. The difference between the two problems is various environmental factors such as buildings, roads, camera capability, and movements of pedestrians. We use a genetic algorithm to increase the efficiency of closed-circuit television deployment in two-dimensional topography. In addition, a parallel experiment using general-purpose computing on graphics processing units is added to improve computing speed, which is a disadvantage in genetic algorithms. The target region is 500 m 3 500 m and consists of 50 3 50 grids. The fitness of the evaluation, which refers to a detection rate, is calculated from the corresponding cell when a pedestrian moves to each cell depending on whether the pedestrian is detected. The proposed experiment was superior to the random deployment experiment by approximately 37.5%. There was no significant difference in the detection rate between the CPU experiment and a NVIDIA GeForce GTX 970 experiment in the 95% confidence interval. The efficiency of a CUDA kernel function using the NVIDIA GeForce GTX 970 graphic card was analyzed. Keywords Sensor deployments, coverage, genetic algorithm, parallel computing, CUDA
Date received: 15 September 2017; accepted: 2 November 2017 Handling Editor: Bin Gao
Introduction Boundary coverage1–3 involves the detection of an entity by forming a boundary when an entity passes a region of interest (RoI). This is applicable to various fields such as border surveillance, protection of resources, surveillance patrol, and invasion detection. Many sensor nodes must be deployed to cover the total RoI in order to reach full coverage. However, the number of deployed sensor nodes can be substantially reduced by forming boundary coverage and detecting the moving entities. The efficiency of boundary coverage increases significantly with wider RoI, but problems occur because coverage holes appear. Boundary coverage can be classified into weak barrier coverage and strong barrier coverage methods.
Weak barrier coverage detects the moving entities that pass the RoI in a fixed path. Strong barrier coverage detects moving entities that pass the RoI in a random path.4 1
Department of Computer Science and Engineering, Wonkwang University, Iksan-si, Republic of Korea 2 Department of Computer Engineering, Gachon University, Seongnam-si, Republic of Korea 3 Department of Computer Science and Engineering, Kwangwoon University, Seoul, Republic of Korea Corresponding author: Yong-Hyuk Kim, Department of Computer Science and Engineering, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of Korea. Email:
[email protected]
Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (http://www.uk.sagepub.com/aboutus/ openaccess.htm).
2 Metaheuristic algorithms such as tabu search, gradient ascend, simulated annealing, and evolution operations are used to optimize the boundary coverage. This is a non-deterministic polynomial-time (NP) hardness problem.5 A genetic algorithm,6,7 which is a type of evolution operation, is used in this study to optimize the boundary of closed-circuit television (CCTV) deployment. Adopting parallelism is more convenient in genetic algorithms compared with other metaheuristicbased algorithms because high-quality solutions can be easily gained in a short time during parallel processing. CCTVs are used in crime prevention, criminal tracking, invasion detection, and boundary surveillance patrol. Uses of CCTV equipment are gradually increasing. CCTV’s efficacy is especially important in urban areas that have concentrated populations. It is quite difficult to deploy CCTVs efficiently in wide RoIs and on complex roads using a suitable number of CCTVs. This is an NP-hard problem and it is proper to adopt approaches that apply metaheuristicbased algorithms. Zang8 researched the boundary coverage of camera sensor networks and aimed to gain high-sensing coverage. Regardless of the direction from which an object approaches, the camera boundary is generated with full-view coverage. The camera sensor can effectively capture facial images. The authors aimed to generate a boundary using few camera sensors by analyzing a fullview coverage model and camera selection algorithm. Geometric models are used to achieve full-view coverage in both deterministic deployment and random deployment. Wang and Cao9 claimed that boundary coverage in camera sensor networks with directivity was not effective only with the sum of RoI sensing ranges. The direction of an object and the camera angle was used to measure the sensing quality. When the camera direction and direction in which the object faces are close enough, one object is completely covered from all angles regardless of its facing direction. This was used to generate a camera boundary that could completely cover all points within the RoI. They proposed a new camera deployment method to achieve boundary coverage in deterministic environments, and they reduced a redundancy to optimize the number of required cameras. Tao et al.4 researched a sensing model that had directivity. To provide strong boundary coverage, the researchers used a video camera to make sensors randomly control the direction. First, the researchers proposed to study strong boundary coverage in onedimension (1D) using a polynomial-time algorithm, which can achieve the required coverage using a minimum number of sensors. Moreover, for strong boundary coverage, a geographical relation was found between deployment region boundaries and sensors with directivity. The concept of virtual nodes was
International Journal of Distributed Sensor Networks introduced to reduce the solution space from a continuous domain into a discrete domain. A directivity boundary graph that can provide quick answers if the directions of directional sensors exist, to achieve strong boundary coverage in a given region, was generated. If such a direction exists, it was claimed that efficient solutions that minimize the maximum rotation angle of directional sensors could be found. The efficiency of the solutions was proven through simulation. Navin10 proposed a new distributed genetic algorithm to overcome camera coverage control problem, which has high time complexity. He/She divided one population into k sub-population and repeated genetic process in parallel for each sub-population. We used only one population. Genetic algorithms require a lot of computing time in fitness evaluation. We designed a CUDA kernel function to evaluate all chromosomes at the same time and the attempt led to largely reduce computation time. Simulation results showed that distributed genetic algorithm results near-optimal solution faster than standard genetic algorithm. Shih et al.11 proposed a distributed algorithm, namely CoBRA (Cone-based Barrier coveRage Algorithm), to achieve barrier coverage in wireless camera sensor networks (WCSNs). The basic concept of CoBRA is that each camera sensor can determine the local possible barrier lines according to the geographical relations with their neighbors. Experiment results show that CoBRA can efficiently achieve barrier coverage in WCSNs. Comparing to the ideal results, CoBRA can use fewer nodes to accomplish barrier coverage in random deployment scenarios. Navin et al.12 also showed the ability of genetic algorithm to solve coverage problem in WCSNs. Simulation results of their presented method showed the better coverage ratio compared with the self-orientation algorithm presented in 2008. Ma et al.13 focused on the Minimum Camera Barrier Coverage Problem (MCBCP) in WCSNs in which the camera sensors are deployed randomly in a target field. In addition, they proposed an optimal algorithm for the MCBCP. Simulation results demonstrated that their algorithm outperformed the existing algorithm. Costa and Guedes14 surveyed the state of the art of the coverage problem issue in WCSNs, regarding strategies, algorithms, and general computational solutions. Open research areas are also discussed, envisaging promising investigation considering coverage in WCSNs. This study aimed to optimize the deployment of CCTV in two dimensions. A downtown area was set as a background, and the locations and angles of CCTVs were fixed during deployment. Detection is determined as a random probability when both the pedestrian facing direction and CCTV facing direction overlap. This procedure was optimized using a genetic algorithm, and the movement detection of each pedestrian was
Seo et al.
3
processed in parallel. Our research deals with a barrier coverage problem. Barrier coverage problem is used to successfully detect events for some objects to cross sensing area. As a special case, CCTV deployments also aim to detect pedestrians crossing a region of downtown. Deterministic deployment and random deployment are types of sensor deployment. Deterministic sensor deployment is generally convenient for installation and is used in accessible environments. Random deployment is used in randomly scattered form when direct deployment is difficult. In this study, deterministic sensor deployment that identifies the locations and number of sensors in advance is used for installation, and a genetic algorithm that is a metaheuristic-based algorithm is used for optimization. Genetic algorithms include inherent parallelism differently from other heuristic algorithms such as tabu search15,16 and gradient ascend.17,18 Our research focused on the improvement on computing time. Tabu search enhances the performance of local search by relaxing a basic rule. At each step, worsening moves can be accepted if no improving move is available. Gradient ascend is a first-order iterative optimization algorithm for finding the maximum of a function. These algorithms do not include any parallelism and they use just one solution to find the optimum. The remainder of this article is organized as follows. Section ‘‘CCTV deployments’’ explains CCTV deployments based on CPUs. In section ‘‘Parallel CCTV Deployments,’’ we present CCTV placements using a parallel technique such as CUDA. In section ‘‘Experiments,’’ we explain our experimental environments, plan, and results. This article ends with some concluding remarks in section ‘‘Concluding remarks.’’
Figure 1. Relation between the direction of a moving person and that of a CCTV: (a) moving direction of pedestrian and (b) detectable direction of CCTV.
A downtown was set as the region in the experiment, and virtual topography was used to create Scenarios 1 and 2. Deployment was limited to a 2D plane, and the topography was composed only of buildings and roads. CCTVs can be deployed only on roadsides, and the detection direction and detection distance are different depending on the type of CCTV. The area of the RoI is 500 m 3 500 m and is a grid composed of 50 3 50 cells. The pedestrian moved randomly from a starting point of (x, y) = (0, 0) to a destination point of (x, y) = (49, 49). An A* algorithm19 is used as a pathsearching algorithm. The A* algorithm moves toward the destination point by calculating the shortest path. The weighted value was added to the moving direction of the pedestrian. However, the number of movements toward the shortest path was generally high owing to the properties of the A* algorithm.
CCTV deployments
CCTV types and detection capabilities
CCTV deployment is a difficult issue that must be determined by considering many variables. CCTV deployment was simplified as a problem to detect pedestrians by determining only CCTV locations and angles in a two-dimensional (2D) plane. CCTV types and detection rates were set in reference to actual products, and the detection direction was simplified into eight directions.
Detectable distance and detection rate were set as listed in Table 1 by referring to CCTV products types sold in the market. CCTV1 is based on the DS-2CD2010-I model and has a maximum visible distance of 30 m. CCTV2 is based on the OPS-BMIP720P3 model and has a maximum visible distance of 40 m. CCTV3 is based on the VSTARCAM-100BP model and has a maximum visible distance of 50 m.
Table 1. Distance and the detection rates according to CCTV type. CCTV type
CCTV1 CCTV2 CCTV3
Visibility Distance (m)
Detection rate
Distance (m)
Detection rate
0–20 0–30 0–40
0.95 0.75 0.65
20–30 30–40 40–50
0.50 0.40 0.30
CCTV: closed-circuit television.
4
International Journal of Distributed Sensor Networks
The directions of pedestrians were set in eight types, as shown in Figure 1(a). The detection directions of CCTVs were also set in eight directions. When a CCTV can detect pedestrians, the case is limited to the situation in which the pedestrian facing direction and CCTV facing direction are facing each other, as shown in Figure 1(b).
Calculation of detection ratio in each cell Probability of detection (POD) of CCTV deployment should be calculated according to the movement direction of pedestrians, because CCTV detection has a limited detection direction, unlike the scalar sensor detection, which can detect in all angles. The POD should be calculated again for new movement because the pedestrian and CCTV facing directions are different when the pedestrian moves to a new cell. After adding the number of cases for both detected and undetected cases of a pedestrian in the relevant cell by comparing the calculated PODs and random values, equation (1) is used to calculate the detection rate. Vd is the number of detected cases and Vnd is the number of undetected cases Detection ratioð%Þ = 100 3
Vd Vd + Vnd
ð1Þ
Genetic algorithm A genetic algorithm was used to deploy CCTVs in the target region, which was composed of 50 3 50 grids. The size of the population was 100, the number of generations was 100, tournament selection is used for selective operation, and the k value that indicates a selective force was set as 3. A block-even crossover operation suitable for 2D problems was used, and a variation was created with a 5% probability in a gene-wise variation operation. In a substitution operation, the solution with the lowest quality was substituted with the child
solution if the child solution in the solution population was better than the solution with the lowest quality. Selection, crossover, and variation operations are repeated in every generation as many times as the number of populations. Then, the genetic algorithm with generation type is used for a substitution operation. Table 2 lists the parameters of the CPU-based genetic algorithm. Regarding encoding, integers that represent the eight directions are added to the integer-type x and y location coordinates on the grid. A CCTV deployment simulator is used for a fitness evaluation. The evaluation is performed once after deployment of the initial solution and once before the substitution operation for each generation. A fitness evaluation is also sequentially performed in the CPU experiment, and an evaluation was performed in parallel with all fitness solutions within the population in the graphics processing unit (GPU) experiment. A total of 10 pedestrians passed the RoI in the fitness evaluation, in which the number of fitness evaluations became 100 3 10. Figure 2 shows a flowchart of the proposed genetic algorithm.
Parallel CCTV deployments POD for each cell in CCTV parallelism should be calculated whenever a pedestrian crosses each cell. The CCTV deployment simulator, which is a fitness function, calculates the POD in each CUDA thread, processes the detection of pedestrians, and returns the number of detected and undetected pedestrians. Our research deals with a barrier coverage problem, of which the goal is how well a deployment detects events for some pedestrians to cross sensing area of CCTVs. Hence, it is very natural that the detection rate is considered as our fitness evaluation measure. The detection rate is calculated by applying equation (1) to the returned result. The most important part of the parallelism of the CCTV deployment simulator was a path search using the A* algorithm. The A* algorithm priority queue enabled a hip alignment in space with a fixed size. The
Table 2. GA parameters for CCTV deployment. Parameter
Contents
Fitness function Encoding Population size Number of generations Selection operation Crossover operation Variation operation Substitution operation
Calculation of detection rate according to pedestrian movement Two-dimensional integer coordinates (50, 75, and 200 dimension) + 8 directions 100 100 Selection of solution with the highest quality after randomly selecting solution from population (t = 3) Block-even crossover Gene-wise (5%) If quality of the child solution is better than the solution with the lowest quality in the population, the solution with the lowest quality is substituted with the child solution
CCTV: closed-circuit television; GA: genetic algorithm.
Seo et al.
5
Figure 2. Flowchart of the proposed GA.
weighted value was put in the direction of pedestrian movement when setting the level values to diversify moving paths. Moving paths were attempted to be diversified by generating random numbers when setting the level values, but too much computing time was used owing to the frequent uses of randomnumber generation function. Moreover, the randomnumber generator used in CPU experiments could not be used in CUDA kernel function in the same way. We made a random-number generator based on the method proposed by Park and Miller.20 The generator effectively reduced the computing time in CUDA kernel function. Genetic algorithms have a trade-off between solution quality and computation time. When population size or the number of generations is large, genetic algorithm may converge too slowly. On the contrary, if it is small, the algorithm may take less computing time but produce worse solutions. Genetic algorithms usually adopt 100–200 solutions over 100–500 generations.21,22 Although not shown in this article, we conducted several preliminary experiments varying population size and the number of generations to find better parameter settings. When population size is 100 and the number of generations is 100, we could reach to an appropriate trade-off point.
Calculation of parallel A* algorithm A path search calculation, which is part of the fitness function, required the longest computing time in the
Figure 3. Blocks and threads for detection of moving persons based on NVIDIA CUDA platform.
CPU-based CCTV deployment simulation results. Figure 3 shows the configuration of the NVIDIA CUDA platform23 block used for the parallelism of the path search and thread per block. The number of blocks and thread per block are set as shown in Table 3. A total of 100 blocks are set, 10 threads work on a block, and each thread is used to calculate a path search of pedestrians. Usually, the number of blocks and that of threads per block used in CUDA kernel function are set considering the number of CUDA cores and the amount of shared memory of the graphic card used in the experiments. In addition, various optimization algorithms are used to reduce computation time in GPU.
6
International Journal of Distributed Sensor Networks deployed for all solutions were encoded in a 1D array; x represents the starting address of the x coordinate array, and y represents the starting address of the y coordinate array. dir refers to the starting address of the direction array in the direction the CCTV faces. cmap is the constant memory of the GPU used to transfer the topography information of Scenario 1 or Scenario 2. CCTV location and direction information is saved in the shared memory that has a quick processing speed. This information can be shared between threads that are in one block. This enables an efficient use of memory and an improvement in computing speed. Topography information for the scenarios and CCTV deployment information are saved in the 2D array map. The path of pedestrian movement is searched and saved into the path in advance and then returned. The pedestrian moves according to the path information, and detection is determined by the deployed CCTV. Detection of the pedestrian is judged every time the pedestrian passes each cell. The value of the detected variable is increased by 1 when a pedestrian is detected, and the undetected variable is increased by 1 for accumulation when a pedestrian is not detected. The detection rate is calculated using equation (1).
Experiments Experimental environments Figure 4. Pseudocode of kernel function for POD calculation.
Table 3. Blocks and threads used in detection of moving persons. Classification
Dimension
Number of blocks per grid Number of threads per block
ðx, y, zÞ = ð100, 1, 1Þ ðx, y, zÞ = ð10, 1, 1Þ
We set the number of blocks and that of threads per block considering graphic card specifications given in Table 5, shared memory usage used in CUDA kernel function of Figure 4, and implementation simplicity. We set the number of threads per streaming multiprocessor (SM) to be 10 because the amount of shared memory used in the CUDA kernel function was large.
Implementation of parallel A* algorithm Figure 4 shows a pseudocode to detect pedestrians who pass the RoI. Location information on all CCTVs
The experiment is composed of various combinations with different types and numbers of CCTVs. To measure the detection rate of pedestrians, a boundary is formed by considering buildings and roads in a downtown environment. Optimizations of sensor and CCTV deployment maximize the detection rate of detectable entities and use fewer CCTVs. A fitness evaluation is performed by using the NVIDIA CUDA platform to enhance the computing speed, which is a disadvantage in metaheuristic algorithms.5 A statistical method is used on the experimental results of the CPU and GPU to compare and analyze detection rate values. The computing speeds of NVIDIA GeForce GTX graphic cards are compared. In addition, improvements in the kernel function are presented through an analysis of the kernel function using the NVIDIA Nsight profiler.24,25 The C language and CUDA-C language based on NVIDIA CUDA platform 7.0 were used to develop 2D deployment simulators to establish the experimental environment. A CCTV deployment simulator and genetic algorithm based on CPU were implemented using the C language. A parallel CCTV deployment simulator was implemented using the CUDA-C language.
Seo et al.
7
Table 4. CPU specifications used in experiments.
Table 6. The number and types of CCTVs.
CPU components
Intel Core 2 Duo
Instances
CCTV1
CCTV2
CCTV3
CPU clock (GHz) CPU cores Memory (MB)
3.0 2 4096
Instance 1 Instance 2 Instance 3
15 15 20
25 25 25
30 30 40
CPU: central-processing unit.
CCTV: closed-circuit television.
Table 5. GPU specifications used in experiments. GPU Components
GTX 970
GTX 770
GTX 560
GPU clock (MHz) CUDA cores Memory clock rate (MHz) L2 cache size (kB) Global memory (MB)
1317 1664 3505 1792 4096
1189 1536 3505 512 4096
1620 336 2010 512 2048
GPU: graphics-processing unit; CUDA: Compute Unified Device Architecture.
Table 4 shows information about the clock speed, the number of cores, and the amount of CPU memory used in the experiment. (These are commonly used in all of the experiments.) Table 5 lists information about the three types of graphic cards used in the experiment. The NVIDIA GeForce GTX 970 graphic card is mainly used, and the other two cards are used for comparative experiments. The number of CUDA cores and the L2 cache size are the items that have significant influence on enhancing the computing speed of the GPU specification. The L2 cache is used in shared memories and registers. Using the L2 cache can result in great enhancements in the performance of the memory approach speed. The L2 cache of the NVIDIA GeForce GTX 970 is more than three times larger than that of other GPUs, and the time for memory approach and copying can be greatly shortened if this is used effectively. A significant reduction in computing time can be achieved when there are many tasks that must be processed at the same time and do not require highperformance operation. In addition, the computing speed can be improved by optimizing software aspects such as block and thread configurations that have high efficiency, using a random-number generator in the GPU environment,20 and by avoiding or reducing remainder operations and division operations.
Results The experiment was repeated 30 times, which is the minimum statistically significant number.19 Scenarios 1 and 2 were classified according to topography and were also classified according to the numbers and types of
CCTVs. These are listed in Table 6. An analysis of the experiment results included a statistical analysis of the detection rate using 3s, comparison of computing time, and an analysis of the kernel function using the NVIDIA Nsight profiler. The experiment resulted in an average of the optimum solutions of 30 experiments. Statistical analysis of CCTV deployment. Table 7 lists the results of the experiment in Scenario 1. The averages of the NVIDIA GeForce GTX 970 and CPU were similar to the analysis of the 95% confidence interval in Figure 5(a). The computing speed of the NVIDIA GeForce GTX 970 was approximately 3.7 times faster. Table 8 lists the results of the experiment in Scenario 2. The averages of the NVIDIA GeForce GTX 970 and CPU were similar to the analysis of the 95% confidence interval in Figure 5(b). The computing speed of the NVIDIA GeForce GTX 970 was approximately 4.2 times faster. Figure 6 shows the results of a comparison between computing time according to scenario and instance. The computing time of the NVIDIA GeForce GTX 560 and NVIDIA GeForce GTX 770 was similar to the results listed in Tables 9 and 10. The computing time of NVIDIA GeForce GTX 970 was decreased by approximately half. In addition, the NVIDIA GeForce GTX 970, which has an L2 cache with a capacity that was three times larger, showed relatively better performance. Figure 7 shows the comparative experiment of computing time using three types of NVIDIA GeForce GTX graphic cards on Scenario 1. An overall even difference in computing time is shown for Instances 1 and 2. The time width of the operation in Instance 3 increases, and it is shown that the GPU operation time has significant influence when there are more than 100 deployed CCTVs. Figures 8 and 9 show the results of Instance 3 experiment and examples of CCTV deployment for Scenarios 1 and 2, respectively. The CCTV detection directions of the initial solutions in Figures 8(a) and 9(a) are different, but the cases of CCTV detection direction facing the direction of pedestrian movement increased slightly in the optimum solutions in Figures 8(b) and 9(b). In Figures 8 and 9, CCTV deployment location was concentrated along the path of pedestrian movement.
8
International Journal of Distributed Sensor Networks
Table 7. Test results of CCTV Scenario 1.
GTX 970 CPU
Instance
Average (%)
Standard deviation (%)
GA (s)
Instance (s)
POD (s)
Random (%)
Instance 1 Instance 2 Instance 3 Instance 1 Instance 2 Instance 3
86.2 93.0 96.1 84.9 91.8 96.0
3.1 1.9 1.2 2.4 1.9 1.3
3.694 3.716 3.749 12.688 13.300 13.751
3.651 3.653 3.668 12.648 13.241 13.674
3.268 3.276 3.284 N/A N/A N/A
56.9 68.9 76.4 55.8 67.2 75.0
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.
Figure 5. 95% confidence interval graph of CCTV deployment: (a) 95% confidence interval graph of Scenario 1 and (b) 95% confidence interval graph of Scenario 2.
CCTV kernel function analysis. An analysis of the kernel function in the three aspects of occupancy, instruction statistics, and issue efficiency were performed using NVIDIA Nsight profiler. A total of 100 blocks were allocated in the implemented kernel function, and 10 threads per block were set. In the occupancy analysis, the number of warps that can be used when there are 10 threads per block is
shown in the ‘‘Varying Block Size,’’ and low values are shown proportionately to the number of blocks and threads per block that were configured. An enhancement in performance is expected by increasing the number of threads per block. A total of 64 registers per thread are used in the ‘‘Varying Register Count.’’ It is considered that to enhance performance, an adjustment procedure is required to increase the number of
Seo et al.
9
Table 8. Test results of CCTV Scenario 2.
GTX 970 CPU
Instance
Average (%)
Standard deviation (%)
GA (s)
Fitness (s)
POD (s)
Random (%)
Instance 1 Instance 2 Instance 3 Instance 1 Instance 2 Instance 3
85.1 91.5 94.9 84.8 90.1 94.2
2.9 1.9 1.5 2.9 2.2 1.6
4.061 4.088 4.116 16.096 16.880 17.041
4.016 4.025 4.035 16.056 16.820 16.966
3.636 3.646 3.651 N/A N/A N/A
51.9 59.3 68.7 49.3 58.7 69.0
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.
Table 9. Comparison of Test results according to GPU types for CCTV Scenario 1.
GTX 970 GTX 770 GTX 560
Instance
Average (%)
Standard deviation (%)
GA (s)
Instance (s)
POD (s)
Random (%)
Instance 1 Instance 2 Instance 3 Instance 1 Instance 2 Instance 3 Instance 1 Instance 2 Instance 3
86.2 93.0 96.1 86.2 92.6 96.3 87.0 92.7 95.1
3.1 1.9 1.2 2.7 1.8 1.0 2.1 2.1 1.8
3.694 3.716 3.749 7.124 7.055 7.051 6.946 7.000 8.589
3.651 3.653 3.668 7.079 6.990 6.968 6.897 6.935 8.506
3.268 3.276 3.284 6.719 6.628 6.600 6.420 6.453 8.030
56.9 68.9 76.4 57.3 67.5 76.2 56.9 67.4 69.1
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.
Table 10. Comparison of Test results according to GPU types for CCTV Scenario 2.
GTX 970 GTX 770 GTX 560
Instance
Average (%)
Standard deviation (%)
GA (s)
Fitness (s)
POD (s)
Random (%)
Instance 1 Instance 2 Instance 3 Instance 1 Instance 2 Instance 3 Instance 1 Instance 2 Instance 3
85.1 91.5 94.9 85.7 91.5 94.9 84.4 91.4 95.0
2.9 1.9 1.5 3.1 2.1 1.9 3.2 1.8 1.8
4.061 4.088 4.116 7.939 7.873 7.830 8.483 8.545 8.597
4.016 4.025 4.035 7.894 7.807 7.748 8.437 8.479 8.514
3.636 3.646 3.651 7.533 7.442 7.384 7.972 8.005 8.035
51.9 59.3 68.7 51.0 60.4 68.9 50.6 59.0 69.6
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.
registers to use per thread. The number of warps when using a shared memory of 1200 bytes per block is shown in the ‘‘Varying Shared Memory Usage.’’ It is considered that an enhancement in performance can be expected by increasing the used shared memory per block. In the analysis of instruction statistics, the average of total SMs in the ‘‘SM Activity’’ is 93.50%, but the overall SM activity is unbalanced in that the exculpation time is different or the number of scheduled blocks per SM is variable. Thus, different activities in each SM, that is a Tail Effect, have an unfavorable influence on the total execution time. The balance level of workload distribution is also not effective in the ‘‘Instructions per Warp.’’ This can be considered as an effect of cases that have
conditional statements depending on the block index, or when there is a high workload only in a few blocks. The overall low performance appears to be a result of the concentrated operations in a few CUDA cores. In addition, the path-searching part of the A* algorithm is processed in order as in the CPU experiment. A considerable enhancement in performance is expected by improving the parallelism level in the path-searching part. The analysis of issue efficiency highlights an efficiency issue of the kernel function in CCTV deployment. The number of warps per SM is considerably small in the ‘‘Warps per SM,’’ and the value of ‘‘No Eligible’’ in the ‘‘Warp Issue Efficiency’’ is very high. This shows that the efficiency of the kernel code is not high. The ‘‘No
10
International Journal of Distributed Sensor Networks
Figure 6. Comparison between CPU and GPU computing time according to scenarios and the number of CCTVs: (a) CPU versus GTX 970 (Scenario 1) and (b) CPU versus GTX 970 (Scenario 2).
Figure 7. Comparison of computing time of GPUs according to the number of CCTVs in Scenario 1.
Seo et al.
11
Figure 8. Optimized CCTV deployment of Scenario 1: (a) initial solution and (b) improved solution.
Figure 9. Optimized CCTV deployment of Scenario 2: (a) initial solution and (b) improved solution.
Eligible’’ part is related to the contents in the ‘‘Issue Stall Reasons,’’ and the memory dependency value in the ‘‘Issue Stall Reasons’’ is high, with a value of 70.25%. The values show a high possibility that the requested resource cannot be used or is being completely used by another process or too many processors have requested the resource. It is considered that this problem can be improved at some point by memory alignment and optimization of the memory approach pattern.
Analysis. A comparison of detection performance, comparison of computation time, and analysis of the kernel function using the NVIDIA GeForce GTX 970 were
analyzed in the experiment. The simulator used in the experiment is classified into CPU-based sequential processing simulators and GPU-based parallel processing simulators. The C language was used to develop the sequential processing simulator, and a CUDA-C language based on the NVIDIA CUDA platform 7.0 was used to develop the parallel processing simulator. It was found in the analysis (on the 95% confidence interval graph using 3s) that there was no significant difference in detection rate between the experiment results from the CPU and NVIDIA GeForce GTX 970. With regard to computing speed, the NVIDIA GeForce GTX 970 experiment was approximately four times faster than the CPU experiment.
12
Concluding remarks A generational genetic algorithm was used to optimize 2D deployment in the boundary coverage of this study. The C language and a CUDA-C language were used to develop the sequential processing simulator and parallel processing simulator for the evaluation of fitness. The CCTV deployment issues in the two downtown environments were used as the case studies, and the results of various experiments using these cases were analyzed. An experimental analysis consisted of a statistical analysis on the detection rate, a comparison between the computing time of the sequential processing experiment and parallel processing experiment, and an analysis of the CUDA kernel function. The detection rates of the CPU experiment and the experiment using the NVIDIA GeForce GTX 970 graphic card were similar and showed a considerable improvement in computing time. Problems with the kernel function were investigated, and an analysis of future improvements was conducted. Like other precedent papers, a direct comparison of values with the results of other research is difficult. However, a new method for the optimization of 2D deployment was presented in from a methodological aspect and is expected to be used in various military, security, and invasion detection fields. It is considered that more realistic experiment results can be obtained if various metaheuristic-based algorithms are used by finding parameter combinations of high efficiency. It is required to find efficient GPU parameter settings to reduce computing time. It is also considered that the efficiency of block settings and memory usage must be increased in order to enhance the parallelism level of the simulators. Declaration of conflicting interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Wonkwang University in 2017.
References 1. Mostafaei H. Stochastic barrier coverage in wireless sensor networks based on distributed learning automata. Comput Commun 2015; 55: 51–61. 2. Cardei M and Wu J. Coverage in wireless sensor networks. In: Ilyas M and Mahgoub I (eds) Handbook of sensor networks. Boca Raton, FL: CRC Press, 2004, pp.1–18.
International Journal of Distributed Sensor Networks 3. Wang B. Build intrusion barriers. In: B Wang (ed.) Coverage control in sensor networks. London: Springer, 2010, pp.175–186. 4. Tao D, Tang S, Zhang H, et al. Strong barrier coverage in directional sensor networks. Comput Commun 2012; 35(8): 895–905. 5. Garey M and Johnson DS. Computers and intractability: a guide to the theory of NP-completeness. San Francisco, CA: Freeman, 1979. 6. Holland JH. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Cambridge, MA: The MIT Press, 1992. 7. Yoon Y and Kim Y-H. An efficient genetic algorithm for maximum coverage deployment in wireless sensor networks. IEEE T Cybernetics 2013; 43(5): 1473–1483. 8. Zhang Q. COMP 4905-Honours Project. Carleton University, 2013. 9. Wang Y and Cao G. Barrier coverage in camera sensor networks. In: Proceedings of the twelfth ACM international symposium on mobile ad hoc networking and computing, Paris, 17–19 May 2011, Article no. 12. New York: ACM. 10. Navin AH. Distributed genetic algorithm to solve overage problem in wireless camera-based sensor networks. Res J Recent Sci 2015; 4(12): 106–109. 11. Shih KP, Chou CM, Liu IH, et al. On barrier coverage in wireless camera sensor networks. In: 2010 24th IEEE international conference on advanced information networking and applications (AINA), Perth, WA, Australia, 20– 23 April 2010, pp.873–879. New York: IEEE. 12. Navin AH, Asadi B, Pour SH, et al. Solving coverage problem in wireless camera-based sensor networks by using genetic algorithm. In: 2010 international conference on computational intelligence and communication networks (CICN), Bhopal, 26–28 November 2010, pp.226–229. New York: IEEE. 13. Ma H, Yang M, Li D, et al. Minimum camera barrier coverage in wireless camera sensor networks. In: 2012 Proceedings IEEE, INFOCOM, Orlando, FL, 25–30 March 2012, pp.217–225. New York: IEEE. 14. Costa DG and Guedes LA. The coverage problem in video-based wireless sensor networks: a survey. Sensors 2010; 10(9): 8215–8247. 15. Tabu search, https://en.wikipedia.org/wiki/Tabu_search 16. Glover F. Future paths for integer programming and links to artificial intelligence. Comput Oper Res 1986; 13(5): 533–549. 17. Gradient decent, https://en.wikipedia.org/wiki/Gradient_ descent 18. Deift P and Zhou X. A steepest descent method for oscillatory Riemann-Hilbert problems. Asymptotics for the MKdV equation. Ann Math 1993; 137(2): 295–368. 19. Hogg RV and Tanis EA. Probability and statistical inference. New York: Macmillan, 2014. 20. Park SK and Miller KW. Random number generators: good ones are hard to find. Commun ACM 1988; 31(10): 1192–1201. 21. Deb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE T Evolut Comput 2002; 6(2): 182–197.
Seo et al. 22. Horn J, Nafpliotis N and Goldberg DE. A niched Pareto genetic algorithm for multiobjective optimization. In: Proceedings of the first IEEE conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, Orlando, FL, 27–29 June 1994, pp.82–87, New York: IEEE. 23. Kro¨mer P, Sna˚sˇ el V, Platosˇ J, et al. Many-threaded implementation of differential evolution for the CUDA
13 platform. In: Proceedings of the 13th annual conference on genetic and evolutionary computation, Dublin, 12–16 July 2011, pp.1595–1602. New York: ACM. 24. NVIDIA Nsight Visual Studio edition 4.5 user guide, http://docs.nvidia.com/nsight-visual-studio-edition/4.5/ Nsight_Visual_Studio_Edition_User_Guide.htm 25. NVIDIA. Compute Unified Device Architecture (CUDA) C programming guide. NVIDIA Corporation, 2014.