The final publication is available at IOS Press through http:// dx.doi.org/10.3233/ICA-180579 1
Undefined 1 (2017) 1–14 IOS Press
Motion detection with low cost hardware for PTZ cameras Jesús Benito-Picazo a,∗ , Enrique Domínguez a , Esteban J. Palomo a , Ezequiel López-Rubio a and Juan Miguel Ortiz-de-Lazcano-Lobato a a
Department of Computer Languages and Computer Science, University of Málaga. Bulevar Louis Pasteur, 35. 29071 Málaga. Spain. E-mail: {jpicazo,enriqued,ejpalomo,ezeqlr,jmortiz}@lcc.uma.es
Abstract. Pan-tilt-zoom (PTZ) cameras are well suited to motion detection and tracking objects due to their mobility. Motion detection approaches based on background difference have been the most used with fixed cameras because of the high quality of the achieved segmentation. However, time requirements and high costs prevent most of the algorithms proposed in literature from exploiting the background difference with PTZ cameras in real world applications, such as automatic surveillance. This paper presents a new algorithm to detect moving objects within an area covered by a PTZ camera while it is panning, tilting or zooming in or out. The low computational demands of the algorithm allow for its deployment to a Raspberry Pi microcontrollerbased board, which enables the design and implementation of a low-cost monitoring system that is able to perform real-time image processing. First, our system works offline to estimate the parameters of the motion detection model, which are written on the Raspberry Pi memory. Second, motion detection is performed online by the microcontroller. Experimental results using different moving objects classifiers (FANN, KNN, and SVM) confirm the good performance of this approach in terms of different classification performance measures (accuracy, F-measure, AUC, and sample processing time). Keywords: Motion detection, foreground detection, feed forward neural network, background modeling, PTZ cameras, background features
1. Introduction Video surveillance systems have become an extremely active research area due to increasing levels of social conflict and public awareness about security issues. This has led to motivation for the development of robust and precise automatic video surveillance systems, where real time operation is essential for their successful deployment [1,2]. Motion detection can be considered as the first process carried out by video surveillance systems. Motion detection consists of detecting a change in the position of an object relative to its surroundings or a change in the surroundings relative to an object. One of the most common motion detection algorithms is * Corresponding
author. E-mail:
[email protected].
to compare the current frame in the video sequence with the previous one, which is regarded as a reference frame. If the pixel color difference is greater than a predefined alarm level or threshold, a motion event alarm is generated. It clearly works under easy conditions of foreground objects, motion speed and frame rate but it is very sensitive to the threshold. A noisy image could make the algorithm to detect motion in many places even though there is no motion at all [3,4]. This task is known to be a significant and difficult research problem in many real environments where different approaches have been proposed based on background subtraction [5,6], non-panoramic background models [7] or optical flow methods [8,9]. Pan-Tilt-Zoom (PTZ) cameras are known for their suitability to both identify and recognize objects in deep scenes due to its remote directional and zoom
0000-0000/17/$00.00 © 2017 – IOS Press and the authors. All rights reserved
2
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
control. Besides the movement options in both axes (pan and tilt), this kind of cameras have an adjustable optical lens, which can be controlled to permit both wide area coverage and close-up views (zoom). These capabilities are particularly useful in surveillance applications, where tracking of targets or zooming on biometric details are common tasks of this kind of vision systems. Research on automated computer vision systems based on PTZ cameras has been intense for many years: [10] presents a method based on robust principal component analysis to separate the stationary part in a video (background) from the non-stationary part (foreground); [11] creates a bag of reduced dimensionality features that is given to a support vector machine so that it can learn a background model; in [12] a dictionary is learned from a bag of sub-events (local observations in each video clip) and multi-instance classifiers are used for deciding whether an item in the dictionary is abnormal or not; and [13] describes an omnidirectional tracking system very sensitive to motion detection because is part of a perimeter security system which has to detect trespassers with camouflage or access to environment cover. Other papers also study the camera calibration [14] and distributed camera systems [15,16]. Despite the numerous aforementioned surveillance systems and algorithms for motion detection, none of them meets the following three requirements simultaneously: 1. Real time operation. Otherwise, the algorithm will not have a practical application. 2. Motion detection in non-stationary video, due to the own movement of the camera taking the images. 3. Low power consumption. It is a desirable feature because the full computing power of a modern computer with several core processors and a powerful graphic card may be unavailable, thus limiting the scope of application of the algorithms. Microcontroller boards are economic, small and flexible hardware devices. They are frequently employed in motion detection systems due to their low energy consumption and reduced cost [17,18,19]. Kinetically challenged people can benefit from microcontroller based input devices specifically designed for them, which measure motion on a plane in real time [20]. Compact versions of diverse evolutionary algorithms have been proposed for typical control robotics
problems [21,22]. A flexible Printed Circuit Board (PCB) prototype which integrates a microcontroller has been proposed to estimate motion and proximity [23]. In this prototype, eight photodiodes are used as light sensors. The efficiency of solar energy plants can be improved by low power systems which estimate cloud motion [24]. The approximation of the cloud motion vectors is carried out by an embedded microcontroller, so that the arrangement of the solar panels can be optimized for maximum electricity output. Energy-saving street lighting for smart cities can be accomplished by low power motion detection systems equipped with low consumption microcontrollers and wireless communication devices [25]. This way, the street lamps are switched on when people are present in their surroundings. Finally, a motion detection algorithm based on Self-Organizing Maps (SOMs) was developed in an Arduino DUE board [26]. The implementation of the SOM algorithm was employed as a motion detector for static cameras in a video surveillance system. In this paper, a motion detection algorithm, which is able to work as the PTZ camera is panning, tilting or zooming in or out, and its real-time implementation on an inexpensive microcontroller is proposed. This work is an extension of the paper [27], where a feed forward neural network was proposed to detect motion taking into account only the pan movement. In this work, two new movement models, tilt and zoom, have been incorporated and two other classifiers have been proposed in order to compare the performance of them. The system is able to detect motion by analyzing the output of a moving PTZ camera. For this purpose three methods, which are commonly used in artificial intelligence, are proposed. First method consists of using a feed forward neural network that will be used as a classifier, which is a typical application of artificial neural networks [28,29,30], in particular for visual recognition tasks [31]. Deep learning neural networks are heavily used for visual recognition [32,33], in particular convolutional neural networks [34,35]. However, inexpensive low power general purpose microcontrollers do not currently have the computational capabilities required to simulate convolutional neural networks in real time. Therefore, they cannot be employed for our present purposes. The second method that is going to be used to process the output of the PTZ camera will comprehend the application of the K-nearest neighbors algorithm. The last method consists of a support vector machine (SVM), which represents another popular and powerful way to build a binary classifier.
3
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
fra
me
PTZ camera O ffl in
e
fra
m es
s
Raspberry Pi
Model parameters
e lin On
Motion detections
The structure of this paper is as follows. Section 2 presents the motion detection algorithm for cameras which pan, tilt or zoom. Section 3 outlines the hardware part of the system, which is based on the Raspberry Pi 3 model B microcontroller-based board, and the employed software architecture. Experimental results on real video footage are reported in Section 4. Finally, Section 5 contains our conclusions.
Computer
2. Motion detection model
Fig. 1. Block diagram of the proposed system.
As mentioned before, our goal is to detect the motion of foreground objects while a PTZ camera is moving. Let us consider the image acquired by the camera: f : R3 → R3
(1)
f (x1 , x2 , t) = (y1 , y2 , y3 )
(2)
where x = (x1 , x2 ) ∈ [−A, A] × [−B, B] are the video frame coordinates in pixels, with (0, 0) at the center of the image and frame size (2A)×(2B) pixels; t is the time instant; and y = (y1 , y2 , y3 ) comprises the color tristimulus values [36] at the frame location and instant of interest. The procedure to be followed in order to detect foreground object motion in the scene depends on a simplified model of movement that the PTZ camera is carrying out, namely pan, tilt or zoom. Here it is assumed that the camera does not carry out more than one kind of movement at a time. This does not limit the range of reachable camera configurations, but it could slow down its operation slightly. The overall structure of our system is depicted in Figure 1. As seen, the PTZ camera provides video frames to a desktop computer which processes them offline in order to estimate the parameters of the motion detection model, including the pretrained synaptic weights of a neural network (see Subsection 2.5). Then the computer writes those parameters on the persistent memory of the Raspberry Pi microcontroller. Finally, at deployment time, the microcontroller works standalone without the intervention of the computer, by fetching video frames online from the PTZ camera and computing the motion detections. Optionally the motion detections can be sent to the desktop computer.
Next the models for the pan, tilt and zoom movements are detailed in Subsections 2.1, 2.2 and 2.3, respectively. The main novelty in them with respect to previous literature is the proposed model parameter estimation method. After that, our proposed procedure to detect motion is presented in Subsection 2.5, which is completely novel to the best of our knowledge. 2.1. Pan model For a PTZ camera moving in the horizontal direction (pan) one can write the following approximation of the observed change in the video data: (x1 , x2 ) , (x1 + δ, x2 ) ∈ [−A, A] × [−B, B] ⇒
f (x1 , x2 , t) ≈ f (x1 + δ, x2 , t + )
(3)
where δ is the observed horizontal displacement of the image as units of time have elapsed, and the equality does not hold due to optical effects such as lens aberration, and the motion of foreground objects in the scene. The precondition means that the approximation applies to those points in the scene that are visible both at time t and t + . In other words, the simplified pan model means that when the camera is panning, the current frame is roughly the same as the previous one, but shifted by δ pixels in the horizontal direction, i.e. left or right depending on the current movement. Then the error in the approximation can be computed as follows: M ove (t) = pan ⇒
E (x1 , x2 , t) = f (x1 , x2 , t)−f (x1 + δ, x2 , t + ) (4)
4
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
2.2. Tilt model The tilt movement for a PTZ camera is a vertical displacement. Therefore the following approximation can be employed: (x1 , x2 ) , (x1 , x2 + γ) ∈ [−A, A] × [−B, B] ⇒ f (x1 , x2 , t) ≈ f (x1 , x2 + γ, t + )
(5)
where γ is the observed vertical displacement of the image as units of time have elapsed, and again the equality does not hold due to optical effects such as lens aberration, and the motion of foreground objects in the scene. The precondition means that the approximation applies to those points in the scene that are visible both at time t and t+. Expressed in simple terms, this simple tilt model assumes that when the camera is tilting, the current frame can be approximated by the previous one, but shifted by γ pixels in the vertical direction, i.e. up or down depending on the current movement. Under these conditions the error in the approximation can be computed as follows:
the center of the video frame, α > 0 is the observed change in the field of view as units of time have elapsed, and again the equality does not hold due to optical effects such as lens aberration, and the motion of foreground objects in the scene. This means that when the camera is zooming, a circle centered at the video frame center in the previous frame can be approximated by another circle in the current frame. The circle in the current frame is bigger than the circle in the previous frame if the camera is zooming in, and vice versa. Here α is the ratio between the radii of the previous and current circles. The precondition means that the approximation applies to those points in the scene that are visible both at time t and t + . Then the error in the approximation can be computed as follows: M ove (t) = zoom ⇒
E (ρ cos θ, ρ sin θ, t) =
f (ρ cos θ, ρ sin θ, t)−f (αρ cos θ, αρ sin θ, t + ) (8) 2.4. Model parameter estimation
M ove (t) = tilt ⇒
E (x1 , x2 , t) = f (x1 , x2 , t) − f (x1 , x2 + γ, t + ) (6)
For given values of and the camera speed, the value of the model parameter (δ, γ or α) can be estimated experimentally by finding the value (of δ, γ or α) which minimizes kEk, where k·k stands for any suitable norm.
2.3. Zoom model
2.5. Motion detection
The zoom movement for a PTZ camera increases (zoom out) or decreases (zoom in) the field of view. The center of the video frame (0, 0) can be assumed to be invariant, to a first approximation. Therefore the following model can be used:
Now it is important to realize that the error comes from two sources, namely optical effects and the presence of foreground objects: E (x1 , x2 , t) = Eoptical (x1 , x2 , t)+Eobjects (x1 , x2 , t) (9)
(ρ cos θ, ρ sin θ) , (αρ cos θ, αρ sin θ)
∈ [−A, A] × [−B, B] ⇒
f (ρ cos θ, ρ sin θ, t) ≈ f (αρ cos θ, αρ sin θ, t + ) (7) where (ρ, θ) are the polar coordinates of a pixel in a coordinate system such that the coordinate origin is
where E (x1 , x2 , t) is given by equations (4), (6) or (8) depending on the kind of movement of the PTZ camera at time instant t. The optical effects can be assumed to be small with respect to the foreground objects effect, if those objects are present: F ore (t) ⇔
kEoptical (x1 , x2 , t)k kEobjects (x1 , x2 , t)k (10)
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
where F ore (t) means that there are foreground objects in motion at time t. Moreover, if there are no foreground objects, then the associated error is zero: F ore (t) ⇔ Eobjects (x1 , x2 , t) 6= 0
(11)
In other words, the error for background pixels is only due to optical effects, while the error for foreground pixels also includes the color difference between the foreground object and the background. Therefore, the expectation of the error norm should be larger when foreground objects are present: E [kE (x1 , x2 , t)k | F ore (t)] > E [kE (x1 , x2 , t)k | ¬F ore (t)]
(12)
Our proposal takes advantage of this by training a classifier in order to estimate the probability that foreground objects are present, P (F ore (t)). The classifier is required to analyze the error information coming from different regions of the current frame and yield an estimated probability that it contains moving objects. To this end, the error norm is summarized by pixel columns, so that the sum of the norms of the differences of the pixels that should correspond to the same locations in the world are computed. Then the error norm sums are added for contiguous columns, so that a reduced set of sums of error norms are obtained. These sums are provided as inputs to the neural network, while the desired output z (t) is 1 whenever F ore (t) holds, and −1 otherwise. The probability is then estimated as follows: P (F ore (t)) =
1 (z (t) + 1) 2
(13)
Subsequently a probability threshold ϕ is applied in order to declare whether foreground object motion has been detected: Detection (t) ⇔ P (F ore (t)) > ϕ
(14)
The lightweight nature of our proposal consists in the use of pan, tilt and zoom models which ignore optical aberrations, whose compensation would require extensive calculations to map the pixels from one frame to the next frame. In other words, our model sacrifices some optical exactitude in order to obtain a fast mapping of the pixels from a frame to the next one. In the following section the details of the implementation of our proposal are described.
5
3. System architecture Hardware choice is such an important issue when it comes to microcontroller-powered computer vision applications. In general, projects involving real-time motion detection should consume a minimal amount of computing power, but at the same time, they must be affordable and low-energy consuming insofar as a certain amount of them may be required to monitor a medium sized building or building complex and the spots they are going to be placed in may not have access to the general power network. All these reasons present Raspberry Pi class microcontroller-based boards as a good choice for our project. Hence, we have chosen a Raspberry Pi 3 model B, running under Linux Raspbian distribution. This device presents a Broadcom BCM2837 microcontroller, featuring an ARM CortexV8 Quad Core CPU running at 1200 Mhz, 1 GB RAM, and a 32GB micro-SD data storage card. It can be powered by a 5.1v power source and its power consumption reaches 1.2 Amps/h approximately at max operating load. A software system composed of a preprocessing module that implements the mathematical model explained in section 2 and three different implementations of a binary classifier has been deployed to the microcontroller. Because of efficiency reasons, all of these modules have been programmed in C++ language. The first implementation consists of three multilayered perceptrons, one for each camera movement. Each neural network was trained previously in order to detect object motion in the video frames while the camera moves in the corresponding direction, i.e. panning, tilting or zooming. In real time frames captured by the camera are inputted to the neural network that is responsible for modeling the actual camera movement. Its output determines the presence or absence of motion. The second implementation uses the well-known KNN algorithm for each movement model to detect the aforementioned presence or absence of movement. The last implementation will train a SVM for each movement model to do the same task as the preceding two ones. This triple implementation for the classifier will be very useful to compare all of them so the most suitable method can be chosen. The third component of our system software architecture is a PTZ camera emulator called Virtual PTZ [37]. This software consists of a C++ library that simulates the functionality of an actual Sony SNC-RZ50N PTZ camera from spherical panoramic video footage. In particular, the experiments in this paper employ se-
6
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
quences obtained by a Point Grey Ladybug 3 Spherical camera (Figure 2). When it comes to PTZ cameras, Virtual PTZ software has been proposed as a valid framework for researching because of its capability of substituting a real PTZ camera by providing the user with the possibility of moving the virtual camera through an almostspherical 360◦ video frame that can be totally controlled and that is not affected by dynamical issues or physical limitations. The most important consequence is that this software presents the capability of performing experiments under totally reproducible conditions. Thus, the obtained results will be more controlled and significant. Since the only output our system requires from the PTZ camera is real-time video streaming, the virtual PTZ software stands as a convenient framework for this project.
formation stage, a comparison stage and a detection stage. The first stage actually depends on the movement model, and for that reason it will be described separately in the three subsections below. However, comparison and detection stages are common for the three movement models we are considering. In frame acquisition and transformation stage two consecutive frames from the camera will be captured and conveniently transformed to compensate the camera movement so that they can be properly compared. Even though pan and tilt movements are modeled similarly, they have to be processed independently. Appropriate values for parameters δ and γ may not be the same because they depend on movement speed and elapsed time, , between frames, which are factors that could differ from vertical to horizontal direction. 4.1. Pan model
4. Experimental results As explained in Section 2, our motion detection system can be regarded as an algorithm that is in charge of obtaining samples from a video stream coming from a PTZ camera and supplying them to a classifier that will decide whether there are foreground moving objects in this video stream. The first alternative for our classifier is a feed forward neural network. Because of its speed and ease of use compared to other artificial neural network models, it will consist of the popular and well-documented multilayer perceptron implementation named Fast Artificial Neural Network (FANN) by Nissen [38]. The second alternative for this classifier will be a KNN algorithm implementation we just made for the occasion, and the third alternative will based in the support vector machine implementation named SVM Light by Thorsten Joachims of the University of Cornell [39]. In order to increase our control over the experimentation process, tests have been performed from Point Grey Ladybug Spherical Camera videos obtained from the datasets available in [40] conveniently supplied to the Virtual PTZ camera emulator. The dataset consists of the ground truth, i.e., a set of background frames in which there are not moving objects, and a class-balanced set of samples containing 50% of background frames and 50% of frames where moving objects appear. Aiming to properly adapt our experiments to the three movement models proposed in Section 2, the object detection process that is carried out in our experiments consists of a frame acquisition and trans-
As aforementioned, the pan movement consists of the horizontal displacement of the image produced by the PTZ camera rotation over the Y axis. For these tests a 16f ps 640x480 resolution video has been used. In order to obtain the maximum number of samples for training, camera angular speed has been fixed to a constant rate of 16◦ /s and movement direction can be either clockwise or counterclockwise. After performing several tests, the δ value (see section 2.1) has been estimated to 5 pixels/◦ and Mean Squared Error (MSE) has been considered as the error norm kEk (see Section 2). This stage starts with the capture of two consecutive frames, namely n and n + 1, from a video stream supplied by a PTZ camera which is performing a panning movement. Next, as figure 3 shows, if movement direction is counterclockwise, frame n + 1 is shifted δ pixels to the left with respect to frame n to compensate camera rotation. If movement direction is clockwise, frame n + 1 is shifted δ pixels to the right with respect to frame n. As it also can be seen in figure 3, blank stripe in frame n + 1 resulting from the frame shifting will be filled with the same stripe coming from the n frame so it will not affect the comparison of both images. Once these transformations are performed, the two frames are ready to be compared in the next stage. This procedure has one limitation that has to be considered: As frame n + 1 is shifted δ pixels to compensate the camera movement, there is a “blank” stripe of δ pixels at the opposite side to the direction in which the movement is being executed that is trimmed from the two images. Because of that, a δ pixel-wide object
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
7
Fig. 2. 360◦ spherical image supplied by the Point Grey Ladybug 3 Spherical camera.
that was moving at the same speed as the camera could hide in that stripe without being detected. 4.2. Tilt model In an analog form, tilt movement consist of a vertical displacement of the image produced by the PTZ camera rotation over the X axis. For these tests the same video has been used as it was for the pan model, this is, a 16f ps video with 640x480 resolution. Also, the camera angular speed has been fixed to a constant rate of 16◦ /s, and in this case, movement direction can be either up or down. After performing several tests, the γ value (see section 2.2) has been estimated as 5 pixels/◦ and Mean Squared Error (MSE) has been considered as the error norm kEk (see Section 2). This stage starts with the capture of two consecutive frames from a video stream coming from a PTZ camera which is performing a tilting movement, namely n and n + 1. Next, as figure 4 shows, if the movement is in the down direction, frame n + 1 is shifted down γ pixels with respect to frame n to compensate camera rotation. If the movement is in the up direction, frame n + 1 is shifted up γ pixels with respect to frame n. As shown in figure 4, blank stripe in frame n + 1 resulting from the frame shifting will be filled with the same stripe coming from the n frame so it will not affect the comparison of both images. Once more, after these transformations, the two frames are ready for the comparison phase. This procedure has the same practical limitation as the one executed in the pan movement model, this is, as frame n + 1 is shifted γ pixels to compensate the camera movement, there is a “blank” stripe of γ pixels at the opposite side to the direction in which the movement is being executed that is trimmed from the two images. Because of that, a γ pixel-wide object that was moving at the same speed as the camera could hide in that stripe without being detected.
4.3. Zoom model As aforementioned, zoom movement consists of variating the field of view (FOV) angle of the camera and it is utilized to present certain area of the image to the viewer in a more detailed way. In order to keep consistency with the way we operate with the models above, we have selected the same 16f ps 640 x 480 resolution video stream as selected for those models. FOV angle will be within the range of [0◦ , 180◦ ] and changes at a 16◦ /s speed, so, consequently, zoom speed in both zoom-in and zoom-out movements will be of 16 units per second. Because of its nature, zoom movement model requires frames to be processed for their comparison in a different way than the two models above. And this processing will be different for zoom-in and zoom-out movements. If the camera is performing a zoom-in movement, frame n is going to be resized by expanding it a number of pixels β that will be set empirically. Because of this resize operation, the image obtained is going to be larger than 640 x 480 resolution which is the size of the n + 1 frame, so in order for both frames to be compared properly, a β pixels-wide picture frame has to be cut from the exterior border of the resized n frame. This process is shown in figure 5. It is important to remark that due to the zoom movement intrinsic features, β is not a constant value for all the zoom values, but it is going to grow high as zoom value does the same. Initially, a regression model was planned to be used to get a function that calculates β value, but it was computationally heavy and was not accurate enough. Instead, a lookup-table (LUT) was implemented where the index represents the zoom value in the instant the frame was captured and the value corresponding to that index would be the β value. This turned to be a better method in terms of accuracy and computing power requirements, which is something very important when it comes to real time operation using microcontroller-based hardware platforms.
8
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
Fig. 3. Example of how frame n + 1 is shifted to match frame n in Pan movement model.
Fig. 4. Example of how frame n + 1 is shifted to match frame n in Tilt movement model.
Fig. 5. Example of how frame n is transformed to match frame n + 1 in the zoom-in movement model.
If the camera is performing a zoom-out movement, frame n is going to be resized by shrinking it a number of pixels β that will also be obtained from a LUT which will be the same as the one used to perform the zoom-in movement. In this case, this transformation will yield a picture smaller than 640 x 480, which is the size of the n + 1 frame, so in order for both frames to be compared, a β pixels-wide picture frame has to be added to the exterior border of the resized n frame. This frame is going to be taken from the border of the
n + 1 image so it will not affect the comparison. This process is illustrated in figure 6. As it could be expected, This procedure has almost the same limitations as the one executed in the pan and tilt movement model, this is, as frame n+1 is shrink or expanded β pixels to compensate the zoom movement, there is a “blank” piece of frame at the border of the image of β pixels that is trimmed from the two images. Because of that, a β pixel-wide object that was moving at the same speed as the camera zoom could hide in that piece of frame without being detected. However,
9
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
this case would be very difficult because β is not a constant value, so the object would have to change its size in the same proportion as β does to accommodate into the blank uncontrolled piece of frame. Once the two consecutive frames are properly transformed our method moves to the comparison phase in which both frames are divided into 64-pixel wide stripes and the mean squared error is calculated for each stripe of the two frames. A a result, a 30 component vector (10 for each RGB color channel) plus one number, which will be 1 if the sample is positive and −1 if the sample is negative, will be saved as a sample for classifier training, validation and testing as illustrated in figure 7. Detection stage will consist of feeding this vectors to the classifier which for every sample will decide if there are foreground moving objects or not. In order to evaluate system performance and accuracy when it comes to detecting movement in video streams supplied by a PTZ camera, several tests have been performed. These tests involve multilayer perceptron general performance values measured for various different network topologies, KNN algorithm performance values for different number of nearest neighbors and SVM performance values for different values of its c parameter, which tells the SVM optimization how much you want to avoid misclassifying each training example. Multilayer perceptrons can be calibrated by modifying several parameters with the objective of achieving better performance rates. However, since the number of parameter combinations would eventually grow exponentially, testing the system by varying every parameter would not be practical. Therefore, for this work, perceptron training and performance comparisons are done just by modifying the number of neurons in its hidden layer, while keeping fixed the rest of them. Thus, the neural network used for this system will have the characteristics listed in Table 1. In the case of the KNN-based classifier, the simplicity of this algorithm makes it easy to tune by just changing the number of neighbors that are considered. Hence, the KNN-based classifier will have the parameters shown in table 2. Finally when it comes to the SVM-based classifier, because of the binary intrinsic quality of the problem, a linear kernel has been selected. The rest of the parameters of the SVM can be checked in table 3. As it was mentioned above, in this case, the c parameter will be used for tunning the SVM algorithm through the different experiments.
Neural Network class
Multilayer perceptron 30
Number of inputs
50-600
Number of neurons in hidden layer Number of outputs
1
Learning algorithm
Backpropagation
Max training epochs
10000 0.7
Learning rate
Table 1 Test parameters for multilayer perceptron
KNN-based
Algorithm class Number of inputs
30
Number of outputs
1
Learning algorithm
K-Nearest Neighbors
Table 2 Test parameters for KNN algorithm
Algorithm Class
SVM-based
Number of inputs
30
Kernel class
Linear
Number of outputs
1
Learning algorithm
SVM
Max training epochs
1000
Table 3 Test parameters for SVM-based classifier
Aiming to guarantee a correct performance evaluation of the different classifiers, a 10-fold crossvalidation procedure has been established for our system, so separate sample sets have been used for training, validation and test phases. 720 class-balanced samples have been used to train and test every network for every movement model. 576 of them have been used for training, 72 for validation and 72 for testing. Once again, in order to make a rigorous analysis of the performance measures obtained, it is mandatory to undertake these tests separately for each one of the three movement models as explained in the three sections below. 4.4. Pan model performance measures For this purpose, it is well known that several measures are available. Because of its simplicity, one of the
10
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
Fig. 6. Example of how frame n is transformed to match frame n + 1 in the zoom-out movement model.
Fig. 7. Example of how samples are obtained and supplied to the perceptron.
most popular ones is the classification accuracy, which computes the number of correct predictions divided by the total amount of test samples [41]. Figure 8 shows the accuracy values for the test sets versus the number of neurons in the hidden layer of the perceptron, the number of nearest neighbors considered for the KNN algorithm and the value of the c parameter. Even though accuracy is widely accepted as an acceptable performance measuring criterion, specially in cases like the one presented here, where the number of positive and negative samples is balanced, it is interesting to extend our experimental results with the Fmeasure (Figure 8), as it is considered as another valid performance measure which eventually can be even more reliable than accuracy [41]. As both accuracy and F-measure limit their performance measurement to one single threshold value, it has been considered that a measurement that integrates all possible threshold values is necessary in order to
complement the results represented in the charts [42]. For this purpose, the area under the ROC curve (AUC), has been calculated for every neural network model architecture, KNN number of neighbors and value of the c parameter for the SVM. So, in figure 8 the AUC values for the model can be checked. When it comes to the multilayer perceptron classifier, charts illustrate how the model reaches the highest performance levels at about 200 hidden layer neurons. The performance stays roughly the same as the number of hidden layer neuron number grows higher than 200 neurons. Results also show that for test samples in configurations comprehending 200 neurons or more, all three performance measures are above 80%. This means that 200 hidden neurons are enough to attain the maximum performance from the neural network classifier. However, charts also illustrate how both KNN and SVM classifiers seem to have a significantly better performance than the perceptron. In the case of the
11
0.95 0.9 0.85 0.8 0.75
1
SVM Accuracy value
1
KNN Accuracy value
1
0.95 0.9 0.85 0.8 0.75
0
200
400
600
Number of neurons
6
8
0.8
10
0.02
0.9 0.85 0.8 0.75
0.95 0.9 0.85 0.8 0.75
600
0.8 0.75
2
4
6
8
10
0.02
0.95
SVM AUC value
0.95
KNN AUC value
0.95
0.9 0.85 0.8
200
400
600
Number of neurons
0.06
0.08
0.1
0.9 0.85 0.8
0.75 0
0.04
C parameter value 1
0.75
0.1
0.85
1
0.8
0.08
0.9
1
0.85
0.06
0.95
Number of neighbors
0.9
0.04
C parameter value
SVM F-Measure value
0.95
400
0.85
1
Number of neurons
Multilayer Perceptron AUC value
4
1
200
0.9
Number of neighbors
1
0
0.95
0.75 2
KNN F-Measure value
Multilayer Perceptron F-Measure value
Multilayer Perceptron Accuracy value
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
0.75 2
4
6
8
Number of neighbors
10
0.02
0.04
0.06
0.08
0.1
C parameter value
Fig. 8. Accuracy, F-measure and AUC chart for test sets in pan model for multilayer perceptron, KNN and SVM methods (higher is better).
KNN algorithm, it gets its better results when the number of nearest neighbors considered is 1, this will be very convenient in terms of processing speed. In the case of the SVM, its easy to observe that it gets its better results as the c parameter gets smaller, falling quickly as c gets larger. Please, note that in order to prevent image data loss, all the results above were obtained by just processing images as they come from the virtual PTZ. This means no further image processing has been done to correct neither camera sensor noise nor camera lens aberration.
els having 150 neurons in their hidden layer. It increases its stability as the number of hidden layer neuron number grows higher and it is also remarkable that AUC value is over 90% from 100 hidden layer neurons on, giving us an idea of how this neural network models fit this particular problem’s solution. In this case, KNN algorithm still overcomes the perceptron as it gets higher values specially when it uses a small amount of nearest neighbors. On the other side, we can see that SVM performance is not good for this particular case. 4.6. Zoom model performance measures
4.5. Tilt model performance measures Charts in figure 9 reveal how the perceptron achieves performance levels higher than 80% from 50 hidden layer neurons on, and reaches its best for the mod-
Compared to the two previous movement models, zoom movement model is more complex as it requires more complex transformations to be made to the frames, involving resizing and cropping opera-
12
0.8 0.7 0.6 0.5
0.9
SVM Accuracy value
0.9
KNN Accuracy value
0.9
0.8 0.7 0.6 0.5
0
200
400
600
Number of neurons
0.7 0.6
4
6
8
10
0.02
Number of neighbors
0.8 0.7 0.6
0.04
0.06
0.08
0.1
C parameter value
0.9
SVM F-Measure value
0.9
0.8 0.7 0.6
0.9 0.8 0.7 0.6 0.5
0.5
0.5 0
200
400
600
2
Number of neurons
4
6
8
10
0.02
Number of neighbors
0.9
0.8 0.7 0.6 0.5 200
400
Number of neurons
600
0.06
0.08
0.1
0.9
0.8 0.7 0.6 0.5
0
0.04
C parameter value
SVM AUC value
0.9
KNN AUC value
Multilayer Perceptron AUC value
0.8
0.5 2
KNN F-Measure value
Multilayer Perceptron F-Measure value
Multilayer Perceptron Accuracy value
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
0.8 0.7 0.6 0.5
2
4
6
8
Number of neighbors
10
0.02
0.04
0.06
0.08
0.1
C parameter value
Fig. 9. Accuracy, F-measure and AUC chart for test sets in tilt model for multilayer perceptron, KNN and SVM methods (higher is better).
tions. Hence, some differences in its performance measures values could be expected. Values for accuracy, F-measure and AUC can be checked in figure 10 Charts reveal how the perceptron-based classifier achieves performance levels higher than 80% from 250 hidden layer neurons on, and reaches its best for the models having 350 neurons in their hidden layer. Even though performance levels are higher than 80% from 250 hidden layer neurons on, a slight performance drop can be observed with respect to the other two models. This could be caused by the interpolation process performed by the system to achieve resize operations applied to the frames involved. On the other hand, KNN and SVM-based classifiers achieve their best results, specially KNN, which is always above 90% in a very stable curve. Time consumption is an absolutely critical issue when it comes to real-time video processing, not to
say when using microcontrollers to undertake artificial neural network processes which involve real-time presence detection from a PTZ camera as the one proposed here does. Therefore, the algorithm not only has to be reliable but it also has to prove that the training time stays within acceptable limits and sample classifying is fast enough to provide real-time presence detection when being deployed in a Raspberry Pi. Once again, is important to remember that in this process three movement models are involved so separate analysis for each one is recommended. Again, as we are testing three different kinds of classifiers, the experiments involving each movement model have been repeated for each of the three classifiers. Thus, in tables from 4 to 12 we can observe the time performance results for the three different classifiers when executing the pan, tilt and zoom movement models. In the case of the multilayer perceptron classifier, both train-
13
0.9 0.8 0.7 0.6 0.5
1
SVM Accuracy value
1
KNN Accuracy value
1
0.9 0.8 0.7 0.6 0.5
0
200
400
600
Number of neurons
0.8 0.7 0.6
4
6
8
10
0.02
Number of neighbors 1
0.8 0.7 0.6 0.5
0.04
0.06
0.08
0.1
C parameter value 1
SVM F-Measure value
1 0.9
0.9 0.8 0.7 0.6
0.9 0.8 0.7 0.6
0.4 0.5 0
200
400
600
0.5 2
Number of neurons
4
6
8
10
0.02
Number of neighbors 1
0.9
0.9
0.9
0.7 0.6
SVM AUC value
1
0.8
0.8 0.7 0.6
0.5 200
400
Number of neurons
600
0.06
0.08
0.1
0.8 0.7 0.6
0.5 0
0.04
C parameter value
1
KNN AUC value
Multilayer Perceptron AUC value
0.9
0.5 2
KNN F-Measure value
Multilayer Perceptron F-Measure value
Multilayer Perceptron Accuracy value
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
0.5 2
4
6
8
Number of neighbors
10
0.02
0.04
0.06
0.08
0.1
C parameter value
Fig. 10. Accuracy, F-measure and AUC chart for test sets in zoom model for multilayer perceptron, KNN and SVM methods (higher is better).
ing average time and single sample average classifying time versus hidden layer neuron number can be observed when executing the algorithm in a Raspberry Pi with the features enunciated in Section 2. In the case of the KNN-based classifier, because of its intrinsic way of operation, no training has been developed further than loading the training patterns in memory. For this reason, tables involving KNN classifier only reflect the testing average time versus the number of nearest neighbors in each stage. In the case of the tables involving the SVM classifier, both training average time and test average time versus the value of the c parameter are shown. To give a clearer idea about our system performance, these tables also include the average speed (measured in frames per second, fps) the system can work at, when receiving a video stream from the PTZ camera.
Except for the KNN classifier, where no iterative training is needed (the patterns are just loaded into memory), training average time has been calculated from the values obtained by launching the training process 90 times, each one corresponding to the same number of neurons in the hidden layer for the MLP, number of nearest neighbors for the KNN-based classifier and c parameter value in the case of the SVMbased classifier. Single sample average time and processing speed in fps have been calculated from the values obtained by passing 72 different samples through the three algorithms. For the multilayer perceptron classifier, as it can be seen in tables 4, 7 and 10, sample processing speeds are approximately 53 frames per second for pan movement and tilt movement models, which are excellent frame rates for real-time video processing. Zoom movement sample processing speed reaches 30 frames per second approximately. In this
14
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
50
# Neurons
100
200
300
400
500
600
Training Avg. Time (s)
39.4977 26.4967 36.3206 53.5561 54.0558 62.4862 93.2795
Processing Avg. Time (s)
0.0195
Fps
54.0336 53.9504 53.7848 53.6203 53.4543 53.2944 53.1305
0.0196
0.0196
0.0197
0.0198
0.0198
0.0199
Table 4 Training average time and sample processing average time versus number of neurons in the hidden layer of the MLP for the pan movement model.
2
# Neighbors
4
6
8
10
Processing Avg. Time (s) 0.0273768 0.0273848 0.0273968 0.0273948 0.0273898 36.5273
Fps
36.5166
36.5006
36.5033
36.5099
Table 5 Sample processing average time versus number of neighbors in the KNN algorithm for the pan movement model.
c Training Avg. Time (s)
0.1
0.025
0.00625
0.001562
0.00039
0.03396
0.03391
0.03393
0.03559
0.08950
9.7 ∗ 10−5 2.4 ∗ 10−5 0.15261
0.0998
Processing Avg. Time (s) 0.0182184 0.0182184 0.0182185 0.0182183 0.0182182 0.018218 54.8896
Fps
54.8896
54.8893
54.8899
0.0182182
54.8908
54.8902
54.8902
Table 6 Training average time and sample processing average time versus c parameter value in the SVM algorithm for the pan movement model.
# Neurons
50
100
200
300
400
500
600
Training Avg. Time (s)
32.8929 43.6579 77.9102 97.1274 109.7185 153.3660 131.5068
Processing Avg. Time (s)
0.0182
0.0184
0.0185
0.0186
Fps
54.8161 54.7232 54.5292 54.3366 54.1336
53.9613
53.7525
0.0182
0.0183
0.0184
Table 7 Training average time and sample processing average time versus number of neurons in the hidden layer for the tilt movement model.
# Neighbors
2
4
6
8
10
Processing Avg. Time (s) 0.0275038 0.0274658 0.0275078 0.0275048 0.0274978 Fps
36.3586
36.4089
36.3533
36.3573
36.3665
Table 8 Sample processing average time versus number of neighbors in the KNN algorithm for the tilt movement model.
15
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
c Training Avg. Time (s)
0.1
0.025
0.00625
0.001562
0.00039
0.02851
0.02818
0.02883
0.0281
0.02857
9.7 ∗ 10−5 2.4 ∗ 10−5 0.04238
0.1476
Processing Avg. Time (s) 0.0182181 0.0182182 0.0182183 0.0182181 0.0182182 0.0182182 0.0182181 54.8905
Fps
54.8902
54.8899
54.8905
54.8902
54.8902
54.8905
Table 9 Training average time and sample processing average time versus c parameter value in the SVM algorithm for the tilt movement model.
50
# Neurons Training Avg. Time (s)
100
200
300
400
500
600
74.0976 86.7633 110.6237 66.1509 51.5172 47.3983 74.3243
Processing Avg. Time (s) 0.03218 0.03221 0.03227 0.03234 0.03241 0.03247 0.03254 31.0748 31.043
Fps
30.9824 30.9192 30.8533 30.7954 30.731
Table 10 Training average time and sample processing average time versus number of neurons in the hidden layer for the zoom movement.
2
# Neighbors
4
6
8
10
Processing Avg. Time (s) 0.0411984 0.0412814 0.0413354 0.0412174 0.0413064 24.2728
Fps
24.224
24.1923
24.2616
24.2093
Table 11 Sample processing average time versus number of neighbors in the KNN algorithm for the zoom movement model.
c Training Avg. Time (s)
0.1
0.025
0.00625
0.001562
0.00039 9.7 ∗ 10−5 2.4 ∗ 10−5
0.03203
0.0380
0.06837
0.04282
0.01795
0.00630
0.00404
Processing Avg. Time (s) 0.0321569 0.032157 0.0321568 0.0321567 0.032157 0.0321568 0.0321569 Fps
31.0975
31.0974
31.0976
31.0977
31.0974
31.0976
31.0975
Table 12 Training average time and sample processing average time versus c parameter value in the SVM algorithm for the zoom movement model.
case, even though sample processing speed is 40-44% slower than in pan and tilt models, it still can be considered valid for real-time video processing. The drop that is observed in the zoom movement time performance compared to the other two models is caused by the higher complexity of the operations performed on the n and n + 1 frames in the frame acquisition and transformation stage.
In the case of the KNN based classifier, results reflected in tables 5, 8 and 11, yield a sample processing speed of approximately 36 fps for pan and tilt movement models falling to 24 fps in zoom movement model. Once again, this performance drop is produced by the more complex image treatment required in the frame acquisition and transformation stage. Finally, experimental results with the SVM based classifier can be observed in tables 6, 9 and 12. Those
16
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
tables reflect a processing speed of approximately 54 fps for pan and tilt movements and 31 fps for the zoom movement model. Here we can observe again the performance drop that is a consequence of the more complex image treatment to obtain the zoom movement patterns. From all this data we can conclude that, in terms of time performance levels, both MLP and SVM have approximately the same score while KNN is quite slower. Overall, our system performance is situated in the range from 24 fps to 54 fps. It is true that 24 fps doesn’t give us a very loose margin when it comes to real time movement recognition but it is still suitable for real time. The other consequence we can obtain is that even though KNN algorithm reaches the best scores in terms of accuracy and stability, it is also slower than the other two classifiers.
5. Conclusions A microcontroller-based real-time motion detection system for video surveillance PTZ cameras has been proposed. It features an algorithm that processes a sequence of images streamed from a PTZ camera simulation software by dividing every image in a set of stripes and comparing each one with the equivalent stripe in the next frame in order to obtain a vector of numbers that will be fed as training, validation or test samples to a classifier that will be in charge of pointing out whether there is movement in the video stream. A simplified camera motion model is proposed for each of the three kinds of possible movements, i.e. pan, tilt and zoom. This way regions which exhibit a large deviation with the appropriate model can be associated to moving foreground objects. In this work it is assumed that the camera only performs one kind of movement at a time. Although taking into account the three movements at at time would not limit the classification performance, it could affect real-time operation. With the objective of increasing system power efficiency and portability, our proposal has been deployed in a Raspberry Pi type microcontroller-based board. Aiming to compare different performance levels, three types of classifiers have been used: multilayer perceptron based, KNN based and SVM based. For the multilayer perceptron classifier, tests have been performed by varying the number of neurons in its hidden layer. In the case of the KNN classifier experiments have been performed by varying the number of nearest neighbors used to classify the samples and in
the case of the SVM classifier, experiments have been performed by varying the value of the c parameter. Results indicate that it is possible to achieve good results according to several well known classification performance measures being the KNN algorithm the one that stands out in terms of accuracy measures and being the SVM classifier the one that obtains a better balance between speed and accuracy. Time tests indicate as well that the movement detection system proposed here shows acceptable training times and when it comes to test time video processing, it always reaches processing speeds higher than 24 fps no matter what classifier is being used, confirming our proposal as a valid alternative for real-time movement detection when combined with PTZ cameras. Regarding to future work, two different but complementary lines have been considered to improve our system. First one, includes the generation of new mathematical models combining various camera movements at the same time instead of analyzing the different movement types separately. Second future line will consist of developing a new parallel implementation that takes advantage of the multi-core processor embedded in the raspberry Pi board.
Acknowledgments This work is partially supported by the Ministry of Economy and Competitiveness of Spain under grant TIN2014-53465-R, project name Video surveillance by active search of anomalous events and grant TIN2014-57341-R, project name metaheuristics, holistic intelligence and smart mobility. It is also partially supported by the Autonomous Government of Andalusia (Spain) under projects TIC-6213, project name Development of Self-Organizing Neural Networks for Information Technologies; and TIC-657, project name Self-organizing systems and robust estimators for video surveillance. Finally, it is partially supported by the Autonomous Government of Extremadura (Spain) under the project IB13113. All of them include funds from the European Regional Development Fund (ERDF). The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the SCBI (Supercomputing and Bioinformatics) center of the University of Málaga. They also gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs used for this research.
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras
References [1] R.G. Mesquita and C.A.B. Mello, Object recognition using saliency guided searching, Integrated Computer-Aided Engineering 23(4) (2016), 385–400. [2] B. Lacabex, A. Cuesta-Infante, A.S. Montemayor and J.J. Pantrigo, Lightweight tracking-by-detection system for multiple pedestrian targets, Integrated Computer-Aided Engineering 23(3) (2016), 299–311. [3] L. Jan Latecki, R. Miezianko and D. Pokrajac, Reliability of Motion Features in Surveillance Videos, Integrated ComputerAided Engineering 12(3) (2005), 279–290. [4] C.C. Sun, Y.H. Wang and M.H. Sheu, Fast Motion Object Detection Algorithm Using Complementary Depth Image on an RGB-D Camera, IEEE Sensors Journal 17(17) (2017), 5728– 5734. [5] P. Azzari, L.D. Stefano and A. Bevilacqua, An effective realtime mosaicing algorithm apt to detect motion through background subtraction using a PTZ camera, in: IEEE Conference on Advanced Video and Signal Based Surveillance, 2005., 2005, pp. 511–516. [6] A. Bevilacqua and P. Azzari, High-Quality Real Time Motion Detection Using PTZ Cameras, in: 2006 IEEE International Conference on Video and Signal Based Surveillance, 2006, pp. 23–23. [7] S.W. Kim, K. Yun, K.M. Yi, S.J. Kim and J.Y. Choi, Detection of moving objects with a moving camera using non-panoramic background model, Machine Vision and Applications 24(5) (2013), 1015–1028. [8] M.J. Shafiee, P. Siva, P. Fieguth and A. Wong, Real-Time Embedded Motion Detection via Neural Response Mixture Modeling, Journal of Signal Processing Systems (2017). [9] S.N. Kalitzin, P.R. Bauer, R.J. Lamberts, D.N. Velis, R.D. Thijs and F.H. Lopes Da Silva, Automated Video Detection of Epileptic Convulsion Slowing as a Precursor for Post-Seizure Neuronal Collapse, International Journal of Neural Systems 26(08) (2016), 1650027. [10] C. Chen, S. Li, H. Qin and A. Hao, Robust salient motion detection in non-stationary videos via novel integrated strategies of spatio-temporal coherency clues and low-rank analysis, Pattern Recognition 52 (2016), 410–432. [11] H. Sajid, S.-C.S. Cheung and N. Jacobs, Appearance based background subtraction for PTZ cameras, Signal Processing: Image Communication 47 (2016), 417–425. [12] J. Huo, Y. Gao, W. Yang and H. Yin, Multi-Instance Dictionary Learning for Detecting Abnormal Events in Surveillance Videos, International Journal of Neural Systems 24(03) (2014), 1430010. [13] T.E. Boult, X. Gao, R. Micheals and M. Eckmann, Omnidirectional visual surveillance, Image and Vision Computing 22(7) (2004), 515–534. [14] K.-T. Song and J.-C. Tai, Dynamic calibration of pan-tilt-zoom cameras for traffic monitoring, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 36(5) (2006), 1091–1103. [15] C. Ding, B. Song, A. Morye, J.A. Farrell and A.K. RoyChowdhury, Collaborative sensing in a distributed PTZ camera network, IEEE Transactions on Image Processing 21(7) (2012), 3282–3295. [16] C. Ding, J.H. Bappy, J.A. Farrell and A.K. Roy-Chowdhury, Opportunistic Image Acquisition of Individual and Group Ac-
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
17
tivities in a Distributed Camera Network, IEEE Transactions on Circuits and Systems for Video Technology 27(3) (2017), 664–672. L. Tong, F. Dai, D. Zhang, D. Wang and Y. Zhang, Encoder Combined Video Moving Object Detection, Neurocomput. 139 (2014), 150–162. P. Angelov, P. Sadeghi-Tehran and C. Clarke, AURORA: Autonomous Real-time On-board Video Analytics, Neural Comput. Appl. 28(5) (2017), 855–865. A. Dziri, M. Duranton and R. Chapuis, Real-time multiple objects tracking on Raspberry-Pi-based smart embedded camera, Journal of Electronic Imaging 25 (2016), 041005. K. Papadimitriou, A. Dollas and S.N. Sotiropoulos, Low-Cost Real-Time 2-D Motion Detection Based on Reconfigurable Computing, IEEE Transactions on Instrumentation and Measurement 55(6) (2006), 2234–2243. F. Neri and E. Mininno, Memetic Compact Differential Evolution for Cartesian Robot Control, IEEE Computational Intelligence Magazine 5(2) (2010), 54–65. E. Mininno, F. Neri, F. Cupertino and D. Naso, Compact Differential Evolution, IEEE Transactions on Evolutionary Computation 15(1) (2011), 32–54. M.K. Dobrzynski, R. Pericet-Camara and D. Floreano, Vision Tape-A Flexible Compound Vision Sensor for Motion Detection and Proximity Estimation, IEEE Sensors Journal 12(5) (2012), 1131–1139. V. Fung, J.L. Bosch, S.W. Roberts and J. Kleissl, Cloud shadow speed sensor, Atmospheric Measurement Techniques 7(6) (2014), 1693–1700. L.H. Adnan, Y.M. Yussoff, H. Johar and S.R.M.S. Baki, Energy-saving street lighting system based on the waspmote mote, Jurnal Teknologi 76(4) (2015), 55–58. F. Ortega-Zamorano, M.A. Molina-Cabello, E. López-Rubio and E.J. Palomo, Smart motion detection sensor based on video processing using self-organizing maps, Expert Systems with Applications 64 (2016), 476–489. J. Benito-Picazo, E. López-Rubio, J.M. Ortiz-De-lazcanolobato, E. Domínguez and E.J. Palomo, Motion detection by microcontroller for panning cameras, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10338 LNCS, 2017, pp. 279–288. www.scopus.com. Y. Zeinali and B.A. Story, Competitive probabilistic neural network, Integrated Computer-Aided Engineering 24(2) (2017), 105–118. F. Ortega-Zamorano, J.M. Jerez, I. Gómez and L. Franco, Select this result for bulk action Layer multiplexing FPGA implementation for deep back-propagation learning, Integrated Computer-Aided Engineering 24(2) (2017), 171–185. M.H. Rafiei and H. Adeli, A New Neural Dynamic Classification Algorithm, IEEE Transactions on Neural Networks and Learning Systems 28(12) (2017), 3074–3083. A. Sánchez, A.B. Moreno, D. Vélez and J.F. Vélez, Analyzing the influence of contrast in large-scale recognition of natural images, Integrated Computer-Aided Engineering 23(3) (2016), 221–235. M. Koziarski and B. Cyganek, Image Recognition with Deep ˘ S¸ Dealing with and Neural Networks in Presence of Noise âA Taking Advantage of Distortions, Integrated Computer-Aided Engineering 24(4) (2017), 337–349. Y. Lin, Z. Nie and H. Ma, Structural Damage Detection
18
[34]
[35]
[36] [37]
J. Benito-Picazo et al. / Motion detection with low cost hardware for PTZ cameras with Automatic Feature Extraction through Deep Learning, Computer-Aided Civil and Infrastructure Engineering 32(12) (2017), 1025–1046. M.A. Molina-Cabello, R.M. Luque-Baena, E. López-Rubio and K. Thurnhofer-Hemsi, Vehicle Type Detection by Convolutional Neural Networks, in: Biomedical Applications Based on Natural and Artificial Computing, J.M. Ferrández Vicente, J.R. Álvarez-Sánchez, F. de la Paz López, J. Toledo Moreo and H. Adeli, eds, Springer International Publishing, Heidelberg, 2017, pp. 268–278. Y. Cha, W. Choi and O. Büyüköztürk, Deep Learning Based Crack Damage Detection Using Convolutional Neural Networks, Computer-Aided Civil and Infrastructure Engineering 32(5) (2017), 361–378. T. Smith and J. Guild, The C.I.E. colorimetric standards and their use, Transactions of the Optical Society 33(3) (1931), 73. G. Chen, P. St-Charles, W. Bouachir, G. Bilodeau and
[38] [39] [40] [41]
[42]
R. Bergevin, Reproducible evaluation of Pan-Tilt-Zoom tracking, in: Proceedings - International Conference on Image Processing (ICIP), 2015, pp. 2055–2059. S. Nissen, Fast Artificial Neural Network, 2016, [Online; accessed 10-January-2017]. T. Joachims, SVM Light, 2008, [Online; accessed 14February-2018]. P.-L.S. Charles, LITIV, 2018, [Online; accessed 14-February2018]. C. Parker, An analysis of performance measures for binary classifiers, Proceedings - IEEE International Conference on Data Mining, ICDM (2011), 517–526. C.X. Ling, J. Huang and H. Zhang, AUC: A statistically consistent and more discriminating measure than accuracy, in: IJCAI International Joint Conference on Artificial Intelligence, 2003, pp. 519–524.