Recognition and tracking objects in video sequences is used to determine the ... Its function is also to recognise number plates. .... Optical Flow algorithm that base on Farnbeck method is generated in OPENCV using the following function.
Logistyka – nauka
Artur Kujawski
Maritime University of Szczecin
Implementation of chosen methods for detecting and tracking objects on the videos in inland navigation1 Introduction Recognition and tracking objects in video sequences is used to determine the trajectory of object movement as well as to detect the motion of moving objects. It is mainly used to automatize the processes that could happen autonomously without human interference. Wide range of algorithms enabling object processing that could be used in different domains make us think which methods to choose and implement in order to process and analyse the images quickly. The answer is not unambiguous though. The algorithms used in road transport differ from the ones used in aviation or medicine. To make a good choice we need to define not only the initial conditions, but also what needs to be obtained and what problem is to be solved in the last phase. In some circumstances one can base on the tests already carried out, for example in road transport. The main aim of picture analyse in road transport is to measure the amount of passing vehicles and detect traffic jams or collisions. Its function is also to recognise number plates. The uniqueness of road traffic, construction of vehicles and their speed determines choice of the algorithm for image analyse. These algorithms must satisfy equally the criteria of reliability and speed of action in real time. Not all algorithms that are used in road transport can be implemented into water transport, especially in respect to vessels of inland navigation. The objects moving on water are much bigger than the road vehicles. What is more there is also greater difference in dimension of the units. The length of yachts, boats and motor-boats can range from few to tens of meters, while pusher-tugs, motor-barges and whole pushed trains could measure from several to over one hundred meters. Such differences in sizes and movement speed of each vessel determine the choice of algorithm for image processing. Water, on which the vessels are moving, generates much greater reflection than the black pavement. What is more the colour of the surface depends on isolation degree. The feature that makes the image analyse on inland water more difficult is the fact that water will have various shades depending on lighting conditions and it will often be similar to the colour of the sky. Dynamic technological development as well as advancing computerization facilitate year to year introduction of new methods for picture analyse and modernisation of the algorithms that already exist in order to increase their efficiency and work speed. Four algorithms presenting different approaches to image processing have been chosen for the tests. Considering general division of methods for image analyse [1], methods belonging to segmentation groups, analyse objects' features and to classification methods have been chosen.
Optical flow method Lucas-Kanade and Farneback methods belong to so called optical flow algorithms that fall into group for image features analyses. If I(x,y,t) describes brightness of certain pixel, it also has to be assumed that I(x,y,t) depends on coordinates x, y in particular part of the picture and that the brightness of each point in the moving object does not change in time. The master is displaced however, the brightness of points remains constant: 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑
(1)
=0
implementing the Taylor's theorem on series we get: 𝜕𝜕𝜕𝜕 𝑑𝑑𝑑𝑑
𝜕𝜕𝜕𝜕 𝑑𝑑𝑑𝑑
+
𝜕𝜕𝜕𝜕 𝑑𝑑𝑑𝑑
𝜕𝜕𝜕𝜕 𝑑𝑑𝑑𝑑
+
if substituted: 𝑢𝑢 =
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
; 𝑣𝑣 =
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
(2)
= 0𝜕𝜕
𝑑𝑑𝑑𝑑
(3)
𝑑𝑑𝑑𝑑
we will get linear equation with two unknowns, so called optical flow equation: (4)
𝐼𝐼𝑥𝑥 𝑢𝑢 + 𝐼𝐼𝑦𝑦 𝑣𝑣 + 𝐼𝐼𝑡𝑡 = 0 1
Reviewed paper.
Logistyka 6/2014
13495
Logistyka – nauka
Fig. 1. Displacement of a single pixel of an image
In their works [2] [3] [4], the authors carried tests of methods that are used to determine optical flow. On the ground of the results it was concluded that in the first step should be to define suitability of two first-order gradient methods: local method of Lucas-Kanade and the global one of Farnbeck. Both methods are based on similar data-in (rank of derivative) however, the way of calculation that gives the final result, is different for both methods. The way of implementation of both methods is described in details in the paper [5]. For both methods there are parameters that influence the outcomes. These are as follows: • the way of differentiate partially, influencing gradient features and precision of the calculations; • primal parameter τ, that influences size and area location; • parameter α, regulating the influence of smoothness of optical speed changes, that by the same token influences continuity of optical flow area; • number of iteration, influencing propagation of information on optical speed and allowing detection of higher speed that gives bigger displacement.
CAMShift method The CAMShift method is development of the already described MeanShift method [6]. The MeanShift method is more adequate for static distribution of a random variable. In case of dynamic distribution, such as the one present in video sequences, it is necessary to use the CAMShift method (Continuously Adaptive Mean Shift) that is able to handle dynamic distribution by matching the size of pixel search window for each following picture frame The algorithm is based on colour distribution. It creates histogram of tested object determining probability function. The function is acquired by projecting component H on the image surface for HSV modelHue Saturation Value) [7]. With its use one can determine the image on which the only left pixels are those for which the value of probability function is big enough. The common aim of both algorithms is to determine the circle centre of the searched neighbourhood of the object in motion figure 2.
So2 Sc1 So1
O2
O1 Fig. 2. Searching of the gravity centre for density of occurrence of given image feature (e.g. colour).
The So1 centre of the first determined window, designated as 01, does not indicate the gravity centre occurrence of searched feature. The centre of gravity will be in that case point Sc1. The whole window shall be displaced, so that the centre So1 and Sc1 will cover one another, and check if those two covered centres correspond to centre of gravity for the successive features included in new window. Usually they do not cover each other, therefore another iteration determining next centre of gravity and new centre of displaced 02 window is needed. In that way the algorithm follows the distribution of objects' features in search while movement on the screen.
13496
Logistyka 6/2014
Logistyka – nauka
MoG (Mixture of Gaussians) method The MoG method described in [8] bases on extraction of pixels' features of the object in motion, that differ form the pixels brightness histogram belonging to the image background. If each pixel belonged to equally lighten surface, it would be enough to use a single Gauss distribution of a random variable to readout the displacement value of each pixel. Object surfaces that reflect different amount of light, depending on their contraction and the angle of sunbeam incidents, are very common in practice. Therefore it is necessary to use adaptation method that bases on compilation of many Gauss' displacement methods. Each time, when the Gauss' displacement parameters are updated, they are also evaluated with the usage of heuristic methods that attribute particular pixel as the most probable component of image background. The values of pixels that do not match the background are grouped as joined components. Next the components are tracked form frame to frame by MHI (Multiple Hypothesis Tracking) algorithm [9]. The value of each pixel is represented by luminance measurement towards the sensor. For objects in motion the values will be different than for the background or objects in rest. The values of illumination are the main agents influencing the choice of procedure and the process of its constant actualization during the whole process. Probability of moving pixel occurrence, {X1, ..., Xt}, is calculated by K combination of Gauss distribution. Current value of pixel is determined by the following formula: 𝑃𝑃(𝑋𝑋𝑡𝑡 ) = ∑𝐾𝐾 𝑖𝑖=1 𝜔𝜔𝑖𝑖,𝑡𝑡 ∗ 𝜂𝜂(𝑋𝑋𝑡𝑡 , 𝜇𝜇𝑖𝑖,𝑡𝑡 Σ𝑖𝑖,𝑡𝑡 )
(5)
where: K defines number of Gauss distribution, ω_(i,t) defines the weight of information batch in time unit t, η defines density of Gauss function probability, μ_(i,t) defines average value i-th iteration in time unit, Σ_(i,t) defines covariance matrix.
SURF Method The SURF algorithm was described in 2006 by Herbe Bay in [10] as stepped-up version of SIFT algorithm (Scaleinvariant feature transform). Extraction of object features is, in both methods, resistant to transformation of individual pixels and does not depend on scale or orientation. Individual steps of both algorithms apply to fragments of a picture, that remain unchanged, the choice of base points and evaluation of the transformation degree on the ground of gradients of the chosen area. Each change of points being observed contributes a unique feature, that is being followed in the whole process of image analyse. The search of points in each subsequent sequence frame of the image takes place by comparing the fragments with the standard sample. Not all areas are checked though, but only the base points, their volumes, orientation and displacement in relation to the latter image, which is the standard sample as the first. If there are similar, the object is recognized and marked. The first step to receive the SURF descriptor area is to build a window around the points of interest. The window is composed from pixels that create entry in the descriptor vector. The default size of the window is 20 pixels. The window is divided into 4x4 descriptors of regular subregions. With the usage of Haar wavelet in each of the subregion regularly dislocated samples of points are marked. Basing on those the gradients and local minima as well as maximum are tested. In order to gather information for each subregion we use relative and absolute values of dx and dx shown on figure 3.
dx
dy
Fig. 3. Exemplary ensemble of 4x4 value pixels and marking gradients of appealing points with the usage of dx and dy differentials. Own study on the basis of [11] [12] [13]
Dry run The comparison was made for three video sequences of respectively 424, 1500 and 1378 image frames. The tests were run on computer with processor Intel i3 and 6GB RAM in Visual Studio 2012 with the usage of OPENCV library, version 2.4.8. For each video file containing movement of a vessel objects in motion was searched in the first run, during the second run defined object in motion was looked for. The amount of wrongly recognised pixels and the times of individual algorithm were tested. Discrepancy in reference to data was calculated as the root mean square deviation (RMSD):
Logistyka 6/2014
13497
Logistyka – nauka
1
2 2 𝜀𝜀 = � ∑𝑁𝑁 𝑖𝑖=1(∆𝑥𝑥 + ∆𝑦𝑦 )
(6)
𝑁𝑁
where: N is the number of all image frames while Δx and Δy are the differences between adequate coordinates acquired by research and reference method. The efficiency results of individual algorithms were presented as an average number of frames per second. To compare the results with one another it was necessary to calculate what percentage of the total number of frames in individual video sequence is. It derives mainly form the difference in length of individual film, consequently the total number of frames and different amount of frames projected per second. Individual values and video sequence parameters are shown in table 1. The algorithms on test were configured on the basis of input data available in literature, such as thresholding, amount of iteration, sensibility and size of each search window. Parameters, that have been taken into consideration, proved better results for video sequence in which objects were moving relatively slow (there were no fast, dynamic changes), so to match them with tested situations with vessel present on inland navigation. Tab. 1. Matching of all tested methods and video sequences
Optical Flow
CAMShift
MoG
SURF
212
2
46
30
25
1280x720 480x360
60
fra T [s] me/ s
Video02.avi
854x480
Res. [px]
Video03.mpg
Video01.avi
File
Optical Flow algorithm that base on Farnbeck method is generated in OPENCV using the following function calcOpticalFlowFarneback(). Apart from the already mentioned parameters there are two additional: threshold τ and sensitivity for changes in the surrounding of points under test α. The size of search window was set for 20 pixels and number of iteration was set on 5 in order to improve algorithm efficiency and avoidance of substantial delays which, for optical flow, got the least favourable results with comparison to other methods. Test 1 was carried out to detect characteristic objects that are in motion. Test 2 detected earlier defined object e.g. vessel superstructure. For test 1 the following were measured: number of frames per second, amount of detected features for object in motion and percentage of wrongly recognized points, which belonged to the image background but were recognised as points in motion. For CAMShift algorithm in OPENCV applies the following function: cvCamShift(). The first step was to transfer the input image from RGB colour profile to equivalent representation in HSV colour space. This phase does not depend on later processing and can be made simultaneously for successive input frames. OpenCV facilitates function cvCvtColor() for conversion of colour space. In the next step data in HSV space are joined in groups on the basis of distribution of a random variable occurrence of individual colour saturation. This process is called back-projection. The histogram itself has to be initiated by sampling represented area of one frame. The average number of frames per second as well as mean squared error were measured, also the amount of correctly recognised characteristic points were counted. Algorithm that consists of mixture of Gauss method (MoG) is based on background subtraction function. It was made in OPENCV with the usage of BackgroundSubtractor class. The BackgroundSubtractorMOG2() function has three parameters: amount of historic frames that are taken into consideration while calculation of each successive frame, value of the threshold from 0 to 255 and true/false value for elimination of shadow that is cast by moving objects on a sunny day.
13498
Logistyka 6/2014
Logistyka – nauka
Another tested algorithm was the SURF one, which is made available by OPNECV by entering cvExtractSURF() function. This function contains the parameters structure that bases on key points and on so called descriptors that describe those points. This algorithm is very sensitive to detection of both objects in motion and prior defined samples of image. It is able to detect prior defined object on moving image, even the one that was morphologically changed e.g. image defect, rotation in optional angle, change of object scale. Figure 4 below presents exemplary detection of characteristic points of vessel superstructure.
Fig. 4. The SURF method of detecting prior defined points on video image
Results The tests proved how effective individual algorithms are and what can be its practical usage on real image. The results of individual test include the speed of algorithm performance, amount of detected key points that were moving in relation to immovable background, and the average error in tracking of individual points for the whole length of three different video materials. The test contributed to the results that give base for construction of conclusion on efficiency and work speed. The tables below present comparison of the results for each video file and method of image processing. Tab. 2. Comparison of average amount of projected frames per second
Video file_01.avi
Video file_02.avi
Video file_03.mpg
Czas [kl/s] Test 1
Czas [kl/s] Test 1
Czas [kl/s] Test 1
Czas [kl/s] Test 2
Czas [kl/s] Test 2
Czas [kl/s] Test 2
100
40
100
50
20
50
0
0
0
As it can be concluded the most efficient algorithm during tests was CAMShift. For all files and in both tests it has gained the greatest number of frames per second. The graph below presents how the speed of processing is transferred onto effectiveness of each algorithm.
Logistyka 6/2014
13499
Logistyka – nauka
40
Discrepancy [%]
35 30 25 20 15 10 5 0 Optical Flow Video 01
CAMShift Video 02
MoG
SURF Video 03
Fig. 5. Comparison of discrepancy of recognised image pixels in relation to expected value for test no. 1
As it can be noticed the slowest was Optical flow algorithm however, it had very good results as far as effectiveness of tracking individual characteristic points on video image is concerned. MoG method was a little bit better in this respect, adding the fact that it was over two times faster during the tests it seems to pose good effectiveness relation. On the figure above the importance of output material and how it can be transferred on tests results can be observed. Increased difference for video file no. 2 is thought-provoking. However, when analysing the content of files the case seems to be clear. The second file presents the passage of motor barge under the bridge span of Most Długi in Szczecin, on which vehicular traffic of cars and railvehicles is present. The algorithm recognises movement outside the inland navigation water and counts up all those moving objects. That contributes to high level of discrepancy. The next observation is the influence of floating surface of water on the level of recognised moving objects. Both methods, SURF and CAMShift, revealed increased reaction on movement of water around ships. In the rest of cases the choice of threshold and sensitivity eliminated those disturbances on satisfying level.
Summary In the paper the tests for object recognition were run. The comparison was made on four methods for object in motion tracking. The objects mentioned were vessels present on inland navigation area. On the basis of calculation of average amount of processed frames per second general features and efficiency of chosen methods were demonstrated and checked. The tests allow to formulate conclusions concerning activity of individual algorithms while analysing an image that presents different navigational situation e.g. passage of motor barge under the bridge span. The tests show that the algorithms that are very effective in recognising objects in motion require, at the same time, much more time for calculations and can not be put in use while transferring objects in real time. The amount of wrongly recognised objects (usually more than expected) derives from the occurrence of unexpected objects on the image, including those that in fact constitute the background but do not remind unmoved for the algorithms e.g. rippling water, leaves moving in the wind or flying birds. The algorithm that proved to be the most resistant to disturbance and had good efficiency results is MoG (Mixture of Gaussian) algorithm. As far as searching of already defined objects is concerned, the method that had the better effectiveness results is the SURF method. It is resistant to transformations and, most importantly, on changes of the searched object scale. For general test on points in motion detection the SURF algorithm proved to be of low efficiency as fas as calculating is concerned, however the efficiency increases with detailing the points in search.
Abstract The aim of the article is to review and implement methods of tracking objects in sequences of video image. The outcome of the test is to indicate the method that fulfils assumptions concerning the amount of correctly recognisable image pixels and the work speed of chosen algorithm. The following methods of tracking objects were tested: optical flow, CAMshift method (Continuously Adaptive Mean Shift), Mixture of Gaussians and SURF method (Speeded-Up Robust Features).
13500
Logistyka 6/2014
Logistyka – nauka
REFERENCES 1.
R. Tadeusiewicz and P. Kordocha, Komputerowa analiza i przetwarzanie obrazów. Kraków: Wydawnictwo Fundacji Postępy Telekomunikacji, 1997.
2.
N. Nourani-vatani, P. V. K. Borges, and J. M. Roberts, “A Study of Feature Extraction Algorithms for Optical Flow Tracking,” 2006.
3.
A. de La Bourdonnaye, R. Doskočil, V. Křivánek, and A. Štefek, “Practical Experience with Distance Measurement Based on Single Visual Camera,” Adv. Mil. Technol., vol. 7, no. 2, 2012.
4.
J. Bouguet, “Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm,” vol. 1, no. 2, pp. 1–9.
5.
K. Pałczyński, “Segmentacja na podstawie analizy pola ruchu sekwencji obrazów cyfrowych,” Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie, 2002.
6.
K. Fukunaga, Introduction to Statistical Pattern Stas-tical Pattern Recognition, Second Edi. Indiana: Academic Press, 1990.
7.
G. R. Bradski, “Computer Vision Face Tracking For Use in a Perceptual User Interface,” Intel Technol. J., 1998.
8.
T. Bouwmans, F. El Baf, and B. Vachon, “Background Modeling using Mixture of Gaussians for Foreground Detection - A Survey,” Recent Patents Comput. Sci., vol. 1, no. 3, pp. 219–237, Nov. 2008.
9.
C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” Proceedings. 1999 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (Cat. No PR00149), pp. 246–252, 1999.
10. H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006, vol. 3951 LNCS, pp. 404–417. 11. J. J. Anitha and S. M. Deepa, “Tracking and Recognition of Objects using SURF Descriptor and Harris Corner Detection,” Int. J. Curr. Eng. Technol., vol. 4, no. 2, pp. 775–778, 2014. 12. H. Kandil and A. Atwan, “A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms,” Int. J. Signal Process. Image Process. Pattern Recognit., vol. 5, no. 3, pp. 111–122, 2012. 13. M. Bzdawski, “Śledzenie obiektów w sekwencjach obrazów,” Politechnika Warszawska, 2008.
Logistyka 6/2014
13501