Robust Object Tracking under Cluttered Environment - Semantic Scholar

5 downloads 7624 Views 1MB Size Report
Deptt. Of Computer Science/Information Technology, DIT University, Dehradun. Abstract—In this paper we recommend a novel method for detecting and tracking ...
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014)

Robust Object Tracking under Cluttered Environment Prashant Kumar1, Rupak Chakraborty2, Arup Sarkar3 Deptt. Of Computer Science/Information Technology, DIT University, Dehradun 3) Image understanding- Image understanding is the process of actually interpreting objects of interest to figure out what‟s actually happening in the image. This includes what the objects are and their spatial relationship to each other etc. this section includes all higher level processing (like image classification, image registration and image recognition). The terms “object tracking” refers to faithfully locate a previously specified target object in subsequent video frames or the process of locating a moving object (or multiple objects) in all consecutive frames of video. Also it can be defined as the process of segmenting an object of interest from a scene and keeping track of its movement, direction and orientation in order to extract useful information. Robust Object tracking in cluttered background means, the tracking of target objects (tracking system) should not fail in the presence of illumination changes, occlusion, and presence of clutter. “Robust object tracking in cluttered background” is a process of developing a tracking system, which should not fail in the presence of clutter background such as movement of the leaves of trees, changing illumination condition, complex object shape etc.

Abstract—In this paper we recommend a novel method for detecting and tracking objects in the presence of cluttered background such as movements of leaves of trees, under various types of occlusion (object to object and object to scene occlusion) and scale change of object (small or large object) in real-time video. Object detection and tracking are two main terms of developing any tracking system. In our approach firstly we apply filters to remove noise and to avoid minute changes in the scene then we are using the frame differencing method to detect and segment the moving object. Contour tracking approach is applied to track the object of interest in all consecutive video frames. For checking superiority of this method we use it on different type of dataset: KTH and Own dataset. Key Terms—Frame differencing, Contour tracking, Average filter, cluttered background, object tracking.

I. INTRODUCTION A. Background Object detection and tracking in a cluttered environment is the field of computer vision and also it is one of the most promising applications of computer vision and has a very wide range of applications. In the last 10 years, the higher growth has been attained by it in the field of Human Computer Interaction (HCI) and Human Machine Interaction (HMI). Over recent years, much research has been devoted to object tracking under cluttered environment, because in real world tracking, a target being partly or entirely covered by outliers for an uncertain period of time is normal. Now it is most interesting and active area of research. Computer vision is the field which related to the processing of images, videos etc. in computer system and try to make a system equivalence to human, In a general computer vision is related to: 1) Image acquisition – This is the first step of any computer vision system this tells us that how the images are obtained and stored from any physical device (like cameras, webcam etc.) to a computer system for further processing. 2) Image analyzing- This means extract the meaningful information from the images (digital images) by means of digital image processing. It includes all low level processing (like clean up, edge or other feature detection etc.), In simple term image analysis is the process of segmenting the image into regions corresponding to interesting objects.

B. Challenges Sometimes, tracking of object can fail because of cluttered, so for developing a robust tracking system we need to tackle with following challenges related to clutter: B.1 Occlusion B.2 Illumination changes B.3 Dynamic background B.4 Object‟s shape deformation B.5 Single or Multiple people B.1 Tracking Under Occlusion: Tracking under occlusion refers that the tracking of object should be successful under severe occlusion. Occlusion means object is occluded by other things partially of fully. It can be of two types object to object occlusion and object to scene occlusion B.1.1 Object to Object Occlusion: When there are more than two objects in a scene then sometimes one object can cover other object partially or fully. Figure 1 illustrates an example of partial and complete occlusion.

515

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) A

B

B

B.3 Tracking Under Dynamic Background: Dynamic background means the background of the object is changing with respect to time such as movements in the branches of trees, water flickering, waves in the sea etc. In below figure 4, a scene of dynamic background is shown.

A B

Fig 1: Object to Object Occlusion

B.1.2 Object to Scene Occlusion: Sometimes the background object can cover foreground object (object of interest) partially or fully. When foreground object is occluded by some stationary background object (like pillar, table, chairs etc.) partially, it means that only some parts of foreground objects (object of interest) are visible then it is called partial occlusion. If the foreground object (object of interest) is occluded by some stationary object of the background completely, it means none of the part of foreground object is visible then it is called a complete occlusion. Figure 2 shows an example of partial occlusion.

Fig 4: Dynamic Background [9]

B.4 Tracking Under Object’s Shape Deformation: Object‟s shape deformation means the shape of an object changes with respect to its movement. A human shape is complex because when it moves its shape and structure change. Object tracking under scale change of an object indicates that the tracking system should not be failed when the size of the object is changing or when more than one objects are present in the scene in which one object is differ from other by size. When the object is far away and when it is near enough. In Figure 5 first image is example of scale change where one person is large in size but the other is small. The second image shows object‟s shape deformation.

Fig 2: Object to Scene Occlusion [9]

B.2 Tracking Under Illumination Change: When there are gradual changes occurring in a scene then its called illumination changes. Sometimes tracking system fails when there are sudden illumination changes in the scene. In practice, the illumination in the scene could change gradually (like daytime or weather conditions in an outdoor Scene) or suddenly (like switching lights on and off in an indoor scene). Figure 3 shows an example of illumination changes in a scene. Fig 5: Scale Change and Objects Shape Deformation[9]

B.5 Tracking Of Single Or Multiple Objects Having Complex Shape: Object tracking under this indicates that the tracking of objects should be successful when there is one or more than one object (e.g. Human, vehicle etc.) presents on the scene. Figure 6 shows an example in which more than one object are walking on the road. Fig 3: Illumination Changes [9]

516

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) Liyuman et al. have proposed method for detection and segmentation of foreground objects from a video which contains both stationary and moving backgrounds. They suggested the Bayes decision rule for classification of background and foreground. Foreground objects are extracted by fusing the classification results from a complex background including wavering tree branches, flickering screens and water surfaces, opening and closing doors, switching lights and shadows of moving objects [4]. Shrinivasam et al. have proposed an idea for human body tracking and modeling in the surveillance area of monocular video sequences. They discussed various kinds of background modeling techniques. They give a detailed study of background modelling, human body modeling, tracking of persons and activity analysis in vision based systems [7]. Benezeth et al. have proposed novel method for locating moving objects in a video sequence for fixed camera. They suggested Background subtraction technique for locating moving objects and evaluate it quantitatively for noise reduction and camera jitter. The problem with this kind of method is this that it is not robust [6]. Camillo et al. have proposed a robust system for segmentation and tracking of objects against occlusion. They proposed statistical analysis of the correlation between of a part and tracking error. Also they suggested novel segmentation approach for the segmentation of objects. Proposed method works well for severe occlusion such as object to object and object to person's occlusion. This method does not work for illumination change and dynamic background so this method can be further improved for describing issues [8].

Fig 6: Multiple Persons [10]

II. RELATED WORK A number of literature surveys have been done about object detection, segmentation and tracking in the video surveillance. We present the survey, which deals with only those works which is related to our thesis study in any context. Panda et al. have proposed a tracking system for a clutter environment when there are sudden changes in background objects such as movement in the leaves of trees, illumination varying etc. Moving objects are detected by 3- frame differencing method [2] and fuzzy C-mean clustering is used to segment the object of interest from the background [1]. Zarka et al. have proposed a real time tracking system for human detection, tracking and motion analysis. They proposed an adaptive background model for detection and tracking that can deal with illumination effect and object occlusions. They also used background subtraction to get foreground pixels. After that noise cleaning and object detection are applied for motion analysis [3] . Yilmaz et al. have review out most of the method for detection and tracking objects from a video, classify them into different categories and identify new trends. They suggested various challenges which are hurdles in tracking. The challenges are: abrupt object motion, object to object and object to scene occlusion and camera motion etc. How to choose best image features and motion models for object tracking, these things are also discussed in this paper. [5].

III. PROPOSED METHOD In this, we have described our methodology. Each block described the processing technique. Every block describes the steps for processing of video. Each block has its own characteristics. It means by each block, the video is being processed and output of each is given to as input in next processing block. Figure 7 shows the steps of our proposed work for object tracking in cluttered environment for real time video.

517

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) A.1 Input or Capture Video: It means capturing the video and give input to the OpenCV library for further processing. There are two methods for giving input: By using a webcam We are using following OpenCV library function for giving input by webcam :

Preprocessing

Handling cluttered

Foreground/Background extraction

CvCapture *fc = cvCaptureFromCAM (0); By video file (examples: .avi, .Mpeg, or .wmv file) We are using following OpenCV library function for input a pre-recorded video:

Binary conversion

CvCapture *fc =cvCaptureFromAVI ("example.avi"); A.2 Process It Frame By Frame: Now our second step is to extract each frame from the video. We are using following OpenCV library function to extract frames from video.

Morphological operations

IplImage* imgin = cvQueryFrame (fc);

feature extraction

Whenever we call this function each time a new consecutive frame is extracted from video. A.3 Resize The Frames: If the frame size is too large then we need to resize it for further operations. This has been done for making our algorithm fast, robust and efficient. To resize each frame of video, we have used following OpenCV library function:

Contour Tracking Fig 7: Flowchart of the proposed system

A. Preprocessing It includes all low level processing in the image. There are following steps for pre-processing of video, which are shown in figure 8:

CvResize (imgin, tmpsize, CV_INTER_LINEAR); A.4 Convert The RGB Image Into Gray Level: This step is for converting the colored image (RGB image) into gray level. In order to reduce the processing time, a grayscale image is used on entire process instead of the color image. This is done because it is difficult to work on 3 channel image (color image) sowe are converting it into a single channel (gray scale) image. For converting the RGB image into a gray level image there is a function in opencv library:

Start

Input Video

Extract frames

CvCvtColor (imgin, imgt1, CV_BGR2GRAY); Where imgin is my source image and imgt1 is destination image.

Resize the frames

B. Handling Cluttered Sometimes noise present in video in terms of clutter. So for handling with different types of cluttered we have used different types of filter.  Sometimes tracking is difficult because of: 1. Noise present in the scene and cluttered effect. 2. Small changes in a scene like a flag waving in the wind, a small movement in the leaves of trees and changing illumination condition etc.

Grey scale conversion

Stop Fig 8: Flowchart of video processing

518

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014)  So for making the system robust we need to remove these changes.  Filters play very significant role to avoid small change in the environment.  We used different types of filter but result is good by using an average filter. Figure 9 represents 3x3 average filter.

Unexpected changes in brightness as well the chromatic components among the background and the pixels of present image are exploited to give clues to pixel into the classes (eg. Background, shadow, selected background, objects movements). Utilizing gradient values and chromatic components are used to remove the shadows from the images. They did a study and reached the conclusion that in shadow intensity changes rapidly without culminating much variation in chromaticity components. Two methods have been proposed for identifying shadow in images and videos: first, the intensity of neighboring pixels can be examined, Second, Intensity values can be used to compare shadow and background v/s foreground. C. Foreground Extraction Background subtraction techniques are commonly used to separate foreground moving objects from the background. It is the process of identifying objects in the image and extracts it. The name “background Subtraction” comes from the simple technique of subtracting the observed image (background image) from the estimated image (newly arrived image) and thresholding the result to generate the objects of interest. A frame differencing method is used for background subtraction. It is used to detect the objects by subtracting the background image pixel form current image pixel as in equation (1).

Fig 9: Average Filter

B.1 Shadow and Light Change Detection: The techniques discussed in the above section to classify background and foreground have been utilized for surveillance and they also give good result if the environment does not have any illumination effect. Illumination is the weakness of this algorithm, these prone to both global illuminations like clouds stopping sunlight to reach earth and also local like it gives unexpected result for videos having shadows. Sometime motion detection algorithms knees down in segmenting foreground and background which culminates with faulty results. To handle such type of situations techniques like chromaticity plays important role in removing shadows and stereo information is also very helpful in chopping shadows out of images. Color model is used to solve the problem of shadow detection and light change. Color models that utilize the brightness and the discrimination of same from chromaticity properties which then used to mark the class of pixels.

| Frame Ic – Frame Ib | > T

equ. (1)

Where Frame Ic - current image Frame Ib – Background image T – Threshold value Here the threshold is fixed such that only foreground pixels extracted from the background. If the pixel difference between two images above is the threshold value T, then it is taken as foreground image pixels. Figure 10 shows the process of extracting foreground objects and tracking in video.

519

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) D. Binary Conversion Binary images are images whose pixels have only two possible intensity values: zero or one. They are normally displayed as black and white. Numerically, the two values are: 0 for black, and either 1 or 255 for white. The foreground image which we get from third step is grey level image so for doing further processing we will convert it into binary image. Figure 11 shows the representation of binary image. For converting gray level image into binary image following steps are used: 1. Find the intensity of each pixels in gray scale image 2. Compare each intensity value to some threshold value, I used threshold value 80. 3. If intensity value is greater than threshold value then change the intensity value to 255 or 1(white). 4. Else change the intensity value to 0 (black)

Fig 11: Block Diagram of Binary Image

E. Morphological Operations Morphological operations are used to fill small gaps inside the moving object and to reduce the noise remained in moving objects. The morphological operators implemented are dilation followed by erosion. In dilation, each background pixel that is touching an object pixel is changed into an object pixel. In erosion, each object pixel that touches the background pixel is changed into a background pixel. We are using following sequence of morphological operations 1. 3-times erosion 2. 4-times dilation 3. 2-times erosion F. Feature extraction When the input data to an algorithm is too large to be processed then we need to suppress it for further processing. Therefore the input data will be transformed into a reduced representation set of features (also named features vector). The Transforming of input data into the set of features is called feature extraction. If the features are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task.

Fig 10: Block Diagram of Frame Differencing Method

520

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) For each pixel point „p‟ we check the intensity value of its 4-neighbours. If any one of the 4-neighbours of „p‟ as shown by the shaded pixels in fig 12 above, has value one (white pixel corresponding to background) then we will consider point „p‟ as a part of the boundary otherwise we will ignore „p‟ and move to the next pixel. For contour tracking, we are using a simple method in which we are mapping each and every pixel value of foreground image having border only to RGB image. Following are the steps which we are using to implement it 1. Split original color image into three frames Red (R), Green (G) and Blue (B). 2. Map each pixel value of border extracted to R,G and B frame and changed the intensity of pixels in R,G and B frame to represent tracking 3. Merge all three Red, Green and Blue images to showing overall effect in color image. Finally the tracked object will be visible correctly in original video.

Selecting the right features plays a crucial role in object tracking. Some of the common visual features are following: F.1 Color: An RGB color space is used to represent color. F.2 Edges: Edge detection techniques are used find object boundary and other relevant information. F.3 Texture: Texture is a measure of intensity variation of a surface. G. Contour Tracking Contour Tracking approach is used for tracking the object in all consecutive video frames which come under Edge detection technique. Contour representation defines the boundary of an object. Contour representation is suitable for tracking complex non rigid objects. Contour finding is done by border extraction. Contour Tracking approach is used for tracking the object in all consecutive video frames which come under Edge detection technique. Contour representation defines the boundary of an object. Contour representation is suitable for tracking complex non rigid objects. Contour finding is done by border extraction. We are using 4-neighbour connectivity approach for finding the border of foreground object. The 4-neighbors of pixel P are P2, P4, P6 and P8 which shown in Figure 12 below.

IV. EXPERIMENTAL RESULTS The result shows that the work done by us is how much effective and robust. We know that we can‟t claim that whether the work done by us is good or not unless it show fine results. This section represents the performance and efficacy of the work done by us. The measurements of performance have been shown by showing the result of our proposed method for different types of datasets Dataset Used: we used two types of datasets KTH Dataset [20] This is downloaded dataset which we have downloaded from the internet. This dataset is freely available we don‟t need to pay any money to download it. On this dataset, we have applied our method which is working properly. KTH dataset contains set of videos of single people in which person is running, jogging and walking. Around 100 sets of videos are available for each (running, jogging and walking) in which approx 10 different people are Jogging, running and walking by different way. We have checked our method on these videos, our proposed method is working successfully on every video of jogging. Consecutive Figures show the result of our tracking algorithm on KTH dataset.

Fig 12: 4-Neighbour Connectivity

G.1 Boundary Extraction For finding out the 8 directional codes for each component we first find out the boundary of the extracted component so that we can capture the essence of the boundary of the object. For that we first apply the edge extraction function on each component. The idea is to obtain one pixel wide boundary of each component so that while running the direction code function, each point being processed has only one unvisited neighbor.

521

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) 4.1 Own Dataset (Self-Created) For checking the superiority of our proposed method we have made our own dataset named: Own dataset. This dataset contains set of videos having illumination change, dynamic background and occlusion of single person. Consecutive Figures show the result of our tracking algorithm on own dataset.

Fig 13: Person Moving where Camera Jittering

Fig 16: Tracking of Person during Movement

Fig 14: Person Running during Camera Jitter

Fig 17: Tracking under Occlusion

Fig 15: Person Running during Illumination Change

522

International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 2, February 2014) Sometimes this algorithm fails to detect a person when the person moves from high contrast region to low contrast region due to some illumination changes. Computation cost is the other problem related to it, which can be solved by GPU based implementation of this method. REFERENCES Deepak Kumar Panda, and Sukadev Meher, “Robust Real-Time Object Tracking Under Background Clutter”, IEEE Proceeding of International Conference on Image Information Processing (ICIIP), pp. 1-6, 2011. [2] Alan J. Lipton, Hironobu Fujiyoshi, and Raju S. Patil, “Moving target classification and tracking from real-time video”, IEEE Workshop on Application of computer vision, pp 8-14, 1998. [3] Nizar Zarka, Ziad Alhalah, and Rada Deeb, “Real-Time Human Motion Detection and Tracking”, International conference on Information and communication Technologies, pp 1-6, 2008. [4] Liyuan Li, Weimin Huang, Irene Y.H. Gu, and Qi Tian, “Foreground Object Detection from Videos Containing Complex Background”, ACM International Conference on Multimedia, pp. 2-10, 2003. [5] Alper Yilmaz, Ohio State University, and Omar Javed, “Object Tracking: A Survey”, ACM Computing Surveys, Vol. 38, No. 4, Article 13, pp- 1-45, 2006. [6] Y. Benezeth, P.M. Jodoin, B. Emile, H. Laurent, and C. Rosenberger, “Review and Evaluation of CommonlyImplemented Background Subtraction Algorithms”, International conference on Pattern Recognition, pp 1-4, 2008. [7] K. Srinivasan, K. Porkumaran, and G. Sainarayanan, “Intelligent human body tracking, modeling, and activity analysis of video surveillance system: A survey”, International Journal of convergence in engineering, technology and science, Vol. 1, pp 635-642, 2009. [8] Camillo Gentile, Octavia Camps, and Mario Sznaier, “Segmentation for Robust Tracking in the Presence of Severe Occlusion”, IEEE conference on Computer vision and Pattern recognition, vol. 13, no. 2, pp 483-489, 2004. [9] Computer vision www.cs.illinois.edu/~slazebni/spring13/lec01_intro.pdf, [10] onocular vision scene understanding http://www.computer.org/csdl/trans/tp/2013/04/ttp2013040882abs.html. [1]

Fig 18: Tracking under Dynamic Background

V. CONCLUSION In this paper, a robust method for detection and tracking of object (Human) is described in the video having cluttered background. Our method is based on selection of spatio-temporal interest points. For checking the superiority of our method we tried it on different dataset, which was discussed in the previous chapter. Finally we are concluding that the accuracy of this method is better than optical flow and background subtraction. This algorithm has successfully executed for dynamic background change and cluttered environment. This object tracking system we are suggested for real time object detection and tracking works better in occlusion effect, scale change of object and camera jitter along with some more modification which we are discussing in the latter session. VI. FUTURE SCOPE In the future we can implement this in security surveillance system, recognizing activity and to detect many complex activities, this would also help in automated surveillance as we can add any type of activity which can be suspicious. It can be also used in context based video retrieval to organize the videos in accord to actions.

523

Suggest Documents