TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0214 55/67 pp343-347 Volume 13, Number S1, October 2008
A Target Tracking System for Applications in Hydraulic Engineering SHEN Qiaonan (ಋத), AN Xuehui (̝༲)** Department of Hydraulic and Hydropower Engineering, Tsinghua University, Beijing 100084, China Abstract: A new type of digital video monitoring system (DVMS) named user defined target tracking system (UDTTS), was developed based on the digital image processing (DIP) technology and the practice demands of construction site management in hydraulic engineering. The position, speed, and track of moving targets such as humans and vehicles, which could be calculated by their locations at anytime in images basically, were required for management. The proposed algorithm, dependent on the context-sensitive moving information of image sequences which was much more than one or two images provided, compared the blobs’ properties in current frame to the trajectories of targets in the previous frames and then corresponded them. The processing frame rate is about 10fps with the image 240-by-120 pixels. Experimental results show that position, direction, and speed measurements have an accuracy level compatible with the manual work. The user-define process makes the UDTTS available to the public whenever appropriate. Key words: target tracking system; digital image processing; user-defined; consecutive trajectory
Introduction It is widely recognized that hydraulic construction engineering is information intensive and complex industry. Present trends in the hydraulic construction engineering have heightened the need for effective and efficient collecting, monitoring and analysis the construction progress data. In recent years, the use of digital video monitoring system (DVMS) in the surveillance phase of a project is rapidly growing which improves the progress controlling, safety monitoring and work coordination during entire project[1]. However, information within thousands of digital videos and images stored for a project from the DVMS could not be obtained automatically. A large number of components and their features need to be inspected on construction sites[2-3]. Many of these features need to be assessed based on tight tolerances, requiring that inspections be extremely accurate. Received: 2008-05-30
** To whom correspondence should be addressed. E-mail:
[email protected]; Tel: 86-10-62794285
At the same time, inspection resources, such as the time that inspectors can spend on site, are limited. Therefore, inspectors can benefit from emerging technologies that improve the efficiency of data collection while on site, and from visualization technologies that improve the effectiveness and efficiency of inspection tasks using this data. The capability to automatically identify objects from images through many methodologies is a product of the technological breakthroughs in the area of digital image processing (DIP)[4,5]. Detection and tracking of targets in construction site is not only a single object tracking problem, but also a multi-object tracking problem. Numerous approaches[6] for multi-object tracking have been proposed. But it is still a very different and more challenging problem. In addition to the normal frame-to-frame following of a salient area, the system must be able to handle occurrences, disappearances, crossing and other complicated events related to multiple moving targets. Features[7-12] such as color, texture, shape, and motion properties are used for tracking. In this study, a new type of DVMS named user
Tsinghua Science and Technology, October 2008, 13(S1): 343-347
344
defined target tracking system (UDTTS) was proposed and developed based on the DIP technology and the practice demands of construction site management in hydraulic engineering. And a new algorithm was proposed for multi-object tracking, dependent on blob properties and context-sensitive motion information.
1 System Overview The system called UDTTS includes four parts: Userdefined part, data preprocessing, moving object detection and tracking. The input data is a video file or the stream of images captured by a stationary digital video mounted on a horizontal gantry or on a tripod and in static positions at construction site. 1.1 User-defined process This system can do many aspects of management by user-define process. Users can define the application, such as vehicle flow, human flow, grinding variables by three steps. Images including targets and static background should be provided to the UDTTS. Firstly, generate the initial background model when the background image is input; secondly, define a target on the target image captured on construction site; thirdly, define the controlling conditions that the target must satisfy; finally, define an output format. So the definition of an application is finished. 1.2 Application analysis Moving targets such as vehicles, humans, and other things at construction site have variable colors, sizes, shapes, speeds, and directions. Their features can be utilized to detect and track them. As is shown in Fig. 1, an application can be worked out from a target’s trajectory which consists of its positions at sequential time. The problem is how to know the positions of a target at any time from the streams of color image. In the
Fig. 1
Application analysis
UDTTS, after the user-define process, the video captured on construction site is input to be processed. The procedure performs several images processing tasks to detect and tracking moving objects in the scene. The result can be output as user-define format.
2 Tracking Method The purpose of the tracking part is to detect moving objects from the video stream and collect appropriate data of their routes. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. The task of detecting and tracking moving objects from video deals with the problem of extracting moving objects (foreground-background separation) and generating corresponding persistent trajectories. In the case of multiple objects in the scene, the tracking task is equivalent with the task of solving the correspondence problem. At each frame a set of trajectories and a set of measured objects (blobs) are available. Each object is identified by finding the matching trajectory. 2.1 Detection of moving objects Detection of moving objects in video streams is the first relevant step of information extraction in many computer vision applications. Aside from the intrinsic usefulness of being able to segment video streams into moving and background components, detecting moving objects provides a focus of attention for recognition, classification, and activity analysis, making these later steps more efficient. At hardware level, color images are usually captured, stored and displayed using elementary R, G, B component images. The color images read from the frame grabber are transformed to gray scale images with only luminance information preserved in order to reduce the computational load and to guarantee adequate frame rate (around 10 fps) for tracking. Each incoming frame goes through four successive image processing stages where the raw intensity data is reduced to a compact set of features which can be used for the matching
SHEN Qiaonan (ಋத) et alġA Target Tracking System for Applications in Hydraulic Engineering
method. These four stages are gray-scale transformation, background subtraction, threshold segmentation and connected component labeling as is shown in Fig. 2.
Fig. 2 The digital image processing steps
Motion detection is started by computing a pixel based absolute difference between each incoming frame and the static background frame provided by users. The pixels are assumed to contain motion if the absolution difference exceeds a predefined threshold level. As a result, a binary image is formed where active pixels are labeled with “1” and non-active ones with “0”. The figures directly extracted from the resulting binary image are typically fragmented to multiple segments. In order to avoid this, a morphological closing operation with a 3-by-3 square kernel is applied to the image. As a result, small gaps between the isolated segments are erased and the regions are merged. After closing, we use a connected component analysis[13] followed by region area detection in this stage. The regions with a smaller area than the predefined threshold are now discarded. Position and area of each blob are detected in local model of individual frame. After detection, the objects in a local model of single frame must be integrated to the trajectories in a world model of all frames through matching method. 2.2 Tracking of moving objects Tracking is needed for determining the object correspondence between frames. In our approach, the main tracked feature is the object trajectory which is consecutive in frame sequences. Since the speed of the
345
objects at construction site is not too fast, we assume that the blob in current frame and its corresponding trajectory in the previous frames overlap. The object centroid and dynamic information are used for tracking. The speed and direction of the object generated by the previous trajectory are stored in the world model of all frames. They are also useful features for matching. In general, high occurrences of objects that visually overlap cause difficulties for a tracking system. Since blob generation of moving objects is based on connected component analysis, touching objects generate a single merged object, where pixel classification, i.e., to which original blob individual pixels belong, is hard to resolve. This lead to the problem that in a merged state individual tracks cannot be updated. To overcome this problem, we propose a solution using a technique, which generates plausible trajectories of the objects in a merged state by performing matching between objects entering and leaving the merged state. The matching is based on the kinematic smoothness constraint. The method is presented in section 2.3. In the first frame, each blob generates a trajectory with the following attributes: area, speed, direction and status. Consecutive judgement is used for matching, which is described in section 2.3. The scheme of the tracking algorithm is outlined as follows. Step 1 If a blob is exactly matched to one existing trajectory, the trajectory properties (area, speed, direction, and status) are updated. Step 2 If a blob matches two trajectories, crossing happens. Set the status of these trajectories crossing. Then do not process them until splitting happens. Step 3 If a trajectory matches two blobs, splitting happens. Find the partner trajectory and compare them to these two blobs. Update the two trajectories properties. Step 4 If a none-matched blob is found, a new trajectory is generated. Step 5 In case of detecting a non-matched trajectory, exiting or failure of the blob detection happens. If the trajectory tends to be out of the view, maybe exiting is right; or leave it to be processed in next frame. 2.3 Consecutive judgement Consecutive judgement: As is shown in Fig. 3, if a blob with solid line in current frame and a trajectory
Tsinghua Science and Technology, October 2008, 13(S1): 343-347
346
with dotted line overlap, we say they are consecutive, otherwise, they are inconsecutive.
Fig. 3 Consecutive and inconsecutive trajectory
In the case of inconsecutive trajectory, these features (maximum distance, limited speed, and correlative direction) are used for matching (conditions shown in Fig. 4).
Fig. 4 Inconsecutive trajectory conditions
If a trajectory is only generated by one blob, speed and direction are not effective values. The distance dij between current blob centroid and the previous blob centroid should fulfill the condition as dij - X d (1) where Xd denotes maximum distance an object can move in a certain interval, i, j are the frame number. If a trajectory is generated by more than two blobs, speed and direction can be used for matching. If the current speed v and the direction correlation described as ș are in the acceptable range, i.e., (1 xVX )*Vxn1 - Vxn - (1 xVX )*Vxn1 (2) (1 xVY )*Vyn1 - Vy n - (1 xVY )*Vyn1 where Vx is the speed in X-axis, Vy is the speed in Yaxis, n is the frame number, xVX and xVY are predefined ratios in (0,1); cosT1 - cosT - 1 (3)
where ș is the angle between the current direction and the previous, ș1 is the predefined angel in (90, 90), the blob and the trajectory match each other, otherwise they do not.
As described above, when blobs overlap the observation of a single merged blob does not allow reconstructing the trajectories of the original entering blobs. Just add the blob to these trajectories for the latter consecutive judgement. Remember the frame number i and the time at which crossing happens. When splitting happens at frame k, direction consistence and correlative speed are used for matching the blobs and the trajectories based on the kinematic smoothness constraint. In the case of entering or exiting, the blob must be near to the boundary of the processing area.
3 An Example
The tracking system UDTTS has been applied to two video files captured from Xiangjiaba dam to track vehicles. One of the test sequences contains one object and the other one contains multiple objects occurring entering, exiting and crossing events. The static background is provided to define the processing area (the rectangle in Fig. 5), and the target’s area is obtained before processing. Main parameters of algorithm implementation: Windows XP, VC++ 6.0, CPU AMD Athlon 2.01 GHz, and memory 1.00 GB. The processing frame rate is about 8 fps, while the image size is 240 by 120. The accuracy and stability of the system depend on these parameters which are predefined initially. The second sequence contains 4 vehicles generating 1 entering, 4 exiting and 3 crossing events. The tracking results such as the centroid sequence and the trajectory of each vehicle are shown in Fig. 5. 4 frames at 1st second, 3rd second, 7-th second and 26-th second are listed on the left. Crossing event between vehicle 2 and 3 occurs at T2, vehicle 1 and 3 cross at T3, and vehicle1 and vehicle 4. Vehicle 3 is moving out of the processing area at T4. Vehicle 1 disappears at T4, and vehicle 2 has moved out of the processing area from T3. Vehicle 4 appears after T3 and leaves the processing area finally. A qualitative summary of the observed events is summarized in Table 1. Table 1 Critical events processed by tracking method Entering
Exiting
Crossing
events
event
events
Test results
1
4
3
Actual situations
1
4
3
Items
SHEN Qiaonan (ಋத) et alġA Target Tracking System for Applications in Hydraulic Engineering
347
Fig. 5 Tracking results for the video containing multiple moving objects
4 Conclusions and Future Work We have presented UDTTS for real-time moving target tracking. Real-time multi-object detection and tracking algorithm were developed using consecutive trajectory and correlative motion information. Experimental results show that position, direction, and speed measurements have an accuracy level compatible with the manual work. Adaptive background model will be developed and the efficiency of the algorithm will be improved. More complex scenes should be tested for the UDTTS. Although the UDTTS is developed to meet the needs of construction site management, it is available to the public whenever appropriate. References [1] Memon Z A, Abd Majid M Z, Mustaffar M. An automatic project progress monitoring model by integrating AutoCAD and digital photos. In: Proceeding of Computing in Civil Engineering (CCE) ASCE. Cancun, Mexico, 2005. [2] Soibelman L, Brilakis I. Identification of material from construction site images using content based image retrieval techniques. In: Proceedings of Computing in Civil Engineering ASCE. Cancun, Mexico, 2005. [3] Teizer J, Caldas C H, Haas C. Real-time three-dimensional occupancy grid Modeling for the detection and tracking of construction resources. ASCE Journal of Construction Engineering and Management, 2007, 133(11): 880-888. [4] Yilmaz A, Javed O, Shah M. Object tracking: A survey.
ACM Journal of Computing Surveys, 2006, 38(4): 1-45. [5] Beleznai C, Schlögl T, Wachmann B, et al. Tracking multiple objects in complex scenes. In: Leberl F, Fraundorfer F, eds, Vision with Non-Traditional Sensors. Proceeding of of 26th Workshop of the Austrian Association for Pattern Recognition. Austrian Computer Society, 2002, 160: 175-182. [6] Yilmaz A, Javed O, Shah M. Object tracking: A survey. ACM Comput. Surv. 2006, 4(38): 1-45. [7] Takala V, Pietikainen M. Multi-object tracking using color, texture and motion. In: IEEE Conference on Computer Vision and Pattern Recognition. 2007: 1-7. [8] Veeraraghavan H, Schrater P, Papanikolopoulos N. Robust target detection and tracking through integration of motion, color, and geometry. Computer Vision and Image Understanding, 2006, 103(2): 121-138. [9] Vermaak J, Godsill S J, Perez P. Monte Carlo filtering for multi-target tracking and data association. IEEE Transactions on Aerospace and Electronic Systems, 2005, 41(1): 309-332. [10] Wang Shuan, Ai Haizhou, He Kezhong. Difference-imagebased multiple motion targets detection and tracking. Journal of Image and Graphic, 1999, 4(6): 470-475. (in Chinese) [11] Wan Qin, Wang Yaonan. Research and implementation of detecting and tracking multiple moving objects method. Application Research of Computers, 2007, 1: 199-202. (in Chinese) [12] Ge Jiaqi, Li Bo, Chen Qimei. A region-based vehicle tracking algorithm under occlusion. Journal of NanJing University (Natural Sciences), 2007, 43(1): 66-72. (in Chinese) [13] Gao Hongbo, Wang Weixing. New connected component labeling algorithm for binary image. Computer Applications, China, 2007, 27(11): 2776-2777. (in Chinese)