method. Index Termsâ feature-histogram, motion estimation, ... In the similar condition with a periodic or a flat pattern, phase correlation matching method.
STATISTICAL REGION SELECTION FOR ROBUST IMAGE STABILIZATION USING FEATURE-HISTOGRAM Jinhee Lee, Younguk Park, Sangkeun Lee, and Joonki Paik Chung-ang University, Seoul, Korea ABSTRACT This paper presents a new robust digital image stabilization system, which involves a practical motion model based on a feature-histogram. The feature-histogram is used for adaptive selection of feasible motion estimation regions. By estimating the representative motion vector in the optimally selected region, the proposed algorithm can robustly remove undesired camera vibration regardless of object’s movement. When compared with the traditional methods, experimental results show that the proposed algorithm can improve the performance by 7% with four times faster computation compared with the existing sum-absolute-difference (SAD) based method. Index Terms— feature-histogram, motion estimation, sum-absolute-difference (SAD)
pattern in the estimation region. In the similar condition with a periodic or a flat pattern, phase correlation matching method may yield ambiguous results with several peaks in the resulting spectrum [5]. In this paper, we present an adaptive method for selecting feasible motion estimation region to achieve practical and robust video stabilization. The primary contribution of this paper is the use of a feature-histogram model based on the combination of the motion vector and the corresponding SAD ratio defined in Equation (3) between adjacent frames. According to the proposed framework, feasible motion estimation regions are selected by comparing with the statistical characteristics, and the optimal representative motion vectors are estimated. 2. MOTION ESTIMATION BASED ON TWO-DIMENSIONAL (2D) MOTION MODEL
1. INTRODUCTION In recent days, most video acquisition devices embody new technical trends such as; compactness in size, high zooming ratio, and digitalization in video signal processing. Compactness and high zooming ratio make it necessary for video cameras to have the image stabilization function, and digital signal processing enables to implement the image stabilization function without bulky parts such as a gyro sensor and its control unit [1]. A digital image stabilization (DIS) system consists of motion estimation and motion compensation modules as shown in Fig. 1 [2]. Several approaches have been proposed to estimate the representative motion vector between a pair of consecutive frames. Either block matching (BM) [3] or phase correlation matching (PCM) has been widely served as a basic motion estimation element in a local region [4]. In spite of unparalleled regularity and simplicity for implementation, block matching does not properly work without any salient This research was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST) (20090081059), by the Ministry of Knowledge Economy, Korea, under the HNRC (Home Network Research Center) - ITRC (Information Technology Research Center) support program supervised by the Institute of Information Technology Assessment, and by Ministry of Culture, Sports and Tourism (MCST) and Korea Culture Content Agency (KOCCA) in the Culture Technology (CT) Research and Development Program 2009.
978-1-4244-5654-3/09/$26.00 ©2009 IEEE
1553
The role of a motion model is to describe formally the real motion between the adjacent frames of a video sequence. Motion is a prominent source of temporal variations in image sequences. In order to model and compute the motion, we need to understand how images are formed. Thus, camera’s parameters, such as its three-dimensional (3D) motion or focal length, play an important role in image motion modeling. These parameters are estimated using a motion estimation algorithm that requires a motion model, an estimation criterion or objective function, and an optimization method [6]. If we know these parameters precisely, relative movement of an image from the reference one can be compensated. We select the 2D translation motion model for the projective plane induced by the 3D camera motion. Given the motion model, we formulate the displaced frame difference (DFD)-based estimation criterion under the constant-intensity assumption [6]. It aims at the minimization of the following error ε=
M −1 N −1 1 |Cij − Rij |2 , M N i=0 j=0
(1)
where M N represents the size of the image block, Cij and Rij the pixels in the compared and the reference image blocks, respectively.
ICIP 2009
Fig. 1. Block Diagram of the DIS system. LMV means a local motion vector, FMV a frame motion vector, and AMV a accumulated motion vector [2]. 3. STATISTICAL REGION SELECTION USING FEATURE-HISTOGRAM In order to stabilize an image sequence with undesired camera movements, a proper dynamic system model should be identified by using inter-frame motion vectors estimated at the previous stage. The block diagram of the proposed approach for selecting the optimum region is highlighted in Fig. 1. There are four types of critical image patches resulting in the erroneous DIS function. The first critical image type is the one including multiple objects with different depth, which results in motion parallax. The proposed algorithm excludes such regions by adaptively selecting the local motion vector (LMV) estimation block. The second type contains no salient patterns such as edges. Because this type of image cannot provide any information for matching, it needs to be excluded from the motion estimated block. The third type contains periodically textured areas. More specifically, they contain several pixels, which are periodic, quasi-periodic or random. Because these patterns do not provide the unique motion solution, they are also excluded. The last type is the one including self-moving objects, which may result in different motion information from the actual camera motion. It is excluded by using the angle-based histogram.
Fig. 2. Characteristics of the feature-based histogram; (a) motion vector (dx,dy) between a pair of consecutive frames, (b) the angle ranges of the motion vector, (c) feature-based histogram, and (d) optimal selection region corresponding to SAD ratio. angle of the motion vector are respectively computed as L=
3.1. Feature histogram: a definition The feature histogram is computed in a region where the corresponding LMV is to be estimated. For the proposed algorithm, we define three different features denoted by FθN orm , N orm (called normalized angle, length, and FLN orm , and FSR SAD ratio-based histogram feature) based on the combination of the motion vector and the corresponding SAD ratio between temporally adjacent frames. Fig. 2 shows characteristics of the feature-based histograms. In the proposed method, consecutive frames, Fn−1 and F n are individually divided into the tile of mxm estimation region blocks as shown in Fig. 2(a). Let (dx, dy) denote the motion vector between adjacent frames. The length and
dx2 + dy 2 ,
and
θ = tan−1 (
dy ). dx
(2)
Fig. 2(b) shows the angle scale divided into eight discrete intervals, ranging from 0o (θ1 ) to 315o (θ8 ). The continuous angle levels are quantized by one of the eight values of each motion vector, where θ0 , shown at the center of the circle in Fig. 2(b), represents the corresponding motion vector (0,0). We define two different SAD criteria before and after Ori CP compensation as SADM Vk and SADM Vk respectively by Equation (1). M Vk denotes the motion vector at the k-th region. A region providing a good SAD is constrained up to the motion-range on all four sides of the corresponding SAD block as shown in Fig. 2(a). The larger motions require the bigger motion-range. The SAD ratio, denoted by SRM Vk , is
1554
cannot discriminate feasible regions. Therefore, we need the mean SAD-based feature difference defined by mSAD Fdif = M ean[SAD(indexθf irst )] f
− M ean[SAD(indexθsecond )], (5) where each term represents the mean value of the commSAD is smaller pensated SAD at the feasible index. If Fdif f than 0, we select regions of the first maximum angle weight indices denoted by R(indexθf irst ) , otherwise, the second R(indexθsecond ). For minimizing the difference between the actual and quantized angle values, a length feature is used, because the remaining candidate blocks have many motion vectors of same size when it goes through the previous procedure. If most of the background regions are flat, and feasible regions only occupy a small areas, we swap R(indexθf irst ) and R(indexθsecond ). Fig. 3. Flow chart for extracting an optimal motion estimation region defined as the ratio of the last two expression as SRM Vk =
Ori SADM Vk . CP SADM Vk
(3)
If the correct motion vector is equal to (0,0), SRM Vk = 1. If the region is flat, SRM Vk > 1 − α and SRM Vk < 1 + β as shown in Fig. 2(d). Both α and β depend on noise, and experimentally set to 0.2. In real images, SRM Vk = 0. Fig. 2(d) illustrates the optimal region selection criteria corresponding to the SAD ratio. Area A includes feasible region, while area Ori B includes a flat region. If M Vk is incorrect, and SADM Vk is CP bigger than SADM Vk , the corresponding block falls in area C. By using the SAD ratio, regions without a salient image pattern are excluded. 3.2. Adaptive selection of the optimal motion estimation region Fig. 2 and Fig. 3 show the method for selecting optimal motion estimation region using feature-based histogram. We discriminate between global and object motions in the sense of motion estimation error based on angle-based feature difference defined as θ N orm θ N orm θ Fdif f = FθM V (indexf irst ) − FθM V (indexsecond ), (4) k
k
where indexθf irst and indexθsecond respectively represent the first and the second maximum weight indices in the anglebased histogram. T hreshold is determined by the ratio of background and the object. In our experiments, we set θ T hreshold to 0.2. If Fdif f is smaller than T hreshold, we
1555
4. EXPERIMENTAL RESULTS We test the proposed algorithm with various video sequences acquired by a camera mounted on a vibrating platform. We used six different high-definition (HD) format video sequences (vertical movement with moving objects, vertical movement, horizontal movement, vertical movement, horizontal movement, and diagonal and circular movement), whose experimental results are summarized in Table 1. Performance of the proposed method is compared with a traditional method [2] shown in Fig. 1. The motion estimation performance is evaluated in terms of the SAD value. Sequences V2 through V5 have edge patterns for good motion estimation. Their SAD values between the traditional and the proposed methods are similar as expected. On the other hand, sequences V1 and V6 contain critical image patterns, and their SAD values using the proposed method are 7% smaller than those using the traditional method. The complexity of each DIS algorithm is the sum of the three stages as shown in Table 1. Tedge represents the computation time of a Prewitt edge detector. TM E and TM C respectively represent the computation time of motion estimation and motion compensation. We used the three-step block matching algorithm with a 16x16 search window. Titer represents the computation time of the iteration process in the proposed method as shown in Fig. 3. Table 1 indicates that the proposed method is four times faster than the traditional method. We show the simulation results of the proposed algorithm using V1 in Fig. 4. Fig. 4 (i) and (j) show the temporal variation of the SAD values by both traditional and the proposed methods. The upper curves (blue) represent the SAD values by the traditional method, the middle (red) for the proposed method, and the lower (green) the difference of the two. It
Table 1. Comparison of the computation times (in seconds) and the SAD mean values between a traditional and the proposed methods using six video sequences: (Environments: Intel Core 2.3GHz, Windows XP, MATLAB) Computation Time
V1 V2 V3 V4 V5 V6 Average
Tedge 2.4124 2.0809 2.0784 2.0754 2.0633 2.0780 2.1314
Traditional Method TM E TM C 0.0039 0.3905 0.0045 0.3266 0.0040 0.3280 0.0038 0.3285 0.0061 0.3232 0.0049 0.3271 0.0045 0.3373
Total 2.8068 2.4120 2.4104 2.4077 2.3926 2.4100 2.4732
TM E 0.2460 0.2248 0.2181 0.2184 0.1695 0.1631 0.2066
Proposed Method Titer TM E 0.0240 0.4942 0.0206 0.4346 0.0234 0.4337 0.0208 0.4359 0.0054 0.3274 0.0038 0.3239 0.0163 0.4082
Total 0.7642 0.6800 0.6752 0.6751 0.5023 0.4908 0.6313
SAD Mean Value Traditional Method Proposed Method BM PCM BM PCM 5.0923 4.4999 3.0082 2.7679 2.6987 2.5946 2.6753 2.7596 5.2697 2.3795 5.3953 3.7517 6.0859 5.0974 5.9519 5.1179 6.6828 6.0646 6.4686 6.0695 14.0150 12.9981 13.6016 11.0651 6.6407 5.6057 6.1835 5.2553
5. CONCLUSION A region selection method by removing image patches with critical patterns is presented for improving the performance of a DIS system. The proposed method results in four times faster computation and performance improvement by up to 7% compared with traditional methods [2]. However, there is a real-time computation issue. Further investigation will focus on improving estimates of a priori skipping region block and evaluation of the effect of the skip cost parameter. Nevertheless, it is believed that the proposed algorithm can be applied in the extended areas of video processing, such as digital camera, video panorama generation, and video surveillance. 6. REFERENCES [1] M. Oshima, “VHS camcorder with electronic image stabilizer,” IEEE Trans. Consumer Electronics, vol. 35, no. 4, pp. 749–758, November 1989. [2] Y. Park J. Paik and D. Kim, “An adaptive motion decision system for digital image stabilizer based on edge pattern matching,” IEEE Trans. Consumer Electronics, vol. 38, no. 3, pp. 607–615, August 1992.
Fig. 4. Results of the proposed algorithm; (a) angle histogram, (b) difference image between the consecutive two frames in the original and stabilized sequences, (c) SAD histogram, (d) all regions, (e) length histogram, (f) selected regions by applying histogram (a), (c), and (e), (g) SAD rate histogram, (h) selected region by applying histogram (g), (i) SAD comparison between the traditional and the proposed methods with BM, and (j) SAD comparison between the traditional and the proposed methods with PCM.
can be seen that the SAD values of the proposed method are smaller than the traditional method.
1556
[3] Massimo F. Vella, A. Castorina and G. Messina, “Digital image stabilization by adaptive block motion vectors filtering,” IEEE Trans. Consumer Electronics, vol. 48, no. 3, pp. 796–801, August 2002. [4] S. Erturk and T. Dennis, “Image sequence stabilization based on dft filtering,” IEE Proc. Vision, Image, Signal Processing, vol. 147, no. 2, pp. 95–102, April 2000. [5] E. De Castro and C. Morandi, “Registration of translated and rotated images using finite fourier transforms,” IEEE Trans. Pattern Analysis, Machine Intelligence, vol. 9, no. 5, pp. 700–703, September 1987. [6] C. Stiller and J. Konrad, “Estimating motion in image sequences,” IEEE Signal Processing Magzine, vol. 16, 1999.