IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 3, MARCH 2015
341
Local Binary Pattern Based Fast Digital Image Stabilization Burcu Kir, Meltem Kurt, and Oğuzhan Urhan, Member, IEEE
Abstract—In this letter, a fast digital image stabilization method based on local binary pattern (LBP) for real time applications is presented. The LBP approach utilized in this work enables efficient representation of the image frames in 1-bit depth resolution. A simple Boolean exclusive-OR (XOR) based matching criterion is employed to decide global motion vector between consecutive image frames. The constant velocity model based Kalman filtering is executed on the global motion vectors to obtain smoothed frame positions. Experiments show that the proposed approach provides comparable or better performance against the methods at the same category by requiring lower computational complexity. Index Terms—Digital image stabilization, global motion estimation, local binary pattern.
I. INTRODUCTION
D
IGITAL image stabilization (DIS) is a crucial part of nearly all hand-held video capturing devices such as camcorders, smartphones and tablets. These kind of devices may undergo unwanted fluctuations in the case of operator or platform movements. The target of DIS is to remove unwanted camera fluctuations while keeping intentional camera motion. Electronic DIS systems used in such devices typically consist of motion estimation and motion correction parts. Global motion of the device is estimated by the motion estimation (ME) part and stabilization of the motion is performed by the motion correction (MC) part. The MC part has generally lower computational load. On the other hand, the ME part has considerably higher computational complexity. It is important to note that the correct estimation of global motion is vital because any error introduced in this stage also affects the MC part of the DIS systems. In the literature, both two-dimensional (2-D) and three dimensional (3-D) motion models are utilized to estimate global motion between the consecutive image frames. Generally, 2-D ME approaches are used to detect translational global motion due to their relatively lower computational load in low power Manuscript received May 03, 2014; revised July 29, 2014; accepted September 21, 2014. Date of publication September 23, 2014; date of current version October 01, 2014. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Farzad Kamalabadi. B. Kιr and M. Kurt are with Kocaeli University, Computer Engineering Department, University of Kocaeli, İzmit 41040, Turkey (e-mail:
[email protected];
[email protected]). O. Urhan is with the Electronics and Telecom. Engineering Department, Kocaeli University Laboratory of Embedded and Vision Processing (KULE), İzmit 41040, Turkey (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LSP.2014.2359981
Fig. 1. Different local binary patterns configurations. (a) (b) (c) .
embedded systems. However, the computational complexity of 2-D full search based block motion estimation in the image domain is still high, particularly for hand-held low power devices. Low bit-depth representation of images is used to decrease computational load of global ME in [1]–[3] by making use of an XOR based matching criterion which has efficient hardware implementation as described in [4]. Gray-coded bit-plane matching (GC-BPM) based approach presented in [1] utilizes sub-images with a single Gray-coded bit-plane in matching using an XOR based criterion to speed-up global ME process. In [2], image frames are initially converted 1-bit depth images by filtering the image with a multi band-pass filter and comparing the filtered frame against to the original image frame. Image frames are represented by two bit-planes in [3] where the first bit plane is constructed using a multi band-pass filter similar to [2]. The second bit plane is used as a constraint mask to discard unreliable pixels at the matching stage. This method utilizes a larger, single and center located sub-image for matching whereas the methods in [1] and [2] employ four sub-images located at the corners of the image frames. In this case, inaccurate interpretation of motion vectors obtained from four sub images is possible. Phase correlation (PC) based global ME approaches for DIS is presented in [5]–[8]. In [5], square sub-images similar to [1] is utilized to reduce computational complexity of PC computation. Rectangular sub-images located at the corners of the images are combined and resulting square images are used to separately estimate global motion vectors along the horizontal and vertical direction in [6]. In [7], a center oriented single sub-image similar to [3] is used in the PC computation for real-time DIS on a DSP. Recently, another real time DIS implementation on a smartphone is presented in [8] where cropped image frames are projected onto the x- and y-axes and then PC is executed over the 1-dimensional signals after the windowing to obtain global motion vectors. The LBP method is originally proposed for gray-scale and rotation invariant texture classification in [9]. It is a non parametric
1070-9908 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
342
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 3, MARCH 2015
Fig. 2. Sample image frame from the “Walking” sequence with its 1-bit depth representations. (a) Original frame (b) Binary image obtained by Otsu’s method (c) 1 bit-depth image frame constructed by the method in [2] (d) 1 bit-depth image frame constructed by the proposed method.
approach which summarizes the local spatial structure of the images. In this letter, we present an LBP based approach to convert full bit-depth images frames into binary images. The proposed LBP based binarization approach has lower complexity compared to [1], [2]. After binary images are obtained a hardware efficient matching criterion is utilized to compute interframe translational global motion. Finally, motion vectors are smoothed to carry out stabilization. II. PROPOSED LBP BASED BINARIZATION APPROACH LBP based approaches are employed in many image processing applications such as face recognition [10], age estimation [11] and colonoscopy image classification [12]. Its basic advantage is to extract useful features for high accuracy classification. In the LBP approach, each pixel in the input image is compared against equally spaced pixels forming a circle of a certain radius . Then, the LBP value for a given pixel located at position is computed as
where denotes the largest integer not greater than . Thus, the number of comparisons and additions required to obtain the binary representation of a pixel are . It should be noted that in [2], 25 additions and one floating point division is needed for binarization of each pixel. Fig. 2 shows an image frame from the “Walking” sequence with its 1-bit depth representations by the Otsu’s method [13], the 1BT based method in [2] and the proposed approach. As seen from this figure, both the 1BT based and the proposed approach are able to extract details from the original image. Additionally, it is clear that the global thresholding is not able to provide enough details for matching. However, the proposed approach has more micro structural information compared to the method in [2]. After obtaining 1-bit depth images, the current image frame is searched within a fixed search range in the previous image frame using the number of non-matching point (NNMP) criteria [2]:
(3) (1) denote the equally spaced pixels around poIn (1), and sition and a binomial factor. Three example LBP configurations are shown in Fig. 1. In the proposed binarization approach in this work, we compute the binary representation of the a given image as (2)
where is the candidate displacement for an image block within a search range , and denote the current and previous 1-bit depth image frames and represents the Boolean XOR operation. The position giving the lowest NNMP value is assigned as the motion vector of the current frame. After the global motion vectors are obtained, motion correction is carried out. It might be possible to utilize adaptive approaches as in [16]. However, we prefer to employ a constant velocity motion model based Kalman filter as described in [14]
KIR et al.: LOCAL BINARY PATTERN BASED FAST DIGITAL IMAGE STABILIZATION
343
TABLE I RMSE VALUES FOR GLOBAL MOTION VECTORS OF TEST SEQUENCES
because of its simplicity. The state transition equation of the constant velocity motion model is given as
(4)
is the absolute horizontal and vertical frame powhere sition state estimates and denotes the corresponding instantaneous velocity. Note that in this equation represents the process noise. In this case, the observation system is defined as (5) is the measurement noise. Note that noise parameters have a normal distribution with and , and are independent and identically distributed. The transition and observation models in (4) and (5) are plugged into the generic Kalman filter equations to obtain a real-time estimate for the state variables. The output of the Kalman filter in this motion model directly provides the desired position of the current image frame with respect to the reference. Hence, the difference between the Kalman-filtered frame position and the actual frame position which is estimated during global motion estimation is used as the correction vector. The image frame is translated in the opposite direction for stabilization. Note that because of the required translation, a portion of the output image frame might be blank. Thus, it is usually preferred to crop a certain part of the original image frames when displaying stabilized results. Another option is to use an in-painting approach to reconstruct blank regions as described in [15]. However, such methods have quite high computational complexity and cannot be executed in real time. Thus, we prefer to utilize the cropping approach at the displaying stage.
where and
III. EXPERIMENTAL RESULTS In general, the root mean square error (RMSE) criterion with respect to frame matching under the MSE criterion is used for performance evaluation of the ME part in DIS systems. The RMSE criterion can be calculated as follow: (6)
where is the global motion vector determined by frame matching under the MSE criterion and is the global motion vector obtained from the compared algorithms. We compare the performance of the proposed approach with respect to low bit-depth representation [1]–[3] and phase correlation based methods [5]–[8]. Table I shows RSME results for eight different image sequences which display different characteristics. Note that, for the results given in Table I , and are set to 8, 4 and 16, respectively. The size of four sub–images are pixels for the methods in [1], [2], [5] as presented in these works. The single sub–image size is set to for the method in [3], [6], [8] and the proposed approach. As seen from the Table I, the MAD (Mean Absoulte Difference) criterion using 8-bit depth images provides the best results, as expected. In this table, the best and second best RMSE results after MAD are shown by bold and underlined fonts, respectively. Among the low bit-depth representation based methods, GCBPM [1] and one-bit transform (1BT) [2] based approaches have similar performance. However, the constrained 1BT (C-1BT) based approach in [3] outperforms [1] and [2] since it utilizes center located sub-image and two bit-planes for matching. Note that the proposed LBP based approach provides similar RMSE results compared to the C–1BT based approach. The 1BT based approach in [2] is modified to work with a single sub-image similar to C-1BT and LBP based approaches in order to reveal the effect of binarization on ME performance. Average RMSE result in this case becomes 0.1553 which means that the proposed LBP based approach performs better in binarization. Among the PC based approaches, the center sub-image based phase correlation approach (C-PC) in [7] provides better results compared rectangular sub-image based PC (R-PC) in [7] and projected image based PC (P-PC) in [8]. Table II summarizes the computational complexity of the compared methods. The results are given for frames size of 640×360 pixel for full frame MAD method. The number of Boolean (B) and integer (I) operations are shown together in this table whereas the number floating point (F.P.) operations are shown separately. In order to compute overall complexity, based on [17], it is assumed that the floating point operations have 5 times higher complexity compared to Boolean and integer operations when implemented in a conventional microprocessor. It is also possible to implement all these approaches in modern SIMD (Single Instruction Multiple Data) supported processors. In this case, low bit-depth based methods have an important advantage because each pixel in these methods
344
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 3, MARCH 2015
TABLE II COMPUTATIONAL COMPLEXITY OF THE COMPARED METHODS
Fig. 3. Performance versus computational complexity.
occupies only one or two bits in SIMD registers whereas 8 bit per pixel is required for other methods. Note the results in Table II is given for the same parallelization level with an SIMD register size of 128 bits. It is also important to note that the low bit–depth based approaches can be implemented very efficiently in hardware as described in [4]. Fig. 3 shows a plot of computational complexity where the top-left portion corresponds to the best overall performance. As seen from this figure, the proposed approach has a reasonably good balance between performance and computational complexity. A similar computational gain is achieved by GCBPM and 1BT based methods but their performance is significantly worse. The C-1BT based approach provides similar performance. However, its computational gain is significantly lower than the proposed method. After global motion estimation, Kalman filtering is utilized to perform motion correction as described in the previous section. The model is initialized with a zero state condition as explained in [14]. The initial velocity vector is set to 0 whereas the initial error covariance is equal to the process noise covariance. The remaining Kalman filter parameters, i.e. process noise covariance (Q) and measurement noise variance (R), are empirically set to 0.0001 and 0.05, respectively. The effect of these parameters on the stabilization performance is discussed in [14]. Fig. 4 shows the original and corrected position vectors for the “Zoom” sequence. As seen from this figure, the Kalman filter is able to remove unwanted camera motion effectively with a relatively small delay. It is possible to improve the stabilization performance by making use of an adaptive approach
Fig. 4. Motion correction results.
as in [16]. In this case, it might be possible to track original camera movements with a smaller delay and smoother fluctuations without significantly increasing the computational complexity since this approach can be implemented using simple look-up table operations. IV. CONCLUSION In this letter, an LBP based binarization approach is presented with application to digital image stabilization. It is shown that the proposed global ME approach has better performance when compared to methods which have higher computational complexity. The proposed approach can be implemented in hardware in a power efficient manner as described in [4].
KIR et al.: LOCAL BINARY PATTERN BASED FAST DIGITAL IMAGE STABILIZATION
REFERENCES [1] S. J. Ko, S. H. Lee, S. W. Jeon, and E. S. Kang, “Fast digital image stabilizer based on gray-coded bit-plane matching,” IEEE Trans. Consumer Electron., vol. 45, no. 3, pp. 598–603, Aug. 1999. [2] A. A. Yeni and S. Ertürk, “Fast digital image stabilization using one bit transform based sub-image motion estimation,” IEEE Trans. Consumer Electron., vol. 51, no. 3, pp. 917–921, Aug. 2005. [3] O. Urhan and S. Ertürk, “Single sub-image matching based low complexity motion estimation for digital image stabilization using constrained one bit transform,” IEEE Trans. Consumer Electron., vol. 52, no. 4, pp. 1275–1279, Nov. 2006. [4] A. Çelebi, O. Urhan, I. Hamzaoğlu, and S. Ertürk, “Efficient hardware implementations of low bit depth motion estimation algorithms,” IEEE Signal Process. Lett., vol. 16, no. 6, pp. 513–516, Jun. 2009. [5] S. Ertürk, “Digital image stabilization with sub-image phase correlation based global motion estimation,” IEEE Trans. Consumer Electron., vol. 49, no. 4, pp. 1320–1325, Nov. 2003. [6] O. Kwon, J. Shin, and J. Paik, “Video stabilization using kalman filter phase correlation matching,” Lecture Notes in Computer Sciences, vol. 3656, pp. 141–148, Sep. 2005. [7] A. Kucukmanisa, O. Urhan, M. K. Gullu, and S. Erturk, “DSP implementation of phase correlation based real-time video stabilization,” in 20th Signal Processing and Communications Applications Conf., Antalya, Turkey, 2012, pp. 1–4. [8] S. W. Ha, H. C. Park, and T. D. Han, “Mobile digital image stabilisation using SIMD data path,” Electron. Lett., vol. 48, no. 15, pp. 922–924, 2012.
345
[9] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Patt. Analy. Mach. Intell., vol. 24, no. 7, pp. 971–987, Jul. 2002. [10] T. Ahonen, A. Hadid, and M. Pietikainen, “Face recognition with local binary patterns,” Lecture Notes in Computer Science, vol. 3021, pp. 469–481, 2004. [11] J. Ylioinas, A. Hadid, X. Hong, and M. Pietikainen, “Age estimation using local binary pattern kernel density estimate,” Lecture Notes in Computer Science, vol. 8156, pp. 141–150, 2013. [12] S. Manivannan, R. Wang, and E. Trucco, “Extended gaussian-filtered local binary patterns for colonoscopy image classification,” in Int. Conf. on Computer Vision (ICCV-2013). [13] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man., Cybern., vol. 9, no. 1, pp. 62–66, 1979. [14] S. Ertürk, “Real-time digital image stabilization using kalman filters,” Real-Time Imag., vol. 8, pp. 317–328, 2002. [15] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H. Y. Shum, “Full-frame video stabilization with motion inpainting,” IEEE Trans Patt. Anal. Mach. Intell., vol. 28, no. 7, pp. 1150–1163, Jul. 2006. [16] M. K. Güllü and S. Ertürk, “Membership function adaptive fuzzy filter for image sequence stabilization,” IEEE Trans. Consumer Electron., vol. 50, no. 1, pp. 1–7, Feb. 2004. [17] A. Fog, “Software optimization resources (chapter-4) instruction tables: Lists of instruction latencies,” throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs [Online]. Available: http:// www.agner.org/optimize/instruction_tables.pdf