On Combining Fractional-Pixel Interpolation and ... - Semantic Scholar

2 downloads 5529 Views 895KB Size Report
Sen University, Guangzhou 510275, China, and also with the Guangdong Uni- versity of Finance ... P. S. Fisher is with the Department of Computer Science, Winston-. Salem State ... estimation (IPME), the computational cost for each fractional.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

717

On Combining Fractional-Pixel Interpolation and Motion Estimation: A Cost-Effective Approach Jiyuan Lu, Peizhao Zhang, Hongyang Chao, Member, IEEE, and Paul S. Fisher

Abstract—The additional complexity of the adoption of fractional-pixel motion compensation technology arises from two aspects: fractional-pixel interpolation (FPI) and fractionalpixel motion estimation (FPME). Different from current fast algorithms, we use the internal link between FPME and FPI as a factor in considering optimization by integrally manipulating them rather than attempting to speed them up separately. In this paper, a refinement search order for FPME is proposed to satisfy the criteria of cost/performance efficiency. And then, some strategies, i.e., FPME skipping, early termination and search pattern pruning, are also given for reducing the number of search positions with negligible coding loss. We also propose a FPI algorithm to save redundant interpolation as well as reduce duplicate calculation. Experimental results show that our integrated algorithm significantly improves the overall speed of FPME and FPI. Compared with the FFPS+XFPI and CBFPS+XFPI, the proposed algorithm has already reduced the speed by a factor of 65% and 32%. Additionally, our FPI algorithm can be used to cooperate with any fast FPME algorithms to greatly reduce the computational time of FPI. Index Terms—Fractional-pixel interpolation, fractional-pixel motion estimation, video coding.

I. Introduction ANY VIDEO compression standards utilize fractional motion compensation technology to obtain the extra R-D gain which provides improvement in video quality or reduction in bit rate [1]. But this enhancement of R-D performance comes at a very high price in terms of computational complexity [2]. Fractional-pixel motion compensation technology reduces the aliasing of the predicted signal [3] and increases coding

M

Manuscript received June 25, 2010; revised October 22, 2010; accepted December 13, 2010. Date of publication March 17, 2011; date of current version June 3, 2011. This work was supported in part by a special program to promote the development of technology services in Guangdong, China, under Grant 2010A040307003, in part by the National High Technology Research and Development 863 Program of China, under Grant 2007AA01Z340, and in part by the Guangdong Natural Science Foundation Council, China, under Grant 07003728. This paper was recommended by Associate Editor J. Ridge. J. Lu is with the School of Information Science and Technology, Sun YatSen University, Guangzhou 510275, China, and also with the Guangdong University of Finance, Guangzhou 510532, China (e-mail: [email protected]). P. Zhang is with the School of Information Science and Technology, Sun Yat-Sen University, Guangzhou 510275, China (e-mail: [email protected]). H. Chao is with the School of Software, Sun Yat-Sen University, Guangzhou 510275, China (e-mail: [email protected]). P. S. Fisher is with the Department of Computer Science, WinstonSalem State University, Winston-Salem, NC 27110 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2011.2129830

efficiency. Video encoders enabling this technology have two extra stages, one is fractional-pixel interpolation (FPI) and the other is fractional-pixel motion estimation (FPME). According to Minoo et al. [4], FPME along with FPI account for a significant portion of the CPU usage for video coding. And the computational burden of FPI is two times larger than that of FPME. If 1/8 pixel motion compensation [5] and an adaptive interpolation filter [6] are used, the computational load for FPI is significantly increased by introducing a more complex FPI technology. Most current fast algorithms speed up FPME and FPI separately and neglect the internal relationship between these two techniques. It is important to note the following facts: 1) the relationship between FPI and FPME is that the FPI algorithms are responsible for getting fractional-pixels, by interpolation, used by FPME algorithms, and 2) the minimal number of fractional-pixels which have to be calculated by the FPI relates directly to the number of search positions (SPs), as well as the actual search positions required for checking by the FPME algorithm. According to our study, an efficient algorithm should not only endeavor to reduce the number of SPs for FPME but also limit the calculation of fractionalpixels. In this paper, we take account these concerns and propose an integrated algorithm to speed up both FPME and FPI. In order to explain our contribution better, we first give a brief introduction and related literature review. Currently, the assumption by the FPME algorithms is that fractional-pixels are already calculated before they are applied. Therefore, fast FPME algorithms have been proposed for the purpose of reducing the number of SPs. Most of these algorithms have two steps. The first step is to predict the optimal motion vector (MV) by considering the correlation among neighboring MVs or by modeling the fractional-pixel error surface. The second step is a refinement search around the predicted MV. In general, the latter is the most time consuming part for getting the proper MV as both matching operation and fractional-pixel interpolation are required. According to different MV prediction technologies, we classified fast FPME algorithms into the following two categories. The first category of fast FPME algorithms is neighboring-MV based algorithms [7]–[9], which benefits from the similarities of MVs of adjacent blocks to determine an initial SP and refine the search by a small pattern such as diamond pattern. The idea behind this kind of algorithms is to check the SPs in the order of the probability from high to low of being the

c 2011 IEEE 1051-8215/$26.00 

718

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

optimal fractional MV. CBFPS is one of the most widely used neighboring-MV FPME algorithms [8]. The second category is the model-based approach. These algorithms establish a fractional-pixel error surface with a mathematical model and predict the optimal MV by finding the minimal of these models. Hill et al. [10] and Dikbas et al. [11] used different polynomial models to describe the error surface. Since it is difficult to model all kinds of error surfaces by a single model, Zhang et al. [12] classified error surfaces into two categories, well-conditioned and ill-conditioned, and then used different modeling techniques. However, mathematical models always have a mismatch with the actual error surface and a refinement search is usually required. Therefore, [13]–[16] proposed to use different refinement search techniques to rectify model prediction. But their refinement search only considers the R-D performance of different SPs and overlooks their computational cost. Different from integer pixel motion estimation (IPME), the computational cost for each fractional SP includes both costs of the search and the interpolation operations. An efficient refinement search algorithm for FPME should be the one to maximize the R-D performance as well as minimize the computational cost. Our previous work [17] provided a rough idea about how to integrate FPI into FPME to solve the above problem, but many details of clarification are required before a feasible algorithm can be given, which is one of the purposes of this paper. To compute the fractional-pixels for FPME, some algorithms of FPI appeared [4], [10], [11], [18], [19]. The simplest algorithm is to interpolate all fractional-pixels in advance [18], which will be called “full FPI” algorithm in this paper. Full FPI does not introduce any duplicate calculation because each fractional-pixel is only computed once. But redundant interpolation is serious since many fractional-pixels are unused in FPME. In addition, a large pool of memory is required for storing all fractional-pixels for each reference frame. To overcome the problem, Minoo et al. [4] proposed a Reciprocal FPME algorithm to lower the memory and computational requirements. This algorithm finds the best match by reciprocal interpolation instead of direct interpolation, and it uses the property of relative motion to compute the fractional MV for the current frame. The precondition necessary to correctly find the optimal fractional MV by this method is that only translational motion is allowable. However, many other types of motion appear in real world videos. This will result in a mismatch between the reciprocal optimal MV and the true optimal MV, and an R-D penalty will be incurred. Hill et al. and Dikbas et al. [10], [11] also proposed some model based FPME algorithms to find the optimal fractional MV from their models and eliminate the need of interpolation. However, the performance of these algorithms is highly dependent on model accuracy. Moreover, interpolation cannot be completely avoided because the fractional-pixels at the best matching position are indispensable for calculating the residual coefficients. On the other hand, the interpolation of fractional-pixels in H.264 is standardized [20]. The calculation of 1/4 pixels depends on 1/2 pixels which have to be clipped and rounded into a range of [0–255] after a 6-tap filtering. Any H.264-

compatible CODEC should use the same fractional-pixels as defined in the standard to avoid encoding and decoding mismatch errors. A proper 1/4 pixel on-the-fly interpolation algorithm is used by a free H.264 encoder, i.e., X264 [19]. This FPI algorithm, we call it XFPI in this paper, interpolates all 1/2 pixels in advance and calculates 1/4 pixels as needed. The main drawback of XFPI is that the calculation of 1/2 pixels is unavoidable whether they are useful for FPME or not. Since fast FPME algorithms always try to adapt to the characteristics of different videos and check the most likely optimal position first, calculating fractional-pixel according to a fixed order for FPI is inefficient. To remedy this inefficiency, a new interpolation algorithm is proposed. According to H.264, the calculation of fractionalpixels includes not only linear filtering operations but also clipping and rounding operations [20]. There will be errors if the new interpolation method only constructs new tap filters by linear transformation and ignores these intermediate clipping and rounding operations. These errors can be large and will lead to two serious consequences: 1) FPME cannot find the true optimal fractional MV for matching with wrong fractional-pixels, and 2) that drifting will occur for the fractional-pixels between the encoder and the decoder and this will induce more error for the pixel pairs. These two problems are the main obstacles for the use of an efficient FPI algorithm in H.264. In addition, the easiest FPI method, i.e., full FPI, avoids duplicate interpolation at the cost of the highest memory consumption and redundant interpolation. On the other hand, if fractional-pixels are completely interpolated on the fly, duplicate calculation is severe due to the interpolation dependency in H.264. Therefore, XFPI only interpolates 1/2 pixels in advance and calculates 1/4 pixels on the fly. This also causes redundant interpolation at 1/2 pixel level as well as duplicate calculation at 1/4 pixel level especially when variable block size motion estimation is used. Both redundant interpolation and duplicate calculation should be taken into consideration. Our paper has two main contributions. The first is to give a cost performance search order for a refinement search which checks fractional SPs by not only maximizing the expectation for the R-D gains but also minimizing the computational cost at the same time. The second contribution is a FPI algorithm. Five new interpolation filters are constructed for this FPI algorithm to meet the needs of different search orders, and all fractional-pixels are interpolated region by region. An optimized balance between redundant interpolation and duplicate calculation is achieved. Furthermore, to solve the problem of errors introduced by ignoring the intermediate clipping and rounding operations, error elimination approaches are proposed to ensure that the calculation is consistent with the H.264 standard. The remainder of this paper is organized as follows. Some background knowledge about FPI in H.264, full fractionalpixel search (FFPS) and XFPI algorithms are reviewed in Section II. Our new FPME and FPI algorithms are proposed in Sections III and IV, respectively. Experimental results are given in Section V. The conclusion is provided in Section VI.

LU et al.: ON COMBINING FRACTIONAL-PIXEL INTERPOLATION AND MOTION ESTIMATION: A COST-EFFECTIVE APPROACH

719

Fig. 1. Integer pixels (shaded blocks with upper-case letters) and fractional pixels (un-shaded blocks with lower-case letters) for fractional pixel interpolation in H.264.

II. Background and Preliminaries In this section, we will provide basic terminologies and notations which are necessary for the understanding of our algorithm. Additionally, the FPI in H.264, FFPS and XFPI algorithms are briefly reviewed. Our review will indicate that since the interpolation order in H.264 is exactly the same as the search order for FFPS, both FFPS and XFPI will operate efficiently together. But most of fast FPME algorithms [7]– [9], [13]–[16] do not follow this interpolation order to refine the search, as a result, the speedup gain by adopting XFPI is limited because of the redundancy of 1/2 pixel calculations and the unnecessary cost of memory. A. Fractional-Pixel Interpolation in H.264 The fractional-pixel interpolation in H.264 is depicted in Fig. 1. Shaded blocks with upper case letter id’s are integer pixels. Un-shaded blocks with lower case letter id’s are fractional-pixels. Among the fractional-pixels, aa, bb, b, j, b1, gg, hh, cc, dd, h, h1, ee and ff are 1/2 pixels and the others are 1/4 pixels. H.264 adopts several 1-D separable filters to compute the fractional-pixels and they are low-pass filters to reduce the aliasing components for the video signal [3]. The total number of fractional-pixels is 15 times more than the number of pixels in the original video, the calculation and the memory usage for these fractional-pixels requires considerable computational resources. We classify the fractional-pixels into 15 pixel sets according to their filter and interpolation method. In the rest of this paper, we use the abbreviation interpolated pixel set (IPS) to denote these different fractional-pixel sets. Fig. 2(a) enlarges the area surrounded by the four integer pixels, G, H, M and N in Fig. 1, and the number in parenthesis denotes the IPS for different fractional-pixels. For example, “a” belongs to 01 IPS, “c” belongs to 03 IPS, “d” belongs to 13 IPS, and so on. 02, 20, 22 IPSs are three 1/2 pixel sets and 00 IPS is the integer pixel set. The rest of IPSs are 1/4 pixel sets. Fig. 2(b) is the

Fig. 2. Dependency of the interpolation method in H.264. (a) Enlarged version of the center area in Fig. 1 with different IPS denotation for each pixel. (b) Relationship of different IPSs.

dependency relationship of different IPSs imposed by H.264. 02 and 20 IPSs have no predecessor IPSs and can be directly calculated from integer pixels. A predecessor IPS is an IPS on which the calculation of other IPSs depends. For example, according to Fig. 2(b), 02 and 20 IPSs are the predecessor IPSs of 1/4 pixels IPSs, 01, 03, 10 and 30. B. Full Fractional-Pixel Search (FFPS) and XFPI Algorithm FPME is an action required to find the best matching fractional position around the best matching integer position with the perimeter of 1 pixel. Fig. 3 is a demonstration, where the search area of the FPME is a 7 × 7 square surrounded by a dotted line. Excluding the center integer SP, there are 48 candidate fractional SPs. FPME compares the current block with fractional-pixels which belongs to the same IPS at each fractional SP. FFPS is the most commonly used FPME algorithm and achieves the best R-D performance since it checks more SPs. Fig. 3(a) is a demonstration of FFPS. FFPS checks 8–1/2 pixel SPs first and then selects the best matching 1/2 position. Then, FFPS checks 8 more 1/4 pixel positions around the best matching 1/2 position. In order to analyze the search order of FFPS and the interpolation order of H.264, we project fractional SPs into different IPSs as shown in Fig. 3(b). The search order of FFPS is exactly identical to the interpolation order imposed by H.264. The XFPI algorithm [19] makes use of this identity, as it interpolates all 1/2 pixels, i.e., pixels in 02, 20, 22 IPSs of the reference image in advance, followed by interpolation over the 1/4 pixels done on the fly. Compared with the full FPI algorithm, the XFPI algorithm is more appropriate for FFPS and saves the computation of some 1/4 pixels. However, other FPME algorithms do not have the same search order as FFPS and this causes redundant interpolation when the XFPI algorithm is applied.

720

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

TABLE I Sequences for Analyzing the Distribution of the Optimal Fractional MVs Format QCIF CIF

4CIF 720p 1080p

Fig. 3. Fractional search positions and their interpolated pixel sets (IPS). (a) FFPS around the best integer position. (b) Relation between fractional SPs and IPSs.

According to the above analysis, since FFPS and XFPI employ the same order in search and interpolation, they can go on very well and save a lot of redundant interpolation. But FFPS always checks 16 SPs for each search as Fig. 3(a) and it requires a large computational segment for video coding. Many fast FPME algorithms were proposed to reduce the number of SPs with minimal coding loss. However, because these fast FPME algorithms check fewer numbers of SPs than FFPS, there will be a lot of redundant interpolation when XFPI is being used. Compared to reducing the number of SPs for FPME, saving this redundant interpolation for FPI is more important. In the following sections, we propose algorithms to speedup FPME and FPI jointly. III. Cost Performance First FPME Algorithm In this section, we propose a cost performance first fractional-pixel motion estimation (CPF-FPME) algorithm for the second step of the FPME, i.e., the refinement search step. The initial position of the refinement pattern is determined by our previous method [9]. The CPF-FPME algorithm checks fractional refinement search pattern in the order of the likelihood for being the optimal position as well as being the simplicity of interpolation. Besides, some strategies are given to further reduce the number of SPs for FPME. A. Cost Performance First Search Order Neither model-based algorithms nor neighboring-MVs based algorithms can precisely predict the optimal fractional

Sequences Carphone, Claire, Grandma, Hall, Miss− am, Mthr− dotr, Salesman, Suzie, Trevor, Akiyo, Bowing Highway, Paris, Tempete, Waterfall, Bridge− close, Bridge− far, Coastguard, Container, Deadline, Football(a), Football(b), Foreman, Flower City, Harbour, Crew, Ice Mocal, Stockholm, Parkrun, Shields In− to− tree, Blue− sky, Life, Rush− hour

MVs, a refinement search is still required in many cases. In this subsection, we assign 48 SPs to one of six categories with different cost performance priorities. Each category has a different probability of being the optimal fractional MV (performance), and it will have a different computational complexity for interpolation (cost). The search order in a refinement pattern follows the priority of each SP. Most of the fast FPME algorithms extend the idea of fast IPME algorithm and check SPs only by the order of the expectation of R-D gains. That is, the first checked SP, in general, will have the highest probability of being the optimal position. But the computational cost spent on each fractional SP is different from those dealing with the integer SP. The differences are just caused by FPI. An efficient FPME algorithm should check those SPs first which have the highest expectation of R-D gains as well as the lowest computational cost. This is the key point of CPF-FPME algorithm. To design an algorithm for maximizing performance and minimizing cost at the same time, we must first analyze the probabilities of being the optimal fractional MV for each SP. We have done many experiments for this statistic. Fig. 4(a) is the probability distribution of the optimal MV in a fractional search area. The experiment has been performed on 36 sequences, shown in Table I, with varied motion activities. FFPS is used in our experiments. The center position (0, 0) in Fig. 4(a) is the best matching integer position. The closer to the center integer position the MV is, the higher the probability of being the optimal MV it will be. As the center integer position is always the best matching integer position, it is intuitive that the optimal MVs are most likely concentrated around the integer position with the smallest error. In addition, it is worth emphasizing that this distribution only affects the search order in a refinement pattern. Even if the distribution of the optimal MVs is not center oriented as occasionally occurs, our algorithm can still find the optimal MV with the only cost of checking more SPs. According to Fig. 4(a), we categorize the 48 fractional SPs into three ranks (H, M, and S) by their probabilities of being the optimal fractional MV depicted in Fig. 4(b). After discussing the probable R-D gain for searching different SPs, we analyzed the computational cost of different SPs. The computational cost of each SP is composed of the cost of the matching operation and the cost of the interpolation operation. The computational cost of the matching operation for each SP is uniform, but the computational cost of the

LU et al.: ON COMBINING FRACTIONAL-PIXEL INTERPOLATION AND MOTION ESTIMATION: A COST-EFFECTIVE APPROACH

Fig. 5.

721

Relation between fractional SPs and their IPBIF. TABLE III

Priority Ranking for IPSs by Their Cost Performance Priority Probabilities of Being the Size of IPBIF Related IPS Optimal Fractional MV 1 High Small 01, 03, 10, 30 2 Middle Small 02, 20 3 Middle Middle 11, 13, 31, 33 4 Low Small 01, 03, 10, 30 5 Low Middle 11, 13, 31, 33 6 Low Large 22, 12, 21, 23, 32 Fig. 4. Classification of fractional SPs according to their probabilities of being the best MV. (a) Probability distribution. (b) Classification of fractional SPs. TABLE II Classification of IPSs by the Size of IPBIF IPBIF 6-tap Cross 6-tap 6 × 6-tap

Size of IPBIF Small Middle Large

Related IPS 01, 02, 03, 10, 20, 30 11, 13, 31, 33 22, 12, 21, 23, 32

interpolation operation varies for different positions. Although the interpolation of 1/4 pixels depends upon 1/2 pixels in H.264, each pixel can be interpolated directly using integer pixels and a linear filter. We call this linear filter an integer pixel based interpolation filter (IPBIF) hereafter. Three types of IPBIFs are shown in Table II. They are a 6-tap filter, cross 6-tap filter and a 6×6-tap filter. The sizes of these IPBIF are 6, 12, and 36, respectively. A larger size for the IPBIF implies a higher computational cost for the interpolation operation. Fig. 5 shows the IPBIF for each fractional SP. SPs in the horizontal or vertical direction of the integer position associate with the simplest IPBIF, 6-tap filter. And the diagonally adjacent SPs of integer positions relate to cross 6-tap filters. All the remaining SPs employ 6 × 6-tap filters. We use the size of IPBIF as a measurement of the computational cost for each SP. As mentioned above, Fig. 4(b) shows the expected R-D performance gains of each SP and Fig. 5 shows the computational cost of each SP. Both of them should be considered in any priority ranking. According to the above analysis, we give six

Fig. 6.

Priority of each fractional SP.

ranks of priorities for fractional SPs in Table III. A higher rank has a higher probability of being the optimal MV with a smaller size of IPBIF. For example, the first rank of priority has the highest probability of being the optimal MV and the smallest IPBIF size. The priority rank for each SP is shown in Fig. 6. According to Fig. 6, we propose a cost performance first refinement search order to follow this priority ranking. This search order checks SPs with larger cost performance as early as possible. Fig. 7 is an example of a diamond search pattern. The letter and the number on the fractional SPs indicate their order for searching and their priorities. The predicted SP will always be checked first, and after that other SPs will be checked according to their priorities. If several SPs with the same priority exist, we will check them in raster order from

722

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

Fig. 8. Fig. 7. Our proposed FPME algorithm (CPF-FPME algorithm) performed on diamond search pattern.

top to bottom and left to right, e.g., positions b-1 to c-1 and positions d-4 to e-4 in Fig. 7. Although the proposed search order provides a path to find the optimal MV with the least computational cost, it does not save any number of SPs from the original fast FPME algorithms. In Section III-B, we give several strategies to speedup the search. B. FPME Skipping, Early Termination and Search Pattern Pruning Strategies The interpolation process must take place prior to the matching operation on each fractional-pixel, and the number of matching pixels is proportional to the number of SPs. Thereby, an efficient FPME algorithm is a precondition to a speedup of FPI. In this subsection, we propose some strategies to further reduce the number of SPs for the FPME. The best way for encoding static blocks is to ignore both FPI and FPME applied to them. Our FPME skipping strategy makes use of the high similarities of the optimal error among temporally neighboring static blocks to determine whether FPI and FPME should be performed. Most of early termination criteria [21]–[23] use the correlation of the optimal error among neighboring blocks. But according to our experiments, the optimal errors among neighboring blocks vary sharply, especially for motion intensive sequences. And it is very difficult to set a rule for all circumstances. However, there is a very high correlation among the optimal errors of temporally neighboring blocks with static behavior. It makes sense that the residual errors of static sequences are stable. With this in mind, we propose an FPME skipping criterion specified for static blocks. The current block is very likely to be static if its integer MV is (0, 0) as well as its optimal integer error is close to the optimal error of the collocated block which is also static in the previous frame. FPME should be currently ignored on these static blocks. A threshold, Th1, is given to decide whether the current integer optimal error is close enough to the optimal error of the collocated one. The proposed FPME skipping criterion can save considerable computation on FPME for static sequences. If the FPME skipping criterion is not satisfied, then the refinement search using the diamond pattern will be applied. At least five SPs will be checked before the diamond search stops. To terminate the search as quickly as possible, we make use of the error slope. The error slope is the error difference of the current minimal SP and the current sub-minimal SP. Fig. 8

Example of the error slope for our early termination strategy.

is a 1-D example. If the error slope of A is small enough, then it is not necessary to check C to ensure that the local optimal is B. Here, the error slope is defined as follows: slopeerror = (errorsub-min − errormin )/dmin-submin

(1)

where the errormin is the minimal error and errorsub-min is the second minimal error. dmin-submin is distance between the smallest and second smallest positions. Refinement search stops as the error slope is smaller than a threshold, Th2. However, the optimal settings of Th1 and Th2 are highly dependent on the video characteristics. Thus, both Th1 and Th2 are decided adaptively. Two ratios of missing rate to hitting rate are introduced to adjust Th1 and Th2, respectively Ratiothreshold =

MissingRate × 100(%). HittingRate

(2)

The missing rate is the false negative rate of terminating by using the current threshold and the hitting rate is the rate of successfully terminating. As the rate of successfully terminating cannot be obtained precisely without checking the skipped SPs, we use the terminating rate as an approximation. We measure both the missing rate and the terminating rate of Th1 and Th2 dynamically and use two empirical values, R1 and R2, to determine whether the ratios of Th1 and Th2 are too large or not. For example, if the Ratiothreshold of Th1 is larger than R1, Th1 is too large to skip FPME on many static blocks and it should be decreased. Even if the above FPME skipping and early termination criterions are not satisfied, we will have a further pruning strategy for the search pattern to speed up the FPME. This strategy is based upon the hypothesis of a unimodal error surface on the fractional-pixel search area which is used by many FPME algorithms [10], [15], [24], [25]. Most of FPME algorithms used symmetrical search patterns, e.g., diamond search pattern, to refine the search. If the error surface of fractional-pixel search area is indeed a unimodal, then the SPs in the opposite direction of a symmetrical pattern cannot have lower errors than the center SP in the same time. We propose that SPs should be skipped when SPs in their opposite direction have a smaller error than the center SP. Fig. 9 is an example of the proposed pruning strategy for the search pattern. The letter and the number on each SP represent its searching order and priority, respectively. SP a is the center of the diamond pattern. The proposed pruning strategy skips the opposite positions of SP b-1 and SP c-1 since both of these SPs have smaller error than SP a. SP c-1 is the optimal position

LU et al.: ON COMBINING FRACTIONAL-PIXEL INTERPOLATION AND MOTION ESTIMATION: A COST-EFFECTIVE APPROACH

Fig. 10.

C. Overall Algorithm of Our Proposed CPF-FPME CPF-FPME algorithm has two stages. The first is to predict the optimal fractional MV by our previous work [9]. However, many other prediction schemes [7], [8], [10]–[16] can also be used in this stage. The second is to refine the search with a selected pattern, such as diamond pattern. Before checking each search position, our proposed FPI algorithm, which will be presented in Section IV, is applied to interpolate the relevant fractional-pixels. In summary, the processing steps of our proposed FPME algorithm are shown below. Step 0: Initialization for FPI. 15 Flags of all IPSs for the current macro-block (MB) are set to 0. Step 1: If the FPME skipping criteria in Section III-B is satisfied then both FPME and FPI are skipped for this MB. Step 2: Our previous work [9] is used to predict the initial fractional MV, that is the center of the following refinement process. Step 3: Refinement search. SPs in refinement pattern around the predicted MV are checked. Step 3.1: Before checking each SP, the associated IPS is interpolated by calling the function InterpolationProcess defined in Section IV-C. Step 3.2: Matching the current SP. Step 3.3: If the early termination criteria is satisfied, then the search is terminated. Step 3.4: If pattern pruning criteria is meet, then the opposite SP is skipped. Step 3.5: Go back to Step 3.1 until all SPs in the current diamond pattern are checked. Step 4: If the optimal SP is the center of the diamond pattern, the search stops. Otherwise, the diamond pattern moves to the new center and Step 3 is carried out repeatedly.

Dependency relationship of our proposed CPF-FPI algorithm. TABLE IV

Fig. 9. Example of our proposed pruning strategy for diamond search pattern.

and our proposed pruning strategy saves two SPs compared with the ordinary diamond search. The above strategies further reduce the number of SPs for our proposed FPME algorithm. This is a premise we use for improving the calculation on the FPI because the number of matching pixels is the lower bound of the number of interpolated pixels. Our proposed FPME algorithm not only takes the computational cost of interpolation into consideration but also improves the search speed of FPME.

723

Five New Filters Defined in Fig. 10 No. 1 2 3 4 5

Source IPS 00 01, 03, 10, 30 01, 03, 10, 30 02, 20 12, 21, 23, 32

Target IPS 01, 03, 10, 30 02,20 11, 13, 31, 33 12, 21, 23, 32 22

Tap Filter (1, −5, 52, 20, −5, 1)>>6 (2, −1) (1, 1, −1) (1, −5, 52, 20, −5, 1)>>6 (2, −1)

IV. Cost Performance First FPI Algorithm In order to better cooperate with CPF-FPME algorithm in the previous section, we propose a cost performance first fractional-pixel interpolation (CPF-FPI) algorithm here to reduce the computation for FPI. A. Fractional-Pixels Interpolation Dependency Relationship As most fast FPME algorithms do not check SPs in a fixed order, the dependent interpolation defined by H.264 is inefficient. Thereby, a new dependency relationship for interpolation is given in Fig. 10. The solid lines depict the ordinary interpolation filters of H.264 and the dash lines indicate our new filters. For example, according to Fig. 2(a), a belongs to 01 IPS and it can be directly calculated from integer pixels by a = (E − 5F + 52G + 20H − 5I + J)/64

(3)

where E, F, G, H, I, J are the values of integer pixels in Fig. 1. A series of new filters are proposed in Table IV. These filters correspond to the new dependency relationship (dashed lines) in Fig. 10. B. Proposed Approaches for Error Elimination Besides linear filtering operations, the interpolation of fractional-pixels for H.264 also includes rounding and clipping operations. These operations help to keep the interpolated pixels in the range of 0 to 255. Therefore, simply using filters above will incur mismatch error because this ignores both the rounding and the clipping operations in H.264. These errors mentioned before produce two consequences: 1) the accuracy of the optimal MV is reduced because we find the optimal MV by matching with the inaccurate fractionalpixels, the search will not stop at the true optimal MV, and 2) the drift effect will accumulate if errors are introduced into the motion compensation phase, then the reference frame in the encoder and decoder will be different causing drifting in the decoded video signal. In order to solve this problem, we

724

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

next give approaches to eliminate the errors in order to make our FPI algorithm compatible with H.264 standard. Suppose we are going to use five new filters defined in Table IV to calculate pixel a, b, e, f, and j, respectively. First, to correctly compute a, we give an approach to eliminate the errors introduced by ignoring the clipping operation. For the following examples, let E = 167, F = 69, G = 226, H = 232, I = 56, J = 3 as shown in Fig. 1. According to the formula (3), a’ is 249. But according to H.264, a is an average of b and G. b is (E − 5F + 20G + 20H − 5I + J)/32 = 272 and then we follow that by a clipping operation, producing b = 255. This makes a equal to (G + b + 1) >> 1 = 241. The difference between a’ and a is 8. The reason for this mismatch is that our proposed algorithm does not have the clipping operation for pixel b. To compensate for this mismatch, we introduce two intermediate variables, Ssum = (E − 5F + 20G + 20H − 5I + J)/32 and Psum = (E − 5F + 52G + 20H − 5I + J)/32. The formula for a is given as follows: a = (G + b + 1) >> 1 = (G + Clip(Ssum , 0, 255) + 1) >> 1 (4) where the term Clip(Ssum , 0, 255) means clipping Ssum into the range of 0 to 255. To represent a in terms of Psum , we change the clipping range. So formula (4) is equal to a = (Clip(G + Ssum , G, G + 255) + 1) >> 1 = (Clip(Psum , G, G + 255) + 1) >> 1

(5)

where a is calculated by Psum . The idea for error elimination is to change the clipping range from (0, 255) to (G, G + 255). Second, to correctly compute b, we give an approach to eliminate the error by ignoring the rounding operation. The magnitude of the rounding error is much smaller than the magnitude of the clipping error since the maximum rounding error for each pixel is only 1. We use an example to describe how this error occurs. The following formula calculates pixel a in H.264: a = (G + b + 1) >> 1.

(6)

Let b = 41 and G = 42, then a should be 42, but when we substitute a = 42 and G = 42 back into (6), we have b equal to 42. The result contradicts the original setting of b. The reason for this contradiction is that the two b values project to the same a value because of the rounding operation in (6). Whenever b = 41 or b = 42, a is equal to 42. So b cannot be correctly computed by substituting only a and G. To correct for this, we use an extra variable to save the lost information from rounding. This lost information is the parity of (G + b + 1). From formulas (5) and (6), (G + b + 1) = (Clip(Psum , G, G + 255) + 1). So the parity of (G + b + 1) is flaga = (Clip(Psum , G, G + 255) + 1)&1

(7)

then b can be computed correctly by a, G, and flaga as b = 2a − G − flaga .

(8)

Third, e can be calculated from a and d by simple substitution. Since e is the average of b and h, then e = (b + h)/2 = (2a − G − flaga + 2d − G − flagd )/2 = a + d − G − (flaga & flagd ).

(9)

Fourth, f is the average of b and j. The calculation of f is similar to that of a Fsum = (aa − 5bb + 52b + 20b1 − 5gg + hh)/32 f = (Clip(Fsum , b, b + 255) + 1) >> 1

(10)

and the flag of f is stored to compute j flagf = CLIP(Fsum , b, b + 255)&1.

(11)

Finally, the calculation of j is similar to that of b j = 2f − b − flagf .

(12)

The basic idea of these approaches can be extended to all other cases in our algorithm. In the next subsection, we will provide a recursive algorithm for interpolation. C. Implementation of Cost Performance First Fractional-Pixel Interpolation In this subsection, we give the implementation detail of our interpolation algorithm. Three issues, i.e., storing and computation scheme, multi-mode interpolation and speedup with single instruction multiple data (SIMD) technology, will be addressed. The CPF-FPI algorithm is also given at the end of this subsection. To reach an optimized balance between redundant interpolation and duplicate calculation, we propose to store fractionalpixels region by region and calculate them IPS by IPS in each region. Each region is around the best matching integer MV and extends 1 pixel in four directions (up, down, left, and right) of the current block. For example, if FPME is applied on 16×16 mode, the region size is 18×18. Fractional-pixels in a region are classified into 15 IPSs. Since each SP associates with a different IPS, we propose to interpolate the whole IPS as needed. 15 flags are also introduced to denote whether each IPS has been calculated already. Because the calculation of IPS is dependent, a recursive algorithm is given to calculate each IPS on the fly. All the fractional-pixels will be discarded after motion compensation finishes. If multi-mode is applied, interpolation around the best matching integer MV of each block will introduce a severe duplicate calculation. As we know, the position and the size of an interpolation region depend upon its best matching integer MV and block size. Thus, we propose to use a rectangle area to encompass all interpolation regions. Integer pixel motion estimation has to be applied on all modes first and FPME will be conducted after that. Fig. 11 is an example for 8 × 8 and 16 × 16. In Fig. 11(a), the best matching integer MVs for 8 × 8 and 16 × 16 are estimated. The interpolation region is the smallest rectangle containing all integer blocks in Fig. 10(b). And the fractional-pixels in this interpolation region are calculated IPS by IPS as required.

LU et al.: ON COMBINING FRACTIONAL-PIXEL INTERPOLATION AND MOTION ESTIMATION: A COST-EFFECTIVE APPROACH

725

TABLE V Different Groups of Sequences for Testing Group A B C D E F

Format QCIF SIF CIF 4SIF 720p 1080p

Sequences News, Bus, Husky, Intros Mobile, Garden, Tt, Stenfan, Tennis Pamphlet, Sign− irene, Silent, Students Galleon, Vtclnw, Washdc Crowd− run, Ducks− take− off , Old− town− cross Riverbed, Park− joy, Pedestrian− area

Step 4: Interpolating the current IPS by the calculated predecessors.

V. Experimental Results

Fig. 11. Extended interpolation region for 8 × 8 and 16 × 16. (a) Best matching integer blocks of 8 × 8 and 16 × 16 modes. (b) Interpolation region is the smallest rectangle including all the best matching integer blocks.

Fractional-pixels in a same IPS employ the same interpolation filter and error elimination approach. Their speed of calculation can be improved by SIMD technology. Several fractional-pixels can be accessed or processed in parallel in only one CPU cycle. With a SIMD processor there are two improvements to the CPF-FPI algorithm. For one the fractionalpixels are arranged in blocks, and a number of values can be loaded all at once. Instead of getting each pixel one by one, a SIMD processor will have a single instruction to get several pixels simultaneously. Another advantage is that SIMD systems typically include those instructions that can be applied to all of the data in one operation. In other words, if the SIMD system works by loading up eight pixels at once, the add operation being applied to the data will happen to all eight values at the same time. CPF-FPI is suitable to use SIMD technology for it computes fractional-pixels from one IPS to another IPS. Finally, we present the detail algorithm for CPF-FPI. Only fractional-pixels in 01, 10, 03, 30, 02, and 20 IPSs can be calculated directly, and pixels in other IPSs can be computed so long as their predecessor IPSs exist. Thus, CPF-FPI algorithm makes use of the 15 IPS flags to produce fractional-pixels recursively. The function InterpolationProcess is responsible for interpolating all fractional-pixels of the required IPS. Below is the description of Function InterpolationProcess. Function InterpolationProcess Input: the index of the required IPS (00 ∼ 33). Output: the fractional-pixels for the required IPS. Step 1: Checking the flag of the required IPS. If the required fractional-pixels have been interpolated, then return. Step 2: If the predecessors of this IPS (indicated in Fig. 10) already exist then interpolate the required fractional-pixels directly and return after setting the flag = 1 for the current IPS. Step 3: Otherwise, recursively call the Function InterpolationProcess to interpolate the predecessors of the required IPS.

In order to evaluate our proposed algorithm, comparisons of different FPI algorithms are given in Section V-A. Additionally, the comparison of the overall performance of our proposed algorithm with different FPME and FPI algorithms is given in Section V-B. The results will show that not only our proposed algorithms attain a higher speed in both FPME and FPI procedures, but also our FPI algorithm will significantly speed up FPI even when other fast FPME algorithms are used. We implement the proposed algorithm using the H.264 reference software JM15.0 [18]. R-D optimization of high precision is enabled in JM15.0. Each sequence is coded in 100 frames with only one I frame and no B frames for QP = 16, 20, and 24. Due to the limitation of paper length, only the average results of these three QPs displayed here. The experiments were run on a PC with Intel core 2 duo at 2.4 GHz. The test has been performed on 22 sequences in Table V with varied resolutions and different motion activities. We classified these sequences into six groups based on their resolutions. Note that sequences in Table V and sequences in Table I are not overlapped. The computational cost of our algorithm is composed of the complexity of both FPME and FPI. The complexity of FPME relates to the number of SPs. The complexity of FPI relates to the number of interpolated pixels and the size of their filters. It is difficult to adopt a unified computational measurement for FPME and FPI. Thus, we use the computational time required for FPME and FPI as a measurement for computational complexity. A. Comparison with Different FPI Algorithms Fig. 12 shows the computational performance of full FPI, XFPI and CPF-FPI when different FPME algorithms, e.g., FFPS, CBFPS [8], S-UMH FPS [18], Hill et al. [10] and Chang et al. [15] are performed. Because CPF-FPI is a lossless FPI algorithm compatible with the H.264 standard, we only give the comparison results on speed. We assume the full FPI algorithm as a reference benchmark and its computational time is set at 100%. It is easy to see from Fig. 12 that CPF-FPI algorithm significantly reduces the computational complexity over the full FPI and XFPI when fast FPME algorithms, such as CBFPS, S-UMH FPS, Hill et al. [10], and Chang et al. [15] algorithms are used. Even if FFPS

726

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

Fig. 12. Average computational complexity of different FPI algorithms with different FPME algorithms.

is adopted, both CPF-FPI and XFPI save about 17% of the computational time over the full FPI. This means CPF-FPI has the same computing complexity as XFPI in the worst case. For the situation when fast FPME algorithms are utilized, a much larger improvement on speed is attained by our CPFFPI algorithm over XFPI. In Fig. 12, the CPF-FPI algorithm achieves the lowest computational complexity when a fast FPME algorithm, Hill et al. [10], is applied. Hill et al. [10] not only proposed a fast FPME method but also provided some techniques to skip FPI on some macro-blocks. And our CPFFPI algorithm benefits from these techniques as we interpolate fractional-pixels region by region. To explain the source of the computational savings by the CPF-FPI algorithm, Fig. 13 is a frame by frame comparison on the number of interpolation pixels for different FPI algorithms. For the sake of comparing sequences with different resolution, we use the average number of interpolation pixels per MB as the y-axis. The lowest line in Fig. 13 represents the minimal number of fractional-pixels which have to be computed. That is the number of fractional-pixels which are used by FPME. The number of this line is calculated by the number of SPs times the size of a MB, i.e., 256. Because the interpolation of fractional-pixels is correlated, no FPI algorithm can reach this minimal number without duplicate calculations. The other three lines from the top down in Fig. 13 represent the number of calculated pixels of full FPI, XFPI and our proposed CPFFPI algorithms. Although the XFPI algorithm skips a lot of fractional-pixel computation compared with the full FPI algorithm, our proposed CPF-FPI algorithm keeps the number of calculated fractional-pixels much closer to the minimum. Table V gives the average behavior of different FPI algorithms on different sequence groups. The savings on the average number of interpolation pixels per MB of CPF-FPI algorithm are obvious on all sequence groups. B. Overall Performance of Our Proposed Algorithm Fig. 15 is the overall computational performance comparison of our proposed algorithm with FFPS+XFPI and CBFPS+XFPI. We take FFPS+XFPI as a reference benchmark and its computational complexity is set at 100%. Compared with other algorithms, our proposed algorithm achieves the highest performance over speed and, respectively, saves about 65% and 32% of the computation. As the resolution decreases,

Fig. 13. Number of calculated fractional pixels frame by frame on sequence Pedestrian− area (1080p).

Fig. 14. Distribution of the optimal fractional MVs on Ducks− take− off sequences. TABLE VI Number of Interpolation Pixels per Macro-Block with Different FPI Algorithms Sequences Group A Group B Group C Group D Group E Group F

The Minimum 1156 1233 1233 1011 1402 1777

CPF-FPI 1382 1398 1411 1536 1843 1920

XFPI 2535 2708 2699 2833 2711 2772

Full FPI 3840 3840 3840 3840 3840 3840

the computational savings of our proposed algorithms becomes larger. The cause of the larger savings in low resolution video is due to the fact that more of the final MVs are located at 1/4 pixel positions of the first cost/performance priority so the finer fractional MV becomes more of a requirement. Since our FPI algorithm can interpolate those 1/4 pixels directly, it is suitable for these circumstances. On the other hand, memory usage of XFPI also increases when the resolution gets larger, and this will demand extra time for the memory swapping. The actual speed of XFPI in high resolution video will be slower than that in Fig. 15. However, our algorithm only interpolates the best integer candidate MB and uses a fixed amount of memory, which reduces the memory requirements in comparison to the other algorithms. This factor also contributes to the computational performance. Besides video resolution, motion activity is the second most important factor in the computational performance of our

LU et al.: ON COMBINING FRACTIONAL-PIXEL INTERPOLATION AND MOTION ESTIMATION: A COST-EFFECTIVE APPROACH

Fig. 15.

Computational performance of FFPS+ FPI, CBFPS+XFPI, and the proposed algorithm. TABLE VII

R-D Performance Comparison of CPF-FPME+CPF-FPI and CBFPS+XFPI Algorithms

Sequences Group Group Group Group Group Group

727

A B C D E F

Average PSNR (dB) Proposed CBFPS+XFPI 0.01 dB −0.03 dB −0.02_ dB −0.01 dB −0.01 dB −0.02 dB −0.01 dB 0 dB −0.02 dB −0.01 dB −0.03 dB −0.03 dB

Average Bit rate (%) Proposed CBFPS+XFPI 0.35% 0.33% 0.63% 0.31% 0.4% 0.13% 0.32% 0.45% 0.61% 0.26% −0.02% 0.11%

proposed algorithm. In Fig. 15, Groups A, B, E and F contain intensive motion activities and the remaining sequences are relatively inactive. It can be observed that even in active video sequences, the speedup of our proposed algorithm is obvious. However, more computational savings can be obtained in static video sequences. This is due to the SPs saving strategies in Section III-B. In Fig. 15, we also give the computational performance on two high definition sequences, Ducks− take− off and Riverbed. Since motion in these sequences is highly irregular, such as flowing water and wave ripple, the distribution of their optimal fractional MVs is not center oriented as Fig. 14. For this reason, our algorithm will lose its advantage on FPME in this worst case. However, as can be seen from Fig. 15, the computational improvement by our CPF-FPI algorithm still exists. Table VII shows the comparison results on R-D performances of different algorithms over various video sequences. As CPF-FPI and XFPI are lossless FPI algorithms and produce the same result, the R-D performances are only affected by different FPME algorithms. We compute the relative R-D performance of the proposed algorithm and CBFPS+XFPI over FFPS+XFPI as PSNR = PSNRx − PSNRFFPS Bit rate = (Bit ratex − Bit rateFFPS )/Bit rateFFPS

(13)

where PSNRFFPS and Bit rateFFPS are the PSNR and the Bit rate of FFPS algorithm. PSNRx and Bit ratex are the PSNR

and the Bit rate of our proposed and CBFPS+XFPI algorithms. The results in Table VII show that the proposed algorithm has almost the same R-D performance on average as FFPS+XFPI and CBFPS+XFPI. Compared with FFPS+XFPI, the average degradation in PSNR and increase in Bit rate of our proposed algorithm are 0.02 dB and 0.38%. VI. Conclusion In this paper, an integrated algorithm for FPME and FPI for H.264 was proposed. We provided a solution to show how to coordinate FPI with FPME in order to speed up both of them. We also solved the incompatibility with the H.264 standard by giving error correction approaches to eliminate any possibility for introduced and then propagated error. According to our analysis and experiments, our proposed algorithm not only improves the speed of FPME, but also dramatically avoids the excess interpolation of unused fractionalpixels. Also, our algorithm only uses a fixed amount of memory to perform FPI and it saves considerable time that would be associated with the processing of this additional memory and its access. Acknowledgment The authors would like to thank the anonymous reviewers for their invaluable comments that greatly improved the quality of this paper, as well as Prof. D. Huang, Dr. F. Wu, Prof. J. Feng, Prof. F. Liang, and Prof. L. Lin for their helpful discussions and suggestions. References [1] O. Werner, “Drift analysis and drift reduction for multiresolution hybrid video coding,” Signal Process. Image Commun., vol. 8, no. 5, pp. 387– 409, 1996. [2] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC: Tools, performance, and complexity,” IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7–28, Jan. 2004. [3] T. Wedi and H. G. Musmann, “Motion and aliasing-compensated prediction for hybrid video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 577–586, Jul. 2003.

728

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 6, JUNE 2011

[4] K. Minoo and N. Truong, “Reciprocal subpixel motion estimation: Video coding with limited hardware resources,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 6, pp. 707–718, Jun. 2007. [5] J. Ostermann and M. Narroschke, Motion Compensated Prediction with 1/8: PEL Displacement Vector Resolution, ITU-T SG16/Q.6 document VCEG-AD09, Hangzhou, China, 2006. [6] Y. Vatis and J. Ostermann, “Adaptive interpolation filter for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 179–192, Feb. 2009. [7] Y.-J. Wang, C.-C. Cheng, and T.-S. Chang, “A fast algorithm and its VLSI architecture for fractional motion estimation for H.264/MPEG-4 AVC video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 5, pp. 578–583, May 2007. [8] Z. Chen, J. Xu, Y. He, and J. Zheng, “Fast integer-PEL and fractionalPEL motion estimation for H.264/AVC,” J. Vis. Commun. Image Representation, vol. 17, no. 2, pp. 264–290, 2006. [9] H. Chao and J. Lu, “A high accurate predictor based fractional pixel search for H.264,” in Proc. IEEE Int. Conf. Image Process., Oct. 2006, pp. 2365–2368. [10] P. Hill, T. K. Chiew, D. Bull, and N. Canagarajah, “Interpolation free subpixel accuracy motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 12, pp. 1519–1526, Dec. 2006. [11] S. Dikbas, T. Arici, and Y. Altunbasak, “Fast motion estimation with interpolation-free sub-sample accuracy,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 7, pp. 1047–1051, Jul. 2010. [12] Q. Zhang, Y. Dai, and C. Kuo, “Direct techniques for optimal subpel motion resolution estimation and position prediction,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1735–1744, Dec. 2010. [13] P. R. Hill and D. R. Bull, “Sub-pixel motion estimation using kernel methods,” Signal Process. Image Commun., vol. 25, no. 4, pp. 268–275, 2010. [14] Y. Lin and Y. C. Wang, “Improved parabolic prediction-based fractional search for H.264/AVC video coding,” Image Process. IET, vol. 3, no. 5, pp. 261–271, Oct. 2009. [15] J. Chang and J. Leou, “A quadratic prediction based fractional-pixel motion estimation algorithm for H.264,” J. Vis. Commun. Image Representation, vol. 17, no. 5, pp. 1074–1089, 2006. [16] L. Shen, Z. Zhang, Z. Liu, and W. Zhang, “An adaptive and fast fractional pixel search algorithm in H.264,” Signal Process., vol. 87, no. 11, pp. 2629–2639, 2007. [17] J. Lu, P. Zhang, H. Chao, and P. Fisher, “An integrated algorithm for fractional pixel interpolation and motion estimation of H.264,” in Proc. DCC, Mar. 2010, p. 541. [18] H.264/AVC Reference Software Version JM15.0. (2009, Jan.) [Online]. Available: http://iphome.hhi.de/suehring/tml/download/15.0.zip [19] X264: A Free H264/AVC Encoder. (2009, Apr.) [Online]. Available: ftp://ftp.videolan.org/pub/videolan/x264/snapshots/x264-snapshot20090409-2245.tar.bz2 [20] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264, ITU-T, 2007. [21] J. Luo, I. Ahmad, Y. Liang, and V. Swaminathan, “Motion estimation for content adaptive video compression,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 7, pp. 900–909, Jul. 2008. [22] J.-H. Kim and B.-G. Kim, “Fast block mode decision algorithm in H.264/AVC video coding,” J. Vis. Commun. Image Representation, vol. 19, no. 3, pp. 175–183, 2008. [23] Z. Xie, Y. Liu, J. Liu, and T. Yang, “A general method for detecting all-zero blocks prior to DCT and quantization,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 2, pp. 237–241, Feb. 2007. [24] J. S. Kim, K. W. Lee, and M. H. Sunwoo, “Novel fractional pixel motion estimation algorithm using motion prediction and fast search pattern,” in Proc. IEEE Int. Conf. Multimedia Expo, Jun. 2008, pp. 821–824. [25] D. Cheng, H. Yun, and Z. Junli, “PPHPS: A parabolic prediction-based, fast half-pixel search algorithm for very low bit-rate moving-picture coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 6, pp. 514–518, Jun. 2003.

Jiyuan Lu received the M.S.E. degree in software engineering from Sun Yat-Sen University, Guangzhou, China, in 2006. He is currently working toward the Ph.D. degree in computer science from the School of Information Science and Technology, Sun Yat-Sen University. His current research interests include video coding, image processing, and video communication.

Peizhao Zhang received the B.E. degree in software engineering and the M.E. degree in computer applied technology from Sun Yat-Sen University, Guangzhou, China, in 2008 and 2010, respectively. His current research interests include image/video processing, computer graphics, and computer vision.

Hongyang Chao (M’06) received the B.S. and Ph.D. degrees, both in computational mathematics from Sun Yet-Sen University, Guangzhou, China. From 1988 to 1990, she was an Assistant Professor with the Department of Computer Science, Sun Yet-Sen University. In November 1990, she became an Associate Professor of computer science and the Director of the Institute of Computational Mathematics. From 1994 to 1995, she visited the Department of Computer Science, Stanford University, Stanford, CA, as a Research Scholar under the support of the Lingnan Foundation, New Haven, CT. She visited the Department of Computer Science, University of North Texas, Denton, as a Visiting Professor from June 1995 to April 1998. During this period, she joined Infinop, Denton (later changed to Vianet), where she was a Founding Researcher. She was continuously with the same company as a Chief Scientist until 2004. Afterward, she joined the School of Software, Sun Yet-Sen University, where she was an Administrative Deputy Dean for the school from 2004 to 2008. She is currently an Associate Dean and a Full Professor with the school. She has published extensively in the area of image/video processing and holds three patents. Her current research interests include the areas of image and video processing, image and video compression, massive multimedia data analysis and understanding, content based image (video) retrieval. Paul S. Fisher received the B.A. and M.A. degrees in mathematics from the University of Utah, Salt Lake City, and the Ph.D. degree in computer science from Arizona State University, Tempe. He is the R. J. Reynolds Distinguished Professor of computer science with Winston-Salem State University, Winston-Salem, NC. Over his tenure as a Faculty Member, he has managed more than 100 proposal efforts for corporations and the Department of Defense involving teams from 1 to 15 people. He has consulted with the U.S. Army, Navy, Air Force, and several companies over the years. In the 1990s, he commercialized a small business innovation research contract from the Navy in a company he founded. The contract specified the development of a wavelet codec for both still and video imagery, and its commercial name was Lightning Strike. He later sold the company to return to academia. His current research interests include wireless communication and network management for sensor networks, image processing, and pattern recognition.

Suggest Documents