algorithm is based on merging NxN PU partitions in order to compose larger ones. ... execution time reduction of up to 34% with insignificant rate- distortion losses ... quad-tree based on decisions that where taken inside the current frame. Due to this .... persisted through all the consecutive test models: the quad- tree based ...
2012 IEEE International Conference on Multimedia and Expo
Motion Vectors Merging: Low Complexity Prediction Unit Decision Heuristic for the InterPrediction of HEVC Encoders Felipe Sampaio, Sergio Bampi
Mateus Grellert, Luciano Agostini, Julio Mattos
PPGC - PGMicro - II Federal University of Rio Grande do Sul Porto Alegre/RS, Brazil {fmsampaio, bampi}@inf.ufrgs.br
GACI - PPGC - CDTec Federal University of Pelotas Pelotas/RS, Brazil {mgdsilva, agostini, julius}@inf.ufrgs.br (HEVC) standard is emerging, aiming to double the compression rates when compared to H.264/AVC for the same video quality [3]. The final draft of this standard is expected to be released in January 2013. The HEVC standard is still under development, but its current innovations already bring negative expectations regarding its complexity. The reference software for this standard, called HEVC Model (HM) [4], implements tools that are much more complex than those of H.264/AVC reference software, causing great concern when real-time applications are considered. In addition, this complexity is also a critical drawback for mobile devices, since a great amount of energy will be required to perform video coding/decoding tasks. The inherent data structures defined in the HEVC standard can be highlighted as one of the main causes of its high complexity [5]. Frames are now divided into Treeblocks, which can be subdivided into Coding Units (CU). Furthermore, CUs can be recursively partitioned, forming a quad-tree structure, which is illustrated in Fig. 1. The CU quad-tree decision is originally performed using the RateDistortion Optimization (RDO) decision [6], which evaluates the bitrate and the objective quality (generally expressed by the PSNR, in dB) produced by every possible configuration. In other words, the prediction, residual coding and entropy coding stages are performed for every possible CU partition. During the prediction stage, each CU is once again divided into Prediction Units (PU), introducing a PU decision inside each node of the CU decision tree. Each PU decision also considers the RDO as decision strategy [5].
Abstract— This paper presents the Motion Vectors Merging (MVM) heuristic, which is a method to reduce the HEVC inter-prediction complexity targeting the PU partition size decision. In the HM test model of the emerging HEVC standard, computational complexity is mostly concentrated in the inter-frame prediction step (up to 96% of the total encoder execution time, considering common test conditions). The goal of this work is to avoid several Motion Estimation (ME) calls during the PU inter-prediction decision in order to reduce the execution time in the overall encoding process. The MVM algorithm is based on merging NxN PU partitions in order to compose larger ones. After the best PU partition is decided, ME is called to produce the best possible rate-distortion results for the selected partitions. The proposed method was implemented in the HM test model version 3.4 and provides an execution time reduction of up to 34% with insignificant ratedistortion losses (0.08 dB drop and 1.9% bitrate increase in the worst case). Besides, there is no related work in the literature that proposes PU-level decision optimizations. When compared with works that target CU-level fast decision methods, the MVM shows itself competitive, achieving results as good as those works. Keywords- High-efficiency Video Coding, Inter-Prediction, Low-Complexity Decision, PU-Level Decision
I.
INTRODUCTION
The recent advances in technology enabled many improvements regarding multimedia systems. Along with these advances, the consumer market constantly demands better quality media, such as higher resolution digital videos. When H.264/AVC [1] (the state-of-the-art videocoding standard) was created, only a few specific devices supported 1080p videos. However, the current scenario is different, since 1080p videos are now supported by a plethora of electronic devices and higher resolutions begin to catch the market’s attention. With that in mind, a group composed of video coding experts from ITU-T and ISO/IEC was formed under the name of Joint Collaborative Team on Video Coding (JCT-VC) [2]. The purpose of the JCT-VC was to develop a new video-coding standard with improved compressing tools focusing on high-resolution videos. From this collaboration, the High Efficiency Video Coding 978-0-7695-4711-4/12 $26.00 © 2012 IEEE DOI 10.1109/ICME.2012.37
1st depth 64x64
3rd depth 16x16
4th depth 8x8 3rd depth 16x16
3rd depth 16x16
4th depth 8x8
2nd depth 32x32
3rd depth 16x16 4th depth 8x8
4th depth 8x8
Figure 1. Example of quad-tree CU partitioning
657
Several works targeting complexity solutions for video encoders targeting the H.264/AVC can be found in the literature [7-9]. These works proposed new heuristic modedecision algorithms to reduce the encoder complexity based on encoding information. However, these solutions cannot be directly imported to the HEVC context, due to the new coding structures that provide different data processing procedures when compared to H.264/AVC. There are also some works that propose solutions for the HEVC standard by relying on heuristic decisions in order to reduce the CU encoding time. In [10], encoding execution time is controlled according to a specific target complexity. Then, using historical information of past decisions, the decision core is able to avoid some CU size evaluations, assuming that they would not be chosen in an RDO decision. The work of [11] performs a similar decision: it cuts some CU nodes in the quad-tree based on decisions that where taken inside the current frame. Due to this fact, the author classified its decision as a frame-level decision. The goal of this work is to heuristically accelerate encoding time by acting on a different level in the CU quadtree decision: the PU level, more specifically in the interprediction step. HEVC defines that each PU must be predicted for the three different modes available: skip, intraand inter-prediction. Some analyses point that the inter-frame prediction consumes from 60% up to 96% of the total encoding time. Such complexity is derived from Motion Estimation (ME), the main tool in the inter-frame prediction, which performs exhaustive searches in order to find the best match between the current block of pixels that is being encoded and an area of pixels from one or more reference frames. These results indicate immense degrees of optimizations, since each CU node in the quad-tree must call ME for each possible PU partition. This work presents the Motion Vectors Merging (MVM) heuristic, a fast decision method for the inter-prediction PU decision, which aims to reduce the complexity associated with the encoding process. The MVM decides the PU partition by using heuristic approaches instead of calling the Motion Estimation (ME) procedure for every possible configuration. In doing so, the PU decision complexity can be significantly reduced, therefore reducing the complexity inside each node of the CU quad-tree. The MVM proposed heuristic checks all internal borders among the neighboring NxN partitions. Each border is analyzed and a heuristic defines if the partitions can be merged or not. If any merging occurs, the rate-distortion cost is evaluated for the decided PU partition, producing enough information for the CU-level decision. The MVM fast decision method for inter-prediction PUs was implemented in the HM 3.4 test model [4], and ratedistortion results, as well as speed-up results, were generated. The obtained results show that MVM is capable of achieving a complexity reduction of 34% with insignificant loss in terms of quality (0.08 dB on average) and minor increase in bitrate (1.9% in the worst case). In addition, MVM can be easily integrated with other solutions that reduce encoding complexity targeting different coding structures, such as CUs.
The remainder of this paper is organized as follows: section 2 describes works found in the literature that also target complexity reduction in HEVC encoders; section 3 further explains the complexity issues concerning the coding process, presenting detailed analyses through different perspectives; section 4 is focused on thoroughly explaining the MVM fast PU decision, which is the main contribution of this work; on section 5, PNSR, bitrate and complexity results are shown; and section 6 finally discusses conclusions and future work. II.
RELATED WORK
To the best of our knowledge, there are no published works that present fast decision heuristic-based methods targeting the HEVC PU decision. All related work focus on avoiding the evaluation of every depth in the CU quad-tree decision. These works perform cuts in the quad-tree by keeping the history of decision during the current frame encoding or using the past frames decisions history. The work presented in [10] proposes a complexity control in the HM test model in order to control the encoder execution time. Since the target of [10] is power-constrained applications, the complexity control checks the battery status and decides by running the encoding process on a specific complexity: higher complexity, higher energy consumption and best rate-distortion results, or lower complexity, lower energy consumption and worst rate-distortion results. This work avoids some CU evaluations based on past frames analyses: (1) one frame is full RDO processed and, afterwards, (2) based on the best rate-distortion results the decision core builds a history map of decisions that (3) will direct the future frame decisions using the already built map of past choices. The work presented in [11] proposes a fast coding unit decision algorithm for frame-level or CU-level encoding process acceleration. The acceleration is achieved by skipping some CU evaluations based on past frames coding information and neighbor CU coding results. The PU-level decision is not considered in this work. III.
HEVC QUAD-TREE BASED DECISION
Differently from H.264/AVC encoders, which divide the frame in coding units of fixed 16x16 size blocks (called macroblocks), the first proposals for HEVC encoders already proposed more flexible coding units. Therefore, the fixed macroblocks are now treated as Coding Units (CU), which can assume dimensions larger than 16x16 pixels [5]. Each CU can be recursively sub-divided into blocks of 64x64, 32x32, 16x16 and 8x8 pixels. During prediction, each CU is once again divided into PUs. There are four possible PU partitions, 2Nx2N, 2NxN, Nx2N and NxN, where 2N is equivalent to the corresponding CU dimension. The PU partitions for a 64x64 CU and their corresponding sizes are illustrated in Fig. 2.
658
64
64
32
32
64
64
2NxN
2Nx2N
Nx2N
NxN
Figure 2. PU partitions for a 64x64 CU
LD
All these possible configurations are supported in order to enable patterns more suitable to different picture regions (homogeneous or heterogeneous parts). The HEVC first test model implements a CU size decision technique that persisted through all the consecutive test models: the quadtree based approach [5]. The HM test model implements all possibilities by distributing them in a tree-based data structure. The decision algorithm evaluates all possibilities by performing the depth first search (DFS) in a recursive manner. Inside each tree node, the RD cost is calculated for all possible configurations: (a) the three prediction modes (skip, intra and inter modes) and (b) the four possible partition sizes (2Nx2N, 2NxN, Nx2N and NxN). In fact, the latest HM 3.4 test model [4], which is used in this work, restricts the skip modes to assume only 2Nx2N partition size and the intra modes to be partitioned in 2Nx2N and NxN [5]. Considering predictive (P) or bi-predictive (B) slices, the HM test model analyzes the skip, inter and intra modes for each CU node in the quad-tree. The proposals for the intraprediction in HEVC considerably increase the computational complexity of this process when compared to H.264/AVC encoders (in the worst case, 35 prediction modes must be analyzed, against 9 in the worst case of H.264/AVC intraprediction) [12]. The skip prediction mode defines a CU as a 2Nx2N PU that has no relevant information regarding transform coefficients, and all motion data, i.e., motion vectors, reference frames indices among others, are inferred from the motion merge technique [5]. Skip-predicted CUs bring no relevant complexity overhead. Lastly, the interprediction is still responsible for most of the computational complexity among all the other coding tools.
BQTerrace (1080p)
RA
BasketballDrive
LD
RA Cactus
B. RDO Decisions Experimental Analysis The HM test model works with two possible complexity settings: Low Complexity (LC) or High Efficiency (HE). Additionally, there are other three configuration sets that define the coding/access parameters in the coding process: Intra Only, Random Access and Low Delay [13]. The combination of these two sets of specifications can generate six different test conditions parameters. In this work, since the focus is to propose a fast algorithm for the PU partition size decision in the inter-prediction, the Intra Only parameter set will be not considered in the analysis. In order to evaluate the RDO decisions, four different video sequences were evaluated. Three of them are HD 1080p (1920x1080 pixels) test sequences: BasketballDrive (few details, medium amount of colors per frame, high motion), BQTerrace (medium details, few amount of colors per frame, medium motion) and Cactus (large amount of details and colors per frame, high motion) [13]. Fig. 4 shows some results related to the prediction modes that are chosen by the default RDO decision in the HM test model. Considering the three video sequences, the most used prediction mode is the skip mode. On average, 71% of the samples are predicted using skip mode when the LD configuration is set and almost 80% are predicted as skip mode for the RA parameters. The inter-prediction generates the best results in 26% and 17% of the cases, for LD and RA respectively. The intra-prediction does not surpass 10% in the Basketball Drive. On average, only 3.6% of all CUs are intra-coded. The complexity analysis shows that the inter-prediction spends from 63% up to 96% of the overall HM execution time. On the other hand, considering the best case in Fig. 4, where the inter-prediction is the best option in only 36%, the
Vidyo (720p)
TZ Search
LD
INTRA
For P and B slices, the inter-prediction takes place in each of the CU quad-tree nodes. The HM test model performs an optimization that avoids the inter-prediction for NxN PU partitions in CUs that are not in the maximum tree depth, e.g., the smallest CU size possible (which will be referred to as leaf CUs). This happens due to the fact that a PU of size NxN in a CU of depth d is equivalent to a PU of size 2Nx2N in a CU of depth d+1. In other words, for each CU node in the quad-tree, the inter-prediction calls ME three (in non-leaf CU nodes) to four times (in leaf CUs) during inter-prediction.
ME
Full Search
INTER
Figure 4. RDO-based decision for the three prediction modes in P and B slices: skip, inter and intra choices.
Rest
TZ Search
RA
BQTerrace
A. HM Complexity Analysis Fig. 3 presents some analyses regarding the execution time spent in the ME compared to the remaining encoding tools. The HM execution profile shows that Motion Estimation, the main tool of the inter-prediction, spends more than 96% of the required encoding time when the Full Search algorithm is used. In addition, even when applying TZ Search, a fast algorithm that achieves a speed up of 23 times when compared to Full Search with small quality losses [4], ME takes more than 70% of the total encoding time. 100% 80% 60% 40% 20% 0%
SKIP
100% 80% 60% 40% 20% 0%
32
32
Full Search
Figure 3. HM execution time for FS and TZ block match algorithms.
659
highest complexity module is responsible for not the most part of the final codification modes. Based on these results, it is possible to conclude that heuristic-based approaches for the inter-prediction can significantly reduce the encoder complexity, and consequently the execution time, besides it can provide good results in the rate-distortion optimization.
64
C. Inter-Prediction PU Decisions This analysis aims to statistically map the inter-prediction decision considering each CU size. Considering all CU dimensions (64, 32, 16 and 8), every partition size decision was counted. It is important to have a trace that shows the most used partition size when the HM is running with the Full RDO decision. With that in mind, Fig. 5 shows the PU partition size decision considering three video sequences. It is important to notice that the 2Nx2N partition is the preferred partition choice, specially for smaller CU sizes. On average, 2Nx2N partitions are chosen in 85% of the PU partition decisions. The rectangular partitions, 2NxN and Nx2N, are the best option in 7% and 7.3% respectively. The NxN PU partition is used only for 8x8 CU with an inexpressive part of the total decisions, corresponding only to 0.2%. The rate-distortion decision results show that the 2Nx2N PU partitions for four CUs of depth d+1 take priority when compared to NxN PU decisions for a d-depth CU. This fact is also considered for the proposed low complexity PU decision algorithm, i.e, 2Nx2N PUs take preference as much as possible. Besides, the 2NxN and Nx2N PU partitions have approximately the same occurrence, which is much lower than the 2Nx2N occurrence. This indicates that the proposed algorithm must prioritize the 2Nx2N choice when possible. In addition, 2NxN and Nx2N partitions should have with the same priority, and NxN PUs are used only in 8x8 CUs and should have the lowest priority of all. IV.
2Nx2N
100% 80% 60% 40% 20% 0% 32
16
BQTerrace
8
2NxN
64
32
Nx2N
16
NxN
8
64
BasketballDrive
32
16
8
Cactus
Figure 5. RDO-based for inter-prediction PUs.
Fig. 6 and 7 show the MVM algorithm and the heuristic decision that was employed respectively. For each CU, the ME is initially performed for all four NxN partitions. With the vectors produced from ME, the heuristic function is applied to decide which partitions can be merged. The conditions that drive the merge decision are based on the vector similarity among the neighboring PUs. The PU decision tree is evaluated in a top-down manner. If all four PUs produced the same motion vectors, then the 2Nx2N is considered the best PU size and the ME is performed for that size only. When 2Nx2N does not meet the restrictions, the rectangular sizes are evaluated following the same steps, i.e., if two neighboring PUs produced the same motion vectors, then the corresponding PU size will be used and ME is applied to produce the best motion vector for that PU size. It is important to notice that ME must be applied for the PUs that are selected via heuristic decision in order to achieve better RD results. In doing so, motion vectors that are different from the ones that were used to decide whether PUs are merged can be produced. Borders Merging Heuristic 1. function mergeBorders ( A, B, C, D ): 2. if (A.B) + (B.D) + (A.C) + (C.D) then 3. partSize ĸ 2Nx2N 4. else if (A+D) then 5. partSize ĸ 2NxN 6. else if (B+C) then 7. partSize ĸ Nx2N 8. else 9. partSize ĸ NONE 10. return partSize
MVM FAST PU DECISION METHOD
As previously explained, ME is called three to four times during the inter-prediction inside each CU node in the quadtree. This characteristic is extremely unwanted considering real time and energy constraints. Therefore, solutions that aim to reduce the number of ME calls are very important to make real time HEVC encoding a feasible task. The Motion Vectors Merging algorithm described in this work was designed with such goal in mind. In order to accomplish this, a heuristic is applied to decide all PU sizes above NxN. When certain conditions are met, NxN partitions are merged into 2NxN, Nx2N or 2Nx2N partitions without performing ME operations for each one of the larger candidate PU sizes. If the conditions are not met, then ME takes place and the regular procedure holds for all possible partitions. It is important to emphasize that 2Nx2N PUs must take priority in the event of a tie, since these partitions will produce a lower bitrate. This is explained by the fact that greater PUs requires less motion information per pixel, such as motion vectors and reference frame index.
Figure 6. Border merging algorithm. Proposed Fast Inter PU Decision Algorithm A ĸ motion vector block 0 B ĸ border between blocks 0 and 2 of a CU C ĸ border between blocks 1 and 3 of a CU D ĸ border between blocks 2 and 3 of a CU 1. function checkInterBestPU( cu ): 2. //four motion vectors for each block in NxN PU partition 3. bestMV ĸ motionEstimation(cu, NxN) 4. A ĸ (bestMV[0] == bestMV[1]) 5. B ĸ (bestMV[0] == bestMV[2]) 6. C ĸ (bestMV[1] == bestMV[3]) 7. D ĸ (bestMV[2] == bestMV[3]) 8. bestPartSize ĸ mergeBorders(A, B, C, D) 9. resultsME ĸ motionEstimation(cu, bestPartSize) 10. return calcRDCost (resultsME)
Figure 7. Fast Inter PU decision algorithm
660
TABLE I.
The MVM was integrated in the HM 3.4 test model. All rate-distortion evaluations were performed under the following settings: (a) low complexity configuration, (b) IPPP prediction structure, (c) search window dimension [±64] and (d) two search algorithms (Full Search and TZ Search). Fig. 8 presented the HM test model execution time reduction comparison between the full RDO based decision available in the HM test model and the proposed MVM fast decision method. The results show that the proposed MVM fast inter PU decision method accelerates the encoding time by 35%. This optimization can be even higher when more complexity heuristics are inserted in the inter-prediction step. These results are expected, since most part of the encoding effort is spent on the inter-prediction, more specifically in the ME step, and these are the modules that the proposed algorithm targets to optimize. Besides, this encoding time reduction is achieved with minimum rate-distortion loses. Fig. 9 presents the sets of rate-distortion evaluations for the two video sequences. The QP parameters were assigned to 22, 27, 32 and 37, as specified in the common test conditions document. For all cases, the PNSR drop and the bitrate increase are negligible. The average results when using the Full Search algorithm show a PSNR drop of only 0.08 dB with a bitrate increase of 1.37% on average. When the TZ Search is adopted, similar behavior is detected, by increasing only in 1.91% the bitrate and with 0.06 dB of PSNR drop. The rate-distortion results show that the proposed MVM fast decision method is highly efficient when compared with the full RDO implementation. For all cases, the RD graphic of the MVM is almost equal to the best corner case implementation, as presented in Fig. 9. Due to the fact that the proposed algorithm acts in the PU level, it provides much more implementation flexibility. In other words, it is possible to combine the proposed inter PUlevel decision with any decision that works in the above levels. This flexibility makes the MVM easily applicable in any HEVC encoder solution targeting low complexity. The low complexity that is provided by the use of the MVM is interesting when power-constrained devices are targeted. As in [3], where the complexity is adjusted accordingly with what the target application needs, the proposed PU decision algorithm could be selected when battery life is at a critical level, providing almost the same RD coding efficiency with significant complexity reduction. Tab. 2 presents the comparison with related works. There are no published works that perform fast decision in PU level. All works assume that the full RDO is applied for all PU partition sizes. This way, the comparison will assume the CU-level heuristic algorithms that aggressively avoid some CU depths processing in order to save computation. The work [10] performs several complexity target analyses. The comparison in Tab. 2 considers only the 60% target, which is the nearest possible scenario that can provide a fair comparison.
MVM VS RDO NUMBER OF ME CALLS
MVM Best Worst Case Case 1 2 1 2
Decision Leaf CUs Non-leaf CUs Total
1
2
RDO Best Worst Case Case 3 4 3
4
This process saves ME operations, because when a particular NxN PU is associated with a larger PU size that met the restrictions applied, all other PU size candidates that share that same NxN PU are automatically discarded. The MVM decision has mainly two possible cases. The first case, is when no merging occurs, and NxN is set as best partition for the current CU, without further ME calls. This case is better is terms of complexity, because ME is called one time only, but it’s also bad for bitrate results, since larger partitions result in less information per pixel, such as motion vectors and reference frame indices. The second case is when one larger partition is produced by the MVM decision, requiring one additional ME call in order to produce better RD results. Tab. 1 presents the number of ME calls required by both MVM and RDO decisions. By analyzing the Tab. 1, it is possible to conclude that the MVM decision is faster than the RDO one in every possible case, since the number of ME calls is always lower. Other heuristics with different conditions can be applied in order to achieve better results in terms of complexity or video quality. Heuristics that define more flexible conditions will accept a greater number of merges, which will definitely reduce complexity, but this will most likely bring negative impacts to video quality. The reverse applies for heuristics that establish more strict conditions, which will probably produce better quality results, but lack in terms of complexity decreasing. V.
RESULTS AND DISCUSSIONS
The MVM algorithm was evaluated under some test conditions. The input test sequences were chosen in order to provide the behavioral analysis of the proposed fast decision method in two different video characteristics: high motion and large amount of details is exploited in Cactus, while medium motion and medium amount of details is characteristic of BQTerrace. Proposed Full RDO 0
0.2
0.4
0.6
0.8
1
Execution Time (normalized)
Figure 8. HM test model execution time reduction
661
Cactus
BQTerrace
40
40 38
36
PSNR (dB)
PSNR (dB)
38
34 32
36 34 32
30 HM RDO
28 0
20
40
60 Bitrate (kbps)
80
Proposed 100
RDO Full
30 0
120
10000
20000 Bitrate (kbps)
Proposed 30000
40000
Figure 9. Rate-distortion analyses on the proposed inter PU decision
MVM (FS)
[10] (target 60%)
Leng [11]
PU-level
CU-level
CU-level
As future work, different heuristics will be elaborated intending to find better complexity reduction at the same video quality. In addition, hardware implementations of the MVM will be designed to estimate the resources overhead introduced by this solution in HEVC hardware encoders.
-0.08 dB
-0.01 dB
-0.04 dB
REFERENCES
1.37%
1.26%
-0.01%
-35%
-38%
-44%
TABLE II.
Specs. ¨PSNR (average) ¨Bitrate (average) ¨Executio n Time
RELATED WORKS COMPARISON
[1]
[2]
One of the main important advantages of the MVM proposed method is that its implementation in high efficiency encoders is somewhat easy. Since the NxN partitions are processed, the proposed algorithm just compares all motion vectors among them and, according to a Boolean expression, decides the best inter-prediction partition for the target PU. This way, this simple complexity provides encoding results that are equivalent to the CU-level fast decisions that inserts much more control flow in the encoding process. Besides, MVM PU-level decision for the inter-prediction process can be extended for integration with other decisions that act on different coding structures levels. It is important because the ME is the most expensive step in high efficiency encoders and it is executed many times during the CU treebased RD decision. By optimizing this bottleneck, the proposed decision can achieve an execution time reduction almost equal to the ones achieved by aggressive CU-level decisions. VI.
[3]
[4] [5] [6]
[7]
[8]
[9]
[10]
CONCLUSIONS
This paper proposed the Motion Vectors Merge algorithm, a novel fast heuristic decision for the interprediction PU trees of the emerging HEVC standard. MVM relies on saving the number of ME operations performed in each PU tree by merging NxN PUs into larger ones without applying the RDO approach to make this decision. When PUs are merged through the heuristic decision, ME is applied for the merged PUs in order to achieve better RD costs. Experimental results point an execution time reduction of 34% on average by just acting in the inter-prediction in PU level. This speed-up is achieved with insignificant losses in the rate-distortion encoding results.
[11]
[12]
[13]
662
JVT (T. Wiegand, G. Sullivan, A. Luthra), Draft ITU-T Rec. and final draft int. stand. of joint video spec. (ITU-T Rec.H.264|ISO/IEC 14496-10 AVC), 2003. JCT-VC Group. Available: Last accessed: dec. 2011. ISO/IEC-JTC1/SC29/WG11, "Joint Call for Proposals on Video Compression Technology," ed. Kyoto, Japan: Video Coding Experts Group (VCEG), 39th Meeting, 2010. ISO/IEC-JTC1/SC29/WG11, "HEVC Reference Software Manual," ed. Geneva, Switzerland, 2011. JCT. Working Draft 3 of High-Efficiency Video Coding. JCTVCE603, 2011. G. Sullivan and T. Wiegand, "Rate-Distortion Optimization for Video Compression", IEEE Signal Processing Magazine, v. 15, Nov. 1998, pp.74-90 Z. Liu; L. Shen; Z. Zhang; “An Efficient Intermode Decision Algorithm Based on Motion Homogeneity for H.264/AVC,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, pp. 128-132, 2009. J. You, W. Kim, J. Jeong, “16x16 macroblock partition size prediction for H.264 P slices,” IEEE Trans. On Consumer Electronics, vol. 52, pp. 1377-1383, 2006. H. Zeng; C. Cai; K.-K. Ma, “Fast Mode Decision for H.264/AVC Based on Macroblock Motion Activity,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, pp. 491-499, 2009. G. Correa. et al, “Complexity control of high efficiency video encoders for power-constrained devices,” IEEE Transactions on Consumer Electronics, v.57, n.4, pp.1866-1874, November 2011 Leng, Jie; Sun, Lei; Ikenaga, T.; Sakaida, S.. "Content Based Hierarchical Fast Coding Unit Decision Algorithm for HEVC," CMSP, v. 1, no., pp.56-59, 14-15 May 2011. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the H.264/AVC video coding standard," IEEE TCSVT, vol. 13, pp. 560-576, 2003. ISO/IEC-JCT1/SC29/WG11, "Common test conditions and software reference configurations," ed. Daegu, Korea, 2011.