The percentage of MBs with perfect match within a frame determines the power consumption of the frame memory in motion compensation. A MB with perfect ...
Combined Frame Memory Architecture for Motion Compensation in Video Decoding Nelson Yen-Chung Chang and Tian-Sheuan Chang Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan Email: {ycchang, tschang}@twins.ee.nctu.edu.tw Abstract—This paper proposes a combined frame memory architecture which is smaller in size and is potential in reducing power consumption compared to the most commonly used ping-pong frame memory. The combined frame memory maps both reference frame data and current frame data onto one single frame memory instead of two in ping-pong architecture. Together with the characteristic of high percentage of MBs with zero-valued MVs and no residual, the combined frame memory architectures is evaluated to be able to reduce not only the memory size, but also the average energy consumption and memory access latency for applications like surveillance, video phone, and video conference. According to the statistics and analysis result, the proposed combined frame memory architecture memory size is only 57% compared to ping-pong architecture. The proposed combined frame memory architecture can reduce up to 83% of average latency and 39% of average power consumption compared to ping-pong frame memory architecture.
I.
INTRODUCTION
Motion compensation plays a major role in video decoding which reconstructs current frame using reference frame data, such as macroblock (MB), predicted by motion vector. The operation in motion compensation can be regarded as copying the predicted MB from reference frame first, and then add the predicted MB with the residual MB to reconstruct the MB in current frame. This operation involves extensive amount of reading from the frame memory and then writing to it. Consequently, frame memory access becomes the dominating part in the power consumption of a video decoder [1]. In addition, the requirement of storing the great amount of reference frame data and current frame data results in a frame memory which would occupy most of the silicon area in motion compensation. Therefore the optimization of the frame memory architecture for low cost and low power is of great significance in reducing cost and power consumption of motion compensation. The most common frame memory architecture is pingpong frame memory, which stores reconstructed current frame and reference frame in two separate memories. Each
0-7803-8834-8/05/$20.00 ©2005 IEEE.
memory in ping-pong frame memory has the size of one frame. On each completion of a frame’s motion compensation, the role of current frame memory and reference frame memory swaps. Hence the current frame memory in previous frame would become the reference frame memory for current frame. However ping-pong frame memory requires the memory size of two frames, which is a significant amount. Along with the large size of frame memory is the power consumption due to accessing the large sized memories. In order to reduce power consumption due to frame memory access, reducing frame memory accesses and frame memory size is important. According to statistics analysis on test sequences, we found that some sequences exhibit high percentage (>70%) of MBs without motion, typically the still background part. Motivated by the aforementioned characteristic, which is also facilitated in [1] to reduce memory accesses in motion estimation, this paper proposes the combined frame memory architecture which combines reconstructed current frame memory and reference frame memory into one single memory. For each MB without motion or residual, no further memory access for copying MB from reference frame to current frame is necessary because current frame data and reference frame data reside in the same memory. Thus it is possible to reduce power consumption due to frame memory access. In addition, the memory size is reduced compared to that of ping-pong frame memory. Consequently, the cost for motion compensation can be lower. For QCIF resolution with search range of [-16:+15], the total memory size of the combined frame memory architecture is 56.6% of total memory size in ping-pong frame memory. The rest of this paper is organized as follow. Section II introduces the combined frame memory and evaluates the memory size. Then section III presents the statistics demonstrating the characteristics found in various test sequences, and also analyzes the average memory access latency and energy consumption for the combined frame
1806
Processing MB n
Main Frame Memory (MFM) Processing MB
Rec. Current Frame
Reference MB n
MB row 0 MB row 1 MB row 2 MB row 3 MB row 4 MB row 5 MB row 6 MB row 7 MB row 8
Rec. Curr. MB n MB Overlapped Life Time
MBOLT n
Reference Frame
n+1 n+2 n+3 n+4 n+5
Search Range Strip Buffer (SRSB)
n+6 n+7 n+8
Dirty Table (DT)
n+9 n+10
Figure 1. Memory components for QCIF with search range of [-16:+15] in the combined frame memory
n+11
SRSB must store 12 MBs
Figure 2. Life time analysis of MBs
memory. At the end, section IV gives a brief summary and concludes this work. II.
COMBINED FRAME MEMORY
The combined frame memory architecture combines current frame memory and reference frame memory together. Current frame data and reference frame data are mapped to one single frame memory with the size of one single frame, which is different from the way frame data are mapped in the commonly used ping-pong frame memory. There are three major parts in the proposed combined frame memory architecture: the main frame memory (MFM), the search range strip buffer (SRSB), and the dirty table (DT). These components are illustrated in Figure 1 for QCIF size with search range of [-16:+15]. The function of each component is explained as follow. •
Main frame memory (MFM): The main frame memory is where current frame data and reference frame data are stored. Reconstructed current frame data are stored at the upper part of the MFM whereas reference frame data are stored at the lower part of the MFM. The size of MFM is as large as on single frame, i.e. 176x144x1.5 bytes for QCIF.
•
Search range strip buffer (SRSB): The search range strip buffer is a rectangular strip of memory which works as an exchange buffer for reference frame data. If one reference MB in MFM is to be updated by a reconstructed current MB, the original reference MB would be copied into SRSB as a backup for subsequent motion compensation. This avoids the reference frame data from being ruined by reconstructed current frame data. The size of SRSB is determined by the height of search range and the width of a frame, i.e. 16x(176+16)x1.5 bytes for QCIF with the search range of [-16:+15].
•
compensation requires the reference pixels of this MB, these reference pixels will be acquired from SRSB instead of MFM as indicated by corresponding dirty bits. The size of DT varies according to the size of SRSB, i.e. 16x(176+16) bits for QCIF with the search range of [-16:+15].
Dirty table (DT): The dirty table keeps record of which pixels in MFM are updated. If a MB in MFM is to be updated by reconstructed current frame, then the corresponding bits of the updated pixels in that MB will be set. This indicates that the reference pixels in that MB are stored into SRSB for backup as mentioned earlier. If subsequent motion
The sharing of one single frame memory is based on the life time analysis of the collocated MB in current frame and reference frame as illustrated in Figure 2. For each MB, the life time of current frame data and reference frame data overlaps for a portion of period during the processing of one single frame. The overlapped period is determined by search range’s height, the larger it is, the longer the overlapped life time is. This overlapped lifetime of collocated MBs would be referred as MB overlapped life time (MBOLT) here on. The maximum number of MBs having overlapped MBOLT determines the size for SRSB and DT. Hence for the case of QCIF with the search range of [-16:+15], the maximum number of MBs having overlapped MBOLT is 12 MBs. Only one extra SRSB of 12 MBs, a DT of 12 bits, and one MFM of QCIF frame size (99 MBs) are needed to store frame data instead of using two QCIF size frame memories. The formulation of memory size required for MFM, SRSB, and DT are listed in TABLE I. The overall memory size is also compared with that of the most commonly used pingpong frame memory. The memory size of the combined frame memory architecture is 56.6% compared to that pingpong frame memory architecture. III.
STATISTICS AND ANALYSIS
The percentage of MBs with perfect match within a frame determines the power consumption of the frame memory in motion compensation. A MB with perfect match is one which has zero-valued MV and no residual. The reconstruction of such MB does not require the summation of the motion compensated (predicted) MB and the residual MB. For instance, not-coded MB in MPEG-4 [2] is a MB with zero-valued MV and no residual; hence a not-coded MB is a MB with perfect match. If a MB block is one with perfect match, the MB data read from reference frame memory is the same as the MB data written to the current frame memory when using ping-pong frame memory.
1807
TABLE I. Memory size of components in combined frame memory Memory Size Formula (bytes)
MFM
height_frame x width_frame x 1.5
SRSB
DT Combined Total Ping-pong Total
floor(height_SR/height_MB) x height_MB x (width_frame + (floor(width_SR/width_MB) x width_MB)) x 1.5 floor(height_SR/height_MB) x height_MB x (width_frame + (floor(width_SR/width_MB) x width_MB)) x 0.125 size_of_MFM + size_of_SRSB+size_of_DT height_frame x width_frame x 1.5
Size for QCIF with SR of [-16, +15] (bytes) 38,016 4,608
384
Test sequences
QCIF (%)
CIF (%)
stefan (C) coastguard (B) foreman (B) mobile container (A) mother_daughter (A) hall (A) news (B) akiyo (A)
15.71 10.35 24.49 10.93 91.74 81.42 86.21 82.53 91.32
20.90 2.69 23.38 3.39 88.91 77.65 83.86 83.01 89.09
43,008 100%
76,032
However, in the proposed combined frame memory architecture, the MB with perfect match resides within the MFM instead. As for a MB without perfect match, the reference MB in MFM must be copied into SRSB as a backup of reference frame, then the predicted MB is read out from MFM and summed with the residual to reconstruct the current MB, the reconstructed MB is written back to MFM finally. Therefore a MB with perfect match performs less memory accesses then one without perfect match. Considering that most of the power consumption in motion compensation is contributed from memory accesses, the higher the percentage of MB with perfect match is, the more likely that the power consumption can be reduced using the combined frame memory architecture. The average percentages of MBs with perfect match within one frame when QP=16 are listed in TABLE II. The statistics are gathered from running MPEG-4 VM18 [3] for various test sequences. Both the results for QCIF and CIF sequences are listed. The parenthesis next to each sequence represents the class it belongs as classified in [3]. Class “A” to “C” represents different levels of spatial detail and amount of movement, where class “A” is the lowest class and class “C” is the highest class. Test sequences exhibiting large portion of still background have high average percentages of MBs with perfect match, which are higher than 70%, such as akiyo, container, mother_daughter, news, and hall. Other test sequences with more motion in background, such as foreman, stefan, coastguard, and mobile, have less than 30% of percentages of MBs with perfect match. The power consumption is evaluated by the total energy consumption of processing a frame. The combination of MFM/SRSB could be external DRAM MFM/internal SRAM SRSB or internal SRAM MFM/internal SRAM SRSB. Hence the energy consumption will be evaluated by modeling the energy consumption of accessing one MB in MFM and SRSB as EMFM and ESRSB. This assumes that the energy consumption of reading and writing from/to the same memory are the same. Based on the previous assumption, the average energy consumptions of processing a QCIF frame
Percentage of Power Reduction Compared to Ping-pong's (%)
Memory
TABLE II. Percentage of MBs with perfect match when QP=16
k=1 k=4 k=8
80% 60% 40% 20% 0% -20% -40%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
P0
-60% -80% -100%
Figure 3. Percentage of power reduction for the combined frame memory with k=1, 4, 8 and ping-pong frame memory versus different P0.
with search range of [-16:+15] are listed in TABLE III. P0 represents the percentage of MBs with perfect match, and k represents the ratio of EMFM and ESRSB. According to the total energy consumption in TABLE III, each k corresponds to a P0 which stands for the boundary point between being able to reduce energy consumption and not. For example, if k equals to 8, P0 must be more than 23% to be able to reduce energy consumption as illustrated in Figure 3. For the QCIF case with search range of [-16:+15], if on-chip SRAM [4] is used for both MFM and SRSB, since the ratio of memory sizes between MFM and SRSB is about 7.6 and that power consumption is usually proportional to memory size for on-chip SRAM, considering that k equals to 8 is reasonable. If external memory is adopted, such as Mobile SRAM [5], k might be even larger. Therefore for k = 8, it is possible to reduce 33.3% ~ 39.1% of average power consumption compared to that of ping-pong frame memory for QCIF test sequences container, akiyo, news, hall, and mom_daughter. However, for test sequences with low P0, such as foreman, stefan, coastguard, and mobile, the average power consumption is reduced by 4.1% to increased by 6.7% compared to that of ping-pong frame memory. This evaluation disregards the impact of memory banking because the memory organization is beyond the scope of interest in this work. As for memory access latency of the combined frame memory, extra memory access latency is incurred for MBs without perfect match whereas the memory access latency for MBs with perfect match is eliminated. In the combined frame memory, for each MB without perfect match, the
1808
predicted MB is first read from MFM or SRSB, and then the content of the current MB which resides in MFM is read and written into SRSB for reference MB backup; the reconstructed current frame is written back to MFM at the end. Therefore memory access latency of 4 MBs is needed for each MB without perfect match. The memory access latencies for both the combined frame memory and pingpong frame memory are listed in TABLE III. The memory access latencies listed are derived based on the assumption that the access latencies of read and write to either MFM or SRSB are all the same, hence the memory access latency for one MB is denoted as CMB. A typical scenario for such assumption is applications which adopt internal SRAM for both MFM and SRSB. Since using internal SRAM for both MFM and SRSB might be expensive when the sizes are large, the formulas are derived to target for QCIF resolution only. According to TABLE III, the memory access latency in the combined frame memory can be less than that of pingpong frame memory if P0 is more than 50%. As a result, for test sequences with higher P0 (>70%), the memory access latencies in a QCIF frame can be reduced by 62.8% ~ 83.5% compared to that of ping-pong frame memory. However, for other test sequences with lower P0, such as foreman, stefan, coastguard, and mobile, the average memory access latencies are increased by 41.0% ~ 79.3%. Even though the average memory access latency for test sequences with lower P0 has increased, there is possibility to hide the extra latency using the computation time in motion compensation. TABLE IV lists the average memory access latencies and energy consumption of different test sequences. The reduced latencies percentage and energy percentage are compared to
the latency and energy consumption of ping-pong frame memory. IV.
SUMMARY
The statistics and analysis show that the combined frame memory architecture saves power consumption and reduces average memory access latency for applications such as surveillance, video phone, and video conference. These applications have the common characteristic of high percentage of MBs with perfect match. For other test sequences having lower percentage of MBs with perfect match, the power consumptions did not decrease when k equals to 8. However the ratio of EMFM and ESRSB will be larger when using external DRAM for MFM and internal SRAM for SRSB. Thus the P0 which enables the reduction of power consumption will also be lowered. Therefore the proposed combined frame memory not only reduces memory size, but is also potential in saving power consumptions. V. [1]
[2] [3] [4] [5]
REFERENCE:
V. G. Moshnyaga, K. Masunaga, and N. Kajiwara, “A data reusing architecture for MPEG video coding,” Proc. Int’l Conf. Circuits and Systems (ISCAS’04), vol. 3, pp.797-pp.800, May 2004. ISO/IEC 14496-2, "Information technology - Coding of audio-visual objects," 2nd edition, Switzerland, Dec. 2001. ISO/IEC JTC1/SC29/WG11 N3908, MPEG-4 Video Verification Model version 18.0, Jan. 2001. Artisan Components, Inc. “UMC 0.18um Process High-Speed single Port SRAM Generator User Manual,” Release 4.0, August 2000. NEC Inc., “16M-bit CMOS Mobile Specified RAM Datasheet,” [online] http://www.necel.com/memory/pdfs/M15085EJ5V0DS00.pdf
TABLE III. Energy consumption of processing one QCIF frame with search range of [-16:+15] Memory
Average Energy Consumption
Average Memory Access Latency
MFM
148.5 x (2-P0) x EFM
148.5 x (1-P0) x 3 x CMB
SRSB
148.5 x (2-P0) x k-1 x EFM
148.5 x (1-P0) x CMB
99 x k-1 x EFM x 0.125 (neglected)
148.5 x 0.125 x CMB (neglected)
Combined Total
148.5 x (1+ k-1) x (2-P0) x EFM
148.5 x (1-P0) x 4 x CMB
Ping-pong Total
148.5 x 2 x EFM
148.5 x 2 x CMB
DT
TABLE IV. Average memory access latencies and energy consumptions for various QCIF test sequences with k=8. K=8 Test sequences (QCIF) stefan (C) coastguard (B) foreman (B) mobile container (A) mother_daughter (A) hall (A) news (B) akiyo (A)
Average Memory Access Latencies Ping-pong (CMB) 297 297 297 297 297 297 297 297 297
Combined (CMB) 500.68 532.52 448.53 529.08 49.06 110.37 81.91 103.77 51.56
1809
Reduced latency (%) -68.6 -79.3 -51.0 -78.1 83.5 62.8 72.4 65.1 82.6
Average Energy Consumptions Ping-pong (EFM) 297 297 297 297 297 297 297 297 297
Combined (EFM) 307.88 316.83 293.21 315.87 180.86 198.10 190.10 196.25 181.56
Reduced energy (%) -3.7 -6.7 1.3 -6.4 39.1 33.3 36.0 33.9 38.9