Minimum Drift Architectures for 3-Layer Scalable ... - Semantic Scholar

29 downloads 52 Views 345KB Size Report
New Providence, NJ, 07974, USA. ... resolution decoder, new algorithms have been developed to perform the ...... leigh Dickinson University, Teaneck, NJ,.
Minimum Drift Architectures for 3-Layer Scalable DTV Decoding Anthony Vetro, Huifang Sun, Paul DaGraca, Tommy Poon

Abstract | This paper describes new techniques for implementing a low-cost video decoder that can decode an HDTV bitstream and display the signal at lower resolutions. Two key algorithms are discusssed: downconversion and low-resolution motion compensation. In the proposed scheme, incoming blocks are subject to a down-conversion process within the decoding loop, hence motion compensation is performed from the downconverted images. The lters used for down-conversion are based on the concept of frequency synthesis, and the lters used to perform the low-resolution motion compensation are determined by an optimal least squares solution. Using these algorithms, three di erent architectures that provide equal quality with varying system level complexity are presented; the rst is directly derived from the initial model of low-resolution decoder, another attempts to reduce the amount of compuation for the IDCT, while another addresses the concerns regarding the amount of interface with an existing decoder structure. All of the above systems are hierarchical with regard to the logic used for ltering and scalable in the amount of memory which is required to reconstruct each output layer. Our simulation results demonstrate that high quality video can be provided at lower-resolutions with signi cant memory savings. Keywords |Memory ecient decoder, down-conversion, low-resolution motion compensation, minimum drift.

D

I. Introduction

IGITAL video broadcasting has had a major impact in both academic and industrial communities. A great deal of e ort has been spent to improve the coding eciency at the transmission side and o er cost-e ective implementations in the overall end-to-end system. Along these lines, the notion of format conversion is becoming increasingly popular. On the transmission side, there are number of di erent formats which are likely candidates for digital video broadcast. These formats vary in horizontal, vertical, and temporal resolution. Similarly, on the receiving side, there are a variety of display devices which the receiver should account for. In this paper, we are interested in the speci c problem of how to receive an HDTV bitstream and display it at a lower spatial resolution. In the conventional method of obtaining a lowresolution image sequence, the HD bitstream is fully decoded, then it is simply pre- ltered and sub-sampled [1]. The block diagram of this system is shown in Fig. 1(a); The authors are with the Advanced Television Laboratory, Mitsubishi Electric Information Technology Center America, Inc., New Providence, NJ, 07974, USA. Email: [email protected].

it will be referred to as a full-resolution decoder (FRD) with spatial down-conversion. Although the quality is very good, the cost is quite high due to the large memory requirements. As a result, low-resolution decoders (LRDs) have been proposed to reduce some of the costs [2]-[7]. Although the quality of the picture will be compromised, signi cant reductions in the amount of memory can be realized; the block diagram for this system is shown in Fig. 1(b). Here, incoming blocks are subject to down-conversion lters within the decoding loop. In this way, the down-converted blocks are stored into memory rather than the full-resolution blocks. To achieve a high quality output with the lowresolution decoder, new algorithms have been developed to perform the down-conversion and the motion compensation (MC). These two processes are of major importance to the decoder as they have signi cant impact on the nal quality. Although a moderate amount of complexity within the decoding loop is added, the reductions in external memory are expected to provide signi cant cost savings. More importantly, we are able to integrate these additional components into the existing High Level decoder to provide a multitude of lower-resolution display formats at a lower cost. This integrated solution will be referred to as a 3-layer scalable decoder. As stated above, the lters used to perform the downconversion are an integral part of the low-resolution decoder. In Fig. 1(b), the down-conversion is shown to take place before the IDCT. Although the ltering is not required to take place in the DCT domain, we do assume that it takes place before the adder. In any case, it is usually more intuitive to derive a down-conversion lter in the frequency domain rather than in the spatial domain; this has been excersised in [2]-[5]. The major drawback of these approaches, however, is that high frequency data is lost or not preserved very well. To overcome this, a method of down-conversion which better preserves high frequency data within the macroblock has been reported in [8],[9]; this method is refered to as frequency synthesis. For compeleteness, this work is brie y reviewed in the next section. The main novelty of the proposed system is the ltering which is used to perform the motion compensation from low-resolution anchor frames. In the past, prediction drift has been dicult to avoid. It is partly due to the loss of high frequency data from the down-conversion

HDTV bitstream

VLD & IQ

IDCT

HD MV’s

SDTV DownConversion picture

+

Full-Resolution Motion Compensation

Frame Store

(a) HDTV bitstream

DownConversion

VLD & IQ

HD MV’s

IDCT

+

SDTV picture

Low-Resolution Motion Compensation

Frame Store

(b) Fig. 1. Decoder Structures. (a) Block diagram of full-resolution decoder with down conversion in the spatial domain. The quality of this output will serve as a drift-free reference. (b) Block Diagram of low-resolution decoder. Down-Conversion is performed within the decoding loop and is a frequency domain process. Motion compensation is performed from a low-resolution reference using motion vectors which were derived from the full-resolution encoder. Motion compensation is a spatial domain process.

and partly due to the inability to recover the lost information. Although prediction drift can not be totally avoided in a low-resolution decoder, our ltering methods can signi cantly reduced the e ects of drift in contrast to simple interpolation methods. The solution is optimal in the least-squares sense and is dependent on the method of down-conversion which is used [10]. In its direct form, the solution can not be readily applied to a practical decoding scheme. In this paper, it is shown that a realizable implementation can be achieved and that the proposed scheme is not a mere guess based on intuition, but a scheme which is the resultant of the optimization procedure for low-resolution MC. To further reduce the complexity, a heirarchical scheme is realized. Not only does this save on computation, but it also allows us to o er a tradeo between memory savings and quality, hence the notion of a scalable decoder is further justi ed.

decoder and describes how the key algorithms for downconversion and motion compensation are integrated into a scalable architecture. Finally, some simulation results are presented in section 5, and in section 6, the results and major contributions are summarized. II. Frequency Synthesis Down-Conversion

The concept of frequency synthesis was rst reported in [8] and later expanded upon in [9]. The basic premise is to better preserve the frequency characteristics of a macroblock (MB) in comparison to simpler methods which extract or cut speci ed frequency components of an 8x8 block. To accomplish this, the four blocks of a MB are subject to a global transformation - this transformation is referred to as frequency synthesis. Essentially, a single frequency domain block can be realized using the information in the entire MB. From this, lower resolution blocks can be achieved by cutting out the lowThe rest of the paper is organized as follows. In section order frequency components of the synthesized block 2, the concept of frequency synthesis down-conversion is this action represents the down-conversion process and reviewed. In section 3, the optimal low-resolution MC is generally represented in the following way: scheme is formulated and the main results are presented. (1) Section 4, provides and overview of the 3-layer scaleable A~ = XA;

16

16

FREQUENCY SYNTHESIS

8

a)

(b)

11111111111111 00000000000000 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 m 00000 11111 00000 11111 00000 11111 00000 11111 00000 n 11111 00000 11111 00000 11111 00000 11111 00000 11111 (c)

Fig. 2. Concept of frequency synthesis down-conversion. (a) 256tap lter applied to every frequency component to achieve vertical and horizontal down-conversion by a factor of 2, framebased ltering, (b) 16-tap lter applied to frequency components in the same row to achieve horizontal down-conversion by 2, picture structure is irrelevant, (c) illustrates that the amount of synthesized frequency components which are retained is arbitrary.

ultimately required. In this way, only the relevant row information is applied as the input to the horizontal ltering operation and the structure of the incoming video has no bearing on the down-conversion process. The reason is that the data in each row of a MB belongs to the same eld, hence the format of the output block will be unchanged. It is noteworthy that the set of lter coe cients are dependent on the particular output frequency index. For 1D ltering, this means that the lters used to compute the second output index, for example, are di erent than those used to compute the fth output index. Similar to the horizontal down-conversion, vertical down-conversion can also be applied as a seperate process. As reasoned earlier, eld-based ltering is necessary for interlaced video with eld-based predictions. However, since a macroblock consists of 8 lines for the even eld and 8 lines for the odd eld, and the vertical block unit is 8, frequency synthesis cannot be applied. Frequency synthesis is a global transformation and is only applicable when one wishes to observe the frequency characteristics over a larger range of data than the basic unit. Therfore, to perform the vertical downconversion, we can simply cut the low order frequency components in the vertical direction. This loss that we accept in the vertical direction is justi ed by the ability to perform accurate low-resolution MC that is free from severe blocking artifacts. In the above, we have explained how the original idea to extract an 8x8 DCT block is broken down into separable operations. However, since frequency synthesis provides an expression for every frequency component in the new 16x16 block, it makes sense to generalize the down-conversion process so that decimations which are multiples of 1/16 can be performed. In Fig. 2(d), an mxn block is extracted. Although this type of downconversion ltering may not be appropriate before the IDCT operation and may not be appropriate for a bitstream containing eld-based predictions, it may be applicable elsewhere, e.g., as a spatial domain lter somewhere else in the system and/or for progressive material. To obtain a set of spatial domian lters, an appropriate transformation can be applied. In this way, (1) is expressed as: (2) a~ = xa; where the lowercase counterparts denote spatial equivalents. The expression which transforms X to x is derived in appendix A.

where A denotes the original DCT macroblock, A~ denotes the down-converted DCT block, and X is a matrix which contains the frequency synthesis coe cients. The original idea for frequency synthesis downconversion was to directly extract an 8x8 block from the 16x16 synthesized block in the DCT domain as shown in Fig. 2(a). The advantage of doing this is that the down-converted DCT block is directly applicable to an 8x8 IDCT (for which fast algorithms exist). The major drawback with regard to computation is that each frequency component in the synthesized block is dependent on all of the frequency components in each of the 8x8 blocks, i.e., each synthesized frequency component is the result of a 256-tap lter. The major drawback with regard to quality is that interlaced video with eldbased predictions should not be subject to frame-based ltering [9]. If frame-based ltering is used, it becomes impossible to recover the appropriate eld-based data which is required to make eld-based predictions. In areas of large motion, sever blocking artifacts will result. Obviously, the original approach would incur too much III. Low-Resolution Motion Compensation computation and quality degradation, so instead, the The focus of this section is to provide an expression operations are performed separably and vertical downconversion is performed on a eld-basis. In Figs. 2(b), it for the optimal set of low-resolution MC lters given a is shown that a horizontal only down-conversion can be set of down-conversion lters. The resulting lters are performed. To perform this operation, a 16-tap lter is optimal in the least squares sense as they minimize the

a b c d

Full-Resolution Motion Compensation

Down Conversion

h

Sa Sb Sc Sd

~ h

X

(a)

Down a

Conversion

Minimize MSE by choosing N1 N2

~ a

X

b

Down Conversion

N3 N4 ~ b Low-Resolution Motion Compensation

X

c

Down Conversion

~ c

^ ~ h

N1 N2 N3 N4

X

d

Down Conversion

~ d

X

(b)

Fig. 3. Comparison of decoding methods to achieve low-resolution image sequence. (a) FRD with spatial down-conversion, (b) LRD. The objective is to minimize the MSE between the two outputs by choosing N1 , N2 , N3 , and N4 for a xed down-conversion.

mean square error between a reference block and a block obtained through low-resolution MC. The results which have been derived in [10] assume that a spatial domain lter, x, is applied to incoming macroblocks to achieve the down-conversion. The scheme shown in Fig. 3(a) illustrates the process by which reference blocks are obtained. First, fullresolution motion compensation is performed on macroblocks a, b, c, and d to yield, h. To execute this process, the lters Sa(r), Sb(r), Sc(r) , and Sd(r) are used. Basically, these lters represent the masking/averaging operations of the motion compensation in a matrix form. More on the composition of these lters can be found in appendix B. Once h is obtained, it is down-converted to h~ via the spatial lter, x: h~ = xh: (3) The above block is considered to be the drift-free reference. On the other hand, in the scheme of Fig. 3(b), the blocks a, b, c, and d are rst subject to the downconversion lter, x, to yield the down-converted blocks, a~, ~b, ~c, and d~, respectively. Using these down-converted blocks as input to the low-resolution motion compensation process, the following expression can be assumed: 2 a~ 3 6 ~ 7  (4) h^~ = N1 N2 N3 N4 64 ~bc 75 ; d~

tion, and h^~ is the low-resolution prediction. As in [10], these lters are solve for by di erentiating the following objective function, J fNl g = kh~ ? h^~k2; (5) with respect to each unknown lter and setting each result equal to zero. It can be veri ed that the optimal least squares solution for these lters is given by:

N1(r) = xSa(r) x+ ; N3(r) = xSc(r) x+ ; where

N2(r) = xSb(r) x+ N4(r) = xSd(r) x+

x+ = xT (xxT )?1

(6)

(7) is the Moore-Penrose Inverse for an mxn matrix with m  n [12]. In the solution of (6), the superscript r is added to the lters, Nl , due to their dependency on the full-resolution motion compensation lters. In using these lters to perform the low-resolution motion compensation, the mean-squared-error between h~ and h^~ is minimized. It should be emphasized that eqn. (6) represents a generalized set of MC lters which are applicable to any x which operates on a single macroblock. For the special case of the 4x4 Cut, these lters are equivalent to the ones which were determined in [7] to minimize the drift. In Fig. 4, two equivalent MC schemes are shown. However, for implementation purposes, the optimal MC scheme is realized in a cascade form rather than a diwhere, Nl , l[1; 4], are the unknown lters which are as- rect form. The reason is that the direct form lters are sumed to perform the low-resolution motion compensa- dependent on the matrices which perform full-resolution

optimal MC scheme which employs the 4x4 cut downconversion. Signi cant reductions in the amount of drift were realized by both optimal MC schemes over the method which used bilinear interpolation as the method of up-conversion. But more importantly, a 35% reduction in the amount of drift was realized by the optimal MC scheme using frequency synthesis over the optimal MC scheme using the 4x4 cut.

low-resolution prediction Frame Store

Low-resolution MC

HD Motion Vectors Large Memory For Filter Coefficient Storage N1 , N2 , N3 , N4

Optimal MC Scheme DIRECT FORM

low-resolution prediction

Small memory

Frame Store

Down-Conversion X full-resolution prediction Full-resolution MC

Upconversion + X

S a, S b , S c , Sd

HD Motion Vectors

Small memory

Optimal MC Scheme CASCADE FORM

Fig. 4. Optimal low-resolution MC scheme: direct form versus cascade form. Both forms yield equivalent quality, but vary signi cantly in the amount of internal memory.

MC. Although, these matrices were very useful in analytically expressing the full-resolution MC process, they require a huge amount of storage due to their dependency on the prediction mode, motion vector, and pelaccuracy. Instead, the three linear processes in (6) are separated, so that an up-conversion, full-resolution MC, and down-conversion can be performed. Although one may be able to guess such a scheme, we have proven here that it is an optimal scheme provided the up-conversion lter is Moore-Penrose inverse of the down-conversion lter. In [10], the optimal MC scheme which employs frequency synthesis has been compared to a non-optimal MC scheme which employs bilinear interpolation, and an

IV. 3-Layer Scalable Decoder

In this section, we show how the key algorithms for down-conversion and motion compensation are integrated into a 3-layer scalable decoder. The central concept of this decoder is that 3-layers of resolution can be decoded using a decreased amount of memory for the lower resolution layers. Also, regardless of which layer is being decoded, much of the logic can be shared. Three possible decoder con gurations are considered: Full-Memory Decoder (FMD), Half-Memory Decoder (HMD), and Quarter-Memory Decoder (QMD). three possible architectures are discussed that provide equal quality, but vary in system level complexity. The rst (ARCH1) is based on the low-resolution decoder modeled in Fig. 1(b), the second (ARCH2) is very similar, but attempts to reduce the IDCT computation, while the third (ARCH3) is concerned with the amount of interface with an existing High Level decoder. With regard to functionality, all of the architectures share similar characteristics. For one, an ecient implementation is achieved by arranging the logic in a hierarchical manner, i.e., employ seperable processing. In this way, the FMD con guration is the most simple and serves as the logic core from which other decoder con gurations are built on. In the HMD con guration, an additional horizontal down-conversion and up-conversion are performed. In the QMD con guration, all of the logic components from the HMD are utilized, such that an additional vertical down-conversion is performed after a horizontal down-conversion, and an additional vertical up-conversion is performed after a horizontal upconversion. In summary, the logic for the HMD is built on the logic for the FMD, and the logic for the QMD is built on the logic of the HMD. The total system contains a moderate increase in logic, but HD bitstreams may be decoded to a lower resolution with a smaller amount of external memory. By simply removing external memory, lower layers can be achieved at a reduced cost. The complete block diagram of ARCH1 is shown in Fig. 5(a). The diagram shown here assumes two things: i) the initial system model of a low-resolution decoder from Fig. 1 is assumed, and ii) the down-conversions in the incoming branch are performed after the IDCT to avoid any confusion regarding MB format conversions in the DCT domain. In looking at the resulting system, it is

Vertical DownConversion Horizontal DownConversion HDTV bitstream

VLD & IQ

IDCT

DisplayProcessor

+

1080I/720P

Vertical DownConversion 480P Horizontal DownConversion

480I

HD MV’s

Full-Resolution Motion Compensation External Memory for QMD

Vertical UpConversion

Additional Memory for HMD

Additional Memory for FMD

Horizontal UpConversion

(a) Horizontal/Vertical Down-Conversion +IDCT Horizontal Down-Conversion + IDCT HDTV bitstream

VLD & IQ

IDCT

DisplayProcessor

+

1080I/720P

Vertical DownConversion 480P Horizontal DownConversion

480I

HD MV’s

Full-Resolution Motion Compensation External Memory for QMD

Vertical UpConversion

Additional Memory for HMD

Additional Memory for FMD

Horizontal UpConversion

(b) HDTV bitstream

VLD & IQ

IDCT

DisplayProcessor

+

Horizontal DownConversion

1080I/720P

480P Vertical DownConversion 480I

HD MV’s

Full-Resolution Motion Compensation External Memory for QMD

Vertical UpConversion

Additional Memory for HMD

Additional Memory for FMD

Horizontal UpConversion

(c) Fig. 5. Block diagram of various 3-layer scalable decoder architectures; all architectures provide equal quality with varying system complexity. (a) ARCH1, derived directly from block diagram of assumed low-resolution decoder, (b) ARCH2, reduce computation of IDCT by combining down-conversion and IDCT lters, (c) ARCH3, minimize interface with existing HL decoder by moving linear ltering for down-conversion outside of the adder.

V. Experimental Results

In this section, experimental results are provided to evaluate the performance of each decoder con guration: FMD, HMD, and QMD. Two 1920x1080i HDTV sequences, Whale and March, are encoded at 19Mbps. We note that there are no restrictions on the encoded bitstreams, i.e., all prediction modes and picture types are

Whale Sequence 40 FMD HMD QMD

PSNR (dB)

35

30

25

20 0

10

20

30

40 50 Frames

60

70

80

90

(a)

March Sequence 40 FMD HMD QMD

35 PSNR (dB)

evident that full computation of the IDCT is required, and that two independent down-conversion operations must be performed. The latter is necessary so that lowresolution predictions are added to low-resolution residuals. Overall, the increase in logic for the added feature of memory savings is quite small. However, it is evident that ARCH1 is not the most cost-e ective implementation, but it represents the foundation of our assumptions, and allows us to better analyze the impact of the two modi ed architectures to follow. In Fig. 5(b), the block diagram of ARCH2 is shown. In this system, the combined compuation for the IDCT and down-conversion is reduced by realizing that the IDCT operation is simply a linear lter by de nition. In the FMD, we know that a fast IDCT is applied separately to the rows and columns of an 8x8 block. For the HMD, our goal is to combine the horizontal down-conversion with the horizontal IDCT. In 1D, an the horizontal downconversion can be represented by an 8x16 matrix, and the horizontal IDCT can be represented by an 8x8 matrix. Combining these processes such that the downconversion operates on the incoming DCT rows rst, results in a combined 8x16 matrix. To complete the transformation, the remaining columns can then be applied to the fast IDCT. In the above description, compuational savings is achieved in two places: rst, the horizontal IDCT is fully absorbed into the down-conversion computation which must take place anyway, and second, the fast IDCT is utilized for a smaller amount of columns. In the case of the QMD, these same principles can be used to combine the vertical down-conversion with the vertical IDCT. In this case, one must be aware of the MB type ( eld-DCT or frame-DCT) so that an appropriate lter can be applied. In contrast to the previous two architectures, ARCH3 assumes that the entire front-end processing of the decoder is used; it is shown in Fig. 5(c). In this way, the adder is always a full-resolution adder, whereas in ARCH1 and ARCH2, the adder needed to handle all 3-layers of resolution. The major bene t of ARCH3 is that it does not require much interface with the existing decoder structure. The memory is really the only place where a new interface needs to be de ned. Essentially, a down-conversion ltering may be applied before storing the data, and an up-conversion ltering may be applied as the data is needed for full-resolution MC.

30

25

20 0

20

40

60 Frames

80

100

120

(b)

Fig. 6. PSNR comparison for 960x540i output via FMD, HMD, and QMD. (a) Whale sequence, (b) March sequence.

supported. The objective of this experiment is to provide a 960x540i image sequence. In this way, the FMD must perform a horizontal and vertical decimation by two after decoding, the HMD must perform a vertical decimation by two, and the QMD does not need any additional conversion. In Fig. 6, the respective PSNRs for the Whale and March sequence are plotted. As expected, the FMD provides the highest quality output for both test sequences. This output serves as our reference as it is the best that can be done (in terms of quality) given the encoded bitstream. With the HMD, we note that a memory savings of 50% causes a small drop in PSNR. This drop is expected, however visually, there is no signi cant difference between the two sequences. Image sharpness is maintained at every frame and prediction drift is not observed. For the QMD, a moderate drop in PSNR is observed from the plots, however the picture quality is surely acceptable. For one, there is no prediction drift; our optimal MC scheme ensures that it is minimal. Second, the motion compensated predictions provide reasonable quality in in areas that one may expect a high loss, i.e., in eld-based predictions. This tells us that our down-conversion and up-conversion lters work well to reduce and restore the information that is being sent in the encoded bitstream.

DCT and spatial lters, respectively. By de nition, the MxN DCT transform is de ned by: In this paper, we have presented a number of integrated solutions for a scalable decoder. Each decoder is ?1 X ?1 X cabable of decoding directly to a lower resolution using a a(i; j ) (i) (j ); (10) A(k; l) = reduced amount of memory in comparison to the mem=0 =0 ory required by the High Level decoder. To achieve this savings in memory, our earlier work on down-conversion and its inverse, the MxN IDCT by, ltering is relied upon. This method of frequency syn?1 X ?1 X thesis has proven successful in better preserving the high A(k; l) (i) (j ) (11) a ( i; j ) = frequency data within a macroblock. A second key tech=0 =0 nology that we have relied upon is the ltering which is used to perform optimal low-resolution MC. We have where the basis function is given by, shown that a realizable implementation of this algorithm r   can be achiveved, such that the lters for optimal low2 2 i + 1 (12) (i) = M (k) cos 2M k resolution MC are equivalent to an up-conversion, fullresolution MC, and down-conversion. In this scheme, the up-conversion lters are determined by a Moorse- By substituting (8) into the expression for the IDCT Penrose inverse of the down-conversion. The amount of yields, logic required by these processes is kept minimal since ?1 X ?1 they are realized in a hierarchical structure. X a~(i; j ) = (i) (j ) Since the down-conversion and up-conversion processes are linear, the architecture design is exible in " =0 =0 # ?1 X ?1 X that equal quality can be acheieved with varying levX (p; q)A(p; q) els of system complexity. The rst architecture that =0 =0 (13) we examined came from the initial assumptions that ?1 X ?1 X were made on the low-resolution decoder, i.e., a down= A(p; q) conversion is performed before the adder. It was noted " =0 =0 # ?1 X ?1 that a full IDCT compuation was required and that a X down-conversion must be performed in two places. As a X (p; q) (i) (j ) : result, a second architecture was presented to reduce the =0 =0 IDCT compuation, and a third was presented to minimize the amount of interface with the existing High Level Substituting the DCT de nition into the above gives the decoder. The major point here is that the advantages of following, # ARCH2 and ARCH3 cannot be realized by a single archi?1 ?1 " ?1 ?1 tecture. The reason is that to IDCT computation is re- a~(i; j ) = X X X X a(s; t) (s) (t) (14) duced by performing a down-conversion in the incoming =0 =0 =0 =0 brach, therefore a down-conversion must be performed ? 1 ? 1 X X  after the full-resolution MC as well. In any case, equal X (p; q)  (i) (j ) : quality is o ered by each architecture and the quality is =0 =0 of commercial grade. Finally, eqn. (9) can be formed with, VI. Concluding Remarks

M

N

i

j

M

N

i

M k

N l

M k

N l

j

M k

M

N

M k

k M

N l

l

N

k;l

p

q

M

N

p

q

M

N

k

M

N l

M k

k;l

l

M

N

N

M p

p

s

q

M

t

N

M k

k;l

k

Appendix A: DCT-to-Spatial Transformation

Our objective in this section is to express the following DCT domain relationship:

A~(k; l) =

?1 NX ?1 X

M

k;l

p=0

as

a~(i; j ) =

[X (p; q)A(p; q)]

q =0

?1 NX ?1 X

M

t=0

N l

l

?1 NX ?1  X

M

i;j

k=0 l=0 M 1N 1

? X ? ? X

M k

(i)

(j )

X (p; q)  k;l

p=0

N l

M p

(s)

N q

# 

(t) ;

q =0

and the transformation is fully de ned. 2 [x (s; t)a(s; t)] i;j

s=0

(8)

x (s; t) =

N q

(9)

Appendix B: Full-Resolution Motion Compensation in Matrix Form

(15)

In 2D, a motion compensated macroblock may have where A~ and a~ are the DCT and spatial output, A and a are the DCT and spatial input, and X and x are the contributions from at most 4 macroblocks per motion

y2 y1

a

To get h2

To get h3

To get h4

: : : :

0a1 1 1 0 0 1

11 00 00 11

+

0 a2 1

+

b1 00 11

1 0 0 1

11 00 00 11

+

a3 1 0 0 1 0 1

+

a 4 00 11

b3 11 00 00 11 00 11

+

Outcome of filtering relevant blocks by M1

a

b

1 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1

1111 0000 0000 1111 0000 1111 b 0000 1111 0000 1111 0000 1111 c 0000 1111 0000 1111 0000 1111 d 0000 1111 0000 1111 0000 1111

a3

a4

+ a4

+

Outcome of filtering relevant blocks by M 2

+

1 0 0 1 0 1 0 1 0 1

+

1 0 0 1 0 1 0 1 0 1

+ c2

Outcome of filtering relevant blocks by M 3

6 6 4

0

6

0 0

6 6 4

0 0 0

3

2

S =

Outcome of filtering relevant blocks by M 4

Fig. 7. Relationship between the input and output blocks of the motion compensation process in the FRD.

0

0 0

S (1) = 64 M3 M4 M3 2 0

1

+

0

0

7 S = M02 00 M04 00 75 ; 0 M2 0 3 2 0 (1)

c1

11 00 00 11

+

a4 1 0 0 1 0 1

M1 M2 M3 M4 7 6 0 (1) 3 7 S = 64 0 M01 M01 M M2 5 ; 0 0 M 2 0 31 d

a 2 00 11

3

2

1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 0000 1111 1111 0000 1111

h011 1 0 00 011 1 001 11 000 1 00 11 011 1 001 011 00 0 1 0 1 011 1 0 1 00 11 000 1 0 1 001 11 00 11 0 1 000 00 011 1 011 1 0 1 0 1 0 1 0 1 0 1 0 1

c

To get h1

the motion compensation lters are given by:

b

c

(1)

d

M4

0 0 0 0

0 0 0 0

0 0 0 0

(19)

0 0 77 ; 05 03

0 0 77 : 05 0

In the above equations, the M1, M2 , M3 , and M4 matrices operate on the relevant 8x8 blocks of a, b, c, and d. Their elements will vary according to the amount of overlap as indicated by (y1,y2) and the type of prediction. The type of prediction may be frame-based or eld-based and is predicted with half-pel accuracy. As a result, the matrices S ( ), S ( ), S ( ), and S ( ) are extremely sparse and may only contain non-zero values of 1, 1/2, and 1/4. For di erent values of (y1,y2) the con guration of the above matrices will change: y1  [0; 7] and y2  [8; 15] implies r = 2; y1  [8; 15] and y2  [0; 7] implies r = 3; y1,y2  [8; 15] implies r = 4. The resulting matrices can easily be formed using the concepts illustrated in Fig. 7. r a

r

b

r

c

r

d

vector. As noted in Fig. 7, macroblocks a, b, c, and d include four 8x8 blocks each. These sub-blocks are rasterscanned so that each macroblock can be represented as a vector. According to the motion vector, (dx,dy), a local reference, (y1,y2), is computed to indicate where the origin of the motion compensated block is located; the [1] MPEG Test ModelReferences 5, ISO/IEC JTC/SC29/WG11 Doculocal reference is determined by: ment. April, 1993. [2] S. Ng, Thompson Consumer Electronics,\Low resolution y1 = dy ? 16  [Integer(dy=16) ? (dy)] (16) HDTV receivers," US Patent 5,262,854, Nov. 16, 1993. [3] H. Sun,\Hierarchical decoder for MPEG compressed video y2 = dx ? 16  [Integer(dx=16) ? (dx)] data" IEEE Trans. on Consumer Electronics, vol. 39, no. 3, where,



1; if d < 0 and d mod 16 = 0; (17) 0; otherwise: The reference point for this value is the origin of the upper-left-most input macroblock. With this, the motion compensated prediction may be expressed as,

(d) =

2

h

h = S( ) S( ) S( ) S( ) r a

r

b

r

c

r

d

i6 6 4

3

a b 77 ; c5 d

(18)

for r[1; 4]. As an example, Fig. 7 considers (y1,y2)  [0; 7], which implies that r = 1. In this case

pgs. 559-562, Aug. 1993. [4] J. Boyce, J. Henderson, and L. Pearlestien, \An SDTV decoder with HDTV capability: an all-format ATV decoder," SMPTE Fall Conference, New Orleans, 1995. [5] K.K. Pang, H.G. Lim, S. Dunstan, J.M. Badcock,\Frequency domain decimation and interpolation techniques," Picture Coding Symposium, Melbourne, Australia, Mar. 1996. [6] N. Merhav and V. Bhaskaran,\Fast algorithms for DCTdomain image down-sampling and for inverse motion compensation" IEEE Trans. Circuits and Systems for Video Technology, vol. 7, no. 3, pp. 468-476, June 1997. [7] R. Mokry and D. Anastassiou, \Minimal error drift in frequency scalability for motion- compensated DCT coding," IEEE Trans. Circuits and Systems for Video Technology, vol. 4, no. 4, Aug. 1994, pp. 392-406. [8] J. Bao, H. Sun, and T. Poon, \HDTV down-conversion decoder," IEEE Trans. on Consumer Electronics, vol. 42, no. 3, pgs. 402-410, Aug. 1996. [9] A. Vetro and H. Sun, \Frequency domain down-conversion using an optimal motion compensation scheme," Journal of Imaging Science and Technology, vol. 9, no. 4, Aug. 1998.

[10] A. Vetro and H. Sun, \On the motion compensation within a down-conversion decoder," Journal of Electronic Imaging, vol. 7 no. 3, July 1998. [11] K.R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications, Academic Press, Boston, 1990 [12] P. Lancaster and M. Tismenetsky, The Theory of Matrices with Application, Academic Press, Boston, 1985.

Anthony Vetro was born in Staten Island,

NY in 1973. He simultaneously received the BS and MS degrees in Electrical Engineering from Polytechnic University, Brooklyn, NY, in June 1996. From 1994 to 1996 he was a Teaching Assistant for the Department of Electrical Engineering at Polytechnic University. Since then he has been a Member of the Technical Sta at the Advanced Television Laboratory of Mitsubishi Electric Information Technology Center America, Inc., New Providence, NJ. His current research interests include digital video coding with emphasis on motion estimation and rate control, compressed-domain processing, motion-based segmentation, and multimedia signal processing.

Huifang Sun received the BS degree in

Electrical Engineering from Harbin Engineering Institute, Harbin, China, in 1967, and the PhD degree in electrical engineering from the University of Ottawa, Ottowa, Canada, in 1986. From 1982 to 1986, he was with the Electrical Engineering Department at the University of Ottowa. In 1986, he joined Fairleigh Dickinson University, Teaneck, NJ, as an Assistant Professor and was consequently promoted to Associate Professor in Electrical Engineering. From 1990 to 1995 he was with the Sarno Corporation (formerly David Sarno Research Center), Princeton, NJ, as a Member of the Technical Sta and was consequently promoted to Technology Leader of Digital Video Technology where his activities were MPEG video coding and Grand Alliance HDTV development. He joined the Advanced Television Laboratory, Mitsubishi Electric Information Technology Center America, Inc. in 1995 as a Senior Principal Technical Sta where his activity is advanced television development. In 1997, he was promoted to Deputy Director. He holds six U.S. patents and has several pending, and has authored or co-authored more than 60 journal and conference papers.

Paul Da Graca received the BS degree in

Electrical Engineering from New Jersey Institute of Technology, Newark, NJ in 1992 and the M.E. degree in Electrical Engineering specializing in Telecommunications from Stevens Instutute of Technology, Hoboken, NJ in 1997. From 1992 to 1996, he was with GEC Marconi Electronic Systems Corporation, Wayne, NJ as a Intermediate Electrical Engineer. In 1996, he joined the Advanced Television Laboratory of Mitsubishi Electric Information Technology Center America, Inc., as a Principal Technical Sta member where he was a key member in the development of a MP@HL

HDTV Video Decoder IC for ATV receivers. His current research interests in the area of telecommunications include QAM and VSB demolulation techniques, digital signal processing, FPGA, ASIC, hardware and system design.

Tommy Poon received the PhD degree

from Columbia University in 1980, and the MS and BS degrees from Rutgers University in 1977 and 1976, respectively. He joined the RCA Laboratory prior to his various assignments at AT&T Bell Laboratories from 1982 to 1995. He was involved in the communications and signal processing elds, especially interested in VLSI design and methodology. In February of 1995, Dr. Poon joined Mitsubishi Electric Information Technology Center America, Inc. (ITA) to become the director of its Advanced Television Laboratory. He founded the Digital Broadcasting Business America (DBBA) Group in July 1997. Since then he serves double duties, as senior vice president of ITA and DBBA.

Suggest Documents