А Primer To Video Transcoding : Image Transcoding - Semantic Scholar

6 downloads 0 Views 684KB Size Report
Dipan Mehta and U.B.Desai. SPANN Laboratory, Dept. of ..... [1] Anil К. Jain "Fundamentals of Digital Image Process- ing" Prentice Hall 1989. [2] Merhav N. and ...
1

A Primer To Video Transcoding : Image Transcoding Dipan Mehta and U.B.Desai SPANN Laboratory, Dept. of Electrical Engineering Indian Institute of Technology, Bombay - India 400076 email : [email protected], [email protected]

video In thefrom video transcoding problem,which we convert the one bit-rate to another can help usbydesign better streaming servers. This is achieved changing the resolution as a trade-o for the bit-rate. In this work, we are solverepresented a part of thein problem. The individual frames terms of 8  8 DCT blocks. Wethedevelop an algorithm for re-sampling the frame in DCT domain, such that the reduced resolution video also has 8  8 DCT blocks. Our algorithm is for color images. Abstract

II. Operations in the DCT Domain As applications emerge for compressed domain operation on images and video, techniques for editing and manipulating in DCT domain become crucial. Our work is based on some earlier algorithms [2],[4]-[6] on spatial down-sampling by working in the DCT domain. The main dierence is that these algorithms are constrained to reduce the size of the image by half.

Our algorithm is more gen-

eral in that it can do both up-sampling and down-

I. Introduction Adapting to the vast heterogeneity and nonguaranteed nature of the present IP-based networks is going to be the next most important problem posed for the media servers that serve many users having dierent connection characteristic for e.g., bandwidth and delay. A solution to this problem is to use a transcoder in the server, which keeps the best quality video bit stream and transcodes

sampling and the ratio of re-sampling can be other than 1=2. A. Masking and Translation of a Block

Let x(m; n)

$

be DCT pairs of size N

$ X 0 (k; l) 0 Let x be obtained by

N.

is achieved through a matrix transformation[6]

0 = h1  x  h2

(1)

x

We are at present concerned with MPEG -1 and MPEG transcoder. Amongst many possibilities, we



0

and x (m; n)

masking and translation of a block within x. This

it to the desired rate at the time of connection. 2 bit-streams, and hence focus on the MPEG-to-

X (k; l )

 where h1 =

0

0

Iw

0





; h2

=

0

Iw

0

0

 are the

consider the change of resolution as a criterion of

masking operators and Iw is the identity matrix of

quality which can be traded o for the bit-rate.

size w

Our objective is to achieve this in the compressed

tion in spatial domain. However, we have X at our

domain.

disposal and we would like compute X directly.

In MPEG 1 and 2 video, as also in JPEG and



w.

The g-1 shows the equivalent opera-

0

Since, the DCT is a unitary orthogonal trans-

H.263, a typical frame is represented in terms of

form,

DCT of 8

tion[1]. Thus,

 8 pixel blocks. This is known as block-

it is distributive over matrix multiplica-

DCT domain. In the sub-sampling problem, given the image in terms of its 8

 8 block-DCT coe-

X

0 = DC T (H1)  X  DC T (H2 )

(2)

cients, we wish to obtain another down sized or up sized version of this frame also in terms of 8

8

block-DCT. This task is usually dicult than resampling the DCT of the whole image because, we

Premultiplication with H1

now need to work with block-DCT.

Prostmultiplication with H 2

The straight forward approach of uncompressing, carrying out the down-sampling in spatial domain and then decompressing involves a huge overheads of DCT and inverse DCT computation.

we would like to obtain the sub-sampling directly in the compressed domain itself. We discusses all the aspects of spatial resampling of the signal in the block-DCT domain.

Fig. 1.

Masking and Rotation Operation

Instead, B. The Tiling property

In this section we derive an important property of the DCT. Let x(n) and X (k ) be an N point DCT

2

0

pairs in 1-D. x and X

0

0

forms an M N point DCT

frequency coecient scaled appropriately. i.e.

pair such that x is the tiled version of x, i.e.

~ (k; l ) X



0 (n + m(N

x(n)

1)) =

x

=

^ (k; l ) X

for odd m

x (N

1

0



n)

for even m

m < M;



0

=

I

X (k; l )

(6)

I

III. The spatial resampling algorithm for block-DCT

n < N

We now develop the algorithm for spatial resamHere M is an integer which indicates number of

0

folds of the tiling. Here we need to construct X (k ) from X (k ) directly. It can be shown that,

0 X (l ) =



M

0



X (k )

if l = M



pling in the block-DCT domain. Here we are given an image in terms of the DCT of 8 also in terms of DCT of 8

k

(3)

otherwise

and apply appropriate scaling, it corresponds to the tiling of the signal in the spatial domain. Similar results can be shown for the 2D case.

Related work can be found in Bhaskaran's [2] paper, which was the earliest approach for downsampling in the compressed domain. This approach primarily exploits the fact that the DCT, being matrix multiplication [1]. Here we take four adjacent 8

Let,

x ~(m; n)

=



0

0

otherwise

n < N;



0

=1;2 hij

m < M

i



L,

2



6 6 6 h11 = 6 6 4

is increased by I

times compared to x in both the directions. Thus

=

I



M

and L = I



N.

I

can be any integer

number. One can show that,

~ (I X



k; I



l)

=

I



aij

X (k; l )

(5)

0:5 0 0 0 0 0 0 0

the spatial version of new sequence is up-sampled

DC T (a)

or; A

in one domain implies increasing the resolution in

= =

the other. This property is very important for spatial zooming in DCT domain without any spatial domain operation. We can extend the above result to achieve the spatial sub-sampling of pixels in DCT domain. In the above case let x ~ be the down-sampled version

=

I



K

and N = I



L.

If loss of

information is be allowed in the higher frequency

^ (k; l ) as, region we approximate X (k; l ) with X

 ^ (k; l ) X

=

X (k; l )

0

for 0



k < K;

0

gij

=

where, hij and gij are the down-



0:5 0 0 0 0 0 0 0

0 0:5 0 0 0 0 0 0

0 0:5 0 0 0 0 0 0

0 0 0:5 0 0 0 0 0

where, aij ; hij

A,

X i;j

=1;2

0 0 0 0:5 0 0 0 0

3

0 0 0 0:5 0 0 0 0

7 7 7 7 7 5

i;j

=1;2

DC T (hij )DC T (aij )DC T (gij )

X

Hij

Hij

and

Aij

 8 DCT block.



(7)

Gij

are

Gij

and gij respectively.

merge 4 neighboring 8

8



the

DCT

of

Using this, one can

 8 DCT

blocks into one

Another approach was developed by Dugad and Ahuja [4][5]. The scheme can be viewed as follows: Take 4-point inverse DCT of the 4 low pass coecients in all the adjacent 8

l < L

0 0 0:5 0 0 0 0 0

the orthogonality property,

version of the old sequence. Hence, padding zeros

M

Then the down-

block a can be written as a

and similarly for other hij 's and gij 's. Then from

A similar analysis shows that converse is true; i.e., if the DCT coecients are zero padded, then

of x; i.e.

 8 blocks in the

aij

sampling lters usually taken as (4)

and x ~(m; n) respectively. The sequence x is of M size. The size of x ~, i.e. K

8  

sampled 8 P

~ (k; l ) are DCTs of x(m; n) Here, X (k; l ) and X

K

f g22 11 .

spatial domain denoted x(m; n)

We require

a linear orthogonal transform, is distributive over

C. Zero padding of DCT for altering resolution

N

 8 blocks.

that all the operation should be in DCT domain to avoid computational overhead.

Thus we add zeros after each DCT coecient



 8 blocks. We

want to construct an image at a dierent resolution

8

DCT blocks.

Concatenate these 4 blocks and take the 8

8

point DCT. Then the resultant block is A. It ap-

otherwise

pears that this algorithm involves both the DCT Then, using the zero-padding property we can see

and IDCT, it involves the computational overhead.

low

Nevertheless, they proposed the way such that both

~ (k ) is nothing but K that the desired X



L

3 2 neighboring 2 blocks of size

these transformations can be combined in to one

In the above three steps, we take m

single transformation and the computational com-

blocks of size 8

plexity is signicantly reduced.

8

 8 and

create n

 8, to achieve the re-sampling ratio of

n=m.

The

following sections describe how each of these three steps is carried out.

A. Our Approach

Both the methods described above suer fundamentally from the fact that reduction can be only k

by a factor of 2

. In our approach it is possible

to reduce the image by an arbitrary factor. To our knowledge, no such scheme exists to achieve this without going to spatial domain. To achieve this, the proposed scheme is as follows. We have an image frame containing 8

 8 Block

B. Merging of Blocks

We take

f g11

f g11

mm

Aij

blocks, DCTs of spatial blocks

 8 size of the block and generate a single block called 1 of the size 8  8 . aij

mm

of size of 8

B

m

the block B1 is that their inverse will generate the

2 neigh-

same spatial pixels. Thus we are merging m

DCT. The new image also needs to be in terms

boring blocks into one single block.

of 8

another m

8mx

 8 block DCT.  8 then the

If the input image size be

my

input image should be re-

ducible to any size 8nx

8

ny .

Thus the sampling

m

The relation between the array of Aij blocks and

f~ g11 which are the zero padded versions of f g11 . We 2 blocks of size 8m

8

We consider

m

named aij

aij mm

mm

observe that,

ratio is nx =mx and ny =my in x and y directions respectively. In the paper, we represent this as a

=

m m X X

a ~ij

(8)

where b1 is the IDCT of B1 .

Since DCT is a

b1

common ratio n=m.

=1 j =1

i

288x288 Image

36 8x8 Blocks

128x128 Image

16 8x8 Blocks

linear transform,[1]

=

B1

m m X X

~ A ij

(9)

=1 j =1

i Subsampling

IDCT

DCT

~ 's we can easily comHence, if we have the A ij pute B1 . We have at our disposal, Aij 's from which

(a) Sub-sampling in the spatial domain

~ 's. Here we shall we should be able to compute A ij make use of the 'tiling' property and masking op-

36 8x8 Blocks

4 24x24 Blocks

4 16x16 Blocks

16 8x8 Blocks

eration of DCT as discussed above. Since we have Aij 's,

we can easily compute the DCT of the 'tiled

0

version' of aij 's. We denote the tiled aij as a

0 it's DCT as A

ij

.

0 The a

ij

and

can be converted to a ~ by

the matrix transformation, Merging of Blocks

Resizing of Blocks

Splitting of Blocks

a ~ij

=

hij

(b) Sub-sampling in the block DCT domain

Fig. 2.

ij



(10)

hij

0

 8 DCT blocks are converted to spatial do-

main by inverse DCT. After that, sub-sampling is performed and nally the frame is converted to 8

0

a

Thus we can apply the masking operation in ~ from A as, DCT domain to achieve A

The sub-sampling algorithm.

Fig:[2a)] shows the conventional approach where the 8



8

blocks of DCT. Fig[2(b)]shows our approach. The

~ A ij

=

Hij



A

0

ij



Hij

(11)

where Hij = DC T (hij ) and hij 's are the masking operators as shown above.

~ is 8m the size of A ij

8

m.

We observe that

Thus the cost of matrix

proposed scheme can be broken into three indepen-

multiplication increases as the value of m increases.

dent parts.

However, we can partition the matrix Hij in such a

  

Merging of blocks.

way that the nal values to be obtained with much

Resizing of the block

lesser computation. Also, one can exploit the fact

Splitting into blocks

that DCT matrices are sparse by nature.

4 sub-sampling the DCT spectrum and applying an

C. Resizing of Common Block

In the second step we resize each of the above blocks by taking the low pass frequency components and applying appropriate scaling. convert each 8m

8

m

Thus we

sized B1 block into an 8n

8

~ , appropriate scale factor. Thus, sub-sampling D ij will result in cropping the block in spatial domain. Hence, we get Dij as,

n

sized block C1 .

Dij (k; l )

We note here, that if m > n, it will result in sub-sampling of the image.

If n > m the image

will be up-sampled. In the rst case one needs to discard the high frequency components whereas in the later case, the blocks should be padded with extra zeros in the high frequency region to change the size. In either case, IDCT of C1 is re-sampled version of IDCT of B1 . The way to achieve this follows directly from the results mentioned in Sec:II-C. In the case of downsampling, as only low frequency components are needed, the merging operation is also required to be done partially, further reducing computational requirement.

=

~ (8k; D

8l )

n

for 0



k; l

8

(15)

The cost of matrix multiplication grows as n increases. However the cost of matrix multiplication can be reduced in the same way as in the case of merging algorithm. We observe that computation cost increases with m + n.

Thus an odd ratio of

 0 91) would require much more computa-

21=23 (

:

tion than 9=10 (= 0:9).

E. Computational Complexity

The purpose of conducting the operations solely in block-DCT domain is to reduce the complexity of computation. Bhaskaran's paper [2] shows how to partition the above-stated matrix which will reduce the computational cost.

D. Splitting the Block

In the nal step we need to arrange the image

 8 DCT blocks. Thus we need to 2 blocks split each block 1 of size 8  8 in to of size 8  8 denoted as f g11 . The operation of

in terms of 8

C

n

n

n

nn

Dij

splitting is reverse of the rst step i.e. merging of the blocks. We follow the same conventions as of Sec:III-B.

In the spatial domain, block c1 is nothing but the concatenation of the blocks

f ng11 .o We consider dij

nn

 8 named as ~ 11 the zero padded to f g11 such that, blocks of size 8n

n

dij

=

converting to spatial domain, sub-sampling,

and then taking DCT again.

n n X X

~ d ij

main is ignored since, the sub-sampling ratio is 1=2. However, if the sub-sampling ratio is awkward, the cost of sub-sampling in spatial domain will increase in addition to the computation required to perform DCT and IDCT.

We now show the results of the re-sampling algo(12)

rithm. We have here an image, shown in Fig:[3(a)], which is of standard size for the frame of the PAL

 480. We down-sample this  96 which is the standard size for

format, namely 640 image to size 128

From the linearity property, n n X X

We note here that,

typically, the cost of sub-sampling in the spatial do-

IV. Results

=1 j =1

=

i.e.

which are

i

C1

50%, as compared to the conventional approach

nn

dij

c1

nn

The comparison shows

that the computational cost is reduced to about

SQCIF. And then we up-sampled the image back

~ D ij

(13)

=1 j =1

i

~ Here, we would have to compute D ij from C1 .

to its original size. The ratio of down-sampling is

1=5.

For the reference we have done this in two

ways, rstly by the conventional methods: by applying the sub-sampling algorithm in spatial do-

~ We get that D ij can be achieved by masking ap-

main (which is B-Spline here) and up-sampled with

propriate part of C1 and then rotating it. We nd

the same algorithm.

the expression for this as follows

in Fig:[3(b)]. In the second case, we have done the

~ D ij

=

Hij



C1



The result of this process is

same using our algorithm i.e. Hij

(14)

we have the image

in block DCT domain, we rst sub-sample the image by that ratio and then up-sampled it back to

here ,Hij 's are DCTs of the masking operators.

~ 's are DCTs of blocks We note here that, D ij ~ d ij

the same size.

The nal result is shown in spa-

tial domain in Fig:[3(c)]. We see that, after going

which are zero padded versions of dij 's. Trun-

through the sub-sampling and up-sampling process

cation in the spatial domain, can be achieved by

in the spatial domain, the image looses sharpness.

5 The image generated through our approach is perceptually much better. The downside is the visibility of blockiness at some places. For an objective comparison we found the SNR of both the resulting images with reference to original image. It is 14.6 dB in case of B-spline, and 19 dB in case of our algorithm.

V. Conclusion We have outlined and veried a new algorithm which, given an image in terms of 8

 8 block DCT,

will generate another up sized or down sized image also in terms of 8

 8 block DCT. The algorithm

operates directly on DCT coecients and does not require any overhead of computing DCT or IDCT. We note that even if it is not mandatory to perform operations in the transformed domain, the re-

(a) Original Image

sampling with this approach will produce better results than that of the best available techniques of the spatial domain sub-sampling techniques like, B-spline. Compared to other algorithms which operates in the DCT domain itself, our algorithm diers in that it can produce both up-sampling and downsampling and for arbitrary ratio m=n.

This is

very crucial for achieving many bit-rate levels for transcoding MPEG video.

References [1] Anil K. Jain  Fundamentals of Digital Image Process-

ing  Prentice Hall 1989. [2] Merhav N. and Bhaskaran Vasudev  Fast Algorithms for

DCT-Domain Image Down-Sampling and for Inverse Motion compensation IEEE Transactions on circuits

(b) Image sub-sampled and up-sampled with B-

and systems for video technology, vol 7, no. 3. June

spline in spatial domain (SNR = 14.6 dB)

1997,pp 468-476 Hu and Sethuraman Panchnathan  Image/Video Spatial Scalability in Compressed Domain 

[3] Qingwen

IEEE Transactions on industrial electronics, vol. 45, no. 1,February 1998,pp 23-31. [4] Rakesh Dugad and Narendra Ahuja A Fast Scheme for

Down-sampling and Upsampling in the DCT Domain  International Conference on Image Processing, October '99. [5] Rakesh Dugad and Narendra Ahuja  A Fast Scheme for

Altering Resolution in the Compressed Domain  IEEE Computer Society Conference on Computer Vision and Pattern Recognition vol 1. pp 213-18, June '99. [6] Shih-Fu and David Messerschmitt Manipulation and

Compositing of MC-DCT Compressed Video 

IEEE

Journal on selected areas in communication, vol. 13, no.1,January 1995,pp 1-11.

(c) Image sub-sampled and up-sampled using our approach (SNR = 19 dB)

Fig. 3.

Results.

Suggest Documents