Software Implementation of MPEG2 Decoder - CiteSeerX

Software

Implementation of MPEG2 Decoder on an

ASIP JPEG Processor

Naser Mohammadzadeh, Shaahin Hessabi, Maziar Goudarzi Department of Computer Engineering

Sharif University of Technology Azadi Ave., Tehran, Iran PO Box: 11365-9517 Tel: +98-21-66164626, Fax: +98-21-66019246 Email:[email protected], {Mohammadzadeh, gudarzi}@mehr.sharif.edu Abstract- In this paper, we present an MPEG-2 video decoder implemented in our ODYSSEY design methodology. We start with an ASIP tailored to the JPEG decompression algorithm. We extend that ASIP by required software routines such that the extended ASIP can now perform MPEG2 decoding while still benefiting from hardware units common between JPEG and MPEG2. This demonstrates the ability of our approach in extending an already manufactured ASIP, which was tailored to a given application, such that it implements new, yet related applications. The implementation platform is a Virtexll-Pro FPGA. The hardware part is implemented in VHDL, and the software runs on a PowerPC processor. Experimental results show that our ASIP structure is comparable to other hardwaresoftware implementations while our approach enables quick and easy extension of an ASIP using our EDA tool-set.

Index Terms- ASIP, ODYSSEY, IDCT, MPEG2.

I. INTRODUCTION

MPEG2 [1] is the digital video paradigm of today. It is at the heart of the DVB and ATSC digital television systems and DVD-Video. The widespread adoption of applications and systems like these, coupled with considerable investment by broadcasters and distributors mean that MPEG2 is going to be around for a good while to come, despite the emergence of new, and even better, video compression algorithms. Future consumer digital applications, where enhanced end-product functionality will be found in the convergence of audio, video, and data networking technologies, are certain to need built-in MPEG2 video capability. Due to ever increasing complexity of products, designs will require more flexibility and less development time. So if you have a MPEG2 decoder with programmability feature, you can reduce design and development time. In the domain of image (de)compression applications, the JPEG [2] algorithm is the base of many image (de)compression algorithms, including MPEG2 moving picture one. So JPEG functions can be used to implement MPEG2 decompression algorithm. Our goal is to implement the MPEG2 decoder using our ODYSSEY (Object-oriented Design and sYntheSiS of Embedded sYstems) design methodology [19] that suggests design and reuse of Application-Specific Instruction-set Processor (ASIP) while the embedded-system design starts

0-7803-9262-O/05/$20.00O2005 IEEE

from an Object-Oriented (00) application. Several implementations are available in the literature for MPEG2. The majority of MPEG2 implementations are commercial, such as those from Fujitsu [3], conexant [4], abe [5], Airugged [6], NEC [7], optelecom [8] and ORCA [9]. All of them are fullcustom ASIC implementations with additional facilities, so we don't consider them. All of the commercial products are rigid and lack of programmability is the major weakness of them. One of the major full-software implementations of MPEG2 has been provided by MPEG group [10]. This implementation is in C language and based on ISO/JEC DIS 13818-2. All software implementations of MPEG2 decoder differ from our work in some points; the most important difference is in the implementation approach, i.e. our implementation is based on an object-oriented model, while others are not. Amphion semiconductor [I I] has offered an IP core for MPEG2 decoder. The Amphion CS6651 intellectual property core is capable of real-time MPEG-2 video decoding on a single Xilinx Virtex series FPGA. Amphion cores support industry standard design flows. The process for integrating the CS6601 and supporting software into a design is shown in Figure 1. This solution allows decoding of MP@ML (Main Profile at Main Level) MPEG2 video with NTSC or PAL frame rates and resolutions. Their model has been described in C code and is not object-oriented. A similar, but not exactly the same, work has been done by Matjaz Verderber and Damjan Lampret [12]. They propose an optimized real-time MPEG-2 video decoder. The decoder has been implemented in one FPGA device as a HW/SW partitioned system. Based on the achieved results, they decided to implement the IDCT and VLD algorithms in hardware. Remaining parts were realized in software with a 32-bit RISC processor. MPEG-2 decoder (RISC processor, IDCT core, VLD core) has been described in Verilog/VHDL and implemented in Virtex 1600E FPGA. Their work is different from our work in some cases: the first and the most important difference is the goal of their implementation. Their goal was to propose an optimized real-time MPEG2 video decoder, but our goal is reusing of JPEG functions and giving flexibility to MPEG2 decoder; the second difference is that our implementation is based on an object-oriented model, while theirs is not.

310

compared to designing and debugging working hardware, and hence, programmable platforms not only reduce design risk, but also result in shorter time-to-market [23]. ODYSSEY synthesis methodology starts from an objectoriented model and provides algorithms to synthesize the model into an ASIP and the software running on it. The synthesized ASIP corresponds to the class library used in the object-oriented model, and hence, can serve other (and future) applications that use the same class library. This is an important advantage over other ASIP-based synthesis approaches [14] since they merely consider a set of given applications and do not directly involve themselves with future I ones. One key point in the ODYSSEY ASIP is the choice of the instruction-set: methods of the class library that is used in the embedded application constitute the ASIP instruction-set. The I other key point is that each instruction can be dispatched either to a hardware unit (as any traditional processor) or to a software routine; consequently, an ASIP instruction is the I quantum of hardware-software partitioning, and moreover, it shows that the ASIP internals consist of a traditional processor Figure 1. Design Data Formats Supplied by Amphion [11] core (to execute software routines) along with a bunch of The remainder of this paper is organized as follows. In hardware units (to implement in-hardware instructions-see Section II, we review the ODYSSEY design methodology. In Figure 2). An 00 application consists of a class library, which Section III, we briefly introduce the MPEG-2 algorithm. We describe our MPEG2 model in Section IV. Our implementation defmes the types of objects and the operations provided by is described in Section V. Section VI contains the simulation them, along with some object instantiations and the and implementation results. Finally, we conclude in Section sequence(s) of method calls among them. We implement methods of that class library as the ASIP instructions and VII. realize the object instantiations and the sequence of method calls as the software running on the ASIP. A simple internal II. ODYSSEY DESIGN METHODOLOGY architecture for such an ASIP is presented in [19] and Software accounts for 80% of the development cost in today embedded systems [13] and object-oriented design summarized in Section (II.a). methodology is a well-established methodology for reuse and II.a. AsIP ARCHITECTURE complexity management in the software design community. A simple internal architecture of the 00-ASIP is shown in These facts motivated us to advocate and follow top-down design of embedded systems starting from an object-oriented Figure 2. The 00-ASIP shown in Figure 2 corresponds to a embedded application. The 00 methodology is inherently library comprising two classes, A and B, where B is derived developed to support incremental evolution of applications by from A and has overridden its f( and go methods and has adding new features to (or updating) previous ones. Similarly, introduced an h( method. The following C++-like code embedded systems generally follow an incremental evolution excerpt demonstrates this. Note that redefmitions of the same (as opposed to sudden revolution) paradigm since this is what method can reside in different partitions (e.g., A::g( is a the customers normally demand. Consequently, we believe that software method while B::g( is a hardware one). 00 methodology is a suitable choice for modeling embedded class A { applications, and hence, we follow this path in ODYSSEY. void f(; The other fundamental choice in ODYSSEY is the void g(; implementation style. ODYSSEY advocates programmable H; other member-functions and attributes platform or ASIP-based approach, as opposed to full-custom or }; ASIC-based philosophy of design, since the design and manufacturing of full-custom chips in today 90nm class B extends A technologies and beyond are so expensive and risky that void f(; ll A::f() is overridden here increasing the production volume is inevitable to reduce the void g(; ll A::g() is overridden here void h(; unit cost. Programmability, and hence programmable ...Hll other member-functions and attributes platforms, is one way to achieve higher volumes by enabling A; the same chip to be reused in several related applications, or different generations of the same product. Moreover, All objects data are stored in a central data memory programming in software is a generally much easier task Sd4twr DdW,

&OIW bY AMPIIo

TayWIASC cFPGADWgnFbw (*~M

311

Figure2. Internal architecture of an OO-ASIP corresponding to a class A withfO and gO methods, and class B derived from A while redefiningft and gO and introducing hO method.

accessible through an Object Management Unit (OMU). In the application corresponding to Figure 2, three objects are defined: OAI, OA2, and 0B11 Objects of the same class (e.g., OA, and OA2) have the same layout and size in memory for their attributes. Objects of a derived class keep the original layout for their inherited attributes (the white part of the memory portion of OBI) and append it with their newly introduced attributes (the gray part of OBI box in Figure 2). The class methods that are assigned to hardware partition, i.e. the hardware methods, are implemented as Functional Units (FU); the other class methods, i.e. the software methods, are software routines stored in the local memory of the traditional processor core (the upper-left box inside the 00 ASIP in Figure 2). Design automation tools have been developed supporting the above-summarized ODYSSEY methodology [15]. III. THE JPEG AND MPEG2 ALGORITHMS MPEG-1 is a lossy video compression standard which enhances still-picture compression, using the Discrete Cosine Transform (DCT) and run-length coding, with motion compensation. Motion compensation exploits temporal redundancy in the video stream and provides much higher compression ratios. An MPEG stream has a hierarchical image data structure which consists of levels organized in the following manner: video sequence, group of pictures (GOP), picture (frame), slice, macroblock, and block (see Figure 3). One of the important features of MPEG is that there are 3 types of pictures, which are used for reducing the temporal redundancy. In the first picture type, called intra (I), all macroblocks in the picture are encoded without motion compensation. I-pictures are independent of other pictures and thus can provide points in the MPEG stream where decoding can start. In the second picture type, called predicted (P), in addition to intra macroblocks, some macroblocks are encoded with motion compensation based on a previous I or P-picture. In the third picture type, called bidirectionally predicted (B),

M.erobIok

Bl.ok(8'8)

Figure 3. MPEG video stream data structure

there are some macroblocks which are encoded with motion compensation based on either previous or next I or P-pictures. MPEG-2 is an enhanced version of the MPEG- 1 standard. Several kinds of image size including HDTV and coding schemes such as spatial and temporal scalable codings are integrated and determined by profile and level. Main Profile at Main Level (MP@ML) is the most common profile and level, and can be used for wide range of applications such as digital video disk (DVD) and Broadcast Satellite Service. Figure 4 shows the simplified block diagram of an MPEG-2 decoder. The decoder stages are: . Variable length decoding: This block decodes the variable length code that uses short words for most frequent values. * Inverse scan: This clause specifies the way in which the one-dimensional data, QFS[n], is converted into a two dimensional array of coefficients denoted by QF[v][u]. u and v both lie in the range 0 to 7. Two scan pattems are defined. The scan that shall be used shall be determined by alternate_scan which is encoded in the picture coding extension.

312

* Inverse Quantization: The two-dimensional array of coefficients, QF[v][u], is inverse quantized to produce the reconstructed DCT coefficients. This process is essentially a multiplication by the quantizer step size. The quantizer step size is modified by two mechanisms; a weighting matrix is used to modify the step size within a block and a scale factor is used in order that the step size can be modified at the cost of only a few bits (as compared to encoding an entire new weighting

Figure S. The basic structure of JPEG decoder

matrix).

* Inverse DCT: Once the DCT coefficients, F[v][u], are reconstructed, the inverse DCT transform shall be applied to obtain the inverse transformed values, f[y][x]. The defining equation for the 8x8 2-D inverse DCT is given by: ,C

S(Y,X)=

S16 cs[f(2x 16+ )v] 0sx, y! 7 ~~L2--:Q);r.)Osf(2x+l)u

Where: C(O)

=J21

C(k) = lforl < k < 7 s(y, x) = sample value S(v,u)= DCTcoefficient

Motion Estimation: The motion compensation process forms predictions from previously decoded pictures which are combined with the coefficient data (from the output of the IDCT) in order to recover the final decoded samples. The decoding process input is one or more coded video bitstreams (one for each of the layers) and its output is a series of fields or frames that are nonnally the input of a display process. Figure 5 shows the basic structure of JPEG decoder (for more information see [2]). As the figure shows, the structure of MPEG2 is very similar to JPEG one. So MPEG2 can be implemented by adding additional methods to JPEG methods. *

IV. OUR MPEG2 MODEL Our purpose is the use of JPEG class library to implement MPEG2 decoder. The JPEG model is described in C++ based on ISO/IEC 10918-1 Standard (ITU-T Recommendation T.81). The used JPEG library has two primary classes, JPEG-CLASS and block-CLASS. The JPEG-CLASS has four primary functions: variable length decoder, inverse scan, inverse quantization, and inverse DCT. The block_CLASS has one primary method: Inverse DCT. Details of the class definition and methods are presented in Figure 6. Our MPEG2 model is an object-oriented model described in C++ and based on ISO/IEC 13818-2 standard; Our class library has two primary classes; The first one is Layer-class which includes layer's functions, and the other is frame-class which includes frame's functions. Four frame class methods are similar to JPEG class methods. The other method (motion-compensation) should be added to JPEG 00-ASIP. The layer class has four main methods:

* * * *

Video_sequence: The video sequence method decodes each stream data. Get_Hdr : The Get_Hdr method decodes headers from one input stream. Decode_Picture: This method decodes picture whose header has already been parsed. frame_reorder: The frame_reorder method does the process of reordering the reconstructed frames when the coded order is different from the display order. Frame reordering occurs when B-frames are present JPEG Class Library.h CLASS( short data[64]; void IDCT(;

class block

class JPEGCCLASSI

void VLD( ){ }; //variable Length Decoder void inverse-scan({ ... }; void inverse_quantization({ ... void IDCT({

I;

tl other member-finctions and attributes

Figure 4. Block diagram of an MPEG-2 decoder

Figure 6.The JPEG class library

313

in a bitstream. There is no frame reordering when decoding low delay bitstreams.

The MPEG2 classes summary is presented in Figure 7. As this figure shows in our MPEG2 object-oriented model we create a block_CLASS object and use its IDCT function. Our design flow is summarized in Figure 8. As this figure shows the JPEG class library has been used to implement MPEG2 decoding algorithm. The main objective of this work is to demonstrate function reuse to implement MPEG2. The IDCT function is chosen as an example. The correctness of our object-oriented model was verified by comparing its result with that of MPEG2 group. #include "JPEG_Class_Library.h"1 class layer{ int video_sequence(; int Get_Hdr(); void Decode-Picture(; void frame_reorder();

class frame extends JPEG CLASS( //JPEG::VLD() is overridden

void VLD({ ... };

//JPEG::inversc_scan() is overridden void inverse_scan({... );

//JPEG::inverse_quantization() is overridden void inverse_quantization() {... ; lluse IDCT implemented in JPEG library

void IDCT(block){ block_CLASS block; IHblock is a class ofJPEG class library

blockdata=data; block.]DCTO; I/already implemented in JPEG

//add new function

void motion_compensation({ .. ..; Figure 7.The MPEG2 class

V. IMPLEMENTATION

To implement our system, we used EDK 7.1 [16] embedded system development tool. EDK is a collection of software tools for designing embedded programmable systems, and supports designs of processor sub-systems using the IBM PowerPCTm hard processor core and the Xilinx(® MicroBlazeTm soft processor core. The heart of our system is a PowerPC processor. PowerPC is a 32-bit RISC processor with Harvard microarchitecture, and five-stage pipeline. Its Caches are 2-way set-associative 16 kB instruction cache, and 16 KB data cache. In this design the frequency of processor is 300 MHZ. The software methods of MPEG2 decoder have been compiled and linked by gcc tools and placed in PPC instruction memory. The JPEG IDCT core has been described in VHDL. We have a network that IDCT can be attached to it. Main function of this network is sending/receiving data and parameters to/from functional units. Any FU (hardware implemented method) can be attached to this network. In this paper only IDCT has been attached to the network. Communicating with this network, an interface has been created and imported in the system on Processor Local Bus (PLB). The IDCT functional unit (already implemented in the JPEG) implements the basic 2-D IDCT structure by employing row-column decomposition of the data and applying 1 -D IDCT on the data row-wise and then on the row IDCT results, column-wise. This functional unit latency for the 8*8 block is 95 clocks. Figure 9 shows the designed system. We used ISE 7.li [17] for circuit synthesis and implementation. The system is implemented on AFX FPGAbased prototype board. The heart of the board is the Xilinx Virtex-I1 Pro xc2vp7 (fg456) FPGA (detailed information about the virtex-IL Pro device can be found in [18]). Several peripheral devices and connectors serve as interfaces from the FPGA to the extemal world. VI. SIMULATION AND IMPLEMENTATION RESULTS Table 1 shows implementation results of the MPEG2 decoder on the virtexIl Pro xc2vp7 fg456 FPGA. As a case study, we ran a test.m2v file downloaded from MPEG group site. Its frames' size was 128*128. The full software solution proposed by MPEG2 group is run on PowerPC and with this case. Then we ran our system on it.

Figure 9. The structure of our system

314

compensated by the increase in flexibility and effective reduction in time-to-market. 9000 Slices

...._RAMs ....... 45MHz 33 block RAMs

I

Ycs

xc2vp77 Vinex-l1 3000

ACKNOWLEDGMENT This work is supported by a research grant from the

16h0(E

6221 Slices

300MHz

18 block RAMs

Ycs

Virtex

1948 Slices

200MHz

>> 33 block RAMs

Yes

Virtex-Il Pro xc2vp7

We did not expect our results to be better in performance than full-hardware implementation. Our aim is function reusing in implementation of MPEG2 in order to make system flexible for changing and correcting the bugs. Consequently, although expectedly the speed of our implementation is lower than full-hardware one, but the flexibility that it offers allows it to be reused for other related application, and moreover, the time-to-design is significantly reduced since our approach only requires designing the missing functionality in software and adding it to the JPEG ASIP while the full-hardware implementation requires a very long time to design and develop working hardware. VII. CONCLUSIONS In this paper, we presented the implementation of MPEG2 decoder in the ODYSSEY design style. A major point here in this implementation is that the MPEG implementation takes advantage of the already-implemented JPEG ASIP. In other words, we used the fact that JPEG is the base of many image (de)compression algorithms, and hence, some methods used in JPEG implementation (IDCT operation in our example) can be reused when implementing a new application such as MPEG2. In this work, we extended the JPEG object-oriented class library with necessary methods, now implemented in software, to implement MPEG2 decoding using the same chip that implements JPEG decompression. This results in reducing time-to-market when developing new applications that are extensions of previous ones. As experimental results show, our design performs better, with much lower area, than one commercially available hardware-software product (i.e., Amphion's product) and takes up much lower area compared to another hardware-software product (i.e., Verderber's one) although this one operates faster than ours. This shows that although performance has not been the focus of our work, our approach does not result in high overheads and is comparable, or even outperforms, other similar products. Moreover, our approach decreases design time and increases flexibility of design because of programmability of the ASIP to add features to it. The reduction in performance that arises in some cases is

Department of High-Tech. Industries, Ministry of Industries and Mines of the Islamic Republic of Iran. We would also like to thank M. Modarressi and A.M. Gharehbaghi for their insightful comments. REFERENCES

[1] ISO/IEC JTCI/SC29/WGI I N0702 Rev, "Information Technology Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262", Draft International Standard, Paris, 25 March

1994. [2] Digital Compression and Coding of Continuoustone Still Images, Part I, Requirements and Guidelines. ISO/IEC JTCI Draft Intemational Standard 10918-l, Nov. 1991. [3] Fujitsu MPEG2 encoder/decode, july 2005, http:/hwww.fme.fujitsu.com [4] conexant MPEG2 encoder/decoder, july 2005, www.conexant.com [5] abe MPEG2 encoder/decoder, july 2005, www.abe.it [6] Real Time MPEG- I & MPEG-2 Decoder PMC, july 2005,

www.rugged.com [7] NEC MPEG2 Decoder, july 2005, www.nec.com. [8] MPEG2 Video Encoder Card, www.optelecom.com [9] ORCA video MPEG2 decoderjuly 2005, - www.doremilabs.com

[10] MPEG Group homepage, http://www.mnpeg.org/MPEG [11] AMPHION semiconductor LTD, 2002, www.amnphion.com [12] Verderber M, Zemva A, Lampret D, "HW/SW PARTITIONED OPTIMIZATION AND VLSI-FPGA implementation of the MPEG-2 video decoder", Design, Automation and Test in Europe Conference, Munich, Germany. March 3-7, 2003. [13] International Technology Roadmap for Semiconductors, (ITRS), 2003,

http://public.itrs.net/

[14] Benini L, De Micheli GL. "Networks on chips: a new soc paradigm". IEEE Computer, Vol. 35, No.1, Jan. 2002 pp. 70-8 [15] Goudarzi M, Hessabi S. "The ODYSSEY tool-set for system-level synthesis of object-oriented models. Embedded Computer Systems: Architectures, MOdeling, and Simulation (SAMOS)" , Springer-Verlag LNCS 3553, July 2005, pp. 394-403. [16] Embedded development kit, july 2005, www.xilinx.com. [17] ISE Foundation, http://www.xilinx.com/products series [18] Virtex

http://www.xilinx.com/products/silicon solutions/fpiras/virtex

[19] M. Goudarzi, S. Hessabi, A. Mycroft, "Object-Oriented Embedded System Development Based on Synthesis and Reuse of 00-ASIPs", Journal of Universal Computer Science, Vol. 10, No. 9, PP. 123-135. [20] M. Goudarzi, S. Hessabi, "Synthesis of Obiect-Oriented Descriptions Modeled at Functional-Level", World Scientific and Engineering Academy and Society Transactions on Computers, Athens, Issue 1, Volume 3, pp. 65-74, January 2004. 2)", (part Coding "Video Wu, [21] Min

http:/Hwww.ece.umd.edu/class/enee63 I/

[22] "A Guide to MPEG Fundamentals and Protocol Analysis",

www.tektronix.com/video audio

[23] K. Keutzer, S. Malik, A.R. Newton, "From ASIC to ASIP: the Next Design Discontinuity," Proc. of Int'l Conference on Computer Design (ICCD'02), 2002.

315

Software Implementation of MPEG2 Decoder - CiteSeerX

Software Implementation of MPEG2 Decoder - CiteSeerX

Suggest Documents

Implementation of JPEG2000 Arithmetic Decoder using ... - CiteSeerX

MPEG1, MPEG2

FPGA based implementation of Baseline JPEG decoder - CiteSeerX

Real Time Implementation of MPEG-4 Video Decoder on ... - CiteSeerX

A VLSI implementation of a cascade viterbi decoder with ... - CiteSeerX

Software Synthesis of Variable-length Code Decoder ... - CiteSeerX

Implementation of Reed-Solomon Encoder/Decoder

Software Implementation of Synchronous Programs - CiteSeerX

Combined Software/Hardware Implementation of a ... - CiteSeerX

A Real-Time MPEG Software Decoder Using a Portable ... - CiteSeerX

The Berkeley Software MPEG-1 Video Decoder

Static Load Balancing of Hierarchical MPEG2 Video ... - CiteSeerX

Program Implementation Schemes for Hardware-Software ... - CiteSeerX

Software Defined Radio Implementation (With simulation ... - CiteSeerX

Software Defined Radio Implementation (With simulation ... - CiteSeerX

Program Implementation Schemes for Hardware-Software ... - CiteSeerX

Software Versus Hardware Shared-Memory Implementation - CiteSeerX

Design and implementation of a near maximum likelihood decoder for ...

Implementation of a 3GPP LTE Turbo Decoder Accelerator on ... - Core

Real time implementation of OGG VORBIS decoder on ... - Sumam David

Implementation of H.264 Decoder on General-Purpose ... - Google Sites

implementation of h.264 decoder on sandblaster dsp - Semantic Scholar

FPGA Implementation of Viterbi Decoder using ... - Semantic Scholar

efficient dsp implementation of an ldpc decoder - Semantic Scholar