A high-performance architecture of Arithmetic Coder in ... - MIRLab

3 downloads 0 Views 344KB Size Report
JPEG2000 the MQ coder is adopted with its bit stuffing technique to avoid implementation difficulties of the full carry resolution. The CABAC module reads.
A high-performance architecture of Arithmetic Coder in JPEG2000 Grzegorz Pastuszak Institute of Radioelectronics, Warsaw University of Technology, Poland [email protected] Abstract This paper presents high-performance architecture of the arithmetic coder for the embedded block coding algorithm in JPEG2000 algorithm. The dedicated pipeline architecture enhanced by the inverse multiple branch selection (IMBS) method is proposed to code two context-symbol pairs per clock cycle. The overall design was implemented in VHDL and synthesized for FPGA devices. Simulation results show that it can process about 17 million samples at 77 MHz working frequency.

1. Introduction JPEG2000 is the latest image compression standard which offers some advantages over the previous ones [1], [2]. On the other hand, it implicates several times more compute-intensive algorithm than JPEG. A high computation speed is essential in case of Motion JPEG2000. The bottleneck of the JPEG2000 system results from the throughput of the embedded block coding with optimized truncation (EBCOT) module accomplishing the entropy coding stage. The major timing limitation is contributed by the context adaptive binary arithmetic coder (CABAC) included in the EBCOT block. The main reason of the high computation time lays in control intensive operations combined with arithmetic carry chains. The bit plane coder (BPC), constituting the second part of entropy module, also may decrease the throughput because of bit-level operations and intervals introduced by fractional bit-plane coding in each pass. There are various architectures of EBCOT proposed in literature [3]-[7]. Most of them focus on optimization methods for the bit-plane coder and assume that the CABAC can process at most one context-symbol pair per clock cycle. In this paper, we describe a VLSI implementation of the arithmetic coder algorithm processing two context-symbol pairs per clock cycle. The core adopts pipeline arrangement

0-7803-8603-5/04/$20.00 ©2004 IEEE.

optimized to achieve clock rate as high as possible. The Inverse Multiple Branch Selection (IMBS) method is proposed to shorten critical paths at the combinatorial logic level. A virtual implementation of the CABAC has been carried out by specifying the whole architecture through VHDL: in particular, synthesis on the commercial FPGA technologies has been performed. The estimated clock rate is 77 MHz. The rest of the paper is organized as follows. Section 2 reviews the algorithm of the arithmetic coder for EBCOT in JPEG2000. Section 3 discusses related timing constraints of hardware implementations. The proposed IMBS method is described in Section 4. Architecture design is illustrated in Section 5. Performance analysis and comparison are given in Section 6, and the Conclusion in Section 7.

2. Arithmetic coding algorithm The binary arithmetic coder (CABAC) constitutes the second stage of the EBCOT algorithm. In JPEG2000 the MQ coder is adopted with its bit stuffing technique to avoid implementation difficulties of the full carry resolution. The CABAC module reads context symbol pairs from the bit plane coder and codes them into subbitstreams for each code block. Each possible context has an associated probability state which identifies a binary value of the most probable symbol (MPS) and keeps an index pointing probability estimate of the least probable symbol (LPS). After a symbol has been coded the corresponding state is updated accordingly to the probability mapping rules. They may be understood in terms of four functions (lookup tables), which are defined in the JPEG2000 standard. The index is used as an input argument for all of them. These functions determine the probability estimate for the LPS (Qe), next index value (NLPS, NMPS) and exchange of the MPS (SWITCH). The CABAC module contains a set of registers: A, B, C, CT and L. Here, A represents the interval length. With each binary symbol it is subdivided into two subintervals where the MPS is

ordered to the bigger one and the LPS to the smaller one. One of them is selected as the new interval length. When MPS is coded the lower bound of this interval pointed by C is increased of the LPS probability value. The interval A is kept in the decimal range from 0.75 to 1.5. Whenever the A value falls bellow 0.75, it is doubled together with C register and the down counter, CT, is decremented. The latter identifies point (CT=0) at which high order bits of the C register are moved out to temporal buffer B. When it occurs, the previous value of the B is released to the output code stream incrementing the byte counter, L. Discrete truncation rates for each coding pass may be estimated on the base of this counter increased by a small number (from 1 to 5) calculated from internal registers of the CABAC module.

3. Timing limitations Provided the bit plane coder is able to generate contexts and symbols continuously the key limitation of the EBCOT algorithm results from the arithmetic coder. In direct mapping architecture, the most critical path originates from the C-register, an associated renormalization circuit, adder and the Qe lookup table. The pipeline architecture allows division of these operations into two or more parts which are accomplished in consecutive stages [5]. This technique can not be applied always since substantial branching conditions must test final result of some operations. When the interval kept in the A register is updated its value has to be selected from either Qe or A – Qe. When LPS is coded the smaller value is assigned to the A-register and the bigger one in the opposite case. Therefore, the subtraction must be finished to obtain decision from the comparator and then selection is possible. Also, updating the index value in the probability state array depends on the MSB bit of the subtraction result when MPS is coded. The new index determines the next probability estimate for the relevant context and must be available to calculate the interval for the following context-symbol pair. In the pipeline architecture the described operations introduce critical paths delays similar to those in remaining stages provided only one symbol may be processed in a clock cycle. When the CABAC is intended to code two or more symbols the subtraction A – Qe with an associated logic causes substantial deterioration in the working frequency. The main reason is that the Avalue for the second symbol is known only after the first symbol logic has calculated it. Hence, the delay of the most critical path consists of delays contributed by circuits ordered to each symbol.

index1

Qe0

Qe1

A

index0

-

2*Qe

Suggest Documents