An efficient implementation of a low-complexity MP3 ... - Springer Link

1 downloads 0 Views 829KB Size Report
Jun 6, 2007 - MP3 algorithm is executed on DSP and the stream cipher is on RISC. ... However, some of the main data, rather than an entire MP3 file, ...
Multimed Tools Appl (2007) 35:335–355 DOI 10.1007/s11042-007-0110-2

An efficient implementation of a low-complexity MP3 algorithm with a stream cipher Chih-Hsu Yen · Yu-Shiang Lin · Bing-Fei Wu

Published online: 6 June 2007 © Springer Science + Business Media, LLC 2007

Abstract For portable devices with MP3 codec, the demands of digital right management arise recently. To provide a secure scheme to the most portable devices with MP3 codec, this work efficiently implements a secure MP3 algorithm on a dual-core system with one DSP and one RISC. The secure MP3 algorithm is a combination of a proposed low-complexity MP3 algorithm and a stream cipher. The low-complexity MP3 algorithm is executed on DSP and the stream cipher is on RISC. This separated design can dynamically update the type of stream ciphers in various applications. However, some of the main data, rather than an entire MP3 file, is encrypted in the MP3 frame. The partially encrypted data have variable size, determined by the specified security level. The security scheme offers two advantages. The first is that the encrypting and decrypting structures are identical. The second is that the scheme easily determines the quality of the encrypted MP3. For saving the computational power to obtain long playing time for a portable device, a low-complexity MP3 encoder and decoder are implemented using ADSP-2181 with 16-bit fixed-point data precision. MP3 encoding requires only 27.2 KB/16.8 KB (data RAM/program RAM), and decoding requires 23.6 KB/20.7 KB for decoder. The peak MIPS of the encoder and decoder are 21.05 and 17.67, respectively. This work can be applied to a Digital Rights Management (DRM) system for limiting the access of the music. Keywords MP3 · Multimedia security · DRM · DSP · Low complexity

C.-H. Yen (B) · Y.-S. Lin · B.-F. Wu Department of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta Hsueh Rd., 300, Hsinchu, Taiwan, Republic of China e-mail: [email protected] B.-F. Wu e-mail: [email protected] Y.-S. Lin e-mail: [email protected]

336

Multimed Tools Appl (2007) 35:335–355

1 Introduction The digitization of media has profoundly affected copyright and intellectual property. Online MPEG Layer III (MP3) sharing seems to threaten the music industry. Accordingly, topics in the area of protecting content have become increasingly important over recent years. The access to content could be restricted in several ways, including encryption, watermarking, finger-printing, mechanism of access control and others. However, the encryption is a method to limit the access of the protected music. This work proposed an encryption scheme by embedding a stream cipher into MP3 algorithm to protect MP3 files. Some studies of the encryption of MP3 have been published. Torrubia et al. [21] presented the perceptual cryptography of MP3 streams. They employed two primitives—scalefactor encryption and Huffman-codeword substitution. Thorwirth et al. [20] presented a selective encryption algorithm, that encrypts the main data of MP3 granules. The encrypted part is determined by mapping the byte index of Huffman codeword onto the exact frequency boundaries. Both schemes can be used to encrypt the already encoded MP3 files. However, both schemes involve extra computations to determine accurately the quality of the encrypted MP3 files. This work presents a simple method for adaptively encrypting the main data in MP3 frame, and yields similar results to those of Torrubia et al. [21] and Thorwirth et al. [20]. The security level can be varied from 0%(lowest security) to 100%(highest security). In this proposed scheme, any stream cipher can be employed to generate the random bitstreams. The advantage of the partial encryption/decryption is that it accelerates processing overall. Additionally, the encryption scheme and decryption scheme are identical, so only one security scheme is required to perform both encryption and decryption. Following encryption, the format of MP3 frame remains valid. Therefore, the MP3 algorithm could directly decompress the encrypted MP3 without decryption, but the consumers receive only the low-quality music. Content providers can use this feature to provide free music to consumers. Most playback devices are dual-core systems, DSP and RISC, so the proposed approach are implemented and accelerated on such a system. The security phase is on RISC, and the MP3 phase is on DSP. The security phase executes parsing and encryption/decryption of the XOR operation. The MP3 phase performs as does the MP3 algorithm. The low-complexity MP3 algorithm in [23] is implemented on ADSP-2181, and accelerated for ADSP-2181. The optimization for ADSP-2181 is completed by simplifying the nonuniform quantization and dequantization, and introducing a data format of dynamic fixed-point to improve the quality for a 16-bit fixed-point DSP. Hybrid schemes that involve the lookup-table (LUT) method and linear approximation are employed to simplifying the quantization and dequantization. The approximation of quantization is a piecewise linear interpolation. The approximation for dequantization includes two steps—piecewise linear interpolation and fine approximation. The implementation of the low-complexity MP3 encoder requires a data memory of 27.2 KB, program memory of 16.8 KB and computation power of 21.05 MIPS. The decoder needs data memory of 23.6 KB, program memory of 20.7 KB, and computation power of 17.67 MIPS. Section 2 briefly describes the proposed secure MP3 scheme, describes the scheme and analyzes the security thereof. This section also presents the lowcomplexity MP3 algorithm proposed in [23]. Section 3 presents the realization of the

Multimed Tools Appl (2007) 35:335–355

337

low-complexity MP3 on ADSP-2181 and compares the performance with other studies. The simulation results are shown and explained in Section 4. Finally, Section 5 summarizes this work and provides directions for future work. 2 Scheme overview Figure 1 is the overview of the proposed scheme, which has two phases, the secure phase and the MP3 phase. The secure phase parses the MP3 frame to identify the security part and the normal part, encrypts/decrypts the security part, and joins normal part to processed security part as a valid MP3 frame. The security level s is varied from 0 to 100%. The percentage is mapped onto the size of the security part. Figure 1 indicates that the security phase has three inputs, key, security level, and a normal MP3 frame or an encrypted MP3 frame. The MP3 phase executes a low-complexity MP3 algorithm, sends the ordinary MP3 frames to the security phase and decompress the decrypted MP3 frames from the security phase. For on-the-fly production, the two-phase scheme increases the efficiency in dualcore platform, because we can run the two phases on two distinct CPUs for executing the low-complexity MP3 algorithm and secure coding simultaneously. Additionally, the proposed scheme can encrypt the existent MP3 files directly by security phase. The security phase reads the MP3 frames and processes the security parts. This makes the scheme easily be applied to the actual state. 2.1 The security phase In the case of encrypting an encoded MP3 file, Torrubia et al. [21] decompressed an MP3 file and performed Huffman encoding using a secure table to substitute codeword; Thorwirth et al. [20] also decompressed an MP3 file; determined the exact frequency boundary of main data in MP3 frame, and encrypted the protected part. These tasks are all time-consuming. However, the proposed scheme uses a more simple encryption scheme than described above, yielding similar results. Figure 2 presents how the security phase works. Each MP3 frame has two or four granules for one channel or two channels, respectively. In the security phase, each granule is sequentially processed in the same way. The flow of the security phase is as follows: 1. Determine the size of the security part with given security level s, where 0 ≤ s ≤ 1 and s has only two digits after decimal point. The security level for each MP3 granule is fixed. However, in each MP3 granule, the size of the security part is S = s × |main data|b , where operation | · |b counts the bit length. Fig. 1 Block diagram of the coding flow

Security Level 0~100 %

Key

RAW Data

MP3 Phase

MP3 Streams

Secure Phase

Encrypted MP3 Streams

338 Fig. 2 Detail of processing flow of the security phase

Multimed Tools Appl (2007) 35:335–355 Input of Security Phase

An Encrypted/Normal MP3 frame

Key

Suppose that 30% is given

Select the type of secure part Other Data

Secure Level

Main Data Extract the secure part

Normal Part

30% A random sequence

Output of Security Phase

Normal Part

A stream cipher

30%

A Normal/Encrypted MP3 frame

2. Obtain a sequence of S successive bits from the output of stream cipher. The stream cipher in proposed scheme may be any secure one to generate a random sequence of appropriate length in advance, whereas the MP3 phase performs compression to accelerate the entire process. 3. XOR the security part with the sequence. 4. Join the security part to the normal part to form a valid MP3 granule. Figure 3 depicts the file format of MP3. The header field contains information about the sampling frequency, bit rates and audio modes, for example. The CRC is used to detect whether errors occurred in fields of header and in the side information. All parameters related to decoding information are in the side-information field. Finally, the main-data field contains the compressed audio data. The security part is backwardly extracted successive S bits from the last bit (highest frequency) of main data. However, the length of main data is not fixed, but the header of each frame includes this information. Data at lower frequency are generally more important than those at higher frequency. Therefore, when the security level s increases, the security part becomes more important. Additionally, the encrypted MP3 file has the same size as the one without encryption, because the encryption part is the final results of MP3 encoding for a granule. 2.2 The MP3 algorithm Figure 4 presents a block diagrams of the general MP3 [6] encoding process. A timeto-frequency mapping converts the audio input into spectral lines frame by frame. In the hybrid transformation block, MP3 uses a poly-phase filter bank followed by a Modified Discrete Cosine Transform (MDCT) to increase the spectral resolution. Fig. 3 File format of MP3 [7] Header (32 bits)

CRC (0,16 bits)

Side information (136,256 bits)

Main data

Multimed Tools Appl (2007) 35:335–355 Fig. 4 MPEG/audio encoding process

PCM audio input

339 Hybrid transform for time to frequency mapping

Rate control for bit allocation

Masking Threshold

Distortion control for noise allocation

Psychoacoustic Model II

Iteration loop

FFT

Bitstream formatting

Encoded bitstream

Ancillary data (optional)

These spectral components are then divided into several scalefactor bands, according to the critical-band rate. The audio input simultaneously passes through the PAM-II, that determines the ratio of the signal energy to the masking threshold for each scalefactor band. The rate controller varies the quantizer in an orderly way: quantizes the spectral values and counts the number of Huffman code bits required to code the quantized values, to satisfy the bit rate constraint. The quantizer in MP3 is non-uniform. In the quantization of the gth granule in the f th frame, the spectral value x f,g (i) is preemphasized and amplified by applying (1) and (2). √ z2 ×(1+z1 )×P(b i ) , (1) xf,g (i) = x f,g (i) × 2 √ (1+z1 )×C(b i ) , (2) xf,g (i) = xf,g (i) × 2 where i is the index of spectral lines; z2 ∈ {0, 1} switches on or off the pre-emphasis; z1 ∈ {0, 1} determines √ whether the scalefactors are logarithmically quantized with a step size of 2 or 2; bi is the scalefactor band of the ith spectral line; P(·) is the preemphasis table as defined in [6], and C(·) is the scalefactor of all scalefactor bands. Then, the spectral value xf,g (i) is quantized by ⎛ ⎞ 0.75 |xf,g (i)| − 0.0946⎠ , (3) y f,g (i) = nint ⎝ δ+q 2 4 where nint is the rounding function; q is the lower bound of quantization parameter, and δ is the increasing variable of quantization parameter. Huffman coding is applied as the lossless coding tool and Huffman tables are predefined in [6]. MP3 also uses scalefactors to amplify the spectral band energy when the quantization noise exceeds the masking threshold. The distortion controller determines the scalefactors that control the quality. Finally, the information required by the decoder is packaged with compressed audio data as a valid stream of MP3 stream. The MP3 decoding process comprises three main parts [6]—bitstream decoding, dequantization and frequency-to-time mapping, as shown in Fig. 5. Bitstream decoding synchronizes encoded bitstream inputs, and extracts the quantized frequency coefficients and other information about each frame. Dequantization reconstructs the frequency coefficients, which are perceptually identical to those during encoding. The dequantization calculation based on the output of Huffman decoding and scalefactor information is given by (4) [6]. 2 4 ( f,g −8s (wi )) 1

4

x f,g (i) = (−1)s(i) · y f,g (i) 3 ·

2(1+z1 )·(C(b i )+P(b i ))

(4)

340

Multimed Tools Appl (2007) 35:335–355 Encoded bitstream

Bitstream Decoding

Dequantization

Frequency to Time Mapping

PCM audio output

Fig. 5 Block diagram of MPEG/Audio Layer 3 decoding

where s(i) is the sign bit of y f,g (i);  f,g = δ + q is the step size of the nonuniform quantizer; wi is the short-block window of the corresponding ith spectral line, and s (wi ) is the pre-defined gain of the short-block window. The final part, frequency-to-time mapping produces an audio PCM output from the dequantized coefficients. This part includes a set of reversed operations of the MDCT and analysis subband the filter bank in the encoder. The alias reduction block adds alias artifacts to dequantized coefficients, to reconstruct the data approximately as those of analysis subband filter bank in encoder. Then, the inverse MDCT reconstructs time domain subband signals from frequency lines. The frequency inversion is then applied in order to compensate the decimation used in the analysis polyphase filterbank. Thereafter, the synthesis subband filter bank is applied to the subband signals to yield the audio PCM output. Of the above procedures, dequantization, IMDCT, and subband synthesis in particular, depend on numerous arithmetic operations, and produce quantization noise in fixed point implementation. This work describes the optimization of these three processes (see Fig. 6). 2.3 Low-complexity algorithm for MP3 encoding The main modified functions of the low-complexity MP3, presented in [23], are briefly described as follows. 2.3.1 Bandwidth control A low-pass filter is applied to the bandwidth control. The spectral values x f,g (i) is filtered by ⎧ ⎨x f,g (i) , if i ≤ nint( fsc × 576), 2 (5) L(x f,g (i)) = ⎩0 , if i > nint( fsc × 576), 2

where the cutoff frequency c is the bandwidth coefficient in Fig. 7. 2.3.2 New rate control loop This work adopts a new rate control algorithm proposed in [23]. The removal of PAM-II and the related distortion control loop simplify the iteration loops. It

Fig. 6 Frequency to time mapping

Alias Reduction

Inverse MDCT

Frequency Inversion

Synthesis Subband Filter bank

PCM audio output

Multimed Tools Appl (2007) 35:335–355 Fig. 7 Coefficients of bandwidth control: the corresponding cutoff frequency of each bit rate with a 44.1 kHz sampling rate is obtained from LAME [10]

341

22050 20787

Cutoff Frequency Ωc

19677

16805 15389 13705 11905 10298 8843 7886 6852 5895 5091 3240485664 80 96 112128 160 192 224 Encdoing Bitrate (kbps)

256

320

contains precise initialization, fast search, fast quantization, and dynamic bit allocation. Each function is detailed as follows. 2.3.3 Precise initialization of the step size  f,g The step size  f,g is iteratively updated by a single granule of the previous frame and by a lower bound l .  f,g (n) = max{l ,  f,g (n − 1) + σ },

(6)

log2 (max{xˆ f,g (i)}) − 69.35; σ is the addend of step size;  f,g (−1) = where l =  16 3 i

 f,g−1 , and  f,0 (−1) = −150 2.3.4 Fast search for the optimal quantizer parameter A fast iterative search is proposed for  f,g in [23]. The first trial performs quantization where the quantization parameter,  f,g (0), is as given by (6). The subroutine Q2 (·) is the implementation of (9). The iterative search is then applied if the first trial has failed. In the nth iteration, the  f,g (n) is updated by (6). After the nonuniform quantization Q2 (·), the number of Huffman coded bits is calculated. Then, the required bits over the step size  f,g (n) are evaluated for subsequent checking. The σ is updated in the same way as in first trial and then some loop break conditions are applied. The final trial guarantees that the used bits are fewer than the allocated bits. Unlike in the iterative search,  f,g is fine tuned to prevent the deadlock loop condition. 2.3.5 Fast quantization In the ISO MP3 algorithm, the nonuniform quantizer was defined by (3). Distortion control is not used in this implementation, so (1) and (2) no longer exist,

342

Multimed Tools Appl (2007) 35:335–355

i.e. xf,g (i) = x f,g (i). Additionally, the rounding function nint() is not necessary in the fixed point implementation. Then (3) is applied in two steps. xˆ f,g (i) = |x f,g (i)|0.75

(7)

y f,g (i) = xˆ f,g (i) × 2− = xˆ f,g (i) × 2 where  N is the integer part of −

3× f,g , 16

3× f,g 16

Q

×2

− 0.0946 N

(8)

− 0.0946

and  Q is the fractional part of −

(9) 3× f,g . 16

2.3.6 Dynamic bit allocation based on energy distribution Without PAM-II or distortion control, an asymmetric allocation of bits was determined based on energy distribution of granules. In that proposed approach [23], at sampling rate of 44.1 kHz, the score of the granule energy defined as (10) takes only 4000 0.75 output for calculation. 44100 × 576  105 spectral lines of |x f,g (i)| 2

E f,g =

105

xˆ f,g (i),

(10)

i=1

where xˆ f,g (i) is given by (7), and E f,g is the energy score of the granule. The maximum encoding bits B f,g for each granule is E f,g × B p, B f,g = b f,g + E f,g

(11)

g

where b f,g is the minimum encoding bits; B f is the total available number of bits in the frame, and B p is the number of bits used to distribute to each granule. B p and b f,g are evaluated as follows,

b f,g =

⎧B f ⎪ ⎪ ⎪6 ⎪ ⎪ ⎨Bf

12

Bf ⎪ ⎪ ⎪ 9 ⎪ ⎪ ⎩Bf 18

B p = Bg −

, Mono, , Left/Right Channel, , Mid Channel,

(12)

, Side Channel,

b f,g

(13)

g

This iteration frequency of the method has a maximum value of eight and a mean of 1.8.

Multimed Tools Appl (2007) 35:335–355

343

2.4 The low-complexity algorithm of MP3 decoding 2.4.1 The dequantization The dequantization equation shown in (4) is rewritten as 2 4 ( f,g −8s (wi )) 1

1

x f,g (i) = (−1)s(i) · y 3f,g (i) · y f,g (i) ·

2(1+z1 )·(C(b i )+P(b i ))

1

The complexity is the calculated y f,g (i) 3 , where y f,g (i) is an integer ranging 0 to 8207. The direct derivation using mathematical libraries is too time-consuming and not suited to real-time implementation. 2.4.2 IMDCT and subband synthesis The frequency-to-time mapping tool is another computationally demanding process. Based on the analysis result of Lee et al. [13], Lee’s Fast DCT algorithm [11] is applied here as a fast algorithms for IMDCT and subband synthesis blocks. Lee’s 9-point fast IDCT is applied to IMDCT block, and Lee’s 64-point fast DCT is used for matrixing routine in subband synthesis block .

3 DSP implementation The low-complexity MP3 encoder and decoder are implemented on the EZ-Kit development board of ADSP-2181, which has been used extensively in many applications. ADSP-2181 is a 16-bit fixed-point DSP and performs 16-bit arithmetic. 3.1 Acceleration of quantization In the new rate control iteration,  f,g is the only updating variable. Hence, |x f,g (i)|0.75 is removed from the iteration. Therefore, to reduce the computation in the loop, (7) is calculated outside the iteration and (9) is calculated inside the iteration. Equations 7 and 9 are restated below. The operation of |x f,g (i)|0.75 is accelerated for ADSP-2181. The unsigned 16-bit fixed-point inputs x f,g (i) range from 0 to 65535 and are divided into two regions. The first region covers from 0 to 31 is implemented using a 32-word lookup table to accelerate the calculation. The probability model of |x f,g (i)|0.75 reveals that the first region covered over 60% of the inputs. The second region from 32 to 65,535 is approximated by piecewise linear interpolation, and includes 11 subregions. The segmentation of the 11 subregions is also accelerated for the target DSP. Since ADSP-2181 supports the hardware detection of leading ones/zeros, so biased log2 (x) can be derived in a single instruction cycle. Accordingly, the boundaries of the subregions are proposed to be set to the powers of 2 – 32, 64, 128, . . ., 65,536. Figure 8 plots the error to real output ratio. The ratio is about 1%. Table 1 summarizes the number of instruction cycles in the calculation of the two regions. In fixed -point implementation, the multiplication by 2 N in (9) can be easily performed by the hardware barrel shifter, and the 2 Q is derived from a 16-word 0 1 15 lookup table that contains fixed point values 2 16 , 2 16 , . . . , 2 16 .

344

Multimed Tools Appl (2007) 35:335–355

Fig. 8 Error of |x f,g (i)|0.75 approximation

1.5%

1%

0.5%

ε (xf,g(i) ) 0%

–0.5%

–1%

–1.5%

0

1

2

3 xf,g(i)

4

5

6 x 104

3.2 Acceleration of dequantization 1

The power function y 3f,g (i) is implemented as a hybrid scheme as well as quantization. First, the input range is split into three sections, as shown in Fig. 9. The first section, 0 ≤ y f,g (i) ≤ 32, utilizes a small lookup table to obtain the real value directly. The piecewise linear approximation method is applied in the other two sections. The segmentation is also accelerated for the ADSP-2181. The second section is segmented according to the leading-zeros of x3 to minimizing the approximation error. Newton’s method was adopted in the section 33 ≤ y f,g (i) ≤ 8207 as to refine the approximation. For simplicity, the index of the spectral line i is ignored in the rest of 1

this subsection. Let u = y 3f,g , then the first-order solution of u is u˜ 1 = u˜ 0 −

u˜ 30 − y f,g 2u˜ 3 + y f,g y f,g 1 = 0 2 = · (2u˜ 0 + 2 ), 2 3 u˜ 0 3u˜ 0 3u˜ 0

(14)

The initial value u˜ 0 is calculated from piecewise linear approximation, as described above and yields the desired accuracy in a single iteration. The effect of the fixed-point implementation was analyzed. Figure 10 depicts the error to real output ratio. The ratio is around ±0.08% and the SNR is around 82 dB. 3.3 Data precision optimization in the low-complexity MP3 encoder ADSP-2181 performs 16-bit arithmetic. However, 32-bit arithmetic can be used more accurately to process data but it also increases the operational complexity.

Table 1 Number of DSP instruction cycles in calculation of two regions

Input range

DSP instruction cycles

Table size

0∼31 32∼65,536

4 9

32 words 22 words

Multimed Tools Appl (2007) 35:335–355

345

Fig. 9 The implementation 1

of yg3 (i)

Figure 11 reveals that five instructions are required to perform the double-precision multiplication, such that the computation is five times that associated with 16-bit multiplication. The encoding process is divided into six stages as shown in Fig. 12, to determine the precision of data. The PCM samples are always 16-bit and the format is denoted as (1.15)16 ; the format (α.β)γ indicates that a fixed-point number of γ bits is represented by setting the binary point as the α th most significant bit. Clearly, α + β = γ . The subband analysis is divided into two stages — windowing with partial calculation [6] and matrixing [6]. Windowing with partial calculation performs 16-bit multiplication-and-accumulation operation, and generates a vector of 32-bit results Y [6]. Matrixing performs double-precision multiplication-and-accumulation, as in Fig. 11, and then generates subband signals r(t) in the time domain. Then, only the 16-bit rounding result of Rh is left. The statistic analysis reveals that the dynamic range of subband signals is −2.0 < x f,g (i) < 2.0; therefore, the format is derived as (2.14)16 . Then, the subband signals are undergone the MDCT and anti-aliasing stage. A faster MDCT is applied here to reduce computational complexity but also

Fig. 10 Error to real

0.06%

4 3

output ratio of yg fixed point approximation.

0.04%

4

ε(yg ) =

yg3 −pow3fx(yg )·yg 4

,

where pow3fx is the proposed fixed point implementation 1 3

of y f,g

0.02%

yg3 0% ε (yf,g )

–0.02%

–0.04%

–0.06%

–0.08%

0

1000

2000

3000

4000 yf,g

5000

6000

7000

8000

346

Multimed Tools Appl (2007) 35:335–355 32 bits

Xh mx1

Xl mx0 Yl my0

X

{ADSP-21xx instructions}

Unsigned x Signed

Xl x Yl Signed x Signed

Xh x Y l

mr = mx0 * my0 (us); mr = mr (rnd); mr0 = mr1; mr1 = mr2; mr = mr + mx1 * my0 (ss)

32 bits

Rs mr2

Rl mr0

Rh mr1

Fig. 11 Double-precision multiplication, R(32 − b it) = X(32 − b it) × Y(16 − b it)

to maintain the quantization error caused by fixed-point arithmetic. After 16-bit multiplication-then-accumulation operation is applied to subband signals, the 32-bit transform coefficients are generated and then passed to the antialias block. Again, the double-precision arithmetic, as depicted in Fig. 11, is performed an antialiased 32-bit transform coefficients x f,g (i) are derived from Rh and Rl . A special format converter added after the antialias block is used to convert the 32-bit data with format (2.30)32 to 16-bit data; the fixed-point format is determined by the dynamic range of x f,g (i) at runtime. The maximum of the transform coefficients in the granule, X f,g = max{|x f,g (i)|}, is determined first. Equation 15 shows that the i

right shift amount k, derived from a special function, exp32 . k = 16 + exp32 (X f,g ),

(15)

where the exponent detector is functionally equal to 

exp16 (x) ≡ log2 |x| − 14, when x is single precision, exp32 (x) ≡ log2 |x| − 30, when x is double precision.

(16)

The SHIFTER unit of ADSP-21xx core supports hardware exponent detector that can count the number of leading zeros/ones of the single-precision data in a single instruction cycle and the double-precision data in two instruction cycles. For example, the 32-bit data X f,g = (01111111 0abcdefg abcdefgh abcdefgh)2 , and the exponent detector generates result of –7. Adding of 16 as (15) yields a k value of 9; then, the format converter shifts the 32-bit transform coefficients, x f,g (i) to right by 9 and yields the 16-bit shifted transform coefficients, according to (17). x f,g (i) b 16 = x f,g (i) × 2−k

(17)

Multimed Tools Appl (2007) 35:335–355

347

Fig. 12 Precision of data used between each stage in the low-complexity MP3 encoder. (M.N)16 , determined from the format converter, is the fixed-point format of transform coefficient

where · b 16 denotes the truncation to 16 bits. Meanwhile, the maximum Xg is also converted into 16-bit data, X f,g = (10abcdef gabcdefg)2 .

(18)

The format converter compacts the 32-bit data in the form (2.30)32 into 16-bit data format (M.N)16 to decrease the quantization error due to fixed-point arithmetic. The format (M.N)16 which varies across the encoding granules is chosen at run-time, and M is usually negative. The shifting operations described by (17) will be inverted subsequently. The shifted transform coefficients, x f,g (i) b 16 , are then passed to the iteration loops. The new rate control executes the |x f,g (i)|0.75 operation before iterative quantization. Equation 19 is rewritten as (7). The multiplying coefficient, 24 , converts 1 the format to (M + 4.N − 4)16 , because the dynamic range of |x f,g (i)|0.75 is 16 of x f,g (i). xˆ f,g (i) b 16 = | x f,g (i) b 16 |0.75 × 24 = |x f,g (i) × 2−k |0.75 × 24 = |x f,g (i)|0.75 × 2−0.75k+4 = xˆ f,g (i) × 2−0.75k+4

(19)

The relationship between xˆ f,g (i) and x f,g (i) b 16 can be rewritten as xˆ f,g (i) = xˆ f,g (i) b 16 × 20.75k−4 .

(20)

348

Multimed Tools Appl (2007) 35:335–355

Fig. 13 Data precision between each stage in the low-complexity MP3 decoder. (Msb .Nsb )16 , determined from the format converter, is the fixed-point format of each subband. (Mglb .Nglb )16 , equal to one of (Msb .Nsb )16 that the subband has the coefficient of highest amplitude in the granule, is the fixed-point format of subband signal

The quantizer is modified using 8 to yield the correct quantized value. Equation 21 indicates that the modification involves adding additional an offset to the exponent term. y f,g (i) b 16 = xˆ f,g (i) × 2

−3×g 16

− 0.0946

= xˆ f,g (i) b 16 × 20.75k−4 × 2 = xˆ f,g (i) b 16 × 2 −(3×

−12×k+64)

f,g Similarly, the exponent 16 part to accelerate the computation.

−3× f,g

−(3× f,g −12×k+64) 16

16

− 0.0946

− 0.0946

(21)

is divided into its integer part and its fraction

3.4 Maximizing precision of data in the low-complexity MP3 decoder Jeong et al. [9] revealed that no noise is audible when fixed-point implementation was used and the MAC-based MPEG2 audio decoder uses at least a 21-bit multiplier and 25-bits adder. Lee et al. [13] implemented MPEG audio decoding using doubleprecision arithmetic during all decoding processes in a 16-bit fixed-point DSP. As shown in Fig. 13, the quantized transform coefficients are decoded by Huffman decoder and then dequantized. The dequantizer with reduced complexity produces 32-bit data in the format (2.30)32 . The succeeding stage includes stereo processing, reordering and anti-aliasing performs double-precision arithmetic, as in Fig. 11, and generates 32-bit data with the same format (2.30)32 . The format converter used in encoding is also applied here in decoding.

Multimed Tools Appl (2007) 35:335–355

349

Fig. 14 Different formats used between sub-bands and the modified IMDCT

Unlike in the encoding case, the format converter changes the data format in each subband. The maximum transform coefficient in each subband is determined to derive the individual right shift amount as the format converting parameter; the format (Msb .Nsb )16 is also derived from (15). Figure 14 reveals that the 32 formats, (M0 .N0 )16 , (M1 .N1 )16 , . . ., and (M31 .N31 )16 , are corresponding to 32 subbands. Each IMDCT performed in each subband produces 16-bit subband signals with the same format (Mglb .Nglb )16 , derived as  Mglb = max(Msb ) sb = 0, 1, . . . , 31 (22) Nglb = min(Nsb ) sb = 0, 1, . . . , 31. After IMDCT, the subband signals are synthesized into time-domain PCM sample through two operations—matrixing and windowing. The matrixing operation is implemented as 64-point Lee’s Fast DCT, as mentioned in relation to the proposed decoding algorithm. The 16-bit arithmetic is performed a 16-bit result vector is produced with format (2.14)16 . The windowing operation yields the PCM samples with format (1.15)16 .

4 Performances and comparisons 4.1 Security analysis This work does not focus on the theoretical security of stream ciphers, because the proposed scheme does not limit the characteristics of the stream cipher should have, except for theoretical security. Hauser and Wenz [5] indicated that the DRM system can not easily protect the multimedia content, especially for audio, on current computing systems, because encrypted contents must eventually be displayed in a raw form, giving hackers the opportunity to grab the content. Therefore, the security analysis is adjusted only by discussing the security part, and the protection of MP3 frames. The security part is extracted from the main data in an MP3 granule, so that contents of the security part vary with the song. Different MP3 encoders are also very likely to have different main data after a particular song has been encoded.

350

Multimed Tools Appl (2007) 35:335–355

Fig. 15 Quality loss following encryption using SEAL stream cipher

–0.5

bass harp spfg

–1

–1.5

ODG

–2

–2.5

–3

–3.5

–4

0

10

20

30

40

50

60

70

80

90

100

Secure level,s

Therefore the main data are hard to predict, and so the main data can undoubtedly be encrypted. The size of the alphabet of the security part is then considered. For a 5-min song of 128 kbps, the size of the security part is least 22,800 bits (at a sampling rate of 44.1 kHz, 38 frame/s × 60 s × 5 min × 2 granules × 1 bit ) at the lowest level of security, s = 0.01. The size described above suffices to against brute-force attacks. In proposed scheme, the transmission of MP3 frames between the security phase and the MP3 phase is critical. If a hacker can easily obtain the MP3 frames, then he can receive unprotected MP3 files without breaking the stream cipher. If the two phases are implemented on separate processors, a secure channel between the security phase and the MP3 phase is required. In this paper, it is assumed that the RISC and the DSP are in one chip. If in this case, then the normal MP3 granules should be in the form of link lists, ensuring that they are located in non-contiguous memory blocks [5]. This program technique makes crackers hard to get the raw data from memory. 4.2 Performance of partial encryption Quality loss is addressed using the Objective Difference Grade(ODG) of PEAQ (Perceptual Evaluation of Audio Quality) [8]. The software encryption algorithm (SEAL) [16] is applied as the stream cipher in the following simulation. The three Table 2 Results of MP3 codec implementation

Algorithm

Our MP3 decoder

Decoder by Lee et al. [12] Decoder by Bang et al. [2] Our MP3 encoder Encoder by Wang et al. [22]

Peak DSP

Program

Data

Total

MIPS

memory (KB)

memory (KB)

memory (KB)

17.67 20.7 13.3 21.05 36.07

20.7 12 6.6 16.8 ∼ 64

23.6 40.8 21.4 27.2 ∼ 64

44.3 52.8 28 44 ∼ 128

Multimed Tools Appl (2007) 35:335–355

351

Table 3 The comparison of peak consumed MIPS in various MP3 encoders

Peak MIPS

Ours

Wang et al. [22]

Oh et al. [15]

Subband analysis MDCT PAM-II Iteration loops Huffman encoding and bitstream formatting Total

7.09 3.99 Removed 4.50 (peak) 5.47 (peak)

5.64 3.74 8.96 11.87 (peak) 5.86 (peak)

10.4 Removed 18.43 (peak) 2.07 (peak)

21.05

36.07

30.9

audio samples, bass, harp and spfg, are obtained from EBU SQAM [4]. Figure 15 shows the ODG of the encrypted MP3 files, obtained by the proposed approach with different security levels. The quality clearly monotonically declines as the security level increases. 4.3 Comparisons of MP3 codec The low-complexity MP3 encoder and decoder are implemented on the ADSP-2181 using the proposed architecture. Table 2 summarizes the results of implementation. All MIPS are estimated at 44.1 kHz sampled, stereophonic audio input and 128 kbps output MP3 bitstreams. A total program RAM of about 37.5 KB is required to store both the encoder and the decoder program code, and a data RAM of no more than 27.2 KB is required during encoding or decoding. The on-chip RAM of ADSP 2181— 48 KB and 32 KB RAM for program and data, respectively, suffices to support the low-complexity MP3 codec. Table 2 compares the results in those with other studies of MP3. Although Bang et al. [2] obtained better results than were obtained for the decoder herein, the later still provides some advantages. Bang et al. designed a specific DSP, that supports Huffman decoding and frame unpacking using on-chip hardware, but the low-complexity MP3 decoder herein was implemented on a widely used ADSP-2181. Table 3 lists the computational complexity of each part of the low-complexity MP3 encoder, and compares the values with those of the Oh et al. and Wang et al. encoders. Oh et al. [15] implemented an MP3 encoder over a specially designed 20-bit fixed-point DSP with fast bit allocation, the removal of PAM-II and window

Table 4 The comparison of peak consumed MIPS in different MP3 decoders

Peak MIPS

Ours

Lee et al. [13]

Bang et al. [2]

Scalefactor and Huffman decoding Synchronization and bitstream unpacking Dequantization IMDCT Subband synthesis Total

5.95 (peak) 0.44

6.2 (peak)

N/A

2.38 (peak) 4.45 (peak) 4.45 17.67

5.4 (peak) 2.8 6.3 20.7

4.5 (peak) 2.85 5.97 13.33

352

Multimed Tools Appl (2007) 35:335–355

Table 5 The comparisons between our work and commercial MP3 encoder

Ours Tensilica [18] ADI, MelodyTM chipset [1] CuTe Solutions [3] SpiritDSP [17] CuTe Solutions [3] CuTe Solutions [3] CuTe Solutions [3]

Processor

MIPS

PM (KB)

DM (KB)

ADSP-2181 Xtensa HiFi Engine ADSP-218x ADSP-218x MIPS-based TX49xx TI C54x TI C55x TI C64x

21.05 65 40 40 80 36 72 33

16.8 90 < 48 32 N/A 22 62 121

27.2 46.6 < 32 16 N/A 21.8 30.3 46.7

PM is program memory and DM is data memory

switching. Wang et al. [22] implemented a real-time encoder on a 50 MIPS 16-bit fixed-point ADSP-2181 using a new PAM based on MDCT, and fast bit allocation. The worst-case results for signal-dependent blocks, such as iteration loops and Huffamn encoding, are listed. The removal of PAM-II, the proposed new rate control algorithm and the non-uniform quantizer, reduce the computational loads of the iteration loops in the proposed encoder to far below those in the other two encoders. However, dynamic bit allocation based on energy distribution and the implementation of blocking floating point in data precision, enable the proposed encoder to perform as well as the other two encoders. Table 4 presents the computational complexity of each part in the proposed MP3 decoder and compares the values with those for the decoders of Lee et al. and Bang et al.. Lee et al. [13] implemented an MP3 decoder on Motorola DSP56654, a dualcore processor with a 32-bit RISC MCU and a 16-bit fixed-point DSP. Bang et al. [2] implemented a decoder on a self-designed VLSI of with a 20-bit fixed-point DSP core including a built-in hardware Huffman decoder. The DSP supports not only general arithmetic but also special instructions such as UNPACK and HUFFMAN. Restated, Huffman decoding and unpacking depend on special hardware. Table 5 shows the results of each implementation of encoder, and Table 6 is for the decoder. From these results, the proposed work is better than most implementations

Table 6 The comparisons between our work and commercial MP3 decoder

Ours Tensilica [19] CuTe Solutions [3] Nuntius Systems[14] Nuntius Systems [14] SpiritDSP [17] CuTe Solutions[3] CuTe Solutions [3] SpiritDSP [17]

Processor

MIPS

PM (KB)

DM (KB)

ADSP-2181 Xtensa HiFi Engine ADSP-218x ADSP-2185 Proprietary SIMD DSP core TI C55x TI C54x TI C64x ARM7

17.67 18 20 36 22 12.5 31 20 25

20.7 37 33 25 24 20 29.7 82 31

23.6 27.3 167.5 23 22 12 14.2 33.2 24

PM is program memory and DM is data memory

Multimed Tools Appl (2007) 35:335–355 Table 7 Test audio samples

353

Signal characteristic

Time

Abbreviation

Violin solo in arpeggio Melodious quartet German female speech

0:37 0:28 0:21

VL QT GF

in the aspect of code size. ADI, CuTe Solutions, and Nuntius Systems have used the ADSP-218x family to realize the MP3 encoder or decoder. CuTe Solutions also used TI 54x (900 MIPS), 55x (900 MIPS), and 64x DSPs (1,200–8,000 MIPS).

4.4 Quality test The audio quality of the proposed MP3 encoder and decoder is evaluated subjectively using “Double blind triple stimulus with hidden reference” listening tests [7], and the hidden reference is ISO encoder/ISO decoder. Three audio samples are used herein, and listed in Table 7. All samples are stereophonic, and were sampled at 44.1 kHz. The experiment involves 11 listeners. The “Diffgrade” and the “number of misidentification items” are presented in the two tests. Diffgrade is the subjective rating of the coded test item, minus the rating of the hidden reference. The Diffgrade scale covers into five ranges - “imperceptible (> 0.00)”, “perceptible but not annoying (0.00 ∼ −1.00)”, “slightly annoying (−1.00 ∼ −2.00)”, “annoying (−2.00 ∼ −3.00)” and “very annoying (−4.00)”. The “number of misidentifications” is the number of subjects who incorrectly identify the test item and the hidden reference. Table 8 shows the two results. The first is the grade of encoding MP3 using the proposed encoder and then decoding using the ISO decoder. The second is the result of encoding MP3 using the ISO encoder and then decoding using the proposed decoder. This table shows that most Diffgrades are very small in contrast to −1, and some tests have the grade of imperceptible quality.

Table 8 The subjective evaluation results (1) Bit rate

256 kbps 192 kbps 128 kbps

DG MI DG MI DG MI

Proposed encoder/ISO decoder

ISO encoder/Proposed decoder

VL

QT

VL

QT

GF

–0.09 5 –0.04 7 –0.4 6

–0.1 7 0.02 7 –0.3 6

0.09 5 –0.02 9 –0.1 8

0.19 7 –0.02 8 –0.04 8

0.64 8 0.01 9 –0.04 9

GF 0.36 6 0.2 10 0.04 9

DG diffgrade, MI number of misidentification over 11 listeners

354

Multimed Tools Appl (2007) 35:335–355

5 Conclusions This work proposed a secure MP3 algorithm and efficiently implemented it on a dual-core platform with StrongARM and ADSP-2181. It represents a fast and easily means of protecting the MP3 files. Actually, a similar secure structure can be applied to many perceptual audio codings. The realization has the potential to be ported on the current playback devices and cellular telephones for DRM applications. It requires only 21.05 MIPS for encoding and 17.67 MIPS for decoding. Only a program RAM of 37.5 KB and a data RAM of 27.2 KB are required. Additionally, the realized MP3 algorithm is also very suited for which power and memory are very concerned, such as cellular telephones and Walkmans. Acknowledgements The work was financially supported by National Science Council under Grant no. NSC 93-2218-E-009-033 and the Program for Promoting University Academic Excellence under Grant no. EX-91-E-FA06-4-4.

References 1. ADI, MelodyTM chipset. http://www.futurlec.com/News/Analog/MP3.html 2. Bang KH, Jeong NH, Lim JS, Park YC, Youn DH (2002) Design and VLSI implementation of a digital audio-specific DSP core for MP3/AAC. In: International Conference on Consumer Electronics, Digest of Technical Papers, pp 220–221 (June) 3. CuTe Solution. http://www.cutesolinc.com 4. European Broadcasting Union, EBU SQAM. http://www.tnt.uni-hannover.de/project/mpeg/ audio/sqam/ 5. Hauser T, Wenz C (2003) DRM under attack. In: Lecture note in Compute Science, vol 2770. Springer, Berlin Heidelberg New York, pp. 206–223. 6. ISO/IEC JTC1/SC29/WG11 MPEG, International Standard IS 11172–3 (1993) Coding of moving pictures and associated audio for digital storage media at up to about 1.5M bit/s, part 3: audio 7. ITU-R Recommendation BS.1116 (1994) Methods for the subjective assessment of small impairment in audio systems including multichannel sound systems. In: International Telecommunication Union, Geneva, Switzerland 8. ITU-R Recommendation BS.1387-1 (1998) Method for objective measurements of perceived audio quality (Dec) 9. Jeong MS, Kim S, Sohn J, Kang JY (1996) Finite wordlength effects evaluation of the MPEG-2 audio decoder. In: International Conference on Signal Processing Applications & Technology, pp 351–355 (Jan) 10. Lame Aint an MP3 Encoder (LAME). http://sourceforge.net/projects/lame/ 11. Lee BG (1984) A new algorithm to compute the discrete cosine transform. IEEE Trans Acoust Speech Signal Process ASSP-32(6):1243–1245 12. Lee KH, Lee KS, Hwang TH, Park YC, Youn DH (2001) An architecture and implementation of MPEG Audio Layer III decoder using dual-core DSP. IEEE Trans Consum Electron 47(4): 928–933 (Nov) 13. Lee KS, Oh HO, Park YC, Youn DH (2001) High quality MPEG-audio Layer III algorithm for a 16-bit DSP. In: Proceedings of IEEE International Symposiumon Circuit and Systems, vol II. Sydney, Australia, 6–9 May, pp. 205–208 14. Nuntius Systems. http://www.nuntius.com/solutions31.html#mp3 15. Oh HO, Kim JS, Song CJ, Park YC, Youn DH (2001) Low power MPEG/Audio encoders using simplified psychoacoustics model and fast bit allocation. IEEE Trans Consum Electron 47(3):613–621 (Aug) 16. Rogaway P, Coppersmith D (1994) A software-optimized encryption algorithm. In: Fast Software Encryption, Cambridge Security Workshop Proceedings, vol 809. Springer, Berlin Heidelberg New York, pp. 56–63 17. SpiritDSP. http://www.spiritdsp.com/audio_processing.html 18. Tensilica encoder. http://www.tensilica.com/html/mp3_encoder.html

Multimed Tools Appl (2007) 35:335–355

355

19. Tensilica decoder. http://www.tensilica.com/html/mp3_decoder.html 20. Thorwirth NJ, Horvatic P, Weis R, Zha J (2000) Security methods for MP3 music delivery. In: Proceedings of the 23th Asilomar Conference on IEEE Signals, Systems and Computers, vol 2, pp 1831–1835 21. Torrubia A, Mora F (2002) Perceptual cryptography on MPEG-1 Layer III bit-streams. In: Proceedings of International Conference on IEEE Consumer Electronics, pp 324–325 22. Wang X, DOU W, HOU Z (2002) An improved audio encoding architecture based on 16-bit fixed-point DSP. In: IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, vol 2, pp 918–921 (July) 23. Yu-Shiang Lin (2004) MPEG-1 Layer III audio codec optimization and implementation on a DSP chip. Master thesis, National Chiao-Tung University, Hsinchu, Taiwan

Suggest Documents