perceptual video encryption using multiple 8×8 ... - Semantic Scholar

3 downloads 0 Views 567KB Size Report
H.264 and MPEG-4 standards. 1. INTRODUCTION. Perceptual video encryption is becoming an important application for video communications over the Internet ...
PERCEPTUAL VIDEO ENCRYPTION USING MULTIPLE 8×8 TRANSFORMS IN H.264 AND MPEG-4 Siu-Kei AU YEUNG, Shuyuan ZHU, and Bing ZENG Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology Clearwater Bay, Kowloon, Hong Kong, China {jeffay, eezhshy, eezeng}@ust.hk

ABSTRACT: It has been demonstrated in our earlier works [1, 2] that perceptual video encryption can be effectively achieved by using multiple transforms where the block size 4×4 has been considered. In this paper, we study the extension to the transforms of size 8×8. In this case, a more complex flow-graph structure is resulted, thus leading to a larger room for encryption. In addition, special technique on controlling the encrypted video quality is presented by carefully selecting the number of rotations in the flow-graph structure of an 8×8 transform. The proposed scheme is first evaluated using the high profile of H.264. It is then further tested for the MPEG-4 standard that completely relies on 8×8 transform. Both cases show that promising results can be achieved with our proposed scheme. Keywords - Perceptual video encryption, Multiple 8x8 transforms, H.264 and MPEG-4 standards 1. INTRODUCTION Perceptual video encryption is becoming an important application for video communications over the Internet. Traditional video encryption is usually carried out at the entropy-coding stage or in the bit-stream domain, which however often destroys all visual information after the data is encrypted, i.e., the encrypted visual signal is completely useless (no visual contents are visible) without the key. On the other hand, perceptual video encryption provides a downgraded version of the original video signal, which can be used as a preview for the potential customers before they decide whether they want to subscribe the services or not. Applications such as video-on-demand (VoD) and pay-TV are some good examples that are very likely to require the perceptual encryption. The simplest perceptual video encryption can be achieved in bit-planes in the spatial domain [3] by selectively encrypting some bit-planes based on the requirement of visual degradation. Although the idea is simple, the computational complexity is very high as every pixel value needs to be processed and the necessary processing at the decoder side is also very high. Another approach, known as “transparent scrambling”, is proposed for the MPEG-2 encoder [4]. Here, a different linear transformation is performed on the RGB values. The major advantage of this approach is that a minimum modification is needed during the compression process and a rather large qualitydegradation is resulted when the encryption key is unknown. However, it is found that this algorithm suffers from certain

978-1-4577-0539-7/11/$26.00 ©2011 IEEE

rounding problem. Moreover, a new transformation on the RGB values may decrease the coding efficiency. Recently, Li et. al. found that encryption only on some bitstream portion involving the fixed-length coding (FLC) is often good enough for the perceptual encryption [5]. In the meantime, they also found that it is extremely difficult to control the video quality if some bit-stream portions involving variable-length coding (VLC) are encrypted. In our earlier work [1], we proposed for the first time that the encryption be performed at the transformation stage: select one out of multiple unitary transforms according to the encryption key. Some more efficient transforms and various encryption scenarios are studied in [2] for our proposed encryption framework. The proposed algorithm is evaluated with 4×4 transforms used in the H.264 standard and many promising results have been achieved. In this paper, we extend this framework to transforms of size 8×8. Although 4×4 transforms are easier to implemented, their simple structure limits the room for us to embed the encryption. In the meantime time, other advanced video encoding standards such as MPEG-4 and the high profile of H.264 still rely on 8×8 transforms. Thus, it is important to study how to extend our 4×4 transforms based encryption framework into the 8×8 case. 2. PERCEPTUAL VIDEO ENCRYPTION BASED ON ALTERNATIVE TRANSFORMS In our work, we are particularly interested in encryption algorithms that are embedded into the video encoder; whereas this encoder typically consists of four functional blocks: predication, transformation, quantization, and entropy coding. Embedding the encryption at the entropy-coding stage has been quite popular, e.g., shuffling in the bit-stream domain according to the (secrete) encryption key [6], using multiple Huffman tables [7], etc. However, such an encryption often makes the encrypted bit-stream un-decodable if the key is unknown. Even if it is still decodable, one will find that no visual contents can be visible. This leads to the so-called complete (or total) encryption, which is important in some applications (such as for financial or military purposes). In principle, encryptions can also be embedded into any of the remaining stages. However, any change at the prediction or quantization stage is likely to produce an uncontrollable degradation on the coding performance. As a result, we believe that the only possible stage in the video encoder that is left to us to introduce encryption is the transformation. Traditionally, DCT is always chosen at the transformation stage. This is because that DCT proves to be the best transform when a strong inter-pixel

2436

ICASSP 2011



džϬ džϭ 

ĐŽƐπͬϰ ĐŽƐπͬϰ

džϮ

ĐŽƐπͬϰ

džϯ 

ĐŽƐπͬϰ

džϰ džϱ džϲ džϳ

−ĐŽƐπͬϰ −ĐŽƐπͬϰ −ĐŽƐπͬϰ −ĐŽƐπͬϰ

ĐŽƐπͬϰ

ƐŝŶπͬϰ

ĐŽƐπͬϰ

ƐŝŶπͬϰ ƐŝŶπͬϰ

−ĐŽƐπͬϰ

ƐŝŶπͬϰ

−ĐŽƐπͬϰ

LJϭ

ƐŝŶπͬϰ ƐŝŶπͬϰ

LJϮ

ƐŝŶπͬϰ

LJϯ

ƐŝŶπͬϰ

ĐŽƐπͬϰ

ƐŝŶπͬϰ

−ĐŽƐπͬϰ

ƐŝŶπͬϰ

ĐŽƐπͬϰ

ƐŝŶπͬϰ

−ĐŽƐπͬϰ

ƐŝŶπͬϰ

−ĐŽƐπͬϰ

ƐŝŶπͬϰ

ĐŽƐπͬϰ

^ƚĂŐĞϭ

ĐŽƐθϭ

LJϬ

ƐŝŶπͬϰ

^ƚĂŐĞϯ

^ƚĂŐĞϮ

ƐŝŶπͬϰ ƐŝŶπͬϰ

ƐŝŶπͬϰ ƐŝŶπͬϰ

−ĐŽƐθϭ ĐŽƐθϮ ĐŽƐθϮ

LJϰ

ĐŽƐθϯ

LJϱ

ĐŽƐθϰ

LJϲ

ĐŽƐθϰ

LJϳ

ĐŽƐθϯ

ƐŝŶθϭ

yϬ

ƐŝŶθϭ

yϮ 

ƐŝŶθϮ

yϰ 

−ƐŝŶθϮ

yϲ 

ƐŝŶθϯ

yϭ

ƐŝŶθϰ

yϱ

−ƐŝŶθϰ −ƐŝŶθϯ

yϯ yϳ

^ƚĂŐĞϰ

Fig. 2. Flow graph of the 8-point (1-D) DCT



džϬ džϭ džϮ  džϯ

ĐŽƐπͬϰ ĐŽƐπͬϰ −ĐŽƐπͬϰ −ĐŽƐπͬϰ ^ƚĂŐĞϭ

ƐŝŶπͬϰ

ĐŽƐθϭ

ƐŝŶπͬϰ

−ĐŽƐθϭ

ƐŝŶπͬϰ

ĐŽƐθϮ

ƐŝŶπͬϰ

ĐŽƐθϮ

ƐŝŶθϭ

yϬ

ƐŝŶθϭ y  Ϯ ƐŝŶθϮ −ƐŝŶθϮ

yϭ yϯ

^ƚĂŐĞϮ

Fig. 1: Flow graph of the 4-point (1-D) DCT: ș1 = π/4, ș2 = 3π/8.

correlation is detected - which is indeed true in most natural pictures. Nevertheless, DCT used in any H.264-based and MPEG4-based encoder is always applied on some predicted residual signals (intra-prediction in I-frames or motion-compensation in Pframes). This gives us a strong motivation to perform the transformation by selecting one from multiple transforms according to a secret key so as to achieve the encryption. In our earlier work [1], we proposed to encrypt the video signal by selecting different unitary transforms during the encoding processing. All new transforms used there were derived from the DCT’s flow-graph structure (see Fig. 1 for the 4-point case) by selecting a different set of rotation angles (ș1, ș2). The major advantage of this algorithm is that the scrambling process is no longer needed for both encoder and decoder, which reduces the computational complexity greatly, since the transform is a muststep no matter encryption is performed or not. We found in another study [2] that transforms with coding efficiency exactly the same to that of DCT can be obtained at 4 pairs of extra rotation angles (δ1, δ2) = (0,0), (ʌ,0), (0,ʌ), and (ʌ,ʌ) that will be added into DCT’s rotation angles (θ1, θ2) = (π/4, 3π/8). Consequently, we obtain 4 transforms, while (δ1, δ2) = (0,0) corresponds to DCT itself; and the encryption can be achieved by choosing one out of these 4 transforms according to the encryption key for each row and column within each 4×4 block. 3. PERCEPTUAL VIDEO ENCRYPTION BASED ON 8×8 TRANSFORMS The design principle of new transforms of size 8×8 is similar to the 4×4 case. Thus, we start from the flow-graph of the 8-point DCT, as shown in Fig. 2, and then identify all rotation angles at Stage-4 (the last stage). It can be derived that 4 angles involved there are: ș1 = ʌ/4, ș2 = 3ʌ/8, ș3 = 7ʌ/16, and ș4 = 3ʌ/16. Next, we introduce an extra angle δi = 0 or π (i = 1, 2, 3, or 4) to each of them. Notice that we will end up with the original DCT if δi = 0 for all i.

2437

Apparently, the advantage of using 8×8 transforms is the greatly increased number of possible combinations of different transforms eventually employed in each 8×8 block. This increase is attributed to two factors: (1) more transforms can be generated easily from a space consisting of 4 rotation angles and (2) even under the condition of using the same number of transforms as in the 4×4 case (e.g., 4 transforms totally), selecting one transform randomly for each one out of 8 rows (instead of 4 rows in the 4×4 case) expands the combinational number greatly. In the end, this means that our encryption system would have a much higher difficulty for attackers to break. We first evaluate the system with two approaches named Algorithm-1 and Algorithm-2. In Algorithm-1, we select one transform (out of 16) for all rows in the first dimension (horizontal direction). Then, we select one transform (out of 16) for each column in the second dimension (vertical direction). In Algorithm2, we relax the constraint set in the first dimension by selecting one transform also for each row. Figure 3 shows some experimental results of the proposed encryption algorithms (employing the 8×8 transform and following the H.264 standard). It is obvious that both algorithms provide a pretty good encryption when the key is missing. When we decode signal with the secret key, we found that Algorithm-1 performs exactly the same as the original H.264. Since adding an extra rotation angle π into the DCT’s angles only causes a sign-flip on a pair of node-variables, it is easy to understand that Algorithm-1 produces a set of coefficients that is a sign-flipped version of the original DCT coefficients. Because of this and considering the fact that only a few coefficients remain to be non-zero after a typical quantization, we believe that Algorithm-1 does not provide enough protection - attackers can concentrate on guessing the sign of a few non-zero coefficients. In Algorithm-2, since one transform is selected for each row in the first dimension, transform coefficients generated in the end will be different from any sign-flipped version of the original DCT coefficients. This greatly enhances the encryption power. However, as we observed from the R-D curve shown in Fig. 3, Algorithm-2 suffers from a small quality drop (0.3 - 0.5 dB). It is because that the use of different transforms in first dimension has generated mis-alignment in the second dimension, which may create relatively larger high-frequency coefficients. To get a better balance between the encryption power and coding efficiency, we can modify Algorithm-1 by allowing 4 more sign-flips: 2 in the top half of Stage-2 and 2 in the bottom half of Stage-3 in Fig. 2, when we perform the transform in the second dimension. We name this scheme as Algorithm-3. This new algorithm also provides a different set of transform coefficients

Mobile

Foreman 55

Stefan

55

PSNR

50

55

PSNR

50

45

45

40

PSNR

50 45

40

40 47

44

44

35

35

35

30

45

30

42 41

25

25

20 0.8

0.9

1

1.1

BPP

15

0.5

1

1.5

2

2.5

3

3.5

: Decrypt with the key (Algorithm-1) : Decrypt w/o the key (Algorithm-1)

5

42

20

39

41

BPP

38 1.8

2

2.2

1.7

15

10

0

43

40

10 5

44

41

25

39

15

30

42

40

20

46

43

43

1.8

1.9

2

2.1

2.2

BPP

10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

: Decrypt with the key (Algorithm-2) : Decrypt w/o the key (Algorithm-2)

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

: Decrypt with the key (Algorithm-3) : Decrypt w/o the key (Algorithm-3)

Fig. 3. R-D performance for the proposed perceptual encryption system for 8x8 transform (H.264)

compared with DCT. At the same time, it reduces the quality drop slightly, as can be seen from Fig. 3. Some video clips can be found at http://ihome.ust.hk/~jeffay/ICASSP2011/ for a visual (subjective) evaluation. In terms of the encryption capability, Algorithms-1, 2, and 3 proposed for the 8×8 block size can provide a search space of 240, 264, and 272, respectively. This space is much larger than the 4×4 case as reported in [1, 2]. 4. QUALITY CONTROL OF PERCEPTUAL VIDEO ENCRYPTION BASED ON 8×8 TRANSFORMS For any perceptual video encryption, video is encrypted such that its quality is visually poor if the encryption key is unknown to the client. In practice, different applications have different quality requirements. It is thus important to allow the service provider an option to control how bad the encrypted video would be when the key is unknown. This quality control issue will be discussed below. A typical way to control the quality of an encrypted video is to control the amount of encrypted data. For example, in our proposed system, one can decide to perform the alternative transform every n frames so that n becomes the control parameter. It is clear that the quality will increase as n increases. This method basically makes use of the motion compensation mechanism such that if the (n −1)th frame is decoded without the key, the motion compensation will often be incorrect – leading to a poor quality on the nth frame even it remains unencrypted. However, if n is pretty small, say 2-3, receiver can perform the decoding by ignoring the encrypted frame and reconstructed the video from the un-encrypted frames. Although there will still be drifting occur, the video quality will be much improved. We have studied the quality control problem for the 4×4 case. There, since only 2 rotation angles can be changed, we do not have a big room to control the quality. Now, we have more rotation angles to adjust for the 8×8 case so that the quality control can be done over a much wider range. For instance, the quality control can be achieved by allowing some rotation angles to change while fixing the others, i.e., instead of letting all și change in Fig. 2, we only select some of them to change (by adding an extra angle π according to the encryption key). We assume that the standard IDCT is always used when the encryption key is unknown at the decoder side. sequence Angle rotate None (a) ș4 (b) ș3

Foreman

Mobile

Stefan

40.56 18.33 13.17

39.09 13.21 11.56

40.33 14.04 11.12

2438

ș3 ș4 ș2 ș2 ș4 ș2 ș3 (c) ș2 ș3 ș4 ș1 ș1 ș4 ș1 ș3 ș1 ș3 ș4 ș1 ș2 ș1ș2 ș4 ș1ș2 ș3 ș1 ș2 ș3 ș4 (d)

11.57 15.57 13.71 11.99 12.21 11.18 10.03 10.87 10.75 9.00 10.63 9.38 8.61

10.11 12.47 11.34 10.02 9.96 9.53 9.51 9.62 9.14 9.27 9.06 8.44 9.49

12.51 13.82 11.22 9.88 9.45 9.56 9.22 10.62 10.06 10.48 10.63 10.50 10.06

Table 1. PSNR performances of various selections of și for encryption.

Table 1 presents the PSNR performances of various selections of și for encryption. Clearly, one can observe a significant quality loss when any of these four rotation angles have been encrypted (i.e., an extra angle π is added according to the encryption key). Among four angles, it is found that ș1 has the biggest influence. This is because that ș1 controls the DC component of each block. Figure 5 shows the visual results of four representative selections in Table 1. Although a significant PSNR drop has been resulted (see Table 1), we have quite different visual experiences, e.g., pictures in Case (b) are visually much more pleasant than those in Cases (c) and (d). From these results, we suggest that one can choose to encrypt ș1 or not so as to as achieve a high or low encryption capability; whereas three other angles can be used for a fine adjustment. In addition, we can introduce a different adjustment to și (instead of ʌ) such that the overall system can operate over an even larger range of encryption strength. 5. PERCEPTUAL VIDEO ENCRYPTION FOR MPEG-4 We have been so far focusing on the H.264-based case. This is because that the prediction has been involved in both I-frames and P-frames in H.264 so that using a transform other than DCT on the residual signal usually will not decrease the coding efficiency. Now, if we apply the same encryption mechanism to another advanced video coding standard MPEG-4, we need to be careful because the intra-predication in MPEG-4 is done in the transform domain. To avoid any complicated processing (which is actually one of on-going research issues), we simply assume that all Iframes remain un-encrypted. All experiments are based on the Xvid ver. 1.2.2 as the MPEG-4 encoder, and three encryption algorithms presented in Section 3 have been tested. The R-D results are shown in Fig. 6. Compared with the H.264 case, one

Foreman

(a)

(c)

Mobile

(a)

(b)

(b)

(c)

(d)

Stefan

(a)

(d)

(b)

(c)

(d)

Fig. 5. Snapshot of proposed encryption system with different rotation angle ș change (Please match (a), (b), (c) and (d) Table 1) Mobile

Foreman 50

45

50

PSNR

PSNR

45

45

40

40

40

40

38

35 35

42 40

30

36

38

30

35

37

39

30

38

25

35

25

37

25 36

0.4

0.5

0.6

34

0.7

20

20 1.2

20

15

Stefan

50

PSNR

1.4

1.6

1.8

15

BPP 0

0.5

1

1.5

*

2

2.5

3

10

0

1

2

3

: Decrypt with the key (Algoritm-1) : Decrypt with the key (Algoritm-2) : Decrypt with the key (Algoritm-3)

4

BPP 5

BPP

34 0.8

1

1.2

1.4

1.6

15

6

10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

: Decrypt without the key (Algorithm-1) : Decrypt without the key (Algorithm-2) : Decrypt without the key (Algorithm-3)

Fig. 6. R-D performance for the proposed perceptual encryption system for 8x8 transform (MPEG-4)

can observe that a (much) higher PSNR has been achieved when the encryption key is unknown (roughly jumped to 15-20 dB from about 10 dB). Clearly, the major reason is due to the fact that each I-frame is un-encrypted so that it will influence positively all subsequent frames (via motion-compensation). On the other hand, when we observe visually the decoded videos without knowing the key (http://ihome.ust.hk/~jeffay/ICASSP2011/), we believe that the visual quality loss is still big enough for the encryption purpose. 6. CONCLUSDING REMARKS In the paper, we extended our perceptual video encryption framework that was developed for the 4×4 block size to transforms of size 8×8, and implemented it for both H.264 and MPEG-4 standards. Experimental results showed that it performs equally well as in the 4×4 case. In addition, we proposed a simple way - by selecting some of the rotation angles in the last stage of Fig. 2 - to control the quality of an encrypted video. We explained that it is more effective to do the quality control in the 8×8 case because more rotation angles are involved at the last stage of its flowgraph. The issues that will be considered in our follow-up works include the following: • As mentioned before, most of the existing video encryption algorithms are put in use after the transform stage; whereas our proposed one is done at the transform stage. It would therefore be very interesting to see whether and how our alternative transform based algorithm can be integrated into the existing ones so as to provide more powerful encryptions. • Integer-based transform is commonly used in H.264 and MPEG-4. We have studied this problem for the 4×4 case in

2439

H.264 [6]. It will be an important task for us to consider this problem for the 8×8 case for both H.264 and MPEG-4. Acknowledgement: This work has been supported partly by an RGC research grant from the HKSAR government. REFERENCES [1] S. K. Au Yeung, S. Zhu, and B. Zeng, “Partial video encryption based on alternative transform,” IEEE Signal Processing Letter, vol. 16, no. 10, pp. 893-896, 2009. [2] S. K. Au-Yeung, S. Zhu, and B. Zeng, “Design of new unitary transform for perceptual video encryption,” accepted to IEEE Trans. on Circuit and Systems for Video Technology [3] S. Lian, Multimedia Content Encryption: Techniques and Applications, Boca Raton: CRC Press, 2009. [4] M. Pazarci and V. Dipcin, “A MPEG-2-transparent scrambling technique,” IEEE Trans. on Consumer Electronics, vol. 48, no. 2, pp. 345-355, 2002. [5] S. Li, G. Chen, A. Cheung, B. Bhargava, and K. Lo, “On the design of perceptual MPEG-video encryption algorithms”, IEEE Trans. on Circuit and Systems for Video Technology, vol. 17, no. 2, pp. 214-223, 2007. [6] L. Qiao and K. Nahrstedt, “Comparison of MPEG encryption algorithm,” Int. Journal on Computer and Graphic, vol. 22, no. 4, pp. 437-448, 1998 [7] C. Wu and C. Kuo, “Design of integrated multimedia compression and encryption systems,” IEEE Trans. on Multimedia, vol. 7, no. 5, pp. 828-839, 2005

[8] S. K. Au-Yeung, S. Zhu, and B. Zeng, “Partial video encryption based on alternative integer transforms”, in IEEE Int. Symp. on Circuits and Systems, Paris, France, May 30 – June 2, 2010

Suggest Documents