On Unified Intra/Inter Coding and Signature/Hash Authentication ...

3 downloads 16343 Views 212KB Size Report
Mar 8, 2010 - coding versatility and signature/hash authentication diversity to provide video ... tions, Stream Authentication, Signature-Hash Diversity.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

On Unified Intra/Inter Coding and Signature/Hash Authentication Diversity for Efficient and Secure Wireless Video Transmission Wei Wang, Member, IEEE, Michael Hempel, Member, IEEE, Dongming Peng, Member, IEEE, Hamid † Sharif, Senior Member, IEEE, Honggang Wang, Member, IEEE, and Hsiao-Hwa Chen , Fellow, IEEE

Abstract—Joint exploration of intra/inter-video coding versatility in signal processing domain and signature/hash diversity in the information security domain has not been well investigated in the literature. In this paper, we show multimedia stream authentication quality can be improved significantly by exploring these diversities. Specifically, we propose a new rate distortion framework by considering intra/inter-signature/hash diversity to simultaneously provide video service quality, communication rate efficiency and video content integrity for wireless video streaming. This research aims to provide a unique fusion of inter/intra video coding versatility and signature/hash authentication diversity to provide video authentication and traffic control. The results based on simulations demonstrate the effectiveness of the proposed scheme in achieving video robust authentication quality with limited rate constraint. Index Terms—Wireless Networks, Multimedia Communications, Stream Authentication, Signature-Hash Diversity

I. I NTRODUCTION The proliferation of wireless computer networks and security sensitive multimedia applications has accelerated the recent development of secure wireless multimedia streaming and enabled new application domains such as content sensitive wireless business video conferencing, personal finance using PDA or 3G/4G-based mobile devices, wireless robotic video surveillance networks, and ubiquitous healthcare using wireless sensors. In such applications, the service quality of video streams, the authenticity and non-repudiation guarantee, and limited wireless network capacity and channel bandwidth are all critical constraints. The research in [1] proposed an intra/inter mode switching approach to reduce video distortion in packet loss networks. An algorithm was developed in that approach to estimate the overall distortion expectation at the receiver, and Wei Wang (e-mail: [email protected]) is with the South Dakota State University, USA. Michael Hempel (e-mail: [email protected]), Dongming Peng (e-mail: [email protected]), and Hamid Sharif (e-mail: [email protected]) are with the University of Nebraska-Lincoln, USA. Honggang Wang (email: [email protected]) is with the University of Massachusetts Dartmouth, USA. Hsiao-Hwa Chen† (the corresponding author) (e-mail: [email protected]) is with the Department of Engineering Science, National Cheng Kung University, Tainan City, Taiwan. This research was supported in part by U.S. National Science Foundation grant No. 0707944 for wireless sensor networks research, Taiwan National Science Council research grants No. 97-2219-E-006-004, faculty start-up support of University of Massachusetts - Dartmouth in ECE, and faculty startup support of South Dakota State University in EECS. The paper was submitted on March 8, 2010.

decoder distortion was computed recursively at pixel level precision. The authors of [2] proposed another due-frame buffer-based approach to improve video compression error robustness over lossy packet networks. The frame buffers were designed in coupling with intra-mode, inter-short-term mode and inter-long-term mode switching algorithms, and the latency-distortion performance was significantly improved. The research in [3] presented an efficient intra/inter mode selection model for robust video coding. Given an anticipated packet loss rate, the model was designed at the macro block level to stop error propagation in consecutive video frames by controlling the inter-intra prediction preference of each frame. Related works regarding robust video coding based on intra/inter mode switching can be found in [4]-[6]. On the other hand, stream level authentication has been proposed recently to provide strong and mathematically verifiable content security for multimedia over networks impeded by packet loss [7] [8]. The research in [8] proposed a contentaware stream authentication approach for robust JPEG2000 image verification. In that approach, more authentication redundancies in terms of crypto-hash tags were allocated to important image packets and the distortion was minimized at the receiver. The research in [9] proposed a new definition of authenticated video as the video decoded from those packets whose authenticities have been verified. The rate-distortion optimization problem was reformulated as a rate-distortionauthentication optimization problem based on this new definition, where the source coding rate overhead as well as authentication bit rate overhead was considered jointly. Other related works regarding multimedia stream authentication can be found in [7] and [10]. In our previous works [11] [13], we proposed an energy-distortion-authentication framework to provide efficient stream authentication in wireless sensor networks. The importance of matching authentication graph design to multimedia decoding dependency was studied in [12], showing the criticality of joint authentication and codec design. However, all aforementioned works are either focused on pure rate-distortion optimization over error-prone networks without considering security issues, or on multimedia authentication without studying intra/inter video coding mode diversity. Due to multidisciplinary challenges, the tradeoffs between source coding rate, stream authentication bit rate, and communication protocol overhead have rarely been studied in the literature. In addition, the joint consideration of intra/inter

978-1-4244-5637-6/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

video coding versatility for communication error resilience and signature-hash diversity for authentication robustness has largely been ignored by the research community. In contrast to the above approaches, we propose a new scheme to improve wireless video authentication quality by jointly designing intra/inter video coding and signature-hash mode selection. The contribution of this research is the unique fusion of the intra/inter-mode selection in the video coding domain with the signature-hash mode selection in the authentication domain, which demonstrates great performance to achieve efficient, authenticable and robust wireless video transmissions. The rest of the paper is outlined as follows. In Section II, we identify the quality authentication problem for wireless video transmissions. In Section III we present the analysis of our proposed scheme which fuses video coding and authentication. In Section IV, we propose the design simplification of the presented scheme. We show simulation results and performance evaluation in Section V. Finally, we present our conclusion in Section VI.

channel. These three factors thus describe a tradeoff between video authentication quality and bit rate limitations: • A large source coding rate improves the video quality and reduces video compression distortion; • A large authentication bit rate increases the robustness of video verification in lossy wireless channels and reduces authentication distortion; • A large network protocol overhead (e.g., error correction coding and packet retransmission) may increase video transmission robustness and reduce video communication distortion. However, all three factors also contribute a significant amount of bit rate overhead, which consumes more bandwidth in resource-constrained wireless networks. In this paper we focus on the source coding rate control and authentication bit rate control by jointly adjusting the intra-inter video coding mode selection and signature-hash authentication mode selection. The network protocol overhead control, which can be formulated as a resource allocation problem, will be studied in our future research. III. J OINT V IDEO C ODING -AUTHENTICATION A NALYSIS

Intra-Inter Coding Versatility

The goal of this cross-layer optimization is to maximize the utility U [D], i.e., the expected video authentication quality. In this paper the video authentication quality utility is defined, in a way similar to [11], as the distortion reduction summation of each individual video packet which can be decoded and verified at the receiver, and is

Signature-Hash Authentication Diversity

Source Coding Rate

Authentication Bits

User 0 Bandwidth

User 1 Bandwidth

Protocol Overhead

……

User N-1 Bandwidth

U [D] =

N −1 

Di (1 − ρi )

i=0

Laptop AP

Fig. 1.

PDA

Resource-constrained authentication-distortion optimality control.

II. Q UALITY AUTHENTICATION P ROBLEM F ORMULATION The overall problem can be formulated in a cross-layer fashion to approach optimal video quality with integrity and energy efficiency guarantees. The goal of the proposed scheme is to jointly optimize three parameter categories in order to achieve the best effort video authentication quality: video coding, stream authentication, and network resource allocation. The cost of joint video coding and authentication is defined in Equation 1. The details of video authentication quality will be presented in Section III. M inimize {−U [D] + λ (R − Rmax )}

(1)

In this cross-layer problem, the source of bit rate overhead is directly related to three factors illustrated in Figure 1: source coding rate, extra authentication bit rate, and communication protocol overheads. In multimedia wireless networks or wireless multimedia sensor networks, the bandwidth resource allocated to each user is limited and could change from time to time due to scheduling algorithms and time-varying wireless

 j∈di

(1 − ρj )



(1 − ρk )

(2)

k∈vi

where N denotes the total number of video packets, di denotes the set of the decoding predecessors of the ith video packet, and vi denotes the set of verification or authentication predecessors of the ith video packet excluding those packets in di . The communication bit rate overhead R is defined as the summation of all component bit overheads of each video packet for source coding, authentication and network protocol, shown as follows:  |Lsrc i + Laut i + Lpro | (3) R= i∈[0,N −1]

There exists a mutual tradeoff between the source coding bit, authentication bit and protocol overhead. Since the total bit rate allocated to each user is limited, increasing any one of these three bit rates will result in a corresponding reduction in the bit rates of the other two. The packet error rate ρ can be easily estimated given the channel bit error rate e and the component bit rate overheads. Lsrc i +Laut i +Lpro

ρ = 1 − (1 − e)

(4)

Assume we have N pictures in the video sequence, and each picture has an option of selecting the video coding mode θi |θ = {0, 1} and authentication mode ∂i |∂ = {0, 1}. The goal of the proposed scheme can be simplified as finding the θi and ∂i for each picture that minimizes the total cost as defined in Equation 1, i.e., [θ, ∂] = arg min {−U [D] + λ (R − Rmax )}.

978-1-4244-5637-6/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

For each video picture, selection of video coding mode and stream authentication mode results in different rate-distortion performance and error resilience. Let LI and LP denote the source coding rate of picture i using the inter-prediction and intra-prediction modes, respectively. Let LS and LH denote the stream authentication rate of picture i using the signature and hash modes, respectively. It is clear that the mode selection vector of {θi , ∂i } is directly related to both U [D] and R, and thus is a factor influencing the total joint rate-distortion cost of −U [D] + λ (R − Rmax ).

rate component overheads of source coding Lsrc and authentication Laut can be calculated according to the simplified mode selection vector {θi , ∂i } and the incoming redundancy degree |π| denoting the incoming authentication crypto-hash links. Specifically,

IV. D ESIGN S IMPLIFICATION As analyzed in the previous section, there are two possible values for video coding and authentication mode selection, respectively, for each video packet, leading to a combination of four states. Since there are, as per our assumption, a total of N packets and each packet has four states, solving such a global optimization problem   is computationally intensive with a complexity of O 4N , which is infeasible for low power mobile wireless devices and real-time video streaming. We propose to simplify the mode selection vector {θi , ∂i } to reduce the total cost of −U [D] + λ (R − Rmax ) in the following steps:

It is clear that, given the simplified mode selection vector {θi , ∂i } |i ∈ [1, N ], we can fully determine the distortion reduction utility and bit rate overhead accordingly. To further simplify the mode selection process for wireless video using practical low cost mobile devices such as PDAs and sensors, a mode shuffling step size ψ|ψ ∈ [1, N ] is introduced for reducing the number of independent variables: ∃ψ|ψ ∈ [1, N ] for a given λ, s.t. ∀ψ0 |ψ0 ∈ [1, N ] |ψ0 = ψ, we have

A. Unification of Video Coding and Stream Authentication Dependency Matching In secure wireless video systems, a packet must be successfully decoded and verified before contributing to the total video quality. Packets that are non-decodable or nonverifiable cannot contribute to the total video distortion reduction. The I-frames in a video stream can be decoded independently, whereas decoding of P-frames depends on the decoding of the preceding I-frame. On the other hand, the verification of signature packets is independent of any other packet in the video stream, while the verification of crypto-hash packets depends on the successful verification of the preceding signature packet. Thus it is straightforward to match stream authentication to video coding dependency, by assigning authentication signatures to I-frames and cryptohash tags to P-frames. The joint coding-authentication design can be simplified with regard to the following constraint: for ∀i, if θi = ∂i . This is because signatures in the authentication domain naturally match those I frames in the video coding domain for authentication and decoding independency. Hash packets rely on the verification of signature in a way similar to those decoding dependent P frames.   So the total computational complexity is reduced to O 2N because now there is only one variable that needs to be determined. The utility of authenticated video quality can be simplified as shown in Equation (5) below. In the next subsection, we will discuss how to further simplify such computational complexity using optimal step size identification. U [D] =

N −1  i=0

Di

i 

(1 − e)

Lsrc j +Laut j +Lpro

(5)

j=k

where k denotes the index of the packet with the first preceding bit ”1” in the vector {θi , ∂i } before picture index i. The bit

Lsrc = LI × θ + LP × (1 − θ) |θ = 0, 1 Laut = LS × ∂ + LH × (1 − ∂) + LH × |π||∂ = 0, 1

(6) (7)

B. Determination of the Mode Selection Step Size

−U [D] (ψ) + λ (R (ψ) − Rmax ) ≤ −U [D] (ψ0 ) + λ (R (ψ0 ) − Rmax )

(8)

Given the step size ψ and the number of video pictures N , the mode of video coding and authentication for each video packet can be explicitly determined. Typically, the starting packet in the video sequence is coded using intra-prediction mode, and thus the signature mode is also selected for the starting packet according to the proposed coding-authentication matching. Then we have θi = 1 and ∂i = 1 for i = 0. Let mod (i, ψ) denote the modulus of i over ψ. For other packets in the video sequence with index 1 ≤ i ≤ N − 1, we have θi = 1 and ∂i = 1 for the packets with mod (i, ψ) = 0, and θi = 0 and ∂i = 0 for packets with mod (i, ψ) = 0. Similar to the mode selection vector, the step size is directly related to both the authentication quality utility and bit rate overhead. A larger step size leads to fewer I-frames and signature packets in the video sequence, resulting in lower bit rate overhead, lower error robustness and lower utility value of video authentication, and vice versa. The minimum step size ψ can be determined by letting R approach the bit rate constraint Rmax , which in turn maximizes the video authentication utility, as shown in Algorithm 1. V. S IMULATION In this section we perform our simulation study to evaluate the video authentication quality gain of the proposed scheme compared with traditional approaches. Traditional approaches use both a fixed authentication bit count and fixed authentication step size. In the proposed approach, the authentication step size is adaptive to the communication bit rate constraint. The simulation parameters are listed as follows. At the source coding side, an H.264 codec is selected as the compressor and the standard ”akiyo” video sequence is selected as the source input with a configuration of 358×288 pixels for frame width and height. The video coding frame rate is 10 fps. At the communication side, the default channel bit error rate is 1e-5, and the default total communication bit budget is 24000 bits. The network transmission protocol overhead is 48 bits. The

978-1-4244-5637-6/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

Algorithm 1 : Determination of the optimal authentication step size. 1: Read inputs: N , e, LS , LH , Lpro . Read LI and LP of each video packet. Initialize the step size ψopt = 1 and output mode vectors θi = 0, ∂i = 0|i ∈ [1, N ]. 2: For ψ = 1 to N/2 do the following steps 3 - 4 to approximate the optimal step size ψopt . 3: Calculate the video coding and authentication mode vectors according to the given step size. For j = 1 to N do If (mod (j − 1, ψ) == 0) then θj = 1; ∂i = 1; End if; End for; 4: Calculate and evaluate the bit rate overhead R according to Equations (3), (6) and (7). If (R > Ropt and R < Rmax ) then record the optimal value of the step size: Ropt = R; ψopt = ψ; End if; 5: Output ψopt and {θi , ∂i } |i ∈ [1, N ] accordingly.

authentication signature size is 1024 bits and the crypto-hash tag length is 160 bits. The baseline performance evaluation of the proposed scheme is demonstrated in Figure 2 and Figure 3. In Figure 2, the output of the proposed authentication step size adaptation is illustrated, given different total bit rate constraints. It is clear that a large step size is selected by the algorithm to meet low bit rate constraints, while a small step size is applied when the bit rate budget is high. Figure 3 shows the actual bit length of each video packet in the video sequence, with the step size varying from 1 to 5 for a video sequence with total N = 10 pictures. When the step size is small, the video coding and authentication chain is short, leading to better error robustness in an error prone wireless channel. In the extreme case of ψ = 1, the video coding of each packet uses an Iframe mode and the authentication of each packet uses an individual signature. Thus the decoding and verification of each video packet is independent of any other packet in the video sequence. The corresponding cost is the increased bit rate overhead, which results from two major sources: 1) video coding using an I-frame mode relying on robust intra-mode prediction, and 2) packet authentication using signatures rather than cryptographic hashes. When the step size is large, e.g., when ψ = 5 in the extreme case, the communication date rate overhead is significantly reduced. The reduction of bit rate overhead has two major sources: 1) video coding in a P-frame mode using efficient inter-prediction, and 2) packet verification using small cryptographic hashes rather than large signatures. Of course, such reduction in communication bit rate results in high decoding and verification risk in error-prone wireless channels. It is also worth noting that the bit length of each P-frame is content-dependent, which is determined by the motion and residue difference from the previous pictures. For example, the lengths of the 2-nd and 4-th packets in sub-figure 2 with ψ = 2 are quite different. This is because there is a significant amount of motion between picture 1 and picture 2, while there is very little motion between picture 3 and picture 4. In general, however, the bit length of video packets using a P-frame coding mode and a cryptographic hash authentication

mode is much less than of those using an I-frame coding mode and signature authentication.

Fig. 2.

Adaptive step size with different bit rate constraints.

Fig. 3.

Video packet bit rate overhead with variable step sizes.

We performed a comprehensive simulation study to evaluate the video authentication quality gain and bit rate overhead of the proposed approach and compared our results against traditional approaches, as illustrated in Figure 4. The bit error rate is 1e-5 in this simulation. From the results we can see that traditional approaches using a fixed step size achieve constant video authentication quality because the video stream structure is not changed with regard to bit rate budget constraints. The video authentication quality of the proposed approach is changing within the range for minimum and maximum quality, as a result of adapting to the bit rate budget. In contrast, although the minimum-step-size scheme achieves the highest video authentication quality, the actual bit rate far exceeds the bit rate. Conversely, the minimum-step-size scheme achieves an acceptable bit rate overhead while its video authentication quality is the lowest of these three schemes. The video authentication quality under various wireless channel conditions is shown in Figure 5. The video authentication quality performance of the minimum-step-size scheme is not shown in this figure since the corresponding bit rate exceeds the bit rate constraint. It is clear from this figure that

978-1-4244-5637-6/10/$26.00 ©2010 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.

tradeoffs of video coding and stream authentication design unification. The intra/inter video coding diversity and signaturehash authentication versatility were studied jointly. A design simplification was proposed by identifying the optimal authentication step size. Simulation results demonstrated that the proposed scheme achieves a considerable video quality gain, especially in noisy wireless channel environments. R EFERENCES

Fig. 4. Video authentication quality and actual bit rate overhead with bit error rate of 1e-5.

the video authentication quality gain of the proposed adaptive step size scheme is much higher than with a fixed step size scheme, especially in bad channel situations with high bit error rates. This is because the proposed scheme can adapt the length of video coding and stream authentication chains to the bit rate budget, significantly increasing the video coding and authentication error robustness. Thus, the video coding and authentication chain length is shorter than the maximum step size scheme, and the P-frames with cryptographic hash tags have a higher probability of being decoded and verified, leading to higher video authentication quality.

[1] R. Zhang, S. Regunathan, K. Rose, ”Video coding with optimal inter/intra-mode switching for packet loss resilience,” IEEE J. Sel. Areas. Commun., vol. 18, no. 6, Jun. 2000. [2] A. Leontaris, P.-C. Cosman, ”Video compression for lossy packet networks with mode switching and a dual-frame buffer,” IEEE Trans. Image Process., vol. 13, no. 7, pp.885 - 897, Jul. 2004. [3] S.-K. Im, A.-J. Pearmain, ”Pyramid model for macroblock inter/intra mode selection in robust video coding,” IEEE Electronics Letters, vol. 43, no. 11, pp.618 - 620, May. 2007. [4] C. Kim, C. Kuo, ”Feature-based intra-/inter coding mode selection for H.264/AVC,” IEEE Trans. Circuits and Syst. Video Technol., vol. 17, no. 4, pp.441 - 453, Apr. 2007. [5] Z. He, J. Cai, C.-W. Chen, ”Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding,” IEEE Trans. Circuits and Syst. Video Technol., vol. 12, no. 6, pp.511 523, Jun. 2002. [6] P.-J. Lee, Yi. Shih, ”Fast inter-frame coding with intra skip strategy in H.264 video coding,” IEEE Trans. Consumer Electronics, vol. 55, no. 1, pp.158 - 164, Feb. 2009. [7] Q. Sun, J. Apostolopoulos, C.-W. Chen, S.-F. Chang, ”Quality-optimized and secure end-to-end authentication for media delivery,” IEEE Proceedings, vol. 96, no. 1, pp.97-110, Jan. 2008. [8] Z. Zhang, Q. Sun, W. Wong, J. Apostolopoulos, S. Wee, ”An optimized content-aware authentication scheme for streaming JPEG-2000 images over lossy networks,” IEEE Trans. Multimedia, vol.9, no.2, pp.320-331, Feb. 2007. [9] Z. Zhang, Q. Sun, W. Wong, J. Apostolopoulos, S. Wee, ”Rate-distortionauthentication optimized streaming of authenticated video,” IEEE Trans. Circuits Syst. Video Technol., vol.17, no.5, pp.544-557, May. 2007. [10] Z. Li, Q. Sun, Y. Lian, C.-W. Chen, ”Joint source-channel-authentication resource allocation and unequal authenticity protection for multimedia over wireless networks,” IEEE Trans. Multimedia, vol.9, no.4, pp.837850, Jun. 2007. [11] W. Wang, D. Peng, H. Wang, H. Sharif, H.-H. Chen, ”Energy-distortionauthentication optimized resource allocation for secure wireless image streaming,” in Proc. IEEE WCNC, pp. 2810-2815, Apr. 2008. [12] W. Wang, D. Peng, H. Wang, H. Sharif, H.-H. Chen, ”Matching stream authentication and resource allocation to multimedia codec dependency with position-value partitioning in wireless multimedia sensor networks,” in Proc. IEEE GLOBECOM, 5pp, Dec. 2009. [13] W. Wang, D. Peng, H. Wang, H. Sharif, H. H. Chen, ”A Multimedia Quality-Driven Network Resource Management Architecture for Wireless Sensor Networks with Stream Authentication,” IEEE Transactions on Multimedia (TMM), vol. 12, no. 5, pp.439-447, Aug. 2010.

Fig. 5. Video authentication quality with various channel conditions under a bit rate constraint of 24000 bits.

VI. C ONCLUSION In this paper we proposed a scheme for improving wireless video authentication quality with limited communication bandwidth. In the scheme, we analyzed the advantages and 978-1-4244-5637-6/10/$26.00 ©2010 IEEE

Suggest Documents