compressed-domain spatial adaptation resilient ... - CiteSeerX

0 downloads 0 Views 344KB Size Report
to a compressed bitstream before or after adaptation operations. Moreover, the ... can also be used as a tool/plug-in for Web services. 2. LITERATURE REVIEW.
COMPRESSED-DOMAIN SPATIAL ADAPTATION RESILIENT PERCEPTUAL ENCRYPTION OF LIVE H.264 VIDEO Razib Iqbal, Sharmeen Shahabuddin, Shervin Shirmohammadi Distributed Collaborative Virtual Environments Research Laboratory School of Information Technology and Engineering, University of Ottawa, Canada { riqbal | sshahabuddin | shervin }@discover.uottawa.ca

ABSTRACT In this paper, we present a slice-based encryption and spatial adaptation technique for H.264 videos. The encrypted video is resilient to spatial adaptation. All the encryption and adaptation steps are performed without any cascaded decoding and re-encoding operation. We use MPEG-21 gBSD as a metadata description of the compressed bitstream. This metadata is utilized to execute all the necessary footsteps in compresseddomain. The proposed encryption scheme can be applied to a compressed bitstream before or after adaptation operations. Moreover, the resultant bitstream conforms to H.264/AVC specification. Proof of the proposed encryption scheme and overall performance results established from the implemented system are also presented here.

1. INTRODUCTION Today’s multimedia communication platforms have been greatly shaped by the coexistence of a number of complementary as well as competing access, delivery and consumption technologies. Key developments and trends from the last few years in wireless communications and mobility, standardized multimedia content, the Internet and the World Wide Web (WWW) have set the scene for ubiquitous multimedia consumption. While there is a wealth of multimedia data on the Internet today, in this heterogeneous world, the delivery path for multimedia content to a multimedia terminal is not straightforward. To match the varying network conditions, user requirements or device constraints, rich media contents need to go through adaptation operations. Additionally, in the wake of a technological challenge to prevent piracy, the entertainment industry uses Digital Rights Management (DRM) technologies to protect contents. However, these content security technologies also limit the content adaptation possibilities. Let’s say for transmission of sensitive video contents, or to ensure revenue for a copyrighted item, video contents can only be adapted at trusted adaptation servers since they have to

be fully decrypted before performing the necessary adaptation operations. As a result, it is not convenient to adapt such encrypted contents on the fly. In this paper, we detail a scheme for compresseddomain spatial adaptation, and adaptation resilient perceptual encryption of live H.264/AVC videos. Our goal is to partially degrade the visual quality of the video content by encryption in order to restrict the full quality to legitimate user only. To achieve this goal, we use metadata of the compressed bitstream for necessary adaptation operations. Moreover, to perform the adaptation operations in an intermediary node, we emphasize on structured metadata-based adaptation utilizing MPEG-21 generic Bitstream Syntax Description (gBSD) [1]. gBSD-based adaptation ensures codec independence, therefore, any MPEG-21 compliant host can adapt any video format instantaneously. Hence, an adaptation server needs not to be aware of the video codec and the adaptation operations can be performed in compressed domain without any decoding and reencoding. Moreover, in our scheme, after adaptation and encryption, syntax of the bitstream conforms to H.264 specification. Such a scheme is very beneficial for an end-to-end video delivery to heterogeneous environments in any video distribution system. The encryption system can be merged to hide/secure information (e.g. in video surveillance systems) or to reduce perceptibility of a video content (e.g. for free preview of a pay-per-view movie). The proposed adaptation and encryption system can also be used as a tool/plug-in for Web services. 2. LITERATURE REVIEW Video adaptation is a promising horizon for Universal Multimedia Access (UMA) concept where a user may access media contents anywhere, anytime, and anyway s/he wants. To achieve target video quality for heterogeneous environments, multiple-proxy based transcoding solutions are investigated by researchers, e.g. [2]. However, it is more practical to perform adaptation operations in one adaptation service point in order to save time and cost rather than forwarding data packets to different proxies where mostly cascaded decodingadaptation-reencoding operations are performed (e.g.

[3]). Our compressed-domain adaptation framework [4] is the very first benchmark to adapt video contents in any MPEG-21 compliant host where ordinary off-the-shelve standard to crop the region of interest in each frame. In this process, compressed H.264 bitstream is first decoded to extract the region of interest. The cropped video stream is then re-encoded. As we can see that, the abovementioned scheme applies cascaded operations for spatial adaptation, for obvious reasons, this scheme is suitable for offline applications only. Our proposed spatial adaptation approach also applies cropping frame but without any cascaded operations. In [6], authors have mentioned several emerging research topics and open issues related to digital content adaptation which includes permissible and secure adaptation, where a secure adaptation of possibly encrypted content need to be performed. We have attempted to address this important issue in this paper. A comprehensive and promising work for H.264/AVC video encryption is reported in [7], where authors uses two algorithms – one for encrypting the headers, and second one for encrypting the slice payload. The objective of the authors in [7] is similar to ours; however, our additional goal is to devise an encryption technique that will survive compressed-domain adaptation operations in any intermediary node. Finally, in our previously published work [4], we detailed compressed-domain temporal adaptation, authentication and macroblock-based encryption, whereas, in this paper we report compressed-domain spatial adaptation and slice-based encryption. The encryption scheme detailed here will survive both temporal and spatial adaptation operations. The work presented here is an integral part of our ongoing research on distributed camera network for video surveillance enabling smart handheld devices [8]. 3. PROPOSED METHODOLOGY In H.264, a video frame can be split into one or several slices where slice sizes are flexible. Slices are selfcontained and can be decoded without using data from other slices. Every slice consists of a group of compressed macroblocks and begins with a slice header. Like any other video codec, the H.264 standard does not provide any security feature itself. For compressed-

personal computers even can do the required adaptation in real time. For spatial adaptation, in [5], authors employ a set of transcoding techniques based on the H.264 domain operations, we have analyzed how H.264 classifies the bits that comprise a compressed video sequence. Instead of detailing the features of H.264 video, we will abstract the features that we have used to achieve our objective. However, we refer interested readers to [4] and [7] for a detail description of H.264 features and syntax description. 3.1.

Video Preparation

The concept of the compressed-domain video processing (from video preparation to adapted video generation) can be seen as - Part 1, Compressed video and Metadata generation, performed during the encoding phase, and Part 2, Adaptation, performed in some intermediary node which can be further logically divided into two subprocesses, namely the Metadata Transformation and the Adapted Video Generation. Adaptation in an intermediary node requires the Digital Item (i.e. video bitstream along with its gBSD) to be available. This Digital Item performs as original content for resource server or content provider on the delivery path. Therefore, the generation of Digital Item is one of the important tasks in the initial stage. We generate gBSD during the encoding process of the uncompressed video by adding a gBSD generation module to the H.264 encoder. We have modified the encoder [9] to generate gBSD information while encoding a raw video. To facilitate spatial adaptation in the compressed-domain, we divide each video frame into slices, where some slices are marked as essential and the rest are tagged as disposable. No macroblock in a disposable slice is used as a reference for any macroblock in the essential slices since disposable slices are considered for removal to achieve a target resolution. gBSD containing the hierarchical information (e.g. starting byte and length of each frame, and slice size, marker for essential and disposable slices etc.) pertaining to an encoded bitstream are written while encoding that video. In Figure 1, a sample gBSD file is shown.



Figure 1. Sample gBSD of a compressed H.264 bitstream

3.2.

3.5.

Slicing strategy

To facilitate spatial adaptation by means of dropping slices, we have devised our own slicing strategies to be applied during encoding. In this paper, we present “WideStrategy” and “O-Strategy” as shown in Figure 2. The Wide Strategy is suitable for video surveillance systems and the O-Strategy is suitable for videos with a smaller area of interest, e.g. talking heads. In Figure 2, shaded slices represent the disposable slices, i.e. these slices may be dropped to achieve the target resolution, and rest of the slices (i.e. white areas) represents the essential slices referring to a region of interest.

Wide-Strategy

O-Strategy

Figure 2. Slicing Strategies 3.3.

Adaptation

Adaptation is performed in the following two steps: first, transform the metadata characterizing adaptation goal, and second, generate the adapted bitstream using the transformed metadata. Environmental requirements (e.g., available bandwidth, display resolution etc.) are the input to the transformation decision making mechanism. The MPEG-21 Usage Environment Description (UED) tool is used to gather this information. To achieve the target bitstream, the adaptation module parses the frame and/or slice data information from the adapted gBSD. gBSD is transformed to an adapted gBSD by means of XSLT [10]. The transformation rules are pre-disposed in the XSL style sheet. The metadata transformation module receives as an input the gBSD of the original compressed-videobitstream and a style sheet that transforms the gBSD according to the context information, e.g., the device capabilities. The output of this process is a transformed gBSD which reflects the bitstream segments of the target (i.e. adapted) bitstream. However, the transformed gBSD still refers to the bit/byte positions of the original compressed-video-bitstream which needs to be parsed in order to generate the adapted bitstream. The adaptation module discards the disposable slices directly from the compressed video-stream. It also modifies the Sequence Parameter Set (SPS) of the video stream before sending it to the requester to indicate that the video file is being adapted so that the video players can display only the desired region accordingly. 3.4.

Processing live video

For gBSD transformation using XSLT, complete gBSD file need to be loaded before being adapted. This is a shortcoming of the metadata based adaptation architecture discussed above in live scenarios. Therefore, to offer live streaming on top of the existing implementation, live video stream is processed as small clips (usually 5 to 20 seconds) in a pseudo live fashion.

Encryption

To encrypt a bitstream, we first need to select the encryptable plaintext with a known length for each logical unit. In this paper, slice data payloads in each frame are the logical units. Due to our own slicing strategy, we can flexibly encrypt slices in a region of interest (ROI). The encryption operation can be performed either before adaptation on the encoder side or after/during adaptation. In any case, encrypted slices are resilient to spatial adaptation operations. The ROI either can be an area marked by disposable slices or essential slices. The encryption module either considers the level of encryption requested by the user or a ROI. If the encryption preference is highest then all the slices in a video frame are encrypted; else, (some or all) essential/disposable slices within the frames are encrypted to hide information in a ROI. For encryption, frame and slice markers are scanned from the gBSD. Thus slices’ starting position and corresponding length is retrieved from the transformed gBSD for those frames which need to be encoded. To encrypt the slices, an 8-bit encryption key is chosen and XOR operation is performed. After XOR operation, the bitstream is processed for H.264 format compliance. It is worth mentioning that each logical unit can be encrypted independently and any encryption algorithm can be applied to replace the simple XOR operation. 4. EXPERIMENTAL RESULTS In digital video encryption, the tradeoffs between security and speed, encryption and compression and format compliance is of greater importance in designing and implementing an encryption framework or algorithm. From our implemented adaptation and encryption system we have observed that on average it requires additional 5 seconds to encode an uncompressed video of size CIF (352× 288) consisting of 300 frames. The extra time is needed to apply our slicing strategy and to generate the gBSD. For spatial adaptation, slice removal time varies from 0.07 milliseconds to 0.27 milliseconds. In all cases, the experiment settings were as follows: an Intel Pentium Dual CPU T3200 at 2.00 GHz machine with 2.00 GB RAM, running Linux 2.6.28. Table 1. Resulting adapted video Video Name

Original (CIF)

Wide-Strategy

O-Strategy

Coastguard (300 frames)

In Table 1, we present sample adapted video frames applying Wide-Strategy and O-Strategy. As we can see from the results, Wide-Strategy is well suited for surveillance systems, traffic monitoring etc., and OStrategy is well suited for smaller ROI. In any case, the adapted and encrypted video delivered to the consumer is H.264 format compliant and any standard H.264 player

would be able to decode and play the video. But the frames those which are encrypted would not be perceived clearly because of the encryption. 2 frames of the results from the sample video Coastguard is shown in Figure 3. Figure 3A. and Figure 3B. show 2 frames from the compressed video before any adaptation and encryption operation. In Figure 3C, we show the sample encrypted output of Figure 3A, (without any adaptation) where all the slices are encrypted. In Figure 3D, we show an encrypted version of the Figure 3A. Encryption is applied after spatial adaptation using Wide-strategy. In this case, a small slice is selected for encryption to hide information in a ROI. Finally, Figure 3E depicts an encrypted version of the Figure 3B. Again, encryption is applied after spatial adaptation but using O-strategy.

B. Original

A. Original

efficiency. One may urge to use the flexible macroblock ordering feature of the H.264 reference software for slice partitioning. However, we have devised our own slicing strategy to gain the full control over the compressed bitstream to discard the disposable slices by parsing the gBSD only. Importantly, the decryption technique needs not to be implemented in all the end nodes unless it is obligatory. Since decoding is less complex than encoding, so the overhead added to decrypt the encoded frames will not exceed the nominal threshold value for presentation as they will reside as a closely coupled tasks. 6. CONCLUSION In this paper, we presented a spatial adaptation resilient perceptual encryption scheme for H.264 videos. We have shown that structured knowledge of the bitstream syntax and hierarchical structure in the form of metadata enables both adaptation and encryption processes to be customized and easily deployable in an intermediary node which can be a proxy, dedicated adaptation server or an ordinary computer. The experimental results show that our method is suitable for real time systems. Finally, for live video capturing, adaptation and encryption, a hardware level implementation capable of generating the compressed bitstream and gBSD will definitely speed up the process.

D. Encrypted – Wide-strategy

REFERENCES [1] ISO/IEC 21000-7:2004, Information Technology Multimedia Framework – Part 7: Digital Item Adaptation.

C. Encrypted – no adaptation

E. Encrypted – O-strategy

Figure 3. Sample video output (Coastguard) 5. DISCUSSION The variation of multimedia enabled devices emerging on a regular basis is inevitable, causing service providers to look for solutions to increase their revenue by providing mobile video streaming and downloads to these devices. Metadata support for video adaptation seems to be a promising solution because it allows adaptation and encryption of video contents in a format independent way. The joint adaptation and encryption architecture for H.264 video, conforming to MPEG-21 DIA detailed in this paper is a feasible solution for commercial deployment since any MPEG-21 compliant host can adapt the video online. An alternative solution, hierarchical multi-layer encoding of video, divides the video into multiple layers where there is a base layer with a given quality that can be improved by adding more layers depending on the bandwidth and processing capability of a receiver. However, it requires a rather powerful decoder at the receiver; not many handheld devices are capable of decoding such streams. Moreover, layer-encoding does not enable fine grained adaptation since the more the number of layers, the less the coding



[2] M.S. Hossain, A. Alamri, and A. ElSaddik, “QoS-Aware Service Selection for Multimedia Transcoding,” in Proc. of IEEE I2MTC, May 2008. [3] J. Zhang, A. Perkis, and N. Georganas, “H.264/AVC and Transcoding for Multimedia Adaptation,” in Proc. of the 6th COST 276 WORKSHOP, 2004. [4] R. Iqbal, S. Shirmohammadi, A. ElSaddik, and J. Zhao, “Compressed Domain Video Processing for Adaptation, Encryption, and Authentication”, IEEE Multimedia, vol. 15, no. 2, pp. 38-50, April-June, 2008. [5] Y. Wang, X. Fan, H. Li, Z. Liu, and M. Li, “An Attention Based Spatial Adaptation Scheme for H.264 Videos on Mobiles”, in Proc. of Multi-Media Modeling Conference, 2006. [6] A. Vetro and C. Timmerer, “Digital item adaptation: overview of standardization and research activities,” IEEE Trans. on Multimedia, Vol. 7, Issue 3, pp. 418-426, June 2005. [7] T. Shi, B. King, and P. Salama, “Selective Encryption for H.264/AVC Video”, Proc. of the SPIE, Vol. 6072, pp. 461-469, 2006. [8] R. Iqbal, S. Ratti, and S. Shirmohammadi, “A Distributed Camera Network Architecture Supporting Video Adaptation”, in Proc. of ACM/IEEE ICDSC, 2009. [9]

[online]. http://www.videolan.org/developers/x264.html

[10] [online]. http://www.w3.org/TR/xslt

Suggest Documents