HARD AUTHENTICATION OF H.264 VIDEO APPLYING ... - CiteSeerX

2 downloads 5837 Views 139KB Size Report
In digital watermarking, customized data is embedded in a .... compressed H.264 bitstream applying the MPEG-21 gBSD. Here, we .... signature by an intruder.
HARD AUTHENTICATION OF H.264 VIDEO APPLYING MPEG-21 GENERIC BITSTREAM SYNTAX DESCRIPTION (GBSD) Razib Iqbal, Shervin Shirmohammadi, and Jiying Zhao Distributed and Collaborative Virtual Environments Research Laboratory (DISCOVER Lab) School of Information Technology and Engineering (SITE) University of Ottawa, 800 King Edward Ave., Ottawa, ON, Canada K1N 6N5 [riqbal | shervin | jyzhao] @site.uottawa.ca ABSTRACT While trivial research has been conducted in watermarking and authentication of H.264 video in recent years, most techniques require cascaded operations within a video adaptation scenario. In this paper, we propose an authentication scheme for adapted H.264 video content to detect integrity at the receiver’s side without the need for cascaded operations. The proposed scheme utilizes MPEG21 gBSD for hard authentication of H.264 video in the compressed domain and does not necessitate any cascaded decompression and recompression. The design uses contentbased authentication which is derived from a hash value. The authentication data is embedded as a fragile watermark, and the marking space is selected during the adaptation process of the H.264 video by parsing the gBSD. The authentication information is embedded in already encoded videos during adaptation. Proof of concept and performance evaluation is also presented.

1. INTRODUCTION Content authentication is an ongoing and constant requirement for the transmission of sensitive video content. In digital watermarking, customized data is embedded in a digital content like an image, video, and audio for digital right management, copy control or authentication. To embed copyright information, a robust watermark is required; whereas for authentication, fragile or semi-fragile watermarks are sufficient. Generally, fragile watermarking systems or hard authentication rejects any modification made to a digital content. The idea is to authenticate the digital data by a hash-value. The hash value can be protected with a key and the key can be verified from a trust centre. Moreover, in most video adaptation delivery and consumption chains, watermark embedding and detection need to be performed in real time. For still picture, detection of the robust watermark can take as long as a few seconds but this delay is unacceptable for motion pictures especially

when the frame rate is faster than given thresholds say more than 10 frames per second (fps). Because of the large size of raw video files, video bitstream is preferred to be stored and distributed in a compressed format. Therefore, the watermark can be embedded in the uncompressed domain, during the compression process or after the compression process. An easy solution to adaptation plus authentication can be achieved by allocating some trusted intermediary nodes in between the media resource provider and the consumer to perform manifold decoding and encoding operations such as decompression, adaptation, watermarking and recompression. While simple, this approach requires a large amount of processing power or processing time, not to mention absolute trust in intermediary nodes, which can be compromised by a third party. Also, for real time applications, it could be impractical to perform these cascading operations. To solve this problem, design of a system to perform adaptation and watermarking operations in the compressed domain will surely be a significant improvement. In our previous works, we utilized MPEG-21 gBSD to perform temporal adaptation [1] and encryption [2] of the H.264 video stream in compressed domain. In this paper, we present an authentication technique which utilizes gBSD to select the marking space from a video content in the compressed domain. The technique presented here requires low computational effort, low processing time and works on top of the adaptation framework [1] because no cascaded decompression-recompression operation is performed. The resulting video conforms to H.264 bitstream syntax structure and can be decoded by a standard decoder. 2. MPEG-21 DIA & H.264 Part 7 of the MPEG-21 framework [3] specifies the syntax and semantics of tools that may be used to assist adaptation of Digital Item (DI). A DI is denoted as a bitstream together with all its relevant descriptions. A generic Bitstream Syntax Schema (gBS Schema) is specified in the MPEG-21 framework, to perform adaptation in an intermediary node in

a format independent way. This scheme ensures codec independence, semantically meaningful marking of syntactical elements, and hierarchical descriptions of the bitstream. A description conforming to this schema is called a generic Bitstream Syntax Description (gBSD). The gBSD provides an abstract view on the structure of the bitstream that can be used in particular when the availability of a specific bitstream schema is not ensured. In the Digital Item Adaptation (DIA) engine, for transformations on gBSD, it is important to include coding format specific information in attributes of the gBSD. H.264 is the latest video coding and compression standard by ITU-T and ISO/IEC. It offers an entropy coding design which includes Context-Adaptive Binary Arithmetic Coding (CABAC) and Context Adaptive Variable Length Coding (CAVLC). Since in H.264 data is entropy coded, in order to achieve byte-alignment, sequence of bits is being padded by the encoder when necessary. Bitstream data unit (BDU) is defined as a unit of the compressed data which may be decoded independently of other information at the same hierarchical level. A BDU can be for example a frame or a slice. In H.264, a frame can be split into one or several slices where slice sizes are flexible. Slices are self contained and can be decoded without using data from other slices. In our scheme, we consider one slice in each video frame. 3. LITERATURE REVIEW Lots of research and prominent investigations have been performed for video content authentication comprising H.264 standard in recent years [4-8]. DCT coefficient based embedding systems [4-6] embed binary watermark bits in DCT domain derived from different extracted features, for example, human visual model adapted for a 4×4 DCT block [4], relations between predicted DCT coefficients and real DCT coefficients [5]. Watermarking method proposed by Qiu et al. [6] embeds a robust watermark into DCT domain and a fragile watermark into motion vectors during H.264 compression. J. Zhang and A.T.S. Ho [7] proposed a scheme that uses the tree-structured motion compensation, motion estimation and Lagrangian optimization of the H.264 standard. The authentication information is represented by a binary watermark sequence and embedded into video frames. Dima Pröfrock et al. [8] proposed a new transcoder, which analyses the original H.264 bit stream, computes a watermark, embeds the watermark for hard authentication and generates a new H.264 bitstream. All of these techniques either embed watermarks during the encoding process of the H.264 video [6,7] or employ cascaded decompression and recompression operations [4,5,8] to analyze H.264 bitstream and embed the watermark. 4. SYSTEM DESIGN In our framework, we have followed the adaptation

approach described in [1] where temporal adaptation of H.264 is performed dynamically and directly from the compressed H.264 bitstream applying the MPEG-21 gBSD. Here, we make use of the gBSD to identify the segments into which the authentication bits can be embedded. Towards this, we install a watermark embedder in the adaptation engine to embed authentication bits during adaptation. The benefits of this approach are: 1) the gBSD is parsed only once while adapting, and 2) authentication bits are computed and embedded in the adapted frame(s). 4.1 Creation of Compressed Video and gBSD

Figure 1. Generation of Digital Item

Adaptation and authentication in a trusted intermediary node requires the DI (H.264 video along with its gBSD) to be available. This DI performs as original content for resource server or content provider on the delivery path. Since in MPEG-21 framework, generation of the gBSD from binary data is not normatively specified, the gBSD is generated during the encoding process of the bitstream. As shown in Figure 1, uncompressed video is the input to the encoder. The encoder encodes raw video to the compressed bitstream conforming to ITU-T specification and generates the corresponding gBSD. A sample gBSD is shown in Figure 2.


Figure 2. Sample gBSD

4.2 Selecting the Marking Space From the gBSD, marking space can be selected from available alternatives, like frame, slice, macroblock, and block. Application specific marking space can be selected in a predefined way and a fixed watermark embedder can be designed. Otherwise, if marking space is selected manually, the watermark embedder should be capable of inserting watermark bits in the selected segment directly in compressed bitstream. For manual selection of marking space, start and length of each segment need to be defined in the gBSD. In any case, selection of a marking space and applying customized modification must conform to H.264 bitstream specification to avoid incompliance for a standard player or decoder. In our implemented system, we have made use of the frame and slice data to compute authentication bits and finally embedded these bits in the

slice header. 4.3 Adaptation Inside the adaptation engine, adaptation of resource (i.e. H.264 bitstream) and description (i.e. gBSD) are performed in 2 steps. At first the gBSD is transformed via XSLT, and then, based on the transformed gBSD, the original bitstream is modified. For the gBSD transformation, an XSL style sheet defines the template rules and describes how to display a resulting document. Bitstream layer/segment information present in the gBSD is taken care of by extending the template rules. The XSLT processor takes a tree structure as its input by parsing the gBSD and generates another tree structure as its output into adapted gBSD. The next step is the generation of the adapted bitstream using the transformed gBSD. Adaptation module first initializes and parses the adapted gBSD. It then extracts the parsed gBSD information from the video stream. Adapted H.264 bitstream is finally generated by discarding gBSD portions corresponding to specific frames. 4.4 Watermark Embedder The watermark embedder module is proposed to implant in the adaptation engine to embed authentication bits on the fly while adapting. Figure 3 shows the adaptation and watermarking module.

Figure 3. Adaptation and Watermarking module

In the adaptation engine, while adapting the video bitstream, we embed the authentication bits in the slice header VLC byte align bits (minimum 1 bit and maximum 7 bits). It is important to mention that this marking space can be further extended to other entities like frame and macroblock based on the gBSD details. Total number of bits in a slice (SN) is the sum of slice header (SH(N)) bits and slice payload (SP(N)) bits, denoted as, SN = SH(N)+SP(N), where N = total number of bits. From the gBSD, length of the slice header VLC byte align bits (vlcn), start and length of the frame are parsed. Hash value (FHash) of the frame data including slide header (except the bits where the authentication bits will be embedded) and slice payload is computed. The architecture applies a simple hash function based on PJW Hash [9] which can be replaced by any available advanced hash function. Input to the hash function is the frame data, length of frame data (in bytes) and a private key (PK). For implementation purpose, we have considered a logo/sample image of an arbitrary length (LN) as our private key. Authentication bits embedded in the slice

header (SH) can be denoted as follows: SH (N − vlcn+ j) = FHash ( j) ⊕ PK(i) where, 1 ≤ j ≤ vlcn, 1 ≤ i ≤ LN After embedding the authentication bits, an optional second level of authentication is applied by scrambling the last byte of the slice payload to restrict re-computation of the signature by an intruder. In H.264, blocks and macroblocks are not byte-aligned, so XOR operation is applied to the last byte of slice payload (Slbsp) with respect to the private key like that of slice header. Even though to re-compute the authentication bits, along with the hash value, the private key is necessary, modified slice payload will add another layer of assessment to detect possible attacks. Modification made to the slice payload can be shown as: S lbsp = S lbsp ⊕ PK (i) where , 1 ≤ i ≤ LN 4.5 Watermark Detector The watermark detector consists of a 4 step process. The first step, parsing adapted gBSD, is extracting the marking space from the adapted gBSD to identify each watermarked segment. The second step, restore frame data, comprises of XOR-ing the scrambled slice payload bits with the private key to re-instate the slice data for computing original hash value. The third step, watermark extraction, extracts the authentication bits from slice header. The final step corresponds to compare the computed hash value from slice data with the extracted value from slice header. To verify a received video content, the user needs additional to the video data, the private key and the adapted gBSD. These data can be transmitted in a second file. Secure transmission of private key and gBSD was not considered in this research. In case of a video content for mass distribution without any priority given to authentication, it is not necessary to modify the decoders for every client and thus adapted gBSD need not to be transmitted to receiver. In the latter case, typical H.264 players will be able to play the content without any prior knowledge of the modifications made to the content since the content structure conforms to the H.264 bitstream syntax structure. 5. PERFORMANCE OF THE SYSTEM Our proposed authentication technique is integrated in an adaptation system which performs temporal adaptation. So the quality degradation of the resulting video is due to the frame dropping only and not for embedding the authentication bits. For the testing purpose, an Intel P4 3.4 Ghz, O/S Win XP Pro SP2, 512MB RAM PC was selected as the media resource/streaming server. In the media server, there were 3 uncompressed (YUV 4:2:0) sample videos (300 frames, 176x144) – 'container', ‘claire’ and ‘news’. To generate the DI, we have modified the ITU-T reference software implementation JM 9.5 [10]. Table 1 shows the performance of the watermarking module on top of the

adaptation system for pre-recorded videos. Table 1. Performance of the watermarking module Frame Claire(A) Claire(A+W) Container(A) Container(A+W) News(A) News(A+W) Rate 5 297 ms 313 ms 578 ms 609 ms 641 ms 657 ms 10 594 ms 625 ms 1172 ms 1218 ms 1296 ms 1328 ms 15 905 ms 922 ms 1765 ms 1813 ms 1953 ms 2015 ms 20 1203 ms 1234 ms 2375 ms 2422 ms 2609 ms 2703 ms 25 1515 ms 1546 ms 3015 ms 3047 ms 3266 ms 3375 ms Original Frame Rate 30 fps. Total frames: 300, H.264 Profile: High (100), QP = 28 * A = Adapted, W = Watermarked, ms = millisecond

From the above table, we can see that time required to embed authentication bits is little higher than the adaptation time. The difference between these two depends on the hash function used. More complicated or robust hash functions will require higher execution time to compute and embed the authentication bits. Another factor that can affect the execution time is the marking space. To make the system more robust, one can decide to embed authentication bits in frame, macroblock and block, which will eventually require longer watermarking time. The same architecture can be applied for a live stream. For XSLT, complete XML description must need to be loaded before being adapted. So, live video streams are processed like that of pre-coded videos as small clips of 20 seconds with 15 fps and SQCIF (128×96) frame size. The watermark embedding time compared to that of adaptation is negligible for live video processing because total number of frames and frame size in each video clip is small compared to pre-recorded videos. 6. ANALYSIS In the proposed approach, adaptation and watermarking is performed in compressed domain without any cascading operation. The adapted and watermarked video is H.264 compliant and bit rate is not changed for the watermarking. For authentication, original H.264 video is not required rather a separate authenticator can verify the validity of the received video data. The authenticator is independent of the decoder, so there will be no lag added while decoding the video. Lastly, watermark embedding can be done in live video streams. Instead of computing hash value for every frame, digital signature can be computed for the whole video content or for a certain number of frames (e.g. frame rate at which the raw video is being encoded) and thus embedded in the selected marking space. In an UMA environment, a Media Streaming Server (MSS) can be connected to several Proxy Servers (PS) and clients. To serve directly connected clients, a MSS sends the adapted video content, adapted gBSD and the private key accordingly. A PS serving only classified clients (e.g. small handheld devices) can request for a specific adapted video (e.g. SQCIF, 15fps) from the MSS. The MSS will thus send the specified adapted video content along with the adapted gBSD (with an optional watermarking). Otherwise MSS will send the original H.264 video and the gBSD. The PS will adapt and re-compute the

authentication signature before serving a client. As we can see, none of the servers need to be aware of the codec. It can adapt and watermark the videos on the fly just looking at the gBSD. 7. CONCLUSION Most recent commercial video monitoring/surveillance tools and applications apply H.264 format for video due to its flexibility and quality. Video data being captured by wireless/dispersed cameras and transmitted to a distant receiver thus requires embedding a signature in real time to scrutinize the integrity of the received content in a contained environment. To serve this purpose, we have used the Bitstream Syntax Description to select the marking space directly from the compressed bitstream rather than decoding any part of the media. The system presented here provides a hard authentication technique of adapted H.264 video in compressed domain and in real time. Beyond the developments presented above, further research is required to embed robust watermark in compressed domain for copyright protection purposes. 8. REFERENCES [1] R. Iqbal, S. Shirmohammadi, and C. Joslin, “MPEG-21 Based Temporal Adaptation of Live H.264 Video”, Proc. IEEE Intl. Symposium on Multimedia (ISM), pp. 457-464, Dec. 2006. [2] R. Iqbal, S. Shirmohammadi, and A. El Saddik, “Secured MPEG-21 Digital Item Adaptation for H.264 Video”, Proc. IEEE Intl. Conf. on Multimedia and Expo (ICME), pp. 2181-2184, Jul. 2006. [3] ISO/IEC 21000-7:2004, Information Technology – Multimedia Framework – Part 7: Digital Item Adaptation. [4] M. Noorkami and R. M. Mersereau, “Towards Robust Compressed-Comain Video Watermarking for H.264”, SPIE Security, Steganography, and Watermarking of Multimedia Contents, Vol. 6072, pp. 489-497, Jan. 2006. [5] G. Wu, Y. Wang, and W. Hsu “Robust Watermark Embedding Detection Algorithm for H.264 Video”, Journal of Electronic Imaging, Vol. 14, Jan.-Mar. 2005. [6] G. Qiu, P. Marziliano, A.T.S. Ho, D. He, and Q. Sun, “A Hybrid Watermarking Scheme for H.264/AVC Video”, Proc. 17th Intl. Conf. on Pattern Recognition, Vol. 4, pp. 865-869, 2004. [7] J. Zhang and A.T.S. Ho, “Efficient Video Authentication for H.264/AVC”, Proc. 1st Intl. Conf. on Innovative Computing, Information and Control, Vol. 3, pp. 46-49, Aug. 2006. [8] D. Pröfrock, H. Richter, M. Schlauweg, E. Müller, “H.264/AVC Video Authentication Using Skipped Macroblocks for an Erasable Watermark”, Proc. SPIE Visual Communications and Image Processing, Vol. 5960, pp. 1480-1489, 2005. [9] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, pp. 434-438, 1986. [10] http://ftp3.itu.ch/av-arch/jvt-site/reference_software/

Suggest Documents