Mobile Application Security for Video Streaming. Authentication and Data Integrity Combining. Digital Signature and Watermarking Techniques. Stefano Chessa.
Mobile Application Security for Video Streaming Authentication and Data Integrity Combining Digital Signature and Watermarking Techniques Stefano Chessa
∗† ,
Roberto Di Pietro
∗ ISTI-CNR,
∗‡ ,
Erina Ferro ∗ , Gaetano Giunta
∗§ ,
Gabriele Oligeri
∗
Area della Ricerca di Pisa, Via G.Moruzzi 1, 56124 Pisa, Italy di Informatica, Universit`a di Pisa, Largo B. Pontecorvo 3, 56127 Pisa, Italy ‡ Dipartimento di Matematica, Universit`a di Roma Tre, L.go S. Murialdo 1, 00146 Rome, Italy. § Dipartimento di Elettronica Applicata, Universit`a di Roma Tre, Via della Vasca Navale 84, 00146 Rome, Italy. † Dipartimento
Abstract— Satellite link presents peculiar characteristics like no packet reordering and low bit error rate. In this paper we leverage these characteristics combined with watermarking techniques to propose a novel authentication algorithm for multicast video streaming. This algorithm combines a single digital signature with a hash chain pre-computed on the transmitter side; the hash chain is embedded in the video stream by means of a watermarking technique. Our proposal shows several interesting features: Authentication is enforced, as well as integrity of the received multicast stream; received blocks can be authenticated on the fly; no storage is required on the receiver side, except for the amount of memory needed to store a single hash; overhead computations required on the receiver sum up to single hash per block, while a digital signature verification is amortized over the whole received stream. Finally, note that the bandwidth overhead introduced is negligible, since the applied watermarking technique introduces virtually no modifications (at least, not recognizable by humans) on the original video stream pictures.
I. I NTRODUCTION In a secure video-streaming system, sender authentication must be performed to prevent unauthorised use of the shared media [1]. Conventional methods for securely identifying images use hash functions. Digital signature standard, used in cryptosystem to dispute authentication documents, also involve the use of hash functions. A digital signature is a bit stream dependent on a private key and the content of the document to be signed. For each document, the digital signature algorithm provides a unique output bit stream [2]. Authentication schemes assert that an adversary cannot compute the same signature with two different messages. In cryptosystems, this type of process is largely used to ensure data integrity, data origin authentication and non repudiation. The digital signature is based on a hash function (or a oneway hash function -OWH-) and an encryption algorithm [3]. This work has been supported by the CNR/MIUR program ”Legge 449/97” (project IS-Manet) and by the European Commission within the 6th Research Framework Programme in the framework of the European Satellite Communications Network of Excellence (SatNEx II, IST-27393) for the FT4 ”Video Streaming Authentication in satellite communications (ViStA)” of the JA 2240 ”Security”.
1550-2252/$25.00 ©2007 IEEE
One-way functions and cryptographic hash functions are computed by cryptographic primitives without key. For example, MD5 [4] and SHA-1 [5] are customized hash functions in cryptographic process. Another method for securely identifying images is watermarking. In order to be efficient in images [1], the digital signature should be different if and only if the image contents are different. Digital watermarking [6] is a multidisciplinary field that combines media and signal processing with cryptography, communication theory, coding theory, signal compression, theory of human perception, and quality of service requirements [7]. Techniques based on watermarking are largely used for copyright protection and fast search of images in databases [6]. In authentication or copyright watermark, the embedded signature watermarks the data with the owner’s or producer’s identification. This is the enabling technology to prove ownership on copyrighted material, either to monitor the use of the copyrighted multimedia data, or to analyse where the data is in use over networks and servers. This technology builds on the idea of hiding meta-information, like an owner’s signature, in physical material in which content is represented, such as the pixels of an image. From the digital form, authorised recipients can extract the embedded information when necessary to show proof of ownership. In general, we have a watermarking embedding function that inserts a watermark h into media data B and generates the watermarked data B w . A detector can check for the existence of the watermark and extracts it (h) from data B w . The embedding and detector functions are usually public but they are initialised by keys to achieve security. Referring to the existing literature, to the best of our knowledge, no published papers deal with proposals of a combined procedure involving digital watermarking together with public/private key security mechanism. Digital watermarking techniques should be robust and fragile at once [6]. Copyright, fingerprint, and copy-control watermarks should have a high robustness, consisting of resistance to blind, non-targeted modifications, or common media operation. For manipulation recognition, the integrity watermark
634
1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111
RSA(H(B 1))
Fig. 2.
Fig. 1.
B
B
H(B 2)
H(B 3)
1
000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111
Multicast satellite scenario.
must be fragile (it disappears if someone has altered the file) to detect altered media. Secure watermarking implies that embedded information cannot be removed by targeted attacks without these modifications being detected. Moreover, watermark must be transparent according to the properties of the human sensory system, in order that a transparent watermark causes no artifacts or quality loss. Furthermore, either a private verification (by a private-key function) or a public verification (by a public-key algorithm) of the embedded watermark has to be implemented, depending on the particular application. All these characteristics should operate at a reasonable level of complexity: The introduced overhead required to embed and retrieve a watermark should be low, especially for real-time applications such as video-streaming. II. S YSTEM MODEL Multicast satellite networks present peculiar characteristics like high round trip time, no packet reordering and low bit error rate (BER). For instance, geostationary satellites placed in equatorial orbit at 36,000 km altitude, introduce a transmission delay between transmitter and receiver of about 250 ms, while satellite link presents a low BER of about 10−8 in the majority of cases [8]. This low BER is due to design requirements and structural constraints. Leveraging the low BER and the no packet reordering feature, we propose a novel schema for authenticating multicast video streaming that does not affect available bandwidth. Our scenario is in Figure 1: The center has to distribute an uncompressed video stream through a multicast satellite network. Both transmitter and receiver can perform preprocessing elaborations, like MPEG2 coding/decoding, before the transmission and after reception respectively. The use of MPEG2 [9] algorithm is justified if we assume our satellite network using DVB-S [10]. The sender preprocesses the uncompressed video stream inserting in it the authentication information. This is achieved with watermarking technique. The next step for the sender is first to perform MPEG2 compression and then to transmit the signal to the various receiver stations through the satellite net-
2
B
3
Hash chain constructed on three blocks.
work. It is worth noticing that using watermarking technique to embed authentication information within the video stream avoids to incur in any bandwidth overhead. Indeed, the authentication information introduce just transparent modifications to the video pictures (that is, modifications that will not affect the perceived quality of the video). Each receiver station decompresses the video stream and then authenticates it. This goal is obtained combining a a digital signature like RSA [11] with a hash chain. Figure 2 synthesizes the authentication schema, where Bx is the compressed version of the block Bx (GOP). This approach amortises also the computational overhead due to the single asymmetric signature over the whole video stream, trading off the overhead related to digital signature, with the overhead required by one way hash functions, like SHA-1 [12], characterised by a low computational overhead. III. T HE AUTHENTICATION PROCEDURE In this section we present a novel authentication procedure for multicast video streaming. We propose a new authentication schema based on single digital signature like RSA [11], authenticating a one-time signature chain. The hash chain, (that could be built over SHA-1 [12]) is hidden in the video stream pictures, by means of the watermarking technique described in Section IV. An MPEG2 video stream can be considered as a GOP (Group Of Pictures) sequence, like in Figure 3. A GOP is composed of three different types of pictures: I-picture (Intra), P-picture (Predictive), and B-picture (Bidirectionally predictive). Inside a GOP there is only one I-picture and one or more B and P-pictures. Compression is computed over two different levels: The spatial and the temporal ones [9]. Spatial compression is performed only over the I-picture, while temporal compression is carried out over the B and the P-pictures. These pictures are obtained via difference from the I-picture. A GOP, hereafter Gi , represents in the MPEG2 domain a well known picture sequence (block) in the uncompressed domain. Let H(◦) and Signature(◦) be the functions that implement a SHA-1 hash and an RSA signature respectively. The function M peg2(◦) computes the GOP associated with the specific input block using the MPEG2 algorithm. Finally, W (x, y) inserts the data y into a block of x pictures according to the watermarking algorithm described in Section IV. Table I describes in details how the sender builds up the hash chain. Transmitter is provided with an uncompressed video stream S and interprets it as sequence of blocks of pictures
635
S G1 G2 G3 G4
Gn−1 Gn
IBBPB... Fig. 3.
Video stream as a picture sequence.
[B1 , . . . , Bn ], so that each block will be codified afterwards as a GOP by the MPEG2 algorithm. The last block of the video sequence is compressed: Gc = M peg2(Bn ). The construction of the hash chain starts from the block of index n−1. The hash value hi is calculated over the GOP Gc through the function hi = H(Gc ). This value (hi ) is inserted into the block Bi by means of the watermarking function W (Bi , hi ). The watermarked block is now encoded with Gc = M peg2(Biw ), obtaining the GOP Gc that will be the argument for the next hash calculation. The procedure is iterated over all the blocks up to the first one (B1 ). Finally, the first GOP (Gc ) is authenticated through a digital signature with Signature(H(Gc )). Note that this is the only asymmetric operation performed by the sender. The marked block sequence is now codified with MPEG2 by the function M peg2([B1w , . . . , Bnw ]). Note that each block belonging to S is codified into a GOP in Sc . Hence, each block in the uncompressed domain corresponds to a GOP in the compressed domain.
side. Hence, once the hash over the first GOP is computed: h = H(G1 ), this value is employed to verify the digital signature (Sig) with V erSignature(Sig, h). If the test succeeds, the receiver can subsequently authenticate the video stream checking for the authenticity of each block of the hash chain. The hash value hi inserted by the sender is extracted from the block Bu through the watermark retrieval function E(Bu ). Bu is obtained repeatedly from each GOP belonging to the video stream through Bu = U nM peg2(Gi ). The value hi is then compared with the hash calculated by the receiver over the GOP Gi+1 , that is H(Gi+1 ). The execution proceeds iteratively only if the test is successful for all of the blocks. If a picture is corrupted (for instance, forged or changed during the channel transmission) its hash changes, and the authentication test fails. Note that once a failure is triggered, all the subsequent pictures cannot be authenticated. Table II shows in details the algorithm run by the receiver. TABLE II Verification of the authenticity of sequence Sig, Sc
Receive (Sig, Sc ); [G1 , . . . , Gn ] = ExtGOP (Sc ) ; h = H(G1 ); if not ( V erSignature(Sig, h) ) { authentication failed; exit; } for i = 1 to n { Bu = U nM peg2(Gi ) ; hi = E(Bu ); if hi = H(Gi+1 ); { authentication failed; exit; } }
TABLE I Construction of the single hash chain
let S = [B1 , . . . , Bn ] be the uncompressed video stream block sequence; Gc = M peg2(Bn ) ; for i = n − 1 to 1 { hi = H(Gc ); Biw = W (Bi , hi ); Gc = M peg2(Biw ) ; } Sig = Signature(H(Gc )); Sc = M peg2([B1w , . . . , Bnw ]) ; Transmit (Sig, Sc );
Without losing of generality, we assumed that the sequence (Sig, Sc ) is received correctly. While this assumption is acceptable due to the low BER of satellite link, it should be noted that if message corruption should become an issue, it could be tackled considering redundant hash chain or using FEC [13]. For instance, robustness could be obtained embedding into the same block more than one hash calculated over several GOP. IV. T HE EMBEDDING PROCEDURE A. Embedding information as watermark
On the receiver side, the authentication procedure work as follows: Once the receiver has stored (Sig, Sc ), it decodes the MPEG2 stream and extracts a GOP sequence from the compressed video stream Sc via ExtGOP (◦), that is: [G1 , . . . , Gn ] = ExtGOP (Sc ). Now, the receiver has to authenticate the anchor of the authentication chain. Note that this operation is the only operation based on asymmetric key cryptography on the receiver
In this section we describe the technique used to embed the hash chain into the video stream. In particular, we describe the functions W (◦) and E(◦) used in the previous section. Note that our use of watermarking techniques is not standard; classical watermarking is used to embed a logo (generally a picture) within a video stream. This embedding is hidden in the video and it can be extracted to witness the ownership of the video. It is acceptable that the extraction
636
TABLE III Function W (◦) embedding a hash value into video stream pictures
could be lossy, that is, the logo can be extracted with errors, provided it remains recognisable. Instead, in our case, we use watermarking to embed the result of a hash computation. The extraction of the hash must not be affected by errors, otherwise the hash chain would break and the receiver could not authenticate the remaining of the video stream. For this reason, in this section we propose an embedding technique that provides error-free hash extraction. An uncompressed video stream can be considered as a three dimensional signal, where its components are a bidimensional matrix (the picture) and the time. Each picture is formed by a w × h matrix of pixels; each pixel expresses a colour that can be represented by three values: R (red), G (green), B (blue) or, equivalently, by Y(luminance) and U, V (chrominance) components. In our algorithm, we consider the Y-U-V representation, and we embed a mark (hash) on each picture of the video stream on its luminance component (Y). We use a Fourier domain method based on the DCT (discrete cosine transform). This method assures robustness with regard to various types of compression methods like MPEG2 or MPEG4 [14], [15]. This procedure works modifying the image frequency domain coefficients; it thus has a minimal impact in the spatial domain of the picture. Let us consider B as a block of pictures and h the hash value that should be embedded in each picture belonging to the block B. Note that, due to specific features of the mark extraction (discussed in Section IV-B), the mark must be represented as a sequence of {−1, 1}. For this reason h (which is a sequence of {0, 1}) is first simply converted into a sequence of {−1, 1}. Then, for each picture pi (with 1 ≤ i ≤ m) in B, the mark h is embedded in pi using the following equation: Pi PiH
=
DCT{pi (Y)}
= PiHF + |PiHF | ∗ α ∗ ni ∗ hW
let B be a video block belonging to the stream S; let h be the hash value to be embedded in B; let p1 , . . . , pm be the pictures belonging to B; let α be an amplification factor; let PRSG(◦) be a pseudo random sequence generator algorithm; H Rep = [h, h, . . . , h] // concatenation of the // h replications = Reshape to asquare matrix hRep hRep−M // Place data into high 0 0 hW = // frequency DCT domain Rep−M 0 h for i = 1 to k { ni = PRSG() Pi = DCT{pi (Y)} Pih = PiHF + |PiHF | ∗ α ∗ ni ∗ hW pi (Y ) = DCT−1 {Pih (Y)} } TABLE IV Function E(◦) retrieving the hash value from a block of video pictures
let B be a video block belonging to the stream S; let p1 , . . . , pm be the pictures belonging to B; let PRSG(◦) a pseudo random sequence generator algorithm; let SIGN {◦} is the mathematical signum function x=0 for i = 1 to m { ni = PRSG() Pi = DCT{pi (Y)} ∗ ni PiHF = Extract high frequency from Pi PiHF V = Reshape PiHF to a vector HF V // Consider as [h, . . . , hj , . . . , h] Pi hj hi = x = x + hi } h = SIGN{x}
(1) (2)
B. Retrieval of information Equation 1 computes the DCT transformation over the luminance (Y) component pi (Y ), obtaining Pi . Equation 2 embeds hW , a repeated and reshaped version of vector h, in the high frequency of Pi , denoted PiHF . To this goal, we define the parameters α, an amplification factor, and ni , a pseudo-noise sequence which is changed for each picture. In particular, ni is a matrix of binary ({−1, 1}) pseudo noise sequences. The spread sequence ni ∗h is multiplied by an amplification factor (α), replicated according to a replication factor, and finally reshaped to a matrix of the same dimensions of the picture. The mark is scaled by the luminance frequency coefficients, so that it is spread proportionally to the DCT coefficients. Finally, the resulting sequence is summed up to the highest frequencies coefficients PiHF of the DCT transformed picture Pi . The pseudo code of the embedding algorithm is shown in Table III.
Since the mark has been embedded in each picture of the block, the receiver extracts the mark from each picture. This is necessary because the extraction from a single picture is, in general, affected by errors. Thus, the mark is obtained combining the information extracted from all the pictures in the block. Given a picture pi in an incoming block, the receiver performs a DCT transform over the luminance component (Y) of the picture denoted Pi . Then, it extracts the mark from the high frequencies of Pi (which are the frequencies where the mark was embedded). This is achieved using the following equations:
637
PiHF ∗ ni
= PiHF ∗ ni + |PiHF | ∗ α ∗ ni ∗ hW ∗ ni = PiHF ∗ ni + |PiHF | ∗ α ∗ hW
Let hi (j) be the j th replica of the hash inside picture i and Red be the number of hash replications in each picture. The receiver combines the marks extracted from each picture using the following equation: i=k j=Red {Pi (j) ∗ ni (j) + |Pi (j)| ∗ α ∗ hi (j)} i=1
(3)
j=1
where the inner sum accumulates mark replicated within a given picture, and the outer sum accumulates all the marks extracted from all the pictures in the block. The choice of the number of pictures to be embedded in a block is essential: If the number of pictures is too small, the introduced mark redundancy could be insufficient and the extraction fail. Block dimensioning is an essential operation that must be done by the sender in the preprocessing phase of the video streaming. Since the first term in Equation 3 (Pi (j) ∗ ni (j)) expresses uncorrelated signals, their sum tends to 0 thus, if the number of pictures in a block is sufficiently high, Equation 3 becomes: i=k j=Red {|Pi (j)| ∗ α ∗ hi (j)}. i=1
(4)
j=1
Thus, mark extraction is performed over Equation 4 using the signum function as: i=k j=Red {|Pi (j)| ∗ α ∗ hi (j)} sign i=1
j=1
i=1
j=1
i=1
j=1
i=k j=Red sign {|Pi (j)| ∗ α} ∗ h
but
i=k j=Red sign {|Pi (j)| ∗ α} ≡ 1.
The pseudo code of the extraction algorithm is shown in Table IV. V. C ONCLUSIONS We have presented a novel scheme for authenticating video streaming in a multicast satellite network. The proposed scheme leverages peculiar features of satellite link, like low BER and no packet reordering. In particular, this work shows how it it possible to use watermarking to embed security related information. The security properties enforced are authentication and integrity of the multicast stream. These features are achieved using just one digital signature, combined with
a pre-computed hash chain. In particular, we embed the hash chain inside the video by means of a watermarking technique. This solution provides several benefits; on the sender side: The computation overhead sums up to a digital signature for the whole stream and a hash per block; as for bandwidth, security related information does not consume this resource, since these data are inserted in the video stream pictures through negligible modifications (at least, not recognizable by humans). On the receiver side: Authentication is performed on a per block basis; no extra storage is required on the receiver, except for the storing of a hash value; the computational overhead on the receiver side is limited to one digital signature verification, that is amortized over the whole multicast stream, and one extra hash computation per block. Future work strives to extend the proposed scheme providing resilience to increasingly lossy channel. R EFERENCES [1] F. Lef`ebvre, J. Czyz, and B. M. Macq, “A robust soft hash algorithm for digital image signature.” in ICIP (2), 2003, pp. 495–498. [2] O. Goldreich, “Two remarks concerning the goldwasser-micali-rivest signature scheme,” in CRYPTO, 1986, pp. 104–110. [3] Digital signature standard (DSS). Washington: National Institute of Standards and Technology, 2000, federal Information Processing Standard 186-2. [4] R. Rivest, “The MD5 message-digest algorithm,” RFC 1321, 1992. [5] N. B. of Standards, “Digital signature standard,” National Bureau of Standards, Tech. Rep. FIPS Publication 186, 1994. [6] J. Dittmann and F. Nack, “Copyright-copywrong,” IEEE MultiMedia, vol. 07, no. 4, pp. 14–17, 2000. [7] F. Benedetto, G. Giunta, and A. Neri, “End-to-end quality of service assessment in mobile applications for user-centric multimedia networks by tracing watermarking,” in IEEE 12th Symp. on Communication and Vehicular Tecnology, Twente, Netherland, 2005. [8] A. Annese, P. Barsocchi, N. Celandroni, and E. Ferro, “Performance evaluation of udp multimedia traffic flows in satellite-wlan integrated paths,” in Proceedings of the 11th Ka and Broadband Communications Conference, Sept. 2005, pp. 267–275. [9] D. L. Gall, “Mpeg: a video compression standard for multimedia applications,” Commun. ACM, vol. 34, no. 4, pp. 46–58, 1991. [10] R. de Bruin and J. Smits, Digital video broadcasting: technology, standards, and regulations. Norwood, MA, USA: Artech House, Inc., 1999. [11] R. L. Rivest, A. Shamir, and L. M. Adelman, “A method for obtaining digital signatures and public-key cryptosystems, Tech. Rep. MIT/LCS/TM-82, 1977. [12] D. Eastlake 3rd and P. Jones, “US Secure Hash Algorithm 1 (SHA1),” RFC 3174, Sept. 2001. [13] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, and J. Crowcroft, “Forward error correction (fec) building block,” , United States, 2002. [14] M. Barni, F. Bartolini, and N. Checcacci, “Watermarking of mpeg-4 video objects.” IEEE Transactions on Multimedia, vol. 7, no. 1, pp. 23–32, 2005. [15] S. Biswas, S. R. Das, and E. M.Petriu, “An adaptive compressed mpeg2 video watermarking scheme,” IEEE Transactions on Instrumentation and Measuremnent, vol. 54, no. 5, pp. 1853–1861, 2005.
638