Although our methods are mainly applied to MPEG encoded video streams, they can ... that can be hidden, the perceptibility, the robustness which is modeled using the ..... If we are allowed to embed totally 101 bits, m will be d101=11e = 10.
Watermarking Methods For MPEG Encoded Video: Towards Resolving Rightful Ownership Lintian Qiao and Klara Nahrstedt Department of Computer Science University of Illinois at Urbana-Champaign l ? qiao@cs:uiuc:edu; klara@cs:uiuc:edu Abstract
Various digital watermarking techniques have been proposed in recent years as the methods to protect the copyright of multimedia data. However, as pointed out by the IBM research group, the rightful ownership problem has not been properly solved. Currently, there are two proposals to solve the ownership problem. Unfortunately, one proposal lacks a formal proof and the other can be easily defeated. In this paper, we present watermarking methods which will be successful in resolving rightful ownership of watermarked MPEG video. By introducing speci c requirements in the watermark construction, the proposed scheme is proved to be non-invertible. We show the invertibility of those schemes which do not use the original in its veri cation, discuss in depth various issues of the watermark construction requirements, extent the basic idea to create a non-invertible scheme for image, and discuss the usage of dierent non-invertible schemes.
Keyword: Watermarking of MPEG video, IBM attack on watermarking schemes, Copyright ownerships, Non-invertibility.
1 INTRODUCTION
With the growth of multimedia systems in distributed environments, the research of multimedia security as well as multimedia copyright protection becomes an important issue. Digital watermarking techniques have been proposed in recent years as the methods to protect the copyright of multimedia data. There are various watermarking schemes applied to images and several methods applied to audio and video streams. Among them, a large class of the watermarking schemes address invisible watermarks. The research and scheme design mainly focus on the issues such as Perceptual Eects { the watermark should be invisible and the distortion should be minimized; Robust { the watermark must be dicult to remove and immune to the multimedia data operations such as digital-to-analog and analog-to-digital conversion, requantization, dithering, resampling, rotation, scaling, cropping, etc. The watermark schemes should also be robust against the multiple-document attacks. However, there are still some questions. One of them, as pointed out by Craver et al from IBM [CMYY96], is how to resolve the rightful ownership of the invisible watermarking schemes. Craver et al attacked existing watermarking techniques by providing counterfeit watermarking schemes that can be performed on a watermarked image to allow multiple claims of ownerships. We refer to their attack as the IBM attack. The rightful ownership problem is either not addressed at all or not addressed properly within current existing watermarking techniques. Furthermore, the IBM group de ned the concept of \non-invertibility". Informally, non-invertibility means that it is computationally impossible for an attacker to nd a pair of a faked image and a watermark This
work was supported by the ICLASS Grant and National Science Foundation Career Grant CCR-96-23867.
1
such that the pair can result in the same watermarked image created by the real owner. The IBM group also tried to design a non-invertible scheme which is a modi ed version of the watermarking method proposed by Cox et al. Unfortunately, their scheme can not be proved to be non-invertible. In this paper, we present watermarking methods which will be successful against the IBM attack. We introduce requirements on the watermark construction so that the whole watermark has to be created by using a standard encryption function, e.g. DES. Because of these requirements, the non-invertibility of the proposed scheme is easily proved. Although our methods are mainly applied to MPEG encoded video streams, they can be easily extended to many image schemes or other compressed video schemes. We also discuss various implications of the watermark construction requirements and derive a modi ed version of the watermarking method proposed by Cox et al. This is a watermarking scheme applied to a raw image and its non-invertibility is also easily proved. The invertibility of those schemes which do not use originals in the verifying process and the usage of dierent non-invertible schemes are also discussed. The paper is outlined as follows: Section 2 presents relevant related work in the area of multimedia watermarking. Section 3 discusses in details the IBM attack and known attempts against it. Section 3 also discusses the invertibility and the property of those schemes which do not use the original in their veri cation process. In Section 4 we rst present our watermarking scheme on MPEG encoded video with the construction algorithm of the watermark. We then prove that the proposed scheme is non-invertible and discuss in depth the implications on the requirements of the watermark construction and issues related to the scheme. In addition, we present another non-invertible scheme, which is applied to raw image, and prove its non-invertibility. In Section 5, we discuss the usage of dierent non-invertible schemes. Section 6 concludes the paper.
2 RELATED WORK
Although our primary focus is on the rightful ownership problem of the watermarks, in this section, we will present not only related work on this problem but also on general watermarking schemes, watermark construction, and watermarking algorithms. All these schemes are important in the understanding of solving the actual problem \rightful ownership". During the past few years, a number of digital watermarking methods have been proposed. Among the earliest works, L.F. Turner [Tur89] proposed a digital audio watermarking method which substitutes the least signi cant bits of randomly selected audio samples with the bits of an identi cation string (watermark). Similar idea can also be applied to images [vSTO94]. However, because of the use of least signi cant bits, the identi cation code can be easily destroyed. There are many other proposals for watermarking such as Tanaka's schemes [TNM90] which use the fact that the quantization noise is typically imperceptible to users, Brassil's methods [BLMO94] for textual document images, and Caronni's geometric patterns (also called tags) [Car95]. These schemes are easily defeated by ltering, redigitization, geometric distortion, croppinga , requantization, etc. Bender, Gruhl, and Morimoto from MIT [BGM95] discuss two dierent approaches to watermarking: (1) Patchwork, and (2) Texture Block Coding. Patchwork is a statistical approach. It randomly chooses a pair of image points (ai , bi), then increases the brightness of ai by one and decreases the brightness of bi by one. These two steps are repeated n times. The expected value of the sum of the dierence of such n pairs is then 2n. This method is resistant to compression and FIR lters and remarkably resistant to cropping. However, it assumes that all brightness levels are equally likely which is not a practical assumption. Texture Block Coding is a visual approach. It hides data in continuous i random texture patterns. The texture blocking scheme is implemented by copying a region from a random texture pattern found in a picture to an area which has similar texture. The auto-correlation of the image results in the recovery of the shape of the region. It's obvious that this method will be limited to those images with a lot of textures. a Cropping eliminates parts of an original image and also allows the remaining parts of the image to be enlarged to ll the space.
2
Smith and Comiskey [SC96] theoretically discuss the characteristics of the data hiding schemes, such as the amount of information that can be hidden, the perceptibility, the robustness which is modeled using the quantities of channel, the signal-to-noise ratio, and the jamming margin. They introduce new data hiding schemes whose parameters can be adjusted with the capacity, imperceptibility, and robustness. Koch, Rindfrey, and Zhao [EK96] present an image watermarking method. The image is grouped into 8x8 blocks and each 8x8 block is DCT transformed. They randomly choose a subset of those blocks, and for each block in that subset, one of the 18 possible triples of frequencies is selected and modi ed so that their relative strengths encode as 0 or 1. However, because the choice of the frequency set is not based on any perceptual signi cance and the variance of the frequency coecients is small, it may be sensitive to noise or distortions. It is also vulnerable to multiple-document attacksb . Cox, Kilian, Leighton, and Shamoon [CKLS95] argue that a watermark must be placed in perceptually signi cant components of a signal if it is to be robust to common signal distortions and attacks. To avoid the perceptual degradation because of watermarking those components, they propose to insert a watermark into the spectral components of the data using techniques analogous to spread spectrum communications, hiding a narrow-band signal in a wide-band. The watermark consists of 1000 randomly generated numbers. The length of the watermark is variable and can be adjusted to suit the characteristics of the data. Langelaar et al [GCL97] propose two image watermarking methods. The rst one extends the existing spatial labeling technique which adds a positive integer constant k to the brightness of 50% of the pixels in an image. This constant k is called the label embedding level. By dividing the image into blocks and searching an optimal label-embedding level k for each block instead of using a xed embedding level, a large and more robust label can be embedded in an image. The second method removes high frequency DCT-coecients in some areas to embed a label. However, this method may remove too many DCT-coecients therefore cause distortions. Hartung and Girod [HG97] provide a watermarking scheme speci cally for MPEG encoded video. The basic idea is to apply DCT to each of 8x8 blocks of the watermark and then add the DCT coecients of the watermark to the corresponding DCT coecients of MPEG stream. However, if the length of the new Human codeword is longer that the old one, then the algorithm restores the old DCT coecient of the MPEG stream. This method guarantees that the length of watermarked video stream does not increase. The problem is that it may violate the requirement of labeling most signi cant components. The watermark embedding process is also slow. It can only handle data at a rate of several bytes per second. Craver, Memon, Yeo, and Yeung [CMYY96] from IBM address an important issue of rightful ownership. They provide as an attack counterfeit watermarking schemes that can be performed on a watermarked image to allow multiple claims of rightful ownerships. We will refer to this attack as the IBM attack. We will discuss this topic in detail in the next section.
3 RIGHTFUL OWNERSHIP
The purpose of a watermark is to protect the owner's copyright. But without a careful scheme design and proper requirements on the watermark, an attacker can easily confuse everyone by manipulating the watermarked video (image, audio) and claim that he/she also is the original owner. This is called the \rightful ownership" problem. Craver et al provided the following scenario: given a watermarked video (image), it is possible for an attacker to watermark the watermarked video (image) again using any watermarking scheme. This twice watermarked video (image) has both original and attacker's watermarks on it. Both the original owner and the attacker can claim the ownership, therefore, defeat the purpose of using watermark. Using the original video clip (image) in the veri cation process can prevent the multiple ownership problems in some cases. However, even with the presence of the original video clip (image), the rightful ownership problem still exists. Craver et al showed that the following scenario is possible: b Multiple-document attack uses watermarked copies document t
D
0
D ; :::D
1
3
0
t
of the original document D to produce an unwatermarked
Notation Meaning V VW W W0 VF WF E D C
xi
Original video clips (or image) Watermarked video clips (or image) Owner's watermark Extracted (recovered) watermark Falsi ed original video clips (or image) Attacker's watermark Watermark inserting procedure: E(V; W) = VW Watermark extracting procedure: D(VW ; V ) = W 0; if V is not needed: D(VW ) = W 0 Similarity testing: given threshold , X and Y are similar i if c ; C(X; Y; ) = f 1; 0; otherwise. where c = cor(X; Y ) is the correlation function of X and Y , X and Y are either both images, or both watermarks. i-th element of X, where X could be V , VW , W, etc. Table 1: Basic Notation of Variables.
Assume that the original video clip (image) is V. The owner of the video clip uses watermark W to create a watermarked video clip (image) VW , and publishes VW . We denote this process as V W => VW
(1)
The attacker creates a watermark WF , without knowing V, extracts WF from VW and creates a counterfeit video clip (image) VF . Notice that, VF WF => VW
(2)
In this way, the attacker can use VF as his \original" and claim the ownership of VW . If (2) can be achieved, then that scheme is called invertible watermarking. Otherwise, it is called noninvertible watermarking. Now we formally de ne the invertibility and non-invertibility properties of a watermark scheme. Table 1 shows the basic notations of all variables used in this paper.
DEFINITION 1 [Weak Non-Invertibility]
A watermarking scheme (E, D, C) is (stronglyc) invertible if, for any video (image) VW , there exists a mapping E ?1 such that E ?1(VW ) = (VF ; WF ) and E(VF ; WF ) = VW
where the construction of E ?1 is computationally feasible and WF belongs to the set of allowable watermarks, VW and VF are perceptually similar, and, C(D(VW ; VF ); WF ; ) = 1. Otherwise, (E, D, C) is (weaklyd ) non-invertible. 2 c Craver et al's d Craver et al's
original de nition did not have this word. original de nition did not have this word.
4
De nition 1 comes from the Craver's paper [CMYY96]. However, the requirement of E(VF ; WF ) = VW is too restrictive (too strong) on the invertibility because VW could be altered and even the original V and W may not satisfy E(V; W) = VW in the veri cation process. On the other hand, the requirement of E(VF ; WF ) = VW is too loose (too weak) on the non-invertibility. We replace this requirement with E(VF ; WF ) = VW where C(VW ; VW ; ) = 1. 0
0
DEFINITION 2 [Strong Non-Invertibility]
A watermarking scheme (E, D, C) is weakly invertible if, for any video (image) VW , there exists a mapping E ?1 such that E ?1(VW ) = (VF ; WF ) and E(VF ; WF ) = VW 0
where the construction of E ?1 is computationally feasible and WF belongs to the set of allowable watermarks, VW and VF are perceptually similar, i.e., C(VW ; VF ; V ) = 1, VW and VW are perceptually similar, i.e., C(VW ; VW ; V ) = 1, and extracted watermark D(VW ; VF ) is similar to WF , i.e., C(D(VW ; VF ); WF ; W ) = 1. 0
0
0
0
0
0
Otherwise, (E, D, C) is strongly non-invertible. 2 We need to point out that all of the schemes discussed in Section 2 can not sustain against the IBM attack therefore fail to resolve the rightful ownership. Currently, there are two proposals to solve the ownership problem. First solution is presented by Craver et al. They suggested that one should use one-way function to map 1000 bits of the original image to a bit sequence and then use the bit sequence to select between two dierent marking functions, i.e., if the bit is 0 then use formula vi (1 + si ), else use vi(1 ? si ), where vi is i-th bit of the original image and si is the i-th bit of the watermark. It is believed (although it has not been proved) that this scheme is secure against any trial and error type of attacks that an attacker can attempt in order to construct a falsi ed original. There is a very important assumption (regulation) which is: si must be positive. In general, we need to enforce that the construction of any watermark has to follow certain rules. Otherwise, the multiple ownership problem can not be solved. Craver et al gave one such an example which shows that if si could be negative in their scheme, then their scheme is invertible. Here we show the importance of the watermark construction requirements by another example which is an extreme case.
EXAMPLE 1 [An Extreme Case] If there is no restriction on the construction and the format of the watermark, an intruder can simply choose one bit vWi from VW , set vWi = vWi ? 1 to form his falsi ed original VF and claim his watermark as (0 ... 0 1 0 ... 0) where only ith bit is 1. It is easy to see that this simple scheme satis es the invertibility de nition 2. 2
There is a class of invisible watermarking schemes which do not use original image (video clips) in the veri cation process. We will refer to these schemes as Self-Proof Class. Because the use of the Self-Proof Class is simpler than that of other watermarking schemes, it is interesting to study its non-invertibility. We want to know \Are there any non-invertible schemes which belong to the Self-Proof Class?". Unfortunately, the answer is NO.
THEOREM 1 The Self-Proof Class is inherently invertible. PROOF:
Let (E; D; C) be a scheme from the Self-Proof Class. Because this scheme uses only watermarked image (video clip) VW in the veri cation process, there is no way to verify how the watermark is created and how 5
the original image(video clip) looks like. Because there is no restriction on the structure of the watermark and the original, any kind of watermarks and originals are legitimate. Therefore, an attacker can always use the EXAMPLE 1 described above to make his/her ownership claim. This means any scheme of Self-Proof Class is invertible. 2 Of course, one can add the requirement of watermark construction to the Self-Proof Class schemes. But in order to verify the watermark construction, something other than watermarked image itself has to be presented. This contradicts to the concept and the de nition of Self-Proof Class scheme. Another attempt to resolve the ownership problem is provided by Wolfgang and Delp [WD97]. Their method uses timestamp to generate the watermark. Then the owner with the earliest timestamp is the true owner. However, this scheme can be easily defeated because timestamps can be manipulated. For example, for a certain event such as "the Berlin Wall came down", everybody knows exactly when it happened. If an attacker uses that time as his/her timestamp in watermarking the video/image taken during that event, then who is the original owner? The following example may be more realistic: in some videos, the real time clocks may be displayed somewhere in the picture, such as news broadcasting, basketball games, etc. The timestamps can be accurately decided in these cases. Again, if an attacker uses these timestamps, then who is the original owner? Based on the above discussion, we can conclude that the use of timestamp is not a good way to solve the watermark ownership problem.
4 NON-INVERTIBLE SCHEMES
One possible approach against IBM attack is to make a strict requirement on the construction of the watermark and bind the watermark with the original video (image) itself. This will greatly limit the choices of the watermark WF and the falsi ed \original" VF for an attacker. Clearly, if it is computationally impossible for an attacker to nd both WF and VF which satisfy formula (2), then the non-invertibility is achieved. In this section, we will rst derive such a scheme which applies to the MPEG-encoded video streams and then derive another one for raw image or uncompressed video stream. We will prove the non-invertibilities of both schemes. Of course, the derived schemes must not belong to the Self-Proof Class.
4.1 MPEG Based Non-Invertible Watermarking Scheme
In this subsection, we will refer to V as a single I frame within the I, P, B sequence of a MPEG stream.
4.1.1 Construction of Our Watermark
Here we only describe the watermark construction on a single image V which is encoded in JPEG format. It can be easily extended to the I frames of MPEG video clips. The method which we will use is based on the ideas from direct sequence spread spectrum communications. First, we choose a standard encryption function, such as DES, and a key KEY . In Zig-Zag order, we scan each block of the DCT-transformed image. Let ACb;l denote the value of the l-th AC coecient in the block b, where b = 1 ... nb, and nb is the number of blocks in V , l = 1:::63. Let ( NZb;l = 1 if ACb;l 6= 0; (3) 0 otherwise: Let nACl ; l = 1:::63 denote the total number of non-zero AC coecients at the l-th position of each block, i.e., X nACl = NZb;l (4) bnb
1
6
For example, if AC1;5 = 1, AC2;5 = ?2, AC1001;5 = 4, ACb;5 = 0 in all other blocks, then nAC5 = 3. Second, we tranform rst m (1 m nb) nACl into binary numbers and concatenate those binary numbers to form a PAD, i.e., PAD = nAC1:::nACm. The choice of m will depend on the number of bits which can be embedded into the frame V . For example, for a standard MPEG-1 video with picture size 352x288 (pixels), there are totally 1584 blocks. Therefore, 11 bits are required to represent each nACl because 210 = 1024 < 1584 < 2048 = 211. If we are allowed to embed totally 101 bits, m will be d101=11e = 10. Third, we apply DES with KEY to PAD and get EPAD = DESKEY (PAD). Forth, let us consider de nitions of the following bit sequences: - Let Aj be a bit sequence, where j=1 ... total number of bits allowed to embedded in V , such that
(
Aj = ?1 if EPADj = 0, 1 otherwise. - Let Bi be a bit sequence, such that Bi = A j ; where Cr is the chip-ratee .
(5)
(j Cr ) i < ((j + 1) Cr )
(6)
pi = ?1 if i-th bit of DESKEY (V ) is 0, 1 otherwise.
(7)
- Let pi be a bit sequence, such that
(
Last, we construct the watermark bit sequence by combining the sequences Bi and pi : wi = Bi pi ; i = 1 ... (Width x Height) (8) We will discuss how we select the scaling number in the next subsection. The Figure 1 shows a graphical representation of the whole watermark construction process. It is applied to the standard MPEG-1 frame with a size of 352(Width)x288(Height) pixels.
4.1.2 Watermarking Procedure
There are several assumptions which need to be mentioned before we describe our scheme. A1 We will derive a watermarking scheme for MPEG-compressed video without fully decoding the video to raw image sequence and then encoding again. We decode the video up to the DCT coecients and then embed a watermark into the stream at the DC and AC coecient level. A2 We set = 1. Tests on various MPEG clips show that, with chip-rate 1000, if we watermark the DC coecients, then the watermarked video has lower quality and the watermark is visible (see the right picture of Figure 3). One way to solve this problem is to increase the chip-rate, for example, to 5000, but this will decrease the amount of information that can be embedded by a factor of 5. Therefore, this is not a good solution. We choose not to mark any DC coecients. (Setting to a small number can also avoid the quality degradation, but this means that there are many choices for in one scheme and dierent can be chosen in dierent schemes. This uncertainty gives an attacker freedom to manipulate the veri cation process. Therefore, we x = 1.) A3 We only watermark I frames because I frames are the most signi cant frames in a video stream. The watermark of I frames will be carried over to P and B frames due to the dependency between I, P, and B frames. It is true that an attacker could drop all watermarked I frames to get an unwatermarked e Chip-rate:
In spread spectrum systems, this is the rate at which the discrete signal is applied.
7
PAD
nAC1 nA
nAC9
101 Bits
DES, KEY EPAD
Aj
0 1
...
1
-1 1
...
1
1
...
101 Bits
Cr = 1000 Bi
-1 -1 ... -1 1000bits
11
...
1 1 ..... 1
1000bits
1000bits Resulting Watermark ...
W
-1 1
p
1 -1
...
-1 1
1 0
...
0 1
0 1
Original Video Frame ...
i
101000 Bits
-1 1
101000 Bits
DES, KEY V
1 1
101000 Bits
Figure 1: Construction of the watermark. video stream. However, although P and B frames may contain some I-blocks which can be decoded by themselves, there is enough dependency between the I frame and the P and B frames, therefore, the quality of the P or B frames will be bad enough if the I frames are dropped. Furthermore, we only watermark the luminance blocks because they are more signi cant than the chrominance blocks. A4 We want to achieve that the total length of a watermarked stream is less than or equal to the total length of the original stream, so that the watermarking process does not defeat the compressing process of MPEG. A5 In our scheme, the original video stream and a key are needed for both the watermark creation and the veri cation process. This is justi ed by THEOREM 1. With the considerations of the above assumptions and requirements, we design an algorithm as shown in Figure 2 called ALGORITHM 1. The main steps of ALGORITHM 1 are: M1 A new watermark is created based on the encrypting information from the original I; We denote the process as wi = DESKEY (vi ). We then put the sequence wi into a two dimension matrix of (Height x Width). The result is a watermark in the form of a raw image. M2 DCT is applied to the watermark; M3 An AC coecient in the original image V is marked if the length of the resulting VLC (Variable Length Code) does not increase. 8
Let nACl denote the number of l-th AC coecients in zigzag order in the I frame, where l = 1, ..., 63. Let chip-rate be Cr , then e = b HeightCrWidth c bits can be embedded in a frame with size (Height x Width). Let each nACl be presented by d bits, d = dlog2 ( Height88Width )e. Let WACl denote the l-th AC coecient of any block in the watermark W. Let PAD = nAC1 ... nACm , where m = d ed e. For each I frame f = Constructthewatermark = Choose a KEY and a standard encryption function such as DES; EPAD = DESKEY (PAD); Let Aj = -1 if EPADj is 0, otherwise 1; Let Bi = Aj ; (j Cr ) i < ((j + 1) Cr ); Let pi = ?1 if i-th bit of DESKEY (Iframe) is 0, 1 otherwise; The watermark wi = Bi pi ; i = 1:::(Height Width); Put wi into Height Width matrix and then divided into Height88Width 8x8 blocks; Do DCT on each blocks and get corresponding AC coecients WACl ; = Watermarkingprocess = For each block in I frame For each none-zero-ACl f add WACl to ACl ; if (Val(WACl + ACl )==0) if (length(new code of next-none-zero-AC)