Adaptive Steganography

Image

Bitsllipped

Baboon Clock Hats Lena New York Parrot

I~TeaDOt340

RS Steganalyzer reported value 0.0207 0.0249 0.0216 0.0204 0.0205 0.0296 0.0240 0.0246

Table 4: Performance of RS Steganalyzer when messagebits embedded only in active regions of image. s.

CONCLUSION

In this paper we have examined ways in which Alice and Bob can practice adaptive steganography in order to escape detection by known steganalysis techniques. We show that even if the warden is aware of the specific adaptive mechanism being used, Alice and Bob still gain in the sense that they can embed larger messages while escaping detection. We demonstrate the validity of our approach using RS steganalysis as an example and design some experimental ways to escape detection.

6. REFERENCES [1] G. I. Simmons, "Prisoners' Problem and the Subliminal CI1annel", CRYPTO 83- Advances in Cryptology, August 22-24.1984. pp. 51-67. , [2] N. F. Johnson, S. Katzenbeisser, "A Survey of steganographic techniques", in S. Katzenbeisser and F. Petitco1as (Eds.): Infonnation Hiding, pp. 43- 78. Artech House, Norwood, MA, 2000. [3] N. F. Johnson, S. Jajodia, "Steganalysis: The investigation of Hidden Information", IEEE Infonnation Technology Conference, Syracuse, NY, USA, 1-3 September 1998. [4] N. F. Johnson, S. Jajodia, "Steganalysis of Images created using current steganography software". in David Aucsmith (Ed.): Info17nation Hiding. LNCS 1525. pp. 32-47. Springer-Verlag Berlin Heidelberg 1998. [5] A Westfeld, A. Pfitzmann. "Attacks on Steganographic Systems'., in Andreas Pfitzmann (Ed.): Infonnation Hiding. LNCS 1768. pp. 61-76, Springer-Verlag Berlin Heidelberg, 1999. [6] I. Fridrich, R. Du, M. Long, "Steganalysis ofLSB Encoding in Color Images.', Proceedings of ICME2000. New York City. July 31-August 2. New York, USA . [7] I. Fridrich. M. Goljan and R. Du. "Reliable Detection ofLSB Steganography in Color and Grayscale Images". Proc. of the ACM Workshop on Multimedia and Security, Ottawa. CA. October 5.2001, pp. 27-30. [8] I. Avcibas, N. Memon and B. Sankur, "Steganalysis Using Image Quality Metrics... Security and Watennarking of Multimedia Contents, SPIE. San Jose, CA. February 2001. [9] R. Chandramouli and N. Memon, "Analysis of LSB Based Image Steganography Techniques." Proceedings of the International Conference on Image Processing, Thessaloniki, Greece, October 2001. [10] Andreas Westfeld. F5-A

Steganographic Algorithm: High Capacity Despite Better Steganalysis. S. 289-302 in Ira S.

Moskowitz (Hrsg.): Information Hiding. 4th International Workshop. IHOI, Pittsburgh, USA. April 2001, Proceedings. LNCS 2137. Springer- Verlag Berlin Heidelberg 2001. [11] Faisal Alturki and Russell M. Mersereau. Secure blind image steganographic technique using discrete fourier transformation. In Proceedings of the IEEE International Conference on Image Processing, lCIP '01, Thessaloniki, Greece, October 2001. [ 12]

78

http:/ /www

.cl.cam.ac.uk/

Prac. SPIE Val. 4675

-fapp2Jwatennarking/benchmark/image_database.hbnl

i

subjective visual quality of a suspected stego-image.. In this paper we restrict our attention to the passive warden caSeaIJ assumethat no changes are made to the stego-object by the warden Wendy.

Figure 1: Framework for Secret Key PassiveWarden Steganography. Alice embedssecret messagein cover imag (left). Wendy the warden checks if Alice's image is a stego-image (centre). If she cannot determine it to be so, sb passesit on to Bob who retrieves the hidden messagebasedon secret key (right) he shareswith Alice.

It should be noted that the main goal of steganography is .to communicate securely in a completely undetectable manner. Th; is, Wendy should not be able to distinguish in any sensebetween cover-objects (objects not containing any secret messag and stego-objects (objects containing a secret message). In this context, "steganalysis" refers to the body of techniques th~ aid Wendy in distinguishing between cover-objects and stego-objects. It should be noted that Wendy has to make thi distinction without any knowledge of the secret key which Alice and Bob may be sharing and in some cases (pUI steganography) without knowledge of the specific algorithm that they might be using for embedding the secret messag Hence steganalysis is inherently a difficult problem. However, it should also be noted that Wendy does not have to glea anything about the contents of the secret message m. Just determining the existence of a hidden message is enough. This fac makes her job a bit easier.

Given the proliferation of digital images, and given the high degree of redundancy present in a digital representation of a image (despite compression), there has been an increased interest in using digital images as cover-objects for the purpose ( steganography. The simplest of such techniques essentially embed the message in a subset of the LSB (least significant bi plane of the image, possibly after encryption [2]. It is well known that an image is generally not visually affected when i least significant bit plane is changed. Popular steganographic tools based on LSB like embedding vary in their approach fi hiding information. Methods like Steganos and Stools use LSB embedding in the spatial domain, while others like Jstt embed in the frequency. For brevity, we collectively refer to these techniques as LSB steganography techniques. For a goc survey of steganography techniques, the reader is referred to [2]. In this paper we restrict our attention to LSB steganograpl where digital images are being used as cover objects. However, the principles developed here are more general and apply ~ well to ?ther t~s of steg.anograp~ytechniq~es and with other types of cover-data like video and audio., .

The development of techniques for image steganography and the wide-spread availability of tools for the same have led to ~ increased interest in steganalysis techniques for image data. The last two years have seen many new and powerl steganalysis techniques reported in the literature. Perhaps some of the earliest work in this regard was reported by Johns< and Jajodia [3],[4]. They mainly look at palette tables in GIF images and anomalies caused therein by common stego-too] A more principled approach to LSB steganalysis was presented in [5] by Westfeld and Pfitzmann. They identify Pairs , Values (poV's) which consist of pixel values that get mapped to one another on LSB flipping. For example the value 2 ge mapped to 3 on LSB flipping and likewise 3 gets mapped to 2. So (2, 3) forms a PoV. Now LSB embedding causes tI frequency of individual elements of a PoV to flatten out with respect to one another .So for example if an image has 50 pixe that have a value 2 and 100 pixels that have a value 3, then after LSB embedding of the entire LSB plane the expect( frequencies of2 and 3 are 75 and 75 respectively. This of course is when the entire LSB plane is modified. However, as 10 as the embedded message is large enough, there will be a statistically discernible flattening of PoV distributions and this fa

70

Proc. SPIE Val. 4675

However,

this,is the main limitation

of their technique. LSB embedding , ., .

can only

Wendy. lines

but with

,

higher

detection ,

rate even

for short ..~

messages

was proposed

by Fridrich,

color planes. They then show that the ratio of close colors to the total number of unique colors increases image as opposed to when the same message is ". It is this difference that enables them cover-images from steg-images for the case of LSB steganography. A more sophisticated technique that accuracyeven for short messages was presented by Fridrich et. al. in [7]. Since we make for obtaining our experimental results, we give a detailed explanation of it in sub-

Memon and Sankur [8] present a general purpose technique for steganalysis of images that works with a wide of embedding techniques including but not limited to LSB embedding. They demonstrate that steganographic ., , c~ multivariate

regression

on the selected quality , ,

metrics .,

and a training , ,

, ' The steganalyzer set of cover and stego images.. In the ..-

, -J results with the chosen feature set and well-known watermarking and steganographic techniques indicate their , -1 cover images and stego-images for a wide variety of stegoThey report moderately successful results eyen when the stego-technique was unknown to the detector and not

l analysis of LSB steganography and investigate under what conditions can an between stego-images and cover-images. They derive a closed form expression of the probability of , , ., ., ; ... , ' , ~ the steganalysis problem as a multiple hypothesis testing problem and pool results at individual pixels to detection rule. Although this work is interesting from an academic point of view, il;resul~sin loose upper steganographic capacity of LSB encoding in general. Some of the specific techniques described above for LSB , , , , .

2. ANAL YSIS OF ADAPrIVE STEGANOGRAPHY framework described so far Alice and Bob have no knowledge of the steganalyzer that the warden is using to detect stego-images, whereas Wendy can have knowledge of their embedding algorithm (but not their -Alice and Bob are well aware about the capabilities of Wendy and may take explicit : will employ. This awareness could be in the of general knowledge about the current state-of-the-art in steganography. For example, if Alice and Bob are aware , -.1 make sure that the message embedded in 'each image is short and-also . , Having some knowledge about Wendy's detection mechanisms, that Alice and Bob will adapt their embedding algorithm accordingly in order to avoid detection. Intuitively this is what we mean by adaptive steganography. Now, it would violate one of the basic principles of c- .steganography) if we assume that Wendy is ignorant of Alice's and Bob's adaptive embedding strategy. ' steganalyzer, we show in this

Alice and Bob can practice adaptive steganography depending on the amount and type if they have about Wendy's detector and the constraints governing their communication abilities. We can -.-, --,

Prac. SPIE Val. 4675

1

2.

3.

4.

Alice and Bob use a technique that cannot be detected with Wendy's steganalyzer. So if Wendy is using steganalyzer for LSB detection, Alice and Bob can use a technique other than LSB embedding. For example, th' could embed in the next significant bit plane. Or they could embed in the transform domain using a techniq similar to [10] and [11]. However, this situation is really not of interest from a theoretical point of view as Wend who we assume has full knowledge of Alice's and Bob's strategy, can develop anew steganalyzer that will det( the technique and we are back to square one. Alice can restrict the number of bits that they embed such that Wendy is not able to detect the message given t parameters she is using in her steganalyzer. In this case Alice and Bob need not share any extra information otl than the secret key which they use to embed the message. Alice and Bob embed the message only in locations where it is hard for Wendy to detect. For example, Alice a Bob can only embed in pixel locations where the immediate neighborhood has high activity and any statisti measurement that Wendy may attempt to make would be affected by the high degree of "noise" present in 1 neighborhood. Note, that in order to use such a strategy Alice and Bob have to pre-decide the specific mechanis to be employed. Alice can first embed the message and then further process the stego-image in order to mask the hidden message ~ hence avoid detection. The post-processing could be as simple as addition of Gaussian noise or could be somethil more involved that makes explicit use of Alice's knowledge of Wendy's steganalyzer.

As we stated above, the first case presented above is ad-hoc and not of further interest. In the rest of this section we exalI the remaining cases in a mathematical setting and show that indeed Alice and Bob can gain something by using any of three strategies. We defer the analysis of case 3 to the full version of the paper as it really turns out to be very similar to ( 4. In the rest of this section we take a closer look at cases 2 and 4 and show how Alice and Bob could benefit by using s an approach. Let us examine case 2 of adaptive steganography, where we assume that Alice and Bob know Wendy's detection techru and try to escape detection using a smart strategy that does not give sufficient information to Wendy. From statisl detection theory point of view we can state that in order to detect the presence of a hidden message within a cel probability of error, Wendy has to observe at least N features or samples [9]. Therefore, if Alice chooses the numbe features that carry the hidden message to be less than N, Wendy's detection probability will decrease. Of course this o also mean a reduction in the hiding capacity for Alice since the number of message carriers will be less in this case. formulate this problem as follows. For Wendy's detector let Ho and H 1respectively denote that hypothesis that the obse signal does not contain a message and the contrary. Now, Wendyobserves a total 'of Lfeatures in the image presented t< by Alice and performs a statistical hypothesis test on each of these features (or a subset) to determine if a message is pre In general, each of these tests on the individual features could be different in their approach or the parameters used. u denote by aiand individual

Pt,i=1,2,...Lthe

respective false alarm (P(HliHo»and

statistical detectors. For simplicity, in the following

miss probabilities

(P(HoIHl»of

analysis we assume ai = Pt,i = 1,2,...L .After

W

chooses these parameters and performs the corresponding tests, she arrives at L decisions. The question now for Wend how to combine these L decisions for a fmal decision such that a global false alarm and miss probability is attained' immediately see that this problem is combinatorially hard. The reason for this is as follows: Wendy has to design a g fusion detector (function) that takes L binary inputs and gives a binary output (final decision about the presence or absen a hidden message). It is easy to see that there are 22L such global fusion rules! Picking the optimal fusion rule from SUI exponential growing class of functiOOs in L is an enormous task indeed. For example, consider the case when WeJ)dy each pixel in a 256x256 image for the presence of a LSB message. This means, Wendy has to fmd the optimal fusion de1 from 226"36 possibilities! ! Of course, some trivial fusion detection functions can be immediately discarded such as th' that decides no message is hidden when all the features test positive for its presence. But, such trivial functions are sm number compared to the problem at hand. In order to alleviate this problem, Wendy can use heuristics such as majority that agrees with the decision outcomes of the majority of the local statistical tests performed on the features. We observe heuristics mayor may not be optimal. Nevertheless, from adaptive steganography view-point this is not critical. Su Wendy uses the K-out-of-L global detector heuristic that decides the presence/absence of a hidden message if K oul detectors agree with this decision. Then, it is easy to see that the global detection probability (Pd) and the false probability ( Pf ) are given by:

72

Proc. SPIE Val. 4675

(0.1)

I'L)

=

L!

SinceAlice knows the detection strategyof Wendy, it is reasonableto assumethat Alice knows the

i!(L-i)!

of Wendy as a function of K (as shown in Figure 2) and the actual value of her detection. Then, Alice's strategy will be to choose a message length such that Wendy will be unable curve. This essentially leads to the notion of steganography capacity, an [9]. For example, consider the ROC curve in Figure 2. If Wendy chooses to maximize her detection (K=I in Eq. (0.1» at the expense of maximizing her false alarm probability, then she detects a hidden message , This means, theoretically, that Alice's steganography capacity to zero. On the other hand, if Wendy decides to minimize her false detection (therefore minimize her detection K=20 in Eq. (0.1), then, the steganography capacity of Alice is nearly 20 times the number of bits feature. Of course, for the other non-extreme cases,Alice and Wendy play the game of not allowing the other operate in the desired point on the ROC curve. Specifically, Alice's strategy would be to choose a value of K as the of features where she will hide the message (how these features are chosen is an entirely different question) such that a detection probability that. is less than what she aimed for. Assuming, Alice is more concerned , we can reason that the value of K in the ROC curve that Wendy chooses serves as an upper on Alice's steganography capacity. We point out here that the tightness of this bound needs further investigation. we note that since Wendy's global detector deals with discrete random variables, not all possible values-of Pj and by it.

0.9

0.8

0.7

0.6

a.""'0.5

0.4

0.3

O.Z

0.

n. 0

0.2

0.4

0.6

0.8

PI

Figure 2: ROC for the global detector as a function of K when a = 0.45, /3 = 0.25, L = 20. now consider the last adaptive steganography case. There are many possible strategies for Alice to increase the ..~1~'~ ~ For example, after embedding a message,Alice can add zero-mean noise to those samples (or that do not carry the message. This way, when Wendy attempts to search for a message using local statistical in most steganalysis techniques) of the stego signal, she will find it difficult to estimate or detect the This is clear because the additional noise will increase the variance of the local statistical measures being used in mechanism thereby increasing the estimation error and probability of detection error .We study this effect from . Proc. SPIE Vol. 4675

an information theoretic formalism using a cascadedchannel model as shown in Figure 3. In this figure, M stands for th~ secretmessage,X is the stegosignal and y is the stego'signalafter modification by Alice to achieveadaptive steganography The cover signal servesas the first channeland the processingundergoneby X can be thought of as a second channe Clearly, X is obtainedfrom M and y fl:omX~

M

Cover

x

(Channel 1)

Process stego ?;

y

signal ( Channeil:L

Figure 3: A cascadedchannel model for adaptive steganography.

It is reasonable to assume that,

I(M;YIX)=o

where I(.) denotes mutual information. In plain terms, Eq. (1.1) means that output of Channel 2 depends statistically only on its input and this happens to be true for this case of adaptive steganography. Now, we know that [6], I(M;X)=H(M)-H(M I(M;Y) = H(M)-H(M

IX) IY)

where H(.) stands for the unconditional entropy. It is enough to show that I(M;X)

~ I(M;Y)

since this means that the

amount of information Wendy gets about M from y (as a result of adaptive steganography) is less than what she will get from X (non-adaptive steganography). To prove the required inequality we resort to information theory for cascade channels [6]. We start with the following well-known equation [6],

I(M;YX)=

I(M;Y)+I(M;X

IY)

=I(M;X)+I(M;Y

IX

and equate the right hand sides and use Eq. (1.1) to obtain I(M;X)=I(M;Y)+I(M;X Since I(M;X

IY)~Ofrom

Eq. (1.4) we see that I(M;X)~

IY) I(M;Y)

showing that there is something to be gained by

adaptive steganography. Of course, the amount of gain will depend on various factors such the embedding algorithm, the steganalysis technique, the stego signal processing etc.

3. EXPERIMENT

AL RESUL TS

In this section we show practical examples of adaptive steganography. We develop our examples for the RS Steganalysis technique proposed in [7]. However similar examples can be constructed for other techniques too. The reason we chose the RS steganalysis technique to demonstrate the practical feasibility of adaptive steganography is first because to the best of our knowledge the RS steganalyzer is currently the state-of-the-art in LSB steganalysis. Second, the technique is very simple to implement. In the rest of this section we examine the four cases of adaptive steganography identified in section 2 and for each of them give a practical realization within the framework of LSB steganography and RS steganalysis. For our experiments we used the watermarking benchmark image database from [22]. The database contains a variety of images including computer generated images, images with bright colors, images with reduced and dark colors, images with textures and fine

74

Proc. SPIE Vol. 4675

and edges, and well-known images like l:.ena, peppers etc. All the images were converted to grey scale and experiments. -~ subsection we describe in more detail the RS steganalyzer which despite its simplicity provides very accurate of LSB steganography. Given an 8 bit grayscale image (so assumed for ease of exposition) an RS steganalyzer -'I pixel in the group. The pixel group is defined as: G = (XI,

a discrimination

function, f constructed

X2, X3, ...,

Xn}

such that it captures the smoothness of a particular

group of pixels,

is defined

three invertible operations, Fn (x), n = -1,0,1, on pixel values x, are defined as follows: I. F1(x) is defined as the operation that performs flipping of the LSB of an image pixel. It effectively maps pixel values as follows: F; : OH

1, 2 H ~, ..., 254 H 255

2. F_1(x) maps pixel values in the opposite direction as compared to F1. Specifically the mapping F_1is defined as follows: F -I:

3. Identity

function,

-I

~

0, I ~

2~

., 255 H 256

Fo{x) that simply maps a pixel value to itself.

or F -1 are extended to a group of pixels G by using a mask M, which specifies what operation will be which pixel in the group. For example, ifG = (39,38,40,41) and M = (-1,0,-1,0), then FM(G) = (F-1(39), Fo(38), F-

using the operations F1, F-l and the discrimination functionf, three types of pixels groups are defined as follows: Ge R ~

f(F;(G»

> f(G)

Ge S ~

f(r;(G»