Performance Optimization of Close-Color Pair ...

4 downloads 0 Views 228KB Size Report
dimitoglou@hood.edu. Abstract - Close-Color Pair analysis allows for a reliable means to determine if a given image contains a secret message. Counting such.
Performance Optimization of Close-Color Pair Steganalysis Paul R Seymer

Dr. George Dimitoglou

Department of Computer Science Hood College. Frederick, MD 21701 [email protected]

MOTIVATION

700 600 500 400 300 200 100

Hex Value original

2 lsb 100% cha nge

4 lsb 100% change

Figure 1. Blue Channel of test image after modifications to LSB content.

118

0 103

Steganalysis is the technique of detecting steganography. Our work is focused on a particular type of Steganalysis, centered on the encoding of the Least Significant Bits (LSB) in 24-bit Bitmap (BMP) image files. BMPs are well structured, lossless, and have a color palette of over 16 million colors, making them ideal cover images [9]. Image data stored in BMP files are arranged in sets of 8-bit numbers corresponding to the color intensity of red, green, and blue in each of its image pixels [7,8]. The LSBs of these 8-bit numbers are the 4 rightmost bits. The rightmost bit (bit number 8) is particularly interesting because it can be flipped without greatly changing the pixel’s color, so it lends itself for use by steganographic tools that hide messages in LSBs. Changing the other bits in the number can

800

88

II.

Blue Channel

73

Steganography [5] comes from a combination of Greek words, stegos and grafein, and translates to “covered writing”. It is the notion of hiding a message inside a medium, such as an image, text, or audio file. It can be thought of as kin to cryptography [4], in that the intension is to keep secrets secret, and one must know how to extract a hidden message out of the medium in the same way that a cipher-text needs to be decrypted. It can also be though of as opposing cryptography, in that strength of an encrypted message lies in the difficulty in decrypting the cipher-text, while the strength in steganography lies in the notion that it is not known the medium contains a hidden message. Typically, messages are hidden, by altering the composition of the bits in some way that can later be recovered. The files to be altered are called cover images. Cover images are combined with a secret message, to form a steganogram. The success of steganography lies in the notion that an observer cannot differentiate between a steganogram and a clean image (one with out hidden information).

58

INTRODUCTION

Count

I.

43

Keywords: Steganography, Steganalysis, Information Hiding, BMP.

produce changes that are easily detected by the human eye, and in doing so, defeats the notion of hiding data [9, 10]. There are many methods for analyzing statistical characteristics of a steganogram’s LSBs [1, 2, 3]. Some methods focus on the color histogram [3] or frequency of color intensity occurrences of the steganogram. However, for these methods to be used to detect the presence of a hidden message, specific information about the original cover image must be known such as its natural noise characteristics, or the messagehiding process (the steganographic algorithm). For example, by observing the pixels and color channels of a 24-bit BMP, it is shown that altering the LSBs produces noticeable noise, manifesting as jaggedness on the curve of a plot of color channel histograms. The diagram in Figure 1 represent a portion of the histogram for each color channel (and composite) of a test image that has had its LSBs altered in several different ways. The figure shows color histograms for the blue channel of one of our test images. The figure shows a histogram for the original image, and histograms for the image after 100% of its 2 and 4 least significant bits have been flipped. These histograms help illustrate the notion that changing the image LSBs creates “noise” in the image. The color histograms for the original image are relatively smooth. When we modify the images color bits, we see the histogram become more jagged.

28

Abstract - Close-Color Pair analysis allows for a reliable means to determine if a given image contains a secret message. Counting such pairs for an entire image is computationally intensive, therefore, rendering the technique impractical when attempting to process large numbers of sizeable images. This paper outlines a proposed optimization technique that reduces the running time, while maintaining a reasonable (and controllable) error.

Department of Computer Science Hood College. Frederick, MD 21701 [email protected]

This effect is comparable the effect appearing in an electrical signal, such as an audio wave, when noise is introduced; For example, when an audio system produces “noise” in its output, jaggedness (called “jitter”) appears. The noise introduces small changes in parts of the wave, causing variance in its output when compared to the original. Similar noise manifests in these histogram plots, and appears as jaggedness around what should be a reasonably smooth curve. Bit-flipping produces small changes in the color characteristics of the image, which shows up as this jagged histogram. This smoothness (or jaggedness) characteristic can be used to compare the original image color histograms to those of the same image after some of its bits have been flipped. This jaggedness is prominent in Figure 1, after flipping 100 percent of the two least significant bits of elements of the blue channel, and even more prominent when flipping 100 percent of the four least significant bits (essentially half of the bits in the color’s 8-bit representation). In this case the cover and hidden image LSB profiles (the original curve) are known. The problem is that the jaggedness can not be measured unless the original curve is known. Without the ability to measure this across a series of unknown images, such a method may prove un-useable in determining what images may have hidden messages. Figure 2 illustrates a practical example, where the noise plots represent what pixels have LSB changes. The black pixels in the plot represent those that have been altered through LSB-based steganography. In this case, the more LSBs that are modified, the more “noisy” the image becomes, and subsequently, the more jagged the color histogram would also become. The problem in this case, is how to measure this noise in a meaningful way, in order to differentiate it from what may naturally be depicted in the contents of an image. For example, the images of a homogeneous subject like a skyscape versus the heterogeneous image of a kaleidoscope. This is further amplified by the problem of differentiating the noise produced by the steganographic process from any noise that may already be resident in the particular cover image such as any noise introduced by the photographic media or digitizing process.

P

(Equation 1)

U 2

From Equation 1 a ratio of image pixels is obtained which is unique for any image, and can be easily measured. The usefulness of this ratio is not in its existence, nor even in comparing it to the ratios of other images, since these ratios are fairly arbitrary and depend on the composition of the image. The power of this ratio is in observing how it changes after an image is seeded with artificial noise. When information is hidden in an image, the least significant bits are flipped in some way, to represent the bits of the secret message. This produces a color that is one bit more or, one bit less in intensity when compared to the original color, producing more close-color pairs in an image. Therefore, by artificially introducing noise in an image, it is expected that the image is affected differently, depending on the original noise content. Equation 2, is identical to Equation 1 but applies to the image after it has been artificially seeded with a test message. R’=



P’

(Equation 2)

U’ 2

R’, P’, and U’ represent R, P, and U for the modified image. It is expected that the ratio R’ for an image that already has noise (because it contains a secret message) would not be significantly larger than R (the ratio for the pre-seeded image).

R '/R - Co m p a ring M e ssa ge Size

NOISE AND CLOSE COLOR PAIR ANALYSIS

One proposed steganalysis technique involves studying the occurrence of closely similar colors in an image, called closecolor pairs [1,2,4]. These are pairs of color values (red, green,



R=

2. 05 1 .9 1. 75 1 .6 Rati o

III.

of close color pairs in an image, then R is the ratio between the number of close-color pairs, and the total number of possible pairs of colors [1]:

1. 45 1 .3 1. 15 1 0. 85 5

10

15

20

25

30

35

40

45

50

S e e d in g S i z e

Figure 2. Sample representations of LSB but flipping in images

or blue) that differ by at most, a single bit. This technique is used in implementing a noise detection process to detect steganograms. As described in [1, 2], letting U be equal to the number of unique colors in an image, and let P be the number

Origin al R'/R

10 % M es s ag e S iz e

100 % M es s ag e S iz e

50 % M es s a ge S iz e

70 % M es s ag e S iz e

90% M es s age S iz e

Figure 4.1 - Shows the effectiveness of the technique with various seeding sizes, the male image with different secret message sizes. Note that messages that occupy up to 50% of the total number of LSBs are detectible with a threshold T of approximately 1.1 for almost every seeding size.

IV.

VALIDATION OF EXISTING TECHNIQUES

Figure 4.1, illustrates the results from experiments against a test image with varying simulated secret message sizes, and varying degrees of artificial seed noise. When the original cover image is processed by the detection algorithm, there is a dramatic change in the ratio R (Equation 1). When the test message is altered to contain a secret message, the change in this ratio is not quite so dramatic, and falls in line with the expected threshold of 10% [1, 2, 4]. This implies the existence of an experimentally determined value that can be used as an indicator that a suspect image has a secret message hidden inside of it. To confirm this implication and confirm the experimental threshold found in earlier experiments [1, 2, 4], two experiments were performed: (a) Experiment 1 - Use equations 1 and 2 to determine the values of P, U, R, P’, U’, R’, and the ratio for an image known to be clean (does not contain a hidden message). This will provide a control (baseline) case. (b) Experiment 2 - Hide random data, in varying amounts, in a clean image. This will simulate the hiding of data in a steganogram. Process this image with the detection algorithm and produce the same ratios as Experiment 1. To perform these experiments, a Java-based tool was developed to (a) parse and analyze 24-bit BMP images and (b) manage the secret message embedding process such as controlling the amount of noise to be introduced into the image and ensuring that the noise was randomly distributed throughout the cover image and (c) reproduce the experimental detection algorithm described in Fridrich et al [1,2]. Figure 4.2 provides the process flow for the tool. As illustrated in Figure 4.2 the clean image (X), is sent through a parsing function that reads the image file and computes the initial values of variables P, U, and R of Equation 1. The parsed information is sent to a seeding function as an array of color values that introduces artificial noise in order to compute R’. The program is executed with a threshold parameter (T), and outputs a value of one (1) if the

image is clean (no hidden message) and a value of zero (0) if the image is a steganogram. For analysis purposes the values of P, U, R, P’, U’, R’, and the Red, Green Blue (RGB) landscape for each image are recorded. Therefore, during Experiment 1, it was confirmed that for our sample cover image (male face), the ratio R’ (Equation 2) increased by as much as 50 percent after being processed by the detection application over the ratio R (Equation 1) of the original image. Experiment 2, produced a ratio that was less than the experimentally [1] determined threshold (T), of 1.1. In earlier experiments [1], this value was determined to be the point at which the detection algorithm determines if a test image has hidden data. If the ratio of R changes by 10% or more, the image is considered to be “clean” and free of a hidden message (based on the knowledge that a clean image’s ratio would be significantly larger than 1). During this process it was observed that the running time required to count the Close-Color Pairs of the images was exponential and with larger images, this processing time would grow significantly according to the increased number of image pixels. Since it is necessary to compare the bit values for each color to every other unique color in the image, the total running time for this process alone will take O(U²) where U is the number of unique colors found in any given image. V.

The Close-Color Pair counting technique is effective at detecting hidden images in clean cover images. During the execution of the previous experiments, however, it became obvious that most of the process of computing R was spent on counting the Close-Color pairs. Most of the other functions ran in seconds, while this counting process took several minutes. The worst case running time occurred when a secret message constituted 50% of the maximum number of pixels, and the image was seeded with 50% of total number of LSBs. Therefore, this technique suffers from a high performance penalty which renders it undesirable for processing any significant volume of images. We propose a performance optimization to the Close-Color Pair counting technique that reduces its asymptotic complexity when used to detect the presence of a secret message.

VI.

Figure 4.2: The java application used to reproduce the experimental detection algorithm [1,2]

PROBLEM DESCRIPTION

PROPOSED SOLUTION

For the performance of this technique to be improved, the most computationally expensive portion of the algorithm -the close-color pair counting process- is the best candidate to enhance. The main hypothesis of the proposed solution is that the statistical characteristics of an image are relatively identical to those of a sample of the same image. As long as the samples are randomly chosen, the probability of encountering the same distribution of color values remains similar, down to a

particular percentage where the sample size becomes too small to represent the original image. The solution requires the application of a “thinning” process to the contents of the original image. This process causes the image to loose a percentage of its pixels, therefore, effectively reducing image size but without altering the statistical characteristics of the image. Let X be a given image, i be the set of pixels in X, f() be the “thinning” function, t% be the percentage of pixels to sample, and random() be a function that takes a set of data, and a percentage, and returns a subset equal to the size of the original times the percentage, randomly chosen from the original set’s data points: f (X ,

t%) = random (i , t%)

(Equation 3)

To test this hypothesis, the following experiment was conducted: (c) Experiment 3 – Select varying “thinning percentages” and perform f(X, thinning percentage). Use this new set of image pixels in the detection algorithm. For this experiment, the process described in Figure 4.2 was modified to randomly choose a subset of color values, passed as a parameter into the application. As illustrated in Figure 6.1, the image data is passed to a thinning function, along with a percentage parameter. This thinning function takes a random sample of the original image data, in order to reduce the total number of pixels the remaining processes have to examine. The parsed image is passed through a thinning process prior to computing P. This process reduces the total number of pixels to be examined, and consequently, the total color pairs. Given the low statistical impact of random sampling, a new set of image data is generated but is much smaller than the original, yet contains the same pixel/color statistics needed for successful use of the detection algorithm. Let be the threshold of R’/R. Let be the random seeding size. Let be the total number of pixels. Let µ be the “thinning factor” or sample percentage of pixels to use in the algorithm. Let be the algorithm applied to the image, to produce U, U’, P, P’, R, and R’. Let Xc represent a clean image, and Xd represent a dirty image.

Figure 6.1: Proposed solution with “Thinning Function”.

So, prior to the thinning process, the following is true:

(Equation 4) where ϕ1 is the probability that the algorithm will produce a false negative and

(Equation 5) where ϕ2 is the probability that the algorithm will produce a false positive. After the thinning process, the following is true:

(Equation 6) where ϕ3 is the probability that the algorithm will produce a false negative.

(Equation 7) where ϕ4 is the probability that the algorithm will produce a false positive. Probabilities ϕ1, ϕ2, ϕ3, and ϕ4 may need to be determined experimentally. It is unknown at this time as to the method needed to determine these probabilities in general, mathematically. These equations are intended to begin to formalize the error introduced by µ values under 100 percent. Prior to the proposed optimization, this function ran in approximately O(n²) time. After optimization, a theoretical running time would be O(n/p)², where p is the reduction factor applied to thin-out the input image. Big gains can be realized when processing larger, more complex images, with only a slight error in R’:R ratio. In Experiments 2 and 3, artificial noise was randomly distributed into the LSBs of the cover image (representing the hidden message) using the same seeding functions used in the java processes shown in Figure 4.2. This allowed the generation of steganograms created by steganographic processes that randomly select and modify LSBs. It is assumed, that a random sample of this noise is still random. It would follow that randomly selecting color values in U would produce a set that has the same behaviors to the introduction of noise, as the original set. Figure 6.3 displays the results of the optimized detection process. It displays that the threshold was not exceeded, even when color sampling was reduced by a factor of 10. The behaviors of P and P’ for various samples of pixels are all tightly bounded around the value of examining 100 percent of the colors. The seed size did not seem to greatly affect this behavior, although there was a clear indication that seeding between 20 and 50 percent would produce the largest change

R'/R - Random Message (Bit Flipping) - Male Image

1.9 1.8 1.7 1.6

R'/R

1.5 1.4 1.3 1.2 1.1 1 0.9 5

10

15

20

25

30

35

40

45

50

Seeding % original 10% Pixels

original 30% Pixels

original 50% Pixels

original 100% Pixels

50% Message 10% Pixels

50% Message 30% Pixels

50 % Message 50% Pixels

50% Message 100% Pixels

Figure 6.3: Ratio values for the original male image, and various message, seeding, and thinning sizes. Note the tight bound under the threshold T even when only 10% of pixels are examined .

in R. These values can be used as the minimum and maximum bound respectively, for acceptable seeding size. This confirms that the threshold of 1.1 was a reasonable place to set the boundary between clean images and steganograms. With further experimentation, including a wider variety of images, this threshold should be maintained. This threshold could also be variable, depending on how clean a clean image is required to be. VII.

CONCLUSION AND FUTURE WORK

The Close-Color Pair Steganalysis technique proposed by Fridrich [1] is a reliable and robust method for detecting hidden messages in a suspect image. This technique, however, does not directly address the performance penalties when processing large images. In time-sensitive implementations, such as network scanners, and information monitoring devices, these penalties may forbid the use of the technique. To enhance the processing speed we attempted to reduce the total worst case running time, while still maintaining the statistical characteristics and baselines of the image as mandated by the Close-Color pair technique. By thinning-out the total number of pixels that need to be examined in the counting process, we are able to reduce the worst case running time from O(n²) to O(n/p) ² resulting in a linear performance optimization. This is a rather modest gain, but when applied to scenarios where clean images are expected, this technique proves useful at increasing the throughput of the detection algorithm. In terms of future work, there are several areas to be further explored. Resized images for example, introduce a number of new challenges to the proposed solution. Resizing the input image would corrupt the secret message, and introduce noise into the image, reducing the effectiveness of the technique. Images that are resized are acted upon by an algorithm that generates Close-Color Pairs as a matter of course, to approximate what an image would look like at a different size

(e.g., interpolation). This ultimately changes the statistical characteristics of the colors in the image, possibly producing erroneous results. Similarly, the notion of natural noise in a cover image may have a dramatic impact on the success of the proposed solution. It has been shown [1,4], and experimentally confirmed that using cover high quality scans, or images taken with a digital camera, produce files with a large number of colors. These devices introduce a natural level of noise and it is not clear how re-sampling and interpolation affects the algorithm. Preliminary experimentation indicates that naturally noisy images (those that have been re-sampled) would effectively defeat the algorithm. Second, some steganographic techniques incorporate the notion of locality by selecting specific areas of the carrier to hide data. It is not clear how this notion of locality affects the proposed solution. Very preliminary results are encouraging. The proposed solution performs the analysis on the contents of the entire image and not by sampling areas. As any statistical characteristics would average across the image, the total number of close-color pairs is the same even if the same amount of LSB modifications were focused in a particular area of the image. These early assertions need to be formally and experimentally confirmed. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

[12]

[13] [14]

Fridrich, J., et al. “Steganalysis of LSB Encoding in Color Images.” Proceedings of the IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2000. Fridrich, J., Goljan, M., Du, R., “Reliable Detection of LSB Steganography in Color and Grayscale Images”, Proceedings of the ACM Workshop on Multimedia and Security, 2001. Provos, N. “Defending Against Statistical Steganalysis”. Proceedings from the 10th USENIX Security Symposium. Washington, DC, 2001 Westfeld, A., Pfitzmann, A. “Attacks on Steganographic Systems”, Third International Workshop on Information Hiding, IH’99, Springer-Verlag, LNCS 1768, 2000. Moerland, T., “Steganography and Steganalysis”, Leiden Institute of Advanced Computing Science, www.liacs.nl/home/tmoerl/privtech.pdf S-Tools for Windows (v.4.0). This tool can be found at ftp://ftp.funet.fi/pub/crypt/mirrors/ idea.sec.dsi.unimi.it/code/ Kirkby, David, “bmp format”, http://atlc.sourceforge.net/bmp.html Hetzl Stefan, “The BMP File Format”, http://www.fortunecity.com/skyscraper/windows/ 364/bmpffrmt.html Johnson, N. F., Sushil J. "Exploring Steganography: Seeing the Unseen." IEEE Computer Feb. (1998): 26-34. Johnson, N.F., Jajodia, S.: “Steganalysis: The investigation of hidden information.” In: Proceedings of the IEEE Information Technology Conference, Syracuse, New York, USA (1998) 113-116 A. Ker, "Resampling and the detection of LSB matching in color bitmaps," in Security, Steganography and Watermarking of Multimedia Contents VII, E. J. Delp III and P. W. Wong, eds., Proceedings of SPIE 5681, pp. 1-15, SPIE and IS&T, (San Jose, California, USA), Jan. 16-20 2005. T. Holotyak, J. Fridrich, S. Voloshynovskiy, "Blind Statistical Steganalysis of Additive Steganography Using Wavelet Higher Order Statistics," 9th IFIP TC-6 TC-11 Conference on Communications and Multimedia Security, LNCS vol. 3677, Springer-Verlag, Berlin, pp. 273274, 2005. G. J. Simmons, "The prisoners’ problem and the subliminal channel," in Advances in Cryptology: Proceedings of Crypto 83 (D. Chaum, ed.), Plenum Press, pp. 51-67, 1984. A. Ker, "Steganalysis of LSB matching in grayscale images," IEEE Signal Process. Lett., vol. 12, no. 6, pp. 441-444, Jun. 2005.

Suggest Documents