A New Perfect Hashing based Approach for Secure Stegnography

16 downloads 9614 Views 387KB Size Report
Department of Computer Science & IT. The Islamia University of Bahawalpur ... Digest 5 (MD5) [7], Bit. Stream Ciphers (BSC) [11], Secure Hash Algorithm (SHA) ...
A New Perfect Hashing based Approach for Secure Stegnography Imran Sarwar Bajwa

Rubata Riasat

School of Computer Science University of Birmingham Birmingham, UK [email protected]

Department of Computer Science & IT The Islamia University of Bahawalpur Bahawalpur, Pakistan [email protected]

Abstract—Image stegnography is an emerging field of research for secure data hiding for data transmission over internet, copyright protection, and ownership identification. A couple of techniques have been proposed for colour image stegnography. However, the colour images are more costly to transmit on internet due to their size. In this paper, we propose a new perfect hashing based approach for stegnography in grey-scale images. The proposed approach is more efficient and effective that provides a more secure way of data transmission at higher speed. The presented approach is implemented into a prototype tool coded in VB.NET. The presented approach is effective in a way that multiple file formats such as bmp, gif, jpeg, and tiff are also supported. A set of sample images were processed with the tool and the results of the initial experiments indicate the potential of the presented approach not only in terms of secure stegnography but also in terms of fast data transmission over internet. Keywords- Grey-scale Images; Stegnography; Hash-based algorithm; image, Perfect hashing

I.

INTRODUCTION

Since the early days of computers, the data security is one of the premier issues of research. Image stegnography [1] is a modern way of hiding information in a way that the unwanted people may not access the information. Data used to hide data in stegnography can be text or image. In modern times, image stegnography can be helpful in a number of ways such as hiding the secret data [2], data authentication, ensuring authenticated data availability for academic usage, monitoring of data piracy, labelling electronic data/contents, copyright protection, ownership identification, providing confidentiality and integrity enhancement control of electronic data piracy etc. [3]. Following is overview of used techniques for image stegnography. A. Algorithms for Image Stegnography Since the term image stegnography has been coined in the research community, various approaches, algorithms and techniques have been presented. The major examples are Data Encryption Standard (DES) [2], International Data Encryption Algorithm (IDEA) [2], Secure Hash Algorithm (SHA) [4], [10], Advanced Encryption Standard (AES) [6], Chaos [8], Message Digest 5 (MD5) [7], and Bit Stream Ciphers (BSC)

[11]. All these algorithms have their respective pros and cons. There are some algorithms those use hash functions to hide data in images. The examples of such algorithms are MD5 and SHA but these algorithms provide susceptible security [3], [4]. Some of these algorithms are good for small data such as AES, DES, and IDEA but their performance deteriorates if the size of the data increases [4], [5]. B. Contribution to Knowledge In this paper, we present a perfect hashing based approach for information hiding in the grey-scale images. The presented approach is based on a robust perfect hash-function algorithm. The experiments with the proposed algorithm indicate that the used algorithm is not only more secure but also more efficient in terms of processing cost. The proposed approach is more efficient and effective that provides a more secure way of data transmission at higher speed. The presented approach is implemented into a prototype tool coded in VB.NET. The presented approach is effective in a way that multiple file formats such as bmp, gif, jpeg, and tiff are also supported. The rest of the paper is structured as the section 2 presents the related work. The used hashing-based algorithm for data hiding is explained in section 3 and the proposed framework for grey-scale image stegnography is described in section 4. The tool support is section 5 succeeded by the implementation details. Section 6 presents the evaluation of the presented tool and finally the paper is concluded with the future work. II.

RELATED WORK

In last couple of decades, with the increase in use of internet for data communication, the need of secure data transmission and data hiding has been emerged into real challenge. Image stegnography is one of the premier solutions for secure data hiding [1]. Some of the algorithms such as International Data Encryption Algorithm (IDEA) [2], Advanced Encryption Standard (AES) [6], Data Encryption Standard (DES) [2], are good for little amount of data but not good for massive data sets as these algorithms involve intensive computation and required super fast processing machines [4], [5]. On the other hand, few algorithms such as Message Digest 5 (MD5) [7], Bit Stream Ciphers (BSC) [11], Secure Hash Algorithm (SHA)

[4], [10], etc are based on cryptographic hash functions that use a hash key of 16 byte. But these algorithms are vulnerable in terms of providing security due to inherent flaws caused by used checksum approach. [3], [4]. Moreover, the issues concerned with the existing stegnography approaches are imperceptibility, robustness and file type supports. All approaches discussed previously have their particular limitations and there is still margin to improve these available approaches. In the perspective of this scenario there is need of an approach/algorithm that is more secure and more efficient in terms of processing cost and time. A hashing based approach to hide information in colour images was proposed in [17]. The results of the hashing based approach in [17] show that the used approach is a better approach than the other available approaches in terms of efficiency and affectivity. In this research paper, we are focusing on gray-scale image stegnography. III.

PERFECT HASHING BASED APPROACH

The used perfect hashing based algorithm originally presented in [17] is explained here with respect to its usage with grey-scale images. An attribute of the used approach is that it can be used with various file formats. Most of the algorithms are good for a particular file format. However for data transmission on internet, the file formats such as jpeg/jpeg and gif are popular due to their small size. Following is the used algorithm: A. Hiding Information Following steps of the algorithm were used to hide the target data/information in an image. i. – Input a text (.txt) file containing textual data and input (.jpg, .gif, .bmp or .tiff) image. ii. – Read text file, tokenize the text and make chunks of the text of 3 characters each and store each chunk of data in an array-list (lc). Total count of data chunks are represented as n. iii. – Generate a random number that is used as a hash key and hash-key is represented using h. iv. – The hash-function (H)†1 [12] uses the hash-key (h) and total number of chunks (n) to generate a pattern i.e. sequence of numbers (hash-values) those are position of the pixels where data will be stored. v. – The generated pattern (containing sequence of numbers) is stored in an array-list (lp). vi. – First chunk from lc and lp are read. String stored in lc is read and tokenized. The ASCII value of each token lc[i] is replaced with the i byte of the lp[i]. This process is repeated until the last token of the text is stored vii. – The output is the image containing stored data and a hash-key (h) that is used to retrieve data. †

A typical hash function is a well-defined procedure or mathematical function that uses a single integer that may serve as an index to an array and returns a number of values called hash codes or values

B. Retrieving Information Following steps of the algorithm were used to retrieve the hidden data/information from the image. i. – Input the (.jpg, .gif, .bmp or .tiff) image that contains that stored information and the hash-key (h) that was actually used to store data. ii. – The input hash-key (h) is used with the hashfunction (H) to generate sequence of numbers as an array-list (lp) and these numbers are actually position of the pixels where data was be stored. Here the hash-function (H) generates specifically same pattern of random numbers for a hash-key (H) those were generated at the time of coding iii. – Each value from the generated patterns represent index of a pixel where the data is saved. Values of the grey color byte are read at lp[i] are read. As each byte contains an ACSII value of a character, the read ASCII value is converted to a character and each character is written to a text file in sequence it is read from the image. iv – The output is a text file that contains the retrieved data from image. C. Perfect Hash Function The most important part of the proposed algorithm is the used hashing technique. We have used the perfect hashing technique as a hash-function (H) [13]. A function for perfect hashing is defined for set N to map distinct elements in N to distinct integers, without any collisions. A perfect hash function supports efficient lookups by placing hash-keys from N to a hash-table. [14] A few implementations for perfect hash functions are available. We have used GNU implementation of hash function called „gperf‟ [15] that typically generates perfect hash- functions for a hash-key. A „gperf‟ based hashfunction locates only one position in a domain using exactly 1 probe. There are number of advantages of using perfect hashing over other hashing techniques. Some of those are following:  Perfect hashing is fast than other techniques as it avoids any hash collision [13], [14]. Hence, there is no need to use any collision resolution techniques (such as linear probing or quadratic probing) as collision resolution is an overhead and involves intensive computation.  Perfect hashing technique supports very large key sets [16] so we can use it for very bulky data sets and it is equally effective and efficient for large data sets as for small data sets. IV.

GREY-SCALE IMAGE STEGNOGRAPHY

In this section, the approach used for the grey-scale image stegnography is explained. The presented approach specifically helps the users to hide their secret textual information in images with loosing the quality of the original image up to much extent. Our approach is able to manipulate

the various file formats e.g. bmp, gif, jpeg, and tiff. Following are the major steps involved in hiding textual information in a grey-scale image: Step 1 – Input the target image with any file format such as bmp, gif, jpeg, and tiff. Step 2 – Input the target text to be hidden in the image in the form of the text file. Step 3 – Hide the target text in the target image. Step 4 – Retrieve the hidden text back from the target image. Step 5 – Output is the retrieved text. All these five input, processing, and output steps are explained in detail below: A. Input Image In the presented system, a user interface is provided so that a user can input an image to hide his/her personal data for privacy purposes. It is recommended that input image should be medium sized to get better results, as very large sized image will use more bandwidth on internet and very small sized images may lost their bit of quality for bulky size of input text data. B. Input Textual Data The second input is the textual data stored in a text (.txt) file. The input text file is read by our system and the chunks of 3 characters (including space characters) are made. One chunk of data is stored in place of one pixel in the 8-bit grey-scale image. For example a string “This piece of data is very secure.” is processed as [Thi] [s p] [iec] [e o] [f d] [ata] [is ] [ver] [y s] [ecu] [re ] [. ] [/0] To point the end of the message, at the end of the message an End-Of-File (EOF) character is stored using ASCII value of EOF that is 00.

The used algorithm randomly generates a hash-key that is afterwards used by the algorithm to generate a pattern of pixels, where the data will be stored. The data chunks are stored in red, blue and green bytes of the selected pixel. The benefit of using this approach is that each time data is embedded on a new pattern that makes the hiding of data very efficient. D. Retrieving Data from Image For retrieving the textual data hidden in the target image, the same hash-key is used that was generated to during hiding. By using the hash-key the used algorithm (described in section 3) generates the exactly same pattern that was used at the time of hiding. The pixels (red, green, blue byte) values of each position are read one by one and generated characters concatenated to form a complete message.

Figure 2. The proposed framework for retrieving data from grey-scale image

V.

TOOL SUPPORT

The proposed algorithm was implemented in VB.NET to code text into an image. A wizard of windows is used to get inputs such as the target image and the target text file. A screen-shot of the prototype tool is shown in figure 3.

C. Hiding Data in Image For hiding (embedding) textual data in the image, a hashbased algorithm (described in section 3) is used. The hashbased algorithm is used to pick pixels randomly to store chunks of input data.

Figure 3. Perfect-hash function based processing

Figure 1. The proposed framework for hiding data in grey-scale image

The presented tool is a proof of concept. The prototype tool contains a GUI (see figure 2 and 3) to load input image and input text file. Once the input image and target text data is loaded, user is just needed to click the „Encode Text to Image‟ and the encoding starts. The whole process is fully automatic and once the coding is completed the user is notified that

coding has successfully been completed. The process of hashing based coding text in an image is shown in figure 2.

BSF

Weak

Average

Very Good

Very Good

AES

Excellent

Small

Weak

Medium

Our Approach

Very Good

Large

Good

Very Good

VII. CONCLUSION AND FUTURE WORK

Figure 4. Perfect-hash function based retrieval of data

To decode textual information from image, user has to load the image containing the coded information. Once the input image is loaded, user has to click the „Decode Image to Text‟, the tool checks that the image has coded information. If the image does not contain the coded information then a message is give into user. If the image contains the coded information, the tool asks for the hash-key that was given at the time of coding. Once the user input the hash-key, tool starts decoding. The decoding process is also fully automatic and once the decoding is finished, the decoded information is shown in the form. The process of hashing based decoding text from an image is shown in figure 3. VI.

There is no doubt about the significance and criticalness of stegnography in the field of cryptic and secure data transmission on internet but there is need of a better and improved algorithm that is not only efficient but also secure. To achieve this goal we present a novel hash-based algorithm that uses a hashing technique to code and decode data in a grey-scale image. Our presented approach is based on perfect hash function. We also present a prototype tool in this paper that is implementation of the presented approach and also proof of concept. The designed system has ability to hide text in an image without loosing the quality of the image up to much extent. This system specifically works for efficient and secure data hiding in images to make possible large-sized data image stenography and transmission over internet. The proposed approach is fully automated. We have presented the initial experiments with the perfect hashing based algorithm based approach for grey-scale image stegnography . However, the used algorithm can be improved to get better and accurate results. REFERENCES [1] [2]

EVALUATION

We experimented with different examples to validate the performance of our prototype tool. The length of smallest data set was 232 words, while the length of largest used data set for experimentation was 1052 words. The average length of input textual data was 748 words. Figure 4 shows the performance results of the used hashing based approach for small (up to 300 words), medium (up to 700 words) and large sized data sets (more than 700 words, up to 1500 words). The results of the experiments were compared to the other available approaches for grey-scale image stegnography. Table I shows that our approach not only provides good security but also provides fast speed for coding/decoding and at the same time able to handle large data-sets. The results of this initial performance evaluation are very encouraging and support both the approach adopted in this paper and the potential of this technology in general. Table I: Comparison with other Approaches Approach

Security

Data Size

Bit Distribution Speed

SHA-2

Good

Medium

Very Good

Average

IDEA

Excellent

Small

Weak

Medium

[3]

[4] [5]

F. Shih, Digital watermarking and stegnography , fundamentals and techniques. USA: CRC Press, 2008. K. Usman, H. Juzoji, I. Nakajima, S. Soegidjoko, M. Ramdhani, T. Hori, and S. Igi, "Medical image encryption based on pixel arrangement and random permutation for transmission security," in Proceedings of IEEE 9th International Conference on e-Health Networking, Application and Services, Taipei, Taiwan, 2007, 19-22 June. pp.244-247. U. Gopinathan, D.S. Monaghan, T.J. Naughton, J.T. Sheridan, and B. Javidi, "Strengths and weaknesses of optical encryption algorithms," in Proc. 18th Annual Meeting of the IEEE Lasers and Electro-Optics Society, 2005, 22-28 Oct. pp. 951-952.

[4] Cheddad, Abbas; Condell, Joan; Curran, Kevin; McKevitt, Paul, A Hash-based Image Encryption Algorithm, Optics Communications, Volume 283, Issue 6, p. 879-893. March, 2010 [6] Y. Mao and M. Wu, "A joint signal processing and cryptographic approach to multimedia encryption," IEEE Transactions on Image Processing, 15(7)(2006)2061-2075. [7] M. Zeghid, M Machhout, L Khriji, A Baganne, and R Tourki, "A modified AES based algorithm for image encryption," International Journal of Computer Science and Engineering, 1(1)(2006) 70-75. [8] Y. Wang, X. Liao, D. Xiao, and K.W. Wong, "One-way hash function construction based on 2D coupled map lattices," Information Sciences, 178(5)(2008)1391-1406. [9] V. Patidar, N.K Pareek, and K.K Sud, "A new substitution-diffusion based image cipher using chaotic standard and logistic maps," Communications in Nonlinear Science and Numerical Simulation, 14(7)(2009) 3056-3075. [10] D.C. Lou and C.H. Sung, "A steganographic scheme for secure communications based on the chaos and Euler theorem," IEEE Transactions on Multimedia, 6(3)(2004)501-509. [11] I. Ahmad and A.S. Das, "Hardware implementation analysis of SHA 256 and SHA -512 algorithms on FPGAs," Computers & Electrical Engineering, 31(6)(2005)345-360.

[12] J. Wen, M. Severa, and W. Zeng, "A format-compliant configurable encryption framework for access control of video," IEEE Trans. Circuits Syst. Video Technol, 12(6)(2002)545-557. [13] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. Section 11.5: Perfect hashing, pp. 245–249. [14] W. P. Yang and M. W. Du, “A Dynamic Perfect Hash Function Defined by an Extended Hash Indicator Table”, in proceedings of 10th International Conference of Very Large Databases, VLDB‟84, Singapore, August, 1984, pp:245-254 [15] David Talbot, Thorsten Brants, “Randomized Language Models via Perfect Hash Functions”, in proceedings of ACL-HLT 2008, Ohio, USA, June 2008, pp:505–513 [16] Linux Software Directory, “Gperf - Perfect hash function generator”, Available at: http://linux.maruhn.com/sec/gperf.html, Accessed on 15.03.2011 [17] Fabiano C. Botelho and Nivio Ziviani. "External perfect hashing for very large key sets". in proceedings of 16th ACM Conference on Information and Knowledge Management (CIKM07), Lisbon, Portugal, November 2007. [18] Rubata Riasat , Imran Sarwar Bajwa, M. Zaman Ali, A Hash-Based Approach for Colour Image Steganography. International Conference on Computer, Networks and Information Technology, ICCNIT 2011, Peshawar, Pakistan, 2011

Suggest Documents