Modified Technique of Insertion Methods for Data Hiding ... - IEEE Xplore

3 downloads 0 Views 279KB Size Report
4 University of Calcutta, 92 A.P.C. Road, Kolkata-700009, India. Email: [email protected] , [email protected], [email protected], ...
Modified Technique of Insertion Methods for Data Hiding Using DNA Sequences Subhajit Manna1, Sudipta Roy2, Pabitra Roy3, and Samir K. Bandyopadhyay4 Department of Computer Application, 2,4 Department of Computer Science and Engineering, 3Department of IT 1,2,3 Academy of Technology, Hoogly-712121,West Bengal, India 4 University of Calcutta, 92 A.P.C. Road, Kolkata-700009, India. Email: [email protected] , [email protected], [email protected], [email protected]

1

Abstract – Cryptographic applications require several biological techniques and hence they have become more popular recently. In one of the most interesting techniques data is hidden in Deoxyribo Nuclic Acid (DNA). In this paper we have proposed a Data Hiding Insertion Method based upon DNA sequence. In this method we hide information data into DNA sequence randomly using certain techniques. In this method we use several procedures as: random key generation, selection of the succeeding prime number of key value, cumulative XOR operation of key value, selection of look up table index mapping.

The concept of DNA computing combined along with fields of cryptography and steganography has brought a new hope for powerful, or unbreakable, algorithms [4,5,8]

Keywords: cryptography, DNA, cumulative XOR, look up table.

I.

INTRODUCTION

In today’s fast developing world of technology, information security holds prime importance. Various types of networks are used for the transmission of messages or information. With the enhancement in technology along with the huge growth of computer networks, a huge amount of information is being exchanged. Hence, the most critical feature for thriving networks is the security of the information. Due to the presence of hacker who waits for a chance to gain access to confidential data, the security of communication becomes very less[1,3]. Cryptography is basically derived from two Greek words, i.e., “kryptos” (meaning “hidden”) and “graphein” (meaning “to write”). It is the means of conversion of information from its normal comprehensible from “graphein” (meaning “to write/draw”) [2]. There has been lots of research work carried out on DNA-base data hiding schemes. Deoxyribo Nucleic Acid (DNA) is a long linear polymer found in the core part of a cell[9,10]. DNA is mainly made up of several nucleotides in the form of double helix and it is linked with the transmission of genetic information. The spiral strand contains sugar phosphate as backbone and bases are connected to a complementary strand by hydrogen bonding between paired bases Adenine(A), Thymine(T), Guanine(G) and Cytosine(C). Two hydrogen bonds are used to connect Adenine and Thymine whereas three hydrogen bonds are used to connect Guanine and Cytosine. DNA cryptography is very effective in its primitive stage. The DNA computing algorithms very powerful in network transmission areas.

Figure 1: DNA structure

Now this method is intended that every nucleotide is encoded with some binary value as [3], Table 1: Binary representation of Nucleotide Nucleotide Binary Form A C G T

II.

00 01 10 11

BACKGROUND OF PROPOSED METHOD

In this method we use different technique to operate encryption operation. First take information message say M after we convert to it’s equivalent ASCII (8 bit Binary) value. Next we take a fake DNA sequence of the same length of binary value message. We take fake DNA sequence as generate randomly by which intruder did not track, which DNA sequence is used here. But it is follow strictly the rule of DNA characteristic. We subdivide the total bit stream (including information message & DNA reference string & binary key-value) into some dynamic length meaningful packet. The length of our packet is multiple of 8 as it may be one of those values 8,16,24,32,40 so on. Each packet contains binary key-value, information message, DNA sequence according to there size of segment.

Figure 2: packet segment

We define some pattern of building packett as 1st position must contain key-value and next 2nd & 3rd positiion contain may be information message followed by DNA refeerence string or DNA reference string followed by information message based on the pattern. And pattern is defines as the keey-value if keyvalue is even number then pattern-1, or if it is odd o number then pattern-2. Even /odd even odd

Table 2: Pattern table Pattern Specificatiion Pattern1 Pattern2

Binary key-value, information message, DNA referrence string Binary keey-value, DNA string, reference information message

We generate a key value randomly. Then we cconvert the keyvalue into a sending key-value by operating cuumulative XOR operation of the key-value [6]. We take a key-value which is a positive integer number and then convert it to it’s binary equivalentt. Then operate cumulative XOR operation on each bit iterativeely until a single bit is found. As an example we take a key-value 10 and its binary equivalent is “001010” and operate cumuulative XOR on it and we get 15 whose binary equivalent is “0011111”.

Figure 3: Cumulative XOR operation on binary eequivalent of 10 and MSB collection.

Then we add this cumulative XOR value of key-value into packet for encryption procedure. In decryptioon section we work reversely, take this number and operate thhis operation to get key-value.

We generate two values R-vaalue & K-value to subdivide corresponding binary informatiion message & binary DNA reference string into segment. So we have key-value and mber of the key-value as a Rgenerate a succeed prime num value, which is the length of ou ur binary information message segment length. Now according g to R-value we generate Kvalue using some formula as P = R + BKL D = P / (Lowest packet leength) D=D+1 ngth) – P K = D * (Lowest packet len Where BKL is Binary Key-valuee Length. To ensure that each case satisfy (R+K+BKL) / (Loweest packet length). We consider key-length as 6 bit. In this metthod we take DNA reference string very low amount as becausse our goal is send information message safely so we create intterest on information message length as R-value nor DNA referrence string length. We take 4 bit consecutive DNA string, 4 con nsecutive DNA string define 8 bit binary by binary encoding rulle. So 28 =256 combination of DNA string as. Table 3: DNA combination chosen randomly for CC value C C

DNA Combination

CC

DNA A Com mbination

CC

DNA Combination

0 1 2 3 4 5 6 7 8

AAAA AAAC AAAG AAAT AACA AAGA AATA ACAA AGAA

9 10 11 12 13 14 15 16 17

ATA AA CAA AA GAA AA TCC CC CCC CC CCC CA CCC CG CCA AT CCA AC

18 19 20 255

CCGC CCTC CACC TTTT

We get appropriate index of maatching DNA string from look up table index method [7]. Finallly we send those DNA string index to the sender. III.

D METHODS PROPOSED

A. Data Hiding Proced dure for sender side: We take information message as M and it is ”java”. Step 1: Convert M into binary forrmat using binary ASCII value of each character as “java” and it’s binary value is “01101010011000010111011001100001” . Step 2: Generate fake DNA refeerence string as same length of M-binary length and it is GACCCATTTCCAGT”. Then “CCGAAAGGGCTTCGTGGTG convert it into binary format usiing binary coding rule. So this DNA reference string binary form is ” 1011101011100001010100111 010110000000101010011111011 1110101001011”. r function, as ”(24, 3)”. Step 3: Generate key-value by random Step 4: Generate succeed prime number as R-value. So it is RK by using it’s rule, so it value ”(29, 3)”. And generate K-value is K-value ”(5, 7)”. Step 5: Segment information message-binary according to 010111011001100, 001)”. R-value as “(011010100110000

And segment DNA reference string-binary according to Kvalue as ”(01011, 0000000)”. Step 6: Generate cumulative XOR of key-value as ”(25,2)”. Convert this integer number to 6 bit binary format as ”(011001, 000010)”. We define it as XorKey. Step 7: Merge those segmented binary values (XorKey, information message-binary, DNA reference string-binary). Check key value, it is 24 and which is even number then pattern-0, So XorKey + information message binary + DNA reference string binary. Now key is 3 and which is odd number then pattern -1 So, XorKey + DNA reference string binary + information message binary. Finally we get total binary stream as “01100101101010011000010111011001100010110000100000 000001”. Step 8: Now convert it into DNA reference string using binary to DNA conversion rules. So we get DNA reference string as “CGCCGGGCGACCTCGCGAGTAAGAAAAC”. Step 9: Now convert this DNA reference string into look up table index by using look up table mapping, take 4 bits DNA string and match appropriate DNA string and get those matching index integer values and continue until the DNA reference string is not finished. So our index is ”(69, 1, 53, 145, 50, 243, 253)”. Step 10: Now send those index value to the receiver. B. Data Recovery Procedure for receiver side: We get some value as ”(69, 1, 53, 145, 50, 243, 253)” at the receiver side. Step 1: Take those integer values as index of look up table index value and get return DNA reference substring of all those index value. We get DNA Reference Sting as ->” CGCCGGGCGACCTCGCGAGTAAGAAAAC”. Step 2: Convert those DNA Reference string into binary stream by binary coding rule. We get “01100101101010011000010111011001100010110000100000 000001”. Step 3: Cut MSB 6 bit and convert to it into integer value, and we get XOR key-value as ”(011001, 000010)” and it’s integer value is “(25,2)”. Step 4: Convert this XOR key-value to key-value by cumulative XOR operation. We get key-value as ”(24, 3)”. Step 5: Generate R-value & K-value from key-value by succeed prime number & k-value generator function. Generate succeed prime number as R-value. So R-value is ”(29, 3)”. And generate K-value by using it’s rule, so K-value is ”(5, 7)”. Step 6: Now get information message-binary stream & DNA reference-binary stream cut from binary stream according to their R-value & K-value. Cutting pattern is follow some rules and its based on check the key value, in this case key-value is 24 and which is even number, then pattern-0, So cut 1st information message-binary followed by DNA reference stringbinary.

Now key is 3 and which is odd number, then pattern -1, So cut 1st DNA reference string-binary followed by information message-binary. Step 7: Now we get segment information message-binary according to R-value as “(01101010011000010111011001100, 001)”. And segment DNA reference string binary according to K-value as “(01011, 0000000)”. Step 8: Convert those information message-binary stream into character stream by 8 bit ASCII value conversion rule. So we get from information message-binary stream ”01101010011000010111011001100001” to character stream ”java”. And from DNA reference string binary stream ”010110000000” to “CCGAAA” as DNA reference string. Step 9: So we get secret information message as – “java”. IV.

PROPOSED ALGORITHMS

A. Algorithm for Data Hiding Input: A secret information message say M. Output: An integer number sequence. Step 1:Message M into binary sequence say MB. Step 2: Generate fake DNA sequence same as length of MB say S. convert it into binary sequence say SB using binary coding rule. Step 3: Generate the number sequence Key1, Key2, Key3,….., Keyt by random number seed Key. Find the smallest integer t that ∑ (remaining MB bit) happen then we consider Rt = (remaining MB bit), otherwise as simply. Generate the number sequence corresponding to number sequence K1,K2,K3,…..,Kt R1,R2,R3,…..,Rt by some rule as. P = Ri + BKL D = P / (Lowest packet length) D=D+1 Ki = D * (Lowest packet length) – P Where 0 < i ≤ t and BKL is Binary Key-value string length. To ensure that each case satisfy ( Ri + Ki+ BKL ) / ( Lowest packet length). Step 5: Sequentially subdivide the binary information MB into segment with lengths R1,R2,R3,…..,Rt in order and denote this segment as m1,m2,m3,…mt. Sequentially subdivide SB into segments with lengths K1,K2,…,Kt-1 in order and truncate the residual part of SB. Denote these segments by s1,s2,…,st-1. Step 6: Convert each Key1,Key2,Key3,…..,Keyt values into X1,X2,X3,…..,Xt by cumulative XOR operation & MSB bit collection of each an every value. Now each X1,X2,X3,…..,Xt values are converted into key-value string as O1,O2,O3,…..,Ot . where the size of each Oi is depend on KSL(key-value string length) value. Step 7: Merge those Oi, mi ,si values according to pattern. Pattern depends upon key-value,

if Keyi is even number then Oi + mi + si if Keyi is odd number then Oi + si + mi.where 0 < i < t . now we get a total bit strem say BDNA. Step 8: Convert this binary stream BDNA into DNA string say WB by binary coding rule. Step 9: Sequentially subdivide this DNA string WB into segment with 4 characters length. Now match those segment string with the look up table string contents, and get returns those integer index values which is matched of the look up table string as I1,I2,I3,…It by using look up table mapping. Step 10: Send those index values I1,I2,I3,…It to the receiver. B. Algorithm for Data Recovery Input: A set of integer number Output: The hidden secret message M.

I1,I2,I3,…It.

Step 1: Take those integer number I1,I2,I3,…It as index of look up table. Select those index specified DNA string by look up table mapping, and concatenate those DNA string into one string say WB. Step 2: Convert those DNA string WB into binary stream say BDNA by using binary coding rule. Step 3: Now cut KSL(Key-value String Length) number of bit from MSB position of BDNA say O . Now convert this binary string O into integer number say X. Step 4: Convert this integer number X into key by using cumulative XOR operation. Step 5: Derive the R-value as prime number from key say R by succeed prime number generation function. And derive Kvalue from R-value as K by P = R + BKL D = P / (Lowest packet length) D=D+1 K = D * (Lowest packet length) – P Where BKL is the Binary Key-value string Length. To ensure that each case satisfy ( R + K+ BKL ) / ( Lowest packet length). For certain situation when (R + K ) > (remaining BDNA bit) then take R = Key and calculate corresponding K value. Step 6: Now take decision which one is cut 1st &2nd either information message or DNA reference string according to key. If key is even then cut 1st information message say m followed by DNA reference string say s. If key is odd then 1st DNA reference string say s followed by information message say m according to R & K values and cut from binary stream say BDNA. Step 7: Then concatenate and collect those information message say m and DNA reference string say s iteratively until BDNA is end. Go back to step 3: iteratively until BDNA is finished. Step 8: Convert those information message binary stream m into character stream say M by 8 bit ASCII value conversion rule. And from DNA reference binary stream s to DNA reference string S. Step 9: So we get secret information message say M from information message-binary stream.

V. RESULT In order for an intruder to retrieve the secret message, they must be as achieve the following. Firstly, there are roughly 163 million DNA sequences available publicly but in this method we use a fake DNA sequence that may or may not match those DNA sequence. . Thus, the probability of an attacker’s success is . ^ Secondly, It is hard for an attacker to know how many number packets divide during encryption time, and in a packet how many bit information-bit & DNA binary stream belongs. Though we assume that attacker know the packet generation procedure but he or she does not know the r & k values. It can be imagined that an attacker could guess the size r and k first. It is known r + k + q = 8 × n, where r, k, q, n ≥ 1 and there will be 8n – 6 possibility here. The probability of an . However, it is attacker successful guessing m and s is not enough for the attacker to recover the data. Thirdly We use variable length packet that multiple of 8, and the random number generator and the two seeds may be required. The problem is that the attacker does not know the number sequences generated by random number seeds r and k denoted as r1, r2,…,rt and k1,k2,…,kt, respectively, which are used to break the secret message and the reference sequence S. Receive binary sequence BDNA is handled with size n that is multiple of 8 during the data recovery stage. BDNA is composed of the secret message M and the DNA reference sequence S and binary Key-value say q. The size of M and the DNA reference of S are defined to be r and k, respectively. It is difficult for an attacker to know r, k and q. Where t is the number of packet into our received binary sequence say BDNA.

Figure 4: Relationship between all mi corresponding to Ri and si corresponding to Ki.

Notice that it is hard for an attacker to know how many packets are divided. Thus, they will need to try one packet, two packets, three packets, then four packets and so on. Then, there may be the following cases: r1 + k1 + q = n r1 + r2 + k1 + k2 + 2q = n r1 + r2 + r3 + k1 + k2 + k3 + 3q = n . . r1 + r2 + r3 +...+ rt + k1 + k2 + k3 +...+ kt + tq = n The above numbers could be summarized as the following + + +…+ =n=2n-1-2. Thus, formula: the probability of an attacker making a successful guess at this . stage is

Fourthly, the intruder has to know the binary coding scheme. For this situation: the number of the binary coding rules is 4! = 24. The probability of an attacker making a successful guess at this stage is . Fifthly, the intruder has to know the procedure of look up table index mapping and its components.(A,C,G,T). As we represent any DNA character with 2 bit binary value(A-00,C-01,G-10,T11) so need 8 bits to represent 4 character DNA sequence(e.gTACG). So we need 28 = 256 combination. The probability of . an attacker making guess at this stage is Lastly, randomly merge those information message & DNA reference string. So guess the pattern of margin of an attacker’s is near to 0. Finally: The probability of an attacker making successful × × × × . guess at this method is .

VI.

^

CONCLUSION

In this modern era of technology, the main objective of cryptography and steganography is to deliver a very high degree of security for the data. DNA characteristics has brought several new ideas in the field of data hiding. The sequences of DNA are potentially important for the implementation of new data hiding techniques or even the transformation of previous schemes to new one. DNA is extremely effective as a storage medium. Not only it is compact but also it is biodegradable and consumes very little energy as well. Nowadays it is used for the propagation of species, encoding protein synthesis, and solving complex computational problems. Due to the low visibility of the DNA sequences, it is quite difficult to track a secret message hidden in the DNA. Since a randomly generated DNA sequence is used, so it is difficult for an attacker to find out whether this sequence is fake or not. Since there are roughly 163 million DNA sequences available publicly, it is rather impossible for an attacker to try out these many sequences. Even though it is known that the fake DNA sequence contains secret messages, it is still literally impossible for the sequence to be correctly recovered. Hence it can be concluded that this approach meets the corresponding security level and the exhaustive attacks can be successfully resisted. REFERENCES [1] Jie Yang,Weiwei Lin, Taojian Lu., "An approach to prove confidentiality of cryptographic protocols with non prove confidentiality of cryptographic protocols with non-atomic keys, " IEEE,2012. [2] Muhalim Mohamed Amin, Subariah Ibrahim , Mazleena Salleh ," Information Hiding Using Steganography ," Universiti Teknologi Malaysia , 2003. [3] B A Mitrans,A Kh Aboo, ” Proposed Steganography Approch Using DNA Properties, ” Mosul Univ./IRAQ,2013.Vol.14 No.1. [4] R. J. Lipton,” Using DNA to Solve NP-Complete problems,” Science, vol. 268, pp. 542 545, 1995. [5] Sherif T. Amin, Magdy Saeb, Salah El-Gindi,"A DNA based Implementation of YAEA Encryption Algorithm," IASTED International Conference on Computational Intelligence,2006 .

[6] Sukalyan Som, Moumita Som,”DNA Secret Writing With Laplace Transform, ” International Journal of Computer Applications (0975 – 8887) Volume 50 – No.5, July 2012. [7] Prof. Samir Kumar Bandyopadhyay , S Chakraborty,S Roy, “Image Steganography Using DNA Sequence and Sudoku Solution Matrix, ” Volume 2, Issue 2, February 2012 ISSN: 2277 128X . [8] Hayam Mousa, Kamel Moustafa, Waiel Abdel-Wahed, and Mohiy Hadhoud, “Data Hiding Based on Contrast Mapping Using DNA Medium, ” The International Arab Journal of Information Technology, Vol. 8, No. 2, April 2011. [9] B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, J.D. Watson, Molecular Biology of the Cell, Garland Publishing, New York & London, (1994) . [10] A.L. Lehninger, D.L. Nelson, M.M. Cox, Principles of Biochemistry, Worth, New York, (2000). [11] Subhankar Roy, Sunirmal Khatua, Sudipta Roy, Samir K.Bandyopadhyay, “An Efficient Biological Sequence Compression Technique Using Lut and Repeat in the Sequence, ” IOSR Journal of Computer Engineering (IOSRJCE), Volume 6, Issue 1 (Sep-Oct. 2012), PP 42-50.

Suggest Documents