Formulation of an Encryption Algorithm on the Basis ...

24 downloads 0 Views 67KB Size Report
7. References. [1] Herskowitz,H. I., Basic Principles of Molecular Genetics, Thomas Nelson, England, 1968. [2] H. Feistel, “Cryptography and computer privacy”, ...
Formulation of an Encryption Algorithm on the Basis of Molecular Genetics and Image Patterns M Prashant, R Siddharth and Rajeev Kumar Department of Computer Science & Information Systems Birla Institute of Technology & Science Pilani – 333 031, India {fd96035, fd96018, rajeevk}@bits-pilani.ac.in Abstract In this paper, we introduce a novel approach to cryptography by amalgamating two cryptic concepts – molecular genetics and image patterning. The genetic code for an organism is unique and yet to be cracked. Formed on the basis of just four nucleotide bases, numerous variations of characteristics are produced. At the same time, encoding using images gives rise to millions of patterns. We draw a parallel between a combination of these and cryptography, thereby attempting to achieve a high level of data security. The first phase of our encryption process deals with the concepts from molecular genetics involving meiosis, fertilization, translation and mutation. Our next phase involves the encryption of data in the form of images. We have tested the algorithm to encode data and image files. Results are presented in the form of sample images.

1. Introduction The basis of intelligent life originates within the cell itself. The reactions of organic molecules resident in the cells give rise to highly efficient, intricate, self-correcting and self-directing processes such as replication, transcription and translation [1]. These processes, along with mutations form the basis of evolution, thereby forming the foundation for the building of an intelligent life. These concepts form the basis of technological advancements for emulating computational intelligence in many engineering applications. Security is a complex property and difficult to design or optimize. The existence of so many methods of attack makes the protection of an information system very complicated. The secrecy of transmitted information using encipherment and also with authentication of information verifying the identity of people, preventing the stealing of information and controlling access to both data and software have become such vital issues today, that data security is highly essential. In cryptography, the message is transformed so as to be unintelligible even though its existence is apparent [2]. Viewed in terms of a bias to these concepts, we introduce a novel approach to cryptography by amalgamating two cryptic concepts – molecular genetics and image patterning. The genetic code for an organism is unique and yet to be cracked. Formed on the basis of just four nucleotide bases, numerous variations of characteristics are produced. At the same time, encoding using images gives rise to multiplicity of patterns. In this work, we draw a parallel between a combination of these and cryptography, thereby attempting to achieve a high level of data security. One of the main issues involved in cryptography is the existence of a global key [3]. A global key of appropriate length is chosen for the encryption process from which, at each stage, a sub-key is derived. The

patterns generated for a particular key are unique and hence the receiver needs to know the key to decipher the code. We have made sure that in this process, a completely loss- free image is transferred across the network. For an experienced user, who has prior knowledge of the key, the image patterns can themselves convey some meaning (if not in totality) pertaining to the data transferred. Rest of the paper is organized as follows. In section 2, we discuss the principles of molecular genetics and application to cryptography. A brief introduction on image patterning is included in section 3. Then we discuss the implementation of the algorithm in section 4. Results are included in section 5 followed by conclusions in section 6.

2. Principles of Genetic Engineering for Cryptography 2.1. Meiosis (Reduction Division)

Meiosis is a process by which gonadic cells divide, resulting in gametes which have half the number of chromosomes in them (haploid). In the male (father) this process gives rise to four sperms, while in the female four ova are formed. We have taken into consideration certain vital steps of meiosis involving the principles of crossing over and cellular division alone. Crossing over is a phenomenon that takes place between two nonsister chromatids of homologous chromosomes of a tetrad. By defining the content of our parental files (cells) as chromatin material, we perform meiosis individually on both parents, thereby producing eight smaller reduced gametes. Depending on the length of the chromosome, the position of the centromere and the linkage factors, the crossing over may take place in different patterns. The greater the distance between two loci, the greater the chance of crossing over to occur between them and greater the frequency of crossed over strands. The non-sister chromatids cross over their segments with each other, which is followed by a chiasma formation or fusion of segments [4]. We implement such a probabilistic cross over mechanism based on the key pattern depending on which the bit pattern in the parental files undergo exchange of fragments resulting in the enciphered text. The crossing over can take place in many forms; some of which are illustrated in Figure 1. 2.2. Fertilization Fertilization is the process by which the male and the female gametes unite to give rise to a zygote, which contains a diploid number of chromosomes. There is a competition among the sperms as to which sperm can get to the egg first and it succeeds in fertilizing the egg. We have assigned priorities to both the sperm and ova Double Chisamata

Meoitic Products 2 double crossovers 1 single and 1 double crossover 1 double and 2 single crossovers 4 single crossovers

Figure 1. Types of double chisamata and their consequences

files based on the key. The best-fit sperm gets the chance to merge with the most conducive ova and so on. Thus four new children (progeny) are formed with twice the number of chromosomes. 2.3. Translation The formation of proteins from genetic information involves a translation from the language of nucleic acids into the language of polypeptides with an alphabet of some twenty amino acids. For this process, we consider the material in the progeny (child files) as the mRNA (messenger RNA) chain. The mRNA is produced from the DNA by the process of transcription. The nucleotide bases on the mRNA (the bit pattern) are identified as Adenine (A), Cytosine (C), Uracil (U) and Guanine (G). These nucleotide bases taken in sets of three are termed as codons. The mRNA chain with the codons serves as a template for the synthesis of an amino acid chain. Each amino acid corresponds to a particular codon set (or more in some cases). As the number of amino acids is just twenty-two and that of the codons is sixty four, more than one codon can code for one amino acid. Thus the coding is degenerate and just the change of a single nucleotide base can give rise to a change in the codon, reflecting a change in the corresponding amino acids. Table 1 explains the degeneracy of the codon sequences. Substitutions of these bases are called as site specific mutations and result in a modified sequence of amino acids [4]. We attempt to simulate this principle by changing base patterns with reference to the key.

3. Image Patterning Images are worth millions of words. The ultimate aim of image patterning is to extract some important features from image data, from which a decrypted interpretation or understanding of the data can be provided. In this context, the observations of David Marr [5] with reference to vision process are: “the processing of gray-level intensity array by the brain proceeds so naturally – so unconsciously, in a sense – that we are seldom aware that it begins with only this: a two-dimensional play of light upon the receptors of either eyes”. So, in this work, we attempt to deliver the encrypted text/message using images. Since many of the applications or data being transferred across the networks needs to be made loss free, we restrict the image Table 1. The degeneracy of the codons

U

C

A

G

U

C

A

G

UUU UUC UUA UUG CUU CUC CUA CUG AUU AUC AUA AUG GUU GUC GUA GUG

UCU UCC UCA UCG CCU CCC CCA CCG ACU ACC ACA ACG GCU GCC GCA GCG

UAU UAC UAA UAG CAU CAC CAA CAG AAU AAC AAA AAG GAU GAC GAA GAG

UGU UGC UGA UGG CGU CGC CGA CGG AGU AGC AGA AGG GGU GGC GGA GGG

U C A G U C A G U C A G U C A G

representation to a few gray-levels. For example, a 256 gray level image can be considered as a set of eight 1bit hierarchical planes. In terms of 8-bit bytes, plane 0 contains all the lowest order bits in the bytes comprising the pixels in the image and plane 7 contains all the higher order bits. Furthermore, error free compression can be achieved using run length coding to increase the rate of data transfer [6].

4. Implementation The process begins with the selection of a global key of appropriate length. This key and derived subkeys are the core pivots of the entire process. The input file is garbled using the simple transposition cipher [7] employing the derived key for the process. We have, at present used only this standard cipher. But, this step can be further strengthened using any one of the standard ciphers. We take the resultant file and define the parental generations by separating it into two halves. The contents of the files are considered as the genetic material – chromatin material of the parental cells. We perform meiotic division on these files. We proceed by implementing the phenomenon of crossing over of the chromosomes. The derived key is rotated periodically and we exchange the fragments of the chromatids on the basis of specific bit patterns. This procedure is repeated throughout the file on both the cells. Based on the same principles, we perform another set of crossovers using the derived key in its totality. The breaking up of the parental cells into gametes (sperms and ova) signifies the end of meiosis. At the end of this process, fertilization takes place and four progeny are produced. In the context of drawing a parallel to the process of translation, we encode the data in the form of codons, which are sequences of three nucleotide bases that code for a particular amino acid. Recognizing each pair of bit patterns as a nucleotide, six bit codons are extracted and represented by the corresponding nucleotide bases. After performing this operation on each of the cells (files), we amalgamate these files into a common protein pool. We map them onto their functional blocks (amino acids) in sets of three, thereby achieving a mapping from four-coded nucleotides to the sixty four-code amino acids as explained in Table 1. The functional blocks can individually undergo site specific mutations, the probability of which is 0.25 (Figure 2). Following the above process, we extract the bit patterns from the file to be represented in the form of an image. The image so obtained is the encrypted version of the message to be delivered to the receiver. The receiver has two options - it can be viewed as text or as an image before commencing the decryption process. An experienced user who has knowledge of the key may get an inclination as to what data the image patterns convey. Further, when short length messages are involved, it is wise to hide this text amidst some garbage text so that the experienced user will be able to decipher the message by looking at the image. The decryption process is accomplished by following the steps mentioned above in the reverse order. The image pattern is decoded to the protein pool, from which the non-mutated codon sequence is obtained. The individual gametes (sperms and ova) are obtained back from the codon sequence and are traced back to their 5’ End

m-RNA

3’ End

U A A G U A C G A C A G G C A A U U U A G G A U U U U

Arginine

A C A C G C A A A C G G G C

Threonine

Glycine

A A U C A G C C A U

Serine

Figure 2. Representation of site specific mutation in an mRNA (bit pattern)

respective parental cells. Decrypting further, the chromatin material (original file) is hence obtained.

5. Results We have tested the above algorithm to encode data files and projected them in the form of images. An example of encrypting a multi gray-level image (Figure 3(a)) is transformed to a coded image of Figure 3(b). Currently, we are working on standardizing the procedure and strengthening it. Results obtained from the timeanalysis of our algorithm reveal that the meiotic process consumes a larger percentage of the total time for encoding and decoding (which is also true biologically).

(a) (b) Figure 3: (a) Sample gray-level image, and (b) its encrypted version

6. Conclusions and Future Work In this work, we have devised an atypical approach to cryptography by amalgamating the highly arcane process of molecular genetics and the intriguing concept of image patterning. We have attempted to simulate various cellular processes that form the basis of producing new life. The concepts of crossing over and mutations introduce a highly probabilistic bearing to our encryption procedure. Images provide a window to the experienced user as to what the message might actually contain, but shall serve as an iron screen for anyone who has no knowledge of the key. The algorithm presented has been contrived in such a way that any file – text, image, voice or a sound file can be encrypted and delivered across the web. We are currently working on further optimization, standardization and strengthening of our process. To optimize the time constraint involved in running the algorithm we are trying to identify the concurrency. We are also exploring the possibilities of conveying meaningful information to an experienced user through pattern identification [8].

7. References [1] Herskowitz,H. I., Basic Principles of Molecular Genetics, Thomas Nelson, England, 1968. [2] H. Feistel, “Cryptography and computer privacy”, Scientific American, vol. 228, no. 5, p 15, May 1973. [3] Davies, D. W. and W. L. Price, Security for Computer Networks, John Wiley, England, 1989. [4] Mallette, M. F, C. O. Clagett, A. T. Phillips, and R. L. McCarl, Introductory Biochemistry, Williams and Wilkins, Baltimore, 1971. [5] D. Marr and H. K. Nishihara, “Visual information processing: artificial intelligence and the sensorium of sight”, Technology Review, vol. 81, no. 1, pp. 2-23, 1978. [6] Jain, A. K., Fundamentals of Digital Image Processing, Prentice Hall, New Jersey, 1989. [7] Beker, H. and F. Piper, Cipher Systems: The Protection of Communications, Northwood Books, London, 1982. [8] Duda, R. O., and P. E. Hart, Pattern Recognition and Scene Analysis, John Wiley, New York, 1973.