J. Appl. Math. Comput. (2017) 55:479–493 DOI 10.1007/s12190-016-1046-3 ORIGINAL RESEARCH
Cyclic DNA codes over F2 + uF2 + v F2 + uv F2 and their applications Shixin Zhu1 · Xiaojing Chen1
Received: 29 March 2016 / Published online: 24 September 2016 © Korean Society for Computational and Applied Mathematics 2016
Abstract In this paper, we study the structure of cyclic DNA codes of arbitrary length over the ring R = F2 + uF2 + vF2 + uvF2 , u 2 = 0, v 2 = v, uv = vu. By defining a Gray map, we establish a relation between R and R12 , where R1 = F2 + uF2 is a ring with four elements. Cyclic codes of arbitrary length over R satisfying the reverse constraint and the reverse-complement constraint are studied in this paper. Furthermore, we introduce reversible codes which provide a rich source for DNA codes. The GC content constraint is also considered. We give some examples to support our study in the last. Keywords Non-chain rings · Cyclic DNA codes · Reversible cyclic codes · Reversible-complement cyclic codes · The GC content
1 Introduction Algebraic coding theory of linear codes has attracted remarkable attention for the last half of the century. Cyclic codes are important families of linear codes because of their rich algebraic structures and practical implementations. Cyclic codes over the ring R1 have been extensively studied by many authors [1,5,6,12]. The focus on constructing
This research is supported by the National Natural Science Foundation of China (No. 61370089), the Anhui Provincial Natural Science Foundation under Grant (No.1508085SQA198, 1508085MA13).
B
Xiaojing Chen
[email protected] Shixin Zhu
[email protected]
1
Department of Mathematics, Hefei University of Technology, Hefei 230009, Anhui, People’s Republic of China
123
480
S. Zhu, X. Chen
codes was mainly over fields, but after the study in [10], finite rings have received a great deal of attention. Most of the studies were concentrated on codes over finite chain rings [7]. However, optimal codes over non-chain rings exist (e.g see [18]). But the case over a non-chain ring structure was more complicated in [4]. In [21], the algebraic structure of cyclic codes over F2 + vF2 , where v 2 = v were studied. Zhu and Wang gave a class of constacyclic codes over F p + vF p in [20]. Recent years, the DNA codes over rings have received more and more attention. Adleman [2] pioneered the studies on DNA computing by solving an instance of NPcomplete problem over DNA molecules. DNA is a nucleic acid containing the genetic instructions used in the development and functioning of all known living organisms. It is formed by strands linked together and twisted in the shape of a double helix. Each strand is a sequence consists of four possible nucleotides, two purines, adenine (A) and guanine (G), and two pyrimidines, thymine (T ) and cytosine (C). The ends of a DNA strand are chemically polar with 5 and 3 ends, which implies that the strands are oriented. DNA has two strands that are governed by the rule called Watson Crick complement (WCC), that is, A pairs with T and G pairs with C. We denote the WCC in this paper as A = T , T = A, G = C and C = G. The pairing is done in the opposite direction and reverse order. For instance, the WCC strand of 3 − T A AGC T C − 5 is the strand 5 − G AGC T T A − 3 . Furthermore, since DNA computing can store more memory than silicon based computing systems, there are many scholars begin to study it. Siap et al. [16] constructed cyclic DNA codes considering the GC content constraint over F2 [u]/(u 2 − 1) and used the deletion distance. Guenda and Gulliver [8] studied cyclic codes over F2 [u]/(u 2 ) satisfying the reverse constraint, the reverse-complement constraint and the GC content constraint, and an infinite family of BCH DNA codes were constructed. Liang and Wang [11] studied the cyclic DNA codes over F2 +uF2 . Yildiz and Siap [19] studied DNA pairs instead of single DNA bases for the first time, where 15 elements of a ring and DNA pairs were matched and the algebraic structure of these DNA codes were studied. Later in [14], DNA pairs were matched with F16 and by introducing some special polynomials DNA codes were constructed. It is also observed that, in some cases reversible codes introduced by Massey over GF(q) were useful for constructing DNA codes in [13]. Besides, Bayram et al. [3] have considered codes over the ring F4 + vF4 . The constacyclic codes and skew constacyclic codes over the ring were studied. And they studied the structure of DNA codes over the ring and presented applications to DNA codes. Furthermore, some work have been done on DNA codes over non-chain ring recently. We do such work over the ring F2 + uF2 + vF2 + uvF2 in the paper. In this paper, we study the structure of cyclic DNA codes of arbitrary length over the ring R. The rest of the paper is organized as follows. Section 2 includes some basic background and some basic results of cyclic codes of arbitrary length over R1 . In Sect. 3, we study cyclic codes satisfying the reverse constraint and reversecomplement constraint over such ring. Furthermore, the existence and the structure of such codes are entirely determined. In Sect. 4, we study the structure of DNA codes over R and present applications to DNA codes where some examples of such codes are optimal. In Sect. 5, we use the Gray image of the minimal generating sets of C to study the GC content of C . Section 6 concludes the paper.
123
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...
481
2 Preliminaries Let F2 be the binary field. Throughout this paper, R denotes the commutative ring F2 + uF2 + vF2 + uvF2 with u 2 = 0, v 2 = v and uv = vu with characteristic 2. Let R1 be the finite chain ring F2 + uF2 with u 2 = 0. The following results come from Srinivasulu and Bhaintwal [17]. R is a semi-local ring with two maximal ideals namely Iu+v and I1+u+v . The quotient rings R/Iu+v and R/I1+u+v are isomorphic to F2 . A direct decomposition of R is R = Iv ⊕ I1+v . We can also see that Iv and I1+v are isomorphic to R1 . So every element c in R can uniquely be written as c = a + bv, a, b ∈ R1 . R is isomorphic to the residue ring R1 [v]/v 2 − v. Note that Iv = {av|a ∈ R1 } and I1+v = {b(1 + v)|b ∈ R1 }. An important property of codes over the ring R is the existence of a mapping ξ called the Gray map which sends linear codes over R to binary linear codes. The Gray map from R to R12 is defined as ξ(a + bv) = (a, a + b). One type of nontrivial automorphisms can be defined over R as follows : σ : F2 + uF2 + vF2 + uvF2 → F2 + uF2 + vF2 + uvF2 , a + bv → a + (1 + v)b, where a, b ∈ F2 + uF2 . Let S D4 = {A, T, C, G} represent the DNA alphabet. We use the same notation for the set
S D16 = {A A, AT, AC, AG, T T, T A, T C, T G, CC, C A, C T, C G, GG, G A, GT, GC},
which was originally presented in [14]. We define a ζ correspondence between the elements of the ring R and DNA double pairs presented explicitly in Table 1. The elements 0, 1, u, 1 + u of R1 are in one-to-one correspondence with the nucleotide DNA bases A, T , C, G, respectively, such that 0 → A, 1 → G, u → T and 1+u → C. Naturally we extend The Watson Crick complement to the elements of S D16 such that A A = T T, · · · , T G = AC. For any a ∈ R, we can define a as the complement of a, where ζ (a) = ζ (a). Obviously, a and a is one-to-one and a = a. For each c = (c0 , c1 , · · · , cn−1 ) ∈ R n , we define the reverse of c as cr = (cn−1 , cn−2 , · · · , c0 ), the complement of c as c = (c0 , c1 , · · · , cn−1 ) and the reversecomplement of c as cr = (cn−1 , cn−2 , · · · , c0 ). An n-tuple c is identified with the polynomial c(x) = c0 + c1 x + · · · + cn−1 x n−1 in the ring R[x]/(x n − 1). Furthermore, for any c(x) = c0 + c1 x + · · · + cn−1 x n−1 , we define the reverse of c(x) as cr (x) = cn−1 + cn−2 x + · · · + c0 x n−1 , the complement of c(x) as c(x) = c0 + c1 x + · · · + cn−1 x n−1 and the reverse-complement of c(x) as cr (x) = cn−1 + cn−2 x + · · · + c0 x n−1 .
123
482
S. Zhu, X. Chen
Table 1 ζ -table for DNA correspondence
Elements a
Gray images
DNA double pairs ζ (a) AA
0
(0, 0)
v
(0, 1)
AG
uv
(0, u)
AT
v + uv
(0, 1 + u)
AC
1
(1, 1)
GG
1+v
(1, 0)
GA
1 + uv
(1, u + 1)
GC
1 + v + uv
(1, u)
GT
u
(u, u)
TT
u+v
(u, 1 + u)
TC
u + uv
(u, 0)
TA
u + v + uv
(u, 1)
TG
1+u
(1 + u, 1 + u)
CC
1+u+v
(1 + u, u)
CT
1 + u + uv
(1 + u, 1)
CG
1 + u + v + uv
(1 + u, 0)
CA
A nonempty subset C of R n will be called an linear code over R of length n if C is an R-submodule of R n . C is said to be cyclic if (cn−1 , c0 , . . . , cn−2 ) ∈ C for all (c0 , c1 , . . . , cn−1 ) ∈ C. Let C be a linear code over R. The following results are presented in [9], let C1 = {x + y ∈ R1n |(x + y)v + x(v + 1) ∈ C , f or some x, y ∈ R1n }, C2 = {x ∈ R1n |(x + y)v + x(v + 1) ∈ C , f or some y ∈ R1n }. Note that C1 and C2 are linear codes over R1 . Consequently, C = vC1 ⊕ (1 + v)C2 . Lemma 2.1 [3] (1) Let C be a linear code over R such that C = vC1 ⊕ (1 + v)C2 . Then, C is a cyclic code if and only if C1 and C2 are both cyclic codes over R1 . (2) If C = vC1 ⊕ (1 + v)C2 is a cyclic code of length n over R, then C = (v f 1 , (1 + v) f 2 ), where f 1 and f 2 are generator polynomials of C1 and C2 , respectively. Remark 1 In this paper, we use f , f to represent f (x) and (x n −1)/ f (x), respectively, if don’t confuse. Recall that the Hamming weight of a codeword c is defined by w H (c) = |{i|ci = 0}|, i.e., the number of the nonzero entries of c. The minimum Hamming weight w H (c) of a code C is the smallest possible weight among all its nonzero codewords. The Hamming distance d(c1 , c2 ) between two codewords c1 and c2 is the Hamming weight of the codeword c1 − c2 . The minimum Hamming distance d(C ) of C is defined as min{d(c1 , c2 )|c1 , c2 ∈ C , c1 = c2 }.
123
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...
483
For some prescribed minimum distance d, a code is called a DNA code if it satisfies some or all of the following conditions: (1) The Hamming constraint For any two different codewords c1 , c2 ∈ C , d(c1 , c2 ) ≥ d. (2) The reverse constraint For any two codewords c1 , c2 ∈ C , d(c1 , cr2 ) ≥ d. (3) The reverse-complement constraint For any two codewords c1 , c2 ∈ C , d(c1 , cr2 ) ≥ d. (4) The fixed GC content constraint For any codeword c ∈ C contains the same number of G and C elements. The purpose of the first three constraints is to avoid undesirable hybridization between different strands. The fixed GC content ensures that all codewords have similar thermodynamic characteristics, which allows parallel operations on DNA sequences. The structure of cyclic codes of arbitrary length n over R1 has been extensively studied in [1], which is Theorem 2.2 Let C be a cyclic code in R1,n = R1 [x]/(x n − 1). Then (1) If n is odd, then R1n is a principal ideal ring and C = (g, ua) = (g + ua), where g, a are binary polynomials with a | g | (x n − 1) mod 2. (2) If n is not odd, then (2.1) C = (g + up), where g | (x n − 1) mod 2 and (g + up) | (x n − 1) in R and g | p g . Or, (2.2) C = (g + up, ua), where g, a and p are binary polynomials with a | g | g and degp < dega. (x n − 1) mod 2, a | p
3 The reverse constraint and reverse-complement constraint codes In this section, we main study the reverse constraint and the reverse-complement constraint codes over R. We begin with the following definition. Definition 3.1 A cyclic code C of length n over R is said to be reversible if cr ∈ C for all c ∈ C , complement if c ∈ C for all c ∈ C and reversible-complement if cr ∈ C for all c ∈ C . For each polynomial f (x) = a0 + a1 x + · · · + as x s with as = 0, we define the reciprocal of f (x) to be the polynomial f ∗ (x) = as + as−1 x + · · · + a0 x s = x deg( f (x)) f (x −1 ). We note that deg f ∗ (x) ≤ deg f (x), and if a0 = 0, then f (x) and f ∗ (x) always have the same degree. f (x) is called self-reciprocal if and only if f (x) = f ∗ (x). There are many properties of polynomial reciprocal, we list two of them in the following which will be used later in the paper. Lemma 3.2 [1] Let f , g be any two polynomials in R with degg ≤ deg f . Then 1. ( f ·g)∗ = f ∗ ·g ∗ ; 2. ( f + g)∗ = f ∗ + x deg f −degg g ∗ .
123
484
S. Zhu, X. Chen
3.1 The reverse constraint codes In this part, we study the reverse constraint codes over the ring R. Now we give some important Lemmas below. Lemma 3.3 [8] Let C = (g, ua) = (g + ua) be a cyclic code of odd length n over R1 . Then C is reversible if and only if g and a are self-reciprocal. Lemma 3.4 [11] Let C = (g + up) be a cyclic code of even length n over R1 . Then C is reversible if and only if 1. g is self-reciprocal; 2. (a) x i p ∗ = p. Or (b) g = x i p ∗ + p, where i = degg − degp. g and Lemma 3.5 [11] Let C = (g + up, ua) with a | g | (x n − 1) mod 2, a | p deg p ≤ dega be a cyclic code of even length n over R1 . Then C is reversible if and only if 1. g and a are self-reciprocal; 2. a | (x i p ∗ + p), where i = degg − degp. We will give one of the main conclusions below. Theorem 3.6 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R. Then C is reversible if and only if C1 and C2 are reversible, respectively, where C1 and C2 are both cyclic codes over R1 . Proof On the one hand, if C1 and C2 are reversible, then for any b ∈ C , b = vb1 + (1 + v)b2 where b1 ∈ C1 and b2 ∈ C2 . We can easy know that br1 ∈ C1 and br2 ∈ C2 , thus br = vbr1 + (1 + v)br2 ∈ C . Hence C is reversible. On the other hand, if C is reversible, then for any b = vb1 + (1 + v)b2 ∈ C , where b1 ∈ C1 and b2 ∈ C2 . we have br = vbr1 +(1+v)br2 ∈ C . Let br = vbr1 +(1+v)br2 = ve1 + (1 + v)e2 , where e1 ∈ C1 and e2 ∈ C2 . Then v(br1 − e1 ) + (1 + v)(br2 − e2 ) = 0, thus br1 = e1 ∈ C1 and br2 = e2 ∈ C2 . Hence C1 and C2 are reversible, respectively. Example 3.7 Let x 8 − 1 = (x + 1)8 = g 8 over F2 . Let C1 = ( f 1 ) = (g1 + up1 ), g1 = g 6 , p1 = x 5 + x, C2 = ( f 2 ) = (g2 + up2 ), g2 = g 4 , p2 = x 3 + x. It is easy to check that g1 and g2 are self-reciprocal, x i p1∗ = p1 and x j p2∗ = p2 , where i = degg1 − degp1 , j = degg2 − degp2 . On the one hand, since C = ( f ) = (v(g1 + up1 ) + (1 + v)(g2 + up2 )), clearly we have f = vx 6 + uvx 5 + x 4 + (u + uv)x 3 + vx 2 + ux + 1 ∈ C , f r = vx + uvx 2 + x 3 + (u + uv)x 4 + vx 5 + ux 6 + x 7 . On the other hand, (vx + (1 + v)x 3 ) f = vx + uvx 2 + x 3 + (u + uv)x 4 + vx 5 + ux 6 + x 7 = f r ∈ C . By Theorem 3.6, C is a reversible code of length 8 over R.
123
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...
485
3.2 The reverse-complement constraint codes In this section, cyclic codes of arbitrary length n satisfying the reverse-complement are examined. We give a useful lemma firstly which can be easily proved. Lemma 3.8 For any c ∈ R, we have c + c = u. Let a, b ∈ R, then a + b = a + b + u. Besides, if c ∈ F2 , then we have u + uc = uc. We will give one of our main conclusions below. Theorem 3.9 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R. Then C is reversible-complement if and only if C is reversible and (0, 0, · · · , 0) ∈ C , where C1 and C2 are both cyclic codes over R1 . Proof On the one hand, suppose C = vC1 ⊕(1+v)C2 , where C1 and C2 are both cyclic codes over R1 . For any c = (c0 , c1 , · · · , cn−1 ) ∈ C , cr = (cn−1 , cn−2 , · · · , c0 ) ∈ C because of C is reversible-complement. Since the zero codeword is in C , its WCC is also in C , i.e., (0, 0, · · · , 0) ∈ C . Whence, cr = (cn−1 , cn−2 , · · · , c0 ) = (cn−1 , cn−2 , · · · , c0 ) + (0, 0, · · · , 0) ∈ C . On the other hand, if C is reversible, then for any c = (c0 , c1 , · · · , cn−1 ) ∈ C , cr = (cn−1 , cn−2 , · · · , c0 ) ∈ C . Since (0, 0, · · · , 0) ∈ C , we get cr = (cn−1 , cn−2 , · · · , c0 ) = (cn−1 , cn−2 , · · · , c0 ) + (0, 0, · · · , 0) ∈ C . Therefore, C is reversible-complement.
Example 3.10 In Example 3.7, since C is reversible, if (0, 0, · · · , 0) ∈ C , we can get C is reversible-complement immediately. Let C be a cyclic code of arbitrary length n over R. Then we can get the conditions that C is reversible or reversible-complement easily by using Lemmas 2.1, 3.3, 3.4, 3.5 and Theorems 2.2, 3.6, 3.9.
4 DNA codes over R In this section, the design of linear DNA codes is presented. We obtain DNA codes over R of arbitrary length that correspond to DNA double pairs. We begin with the following definitions. Definition 4.1 Let C be a code over R of arbitrary length n and c ∈ C be a codeword where c = (c0 , c1 , · · · , cn−1 ), ci ∈ R, then, by Table 1, we define
123
486
S. Zhu, X. Chen 2n (c) : C → S D 4
(a0 + b0 v, a1 + b1 v, · · · , an−1 + bn−1 v) → (a0 , a1 , · · · , an−1 , a0 + b0 , a1 + b1 , · · · , an−1 + bn−1 ). For instance, (c0 , c1 , c2 , c3 ) = (1, v, u, u + v) is mapped to (1, v, u, u + v) = (GATTGGTC). Definition 4.2 Let f 1 and f 2 be polynomials with deg f 1 = t1 , deg f 2 = t2 and both dividing x n − 1 over R1 . Let m = min{n − t1 , n − t2 } and f = v f 1 + (1 + v) f 2 over R. The set L( f ) is called a σ -set and is defined as L( f ) = {E 0 , E 1 , · · · , E m−1 , F0 , F1 , · · · , Fm−1 }, where E i = x i f, Fi = x i σ (h), 0 ≤ i ≤ m − 1, h = vx t2 −t1 f 1 + (1 + v) f 2 if t2 ≥ t1 , h = v f 1 + (1 + v)x t1 −t2 f 2 otherwise. L( f ) generates a linear code C over R denoted by C = f σ . Remark 2 In this paper, the notation L( f ) or f σ denotes the R-module generated by the set L( f ). The notation ( f ) stands for the ideal generated by f . Let f = a0 + a1 x + a2 x 2 + · · · + at x t over R, σ (h) = b0 + b1 x + b2 x 2 + · · · + bs x s and the R-submodule generated by L( f ) can be considered to be generated by the rows of the following matrix ⎞ ⎛ E0 a0 ⎜ F0 ⎟ ⎜ ⎜ ⎟ ⎜ b0 ⎜ ⎟ 0 L( f ) = ⎜ E 1 ⎟ = ⎜ ⎜ F1 ⎟ ⎜ ⎝ ⎠ ⎝0 .. ··· . ⎛
a1 b1 a0 b0 ···
a2 · · · b2 · · · a1 · · · b1 · · · ··· ···
at · · · 0 · · · · · · · · · bs · · · · · · at · · · · · · · · · · · · · · · bs ··· ··· ··· ···
··· ··· ··· ··· ···
⎞ 0 0⎟ ⎟ 0⎟ ⎟. 0⎠ ···
Theorem 4.3 Let f 1 and f 2 be self-reciprocal polynomials dividing x n − 1 over R1 with degrees t1 and t2 , respectively. If f 1 = f 2 , then f = v f 1 + (1 + v) f 2 and |L( f )| = 16m . Besides, C = L( f ) is a linear code over R and (C ) is a reversible DNA code. Proof Most of the claims follow from the algebraic structures that are discussed before. Especially, the reverse of each DNA code given by C = L( f ) over R is shown to fall inside the codes by the following observation r
αi E i + βi Fi = σ (αi )Fm−1−i + σ (βi )E m−1−i , where αi , βi ∈ R and 0 ≤ i ≤ m − 1.
Below we give an example which illustrates the power of Theorem 4.3. Example 4.4 Let f 1 = x + 1 and f 2 = x 6 + x 3 + 1 where both divide x 9 − 1 over F2 . Hence f = v f 1 + (1 + v) f 2 = 1 + vx + (1 + v)x 3 + (1 + v)x 6 , σ (h) =
123
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...
487
v +vx 3 +(1+v)x 5 + x 6 . C = L( f ) is a linear code over R and (C ) is a reversible DNA code. Now we consider the generator matrix of C . ⎞ ⎛ 1 E0 ⎜ F0 ⎟ ⎜v ⎜ ⎟ ⎜ ⎜ E 1 ⎟ ⎜0 ⎜ ⎟=⎜ ⎜ F1 ⎟ ⎜0 ⎜ ⎟ ⎜ ⎝ E 2 ⎠ ⎝0 0 F2 ⎛
v 0 1 v 0 0
0 0 v 0 1 v
1+v v 0 0 v 0
0 0 1+v v 0 0
0 1+v 0 0 1+v v
1+v 1 0 1+v 0 0
0 0 1+v 1 0 1+v
⎞ 0 0 ⎟ ⎟ 0 ⎟ ⎟. 0 ⎟ ⎟ 1 + v⎠ 1
If we take α0 = 0, α1 = 1, α2 = u, β0 = 0, β1 = 1 and β2 = v, then α0 E 0 +α1 E 1 + α2 E 2 +β0 F0 +β1 F1 +β2 F2 = (1+v)x +ux 2 +uvx 3 +x 4 +(u+v+uv)x 5 +(1+v)x 6 + vx 7 +(u +v+uv)x 8 and this corresponds to the codeword c1 = (0, 1+v, u, uv, 1, u + v + uv, 1 + v, v, u + v + uv). Hence (c1 ) = (AGTAGTGATAATTGGAGG). Furthermore, σ (α0 )F2 + σ (α1 )F1 + σ (α2 )F0 + σ (β0 )E 2 + σ (β1 )E 1 + σ (β2 )E 0 = 1 + v + uv + (1 + v)x + vx 2 + (1 + v + uv)x 3 + x 4 + (u + uv)x 5 + ux 6 + vx 7 corresponds to the codeword c2 = (1 + v + uv, 1 + v, v, 1 + v + uv, 1, u + uv, u, v, 0) and thus (c2 ) = (GGAGGTTAATAGTGATGA). Therefore, ((c1 ))r = (c2 ). Corollary 4.5 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, C1 and C2 are reversible and C = L( f ) be a linear code over R. If (0, 0, · · · , 0) ∈ C , then (C ) gives a reversible-complement DNA code. Proof It follows from Theorems 3.7, 4.3, 3.9 immediately.
Example 4.6 Let f 1 = x + 1 and f 2 = x 6 + x 5 + x 4 + x 3 + x 2 + x + 1 where both divide x 7 − 1 over F2 . Hence C = v f 1 + (1 + v) f 2 σ = 1 + x + (1 + v)x 2 + (1 + v)x 3 + (1 + v)x 4 + (1 + v)x 5 + (1 + v)x 6 σ , is a σ -linear code over R and (C ) is a reversible-complement DNA code since (0, 0, · · · , 0) ∈ C . Corollary 4.7 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, where C1 , C2 are reversible and C = L( f ) be a linear code over R and (C ) be a reversible DNA code. If (0, 0, · · · , 0) is added to generator set L( f ), then (C ) is a reversible-complement DNA code. Theorem 4.8 Let C1 is reversible and f = v f 1 + (1 + v) f 1 over R. Then C = ( f ) is a reversible cyclic code over R and (C ) is a reversible DNA code. If x − 1 f , then (C ) is a reversible-complement DNA code. Proof Let the dimension of the code C be k. Suppose that the linear cyclic codes C has generator matrix with rows f, x f, · · · , x k−1 f . If we use the σ -set L( f ), then we observe that
r
i k−1−i αi x f = σ (αi )x f , i
i
where α ∈ R and 0 ≤ i ≤ k − 1 since f = v f 1 + (1 + v) f 2 and coefficients of f are solely from R1 which proves the reversibility in DNA. Thus, we can use the
123
488
S. Zhu, X. Chen
generator matrix of a linear cyclic code or the σ -set L( f ) since σ does not effect the coefficients. If x − 1 f , then C contains 1 + x + · · · + x n−1 . Therefore, (C ) gives a reversible-complement DNA code by Corollary 4.7.
5 The GC weight As we all know, a DNA code with the same GC weight (content) in every codeword ensures that the codewords have similar thermodynamic characteristics (i.e. melting temperature and hybridization energy). In this section, we will study the GC weight over R by the Gray image. In order to study the GC weight over R, we give the following lemmas first which can be received from [12] easily.
Lemma 5.1 Let C be a cyclic code over R1 . Then there are unique polynomials g p) and g, a, p in F2 [x], s.t. C = (g + up, ua), where a | g | x n − 1, a| gcd(g, degp < dega.
Lemma 5.2 Let C be a cyclic code over R1 . If (n, 2) = 1, then C = (g, ua) = (g + ua),
(1) If a = g, we have C = (g). It is a free-module with rank of n − degg and a set of basis is {g, xg, · · · , x n−degg−1 g}; (2) If a = g, then C is not a free-module which rank is n − dega. The minimal generating sets is {g, xg, · · · , x n−degg−1 g, ua, uxa, · · · , ux degg−dega−1 a}.
Lemma 5.3 Let C be a cyclic code over R1 . If (n, 2) = 1, then
(1) If a = gcd(g, g p), we have C = (g + up). (a) If g = a, it is a free-module with rank of n − degg and a set of basis is {g + up, x(g + up), · · · , x n−degg−1 (g + up)}; (b) If dega < degg, the minimal generating sets is g+up, x(g+up), · · · , x n−degg−1 (g+up), ua, uxa, · · · , ux degg−dega−1 a ;
(2) If a = gcd(g, g p), then C is not a free-module, where its rank is n − dega. the minimal generating sets is g + up, x(g + up), · · · , x n−degg−1 (g + up), ua, uxa, · · · , ux degg−dega−1 a . Using the lemmas above and the structure of C we have already received, we can get the following Theorems immediately. Theorem 5.4 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, where C1 and C2 are both cyclic codes over R1 . Then C has a minimal generating sets = v + (1 + v) , where , are the minimal generating sets of C1 and C2 , respectively.
123
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...
489
Now we have already had the minimal generating sets of C , so we can study its Gray image. As we all know, the minimal generating sets of C is = v + (1 + v) and the Gray map from R to R12 is defined as ξ(a + bv) = (a, a + b). Then the Gray image of the minimal generating sets of C is () = x n + , where , are the minimal generating sets of C1 and C2 , respectively. If we can prove ζ is a linear transformation, then the GC weight over R is given by the Hamming weight of u() = u(x n + ). For any x = a1 + b1 v, y = a2 + b2 v ∈ C , ξ(x + y) = (a1 + a2 ) + (a1 + a2 + b1 + b2 )v = ξ(x) + ξ( y). Theorem 5.5 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, C1 = (g1 + up1 , ua1 ), C2 = (g2 + up2 , ua2 ), where a1 | g1 | x n − 1, a2 | g2 | x n − 1, degp1 < dega1 and degp2 < dega2 . Then the GC weight over R is given by the Hamming weight enumerator = x n g1 , xg1 , · · · , x n−degg1 −1 g1 + g2 , xg2 , · · · , x n−degg2 −1 g2 . Proof The GC content is obtained by multiplying the Gray image of the minimal generating sets of C by u, and from Theorem 5.4 we have u() = ux n g1 , xg1 , · · · , x n−degg1 −1 g1 + u g2 , xg2 , · · · , x n−degg2 −1 g2 . Hence the GC content is given by the Hamming weight enumerator = x n g1 , xg1 , · · · , x n−degg1 −1 g1 + g2 , xg2 , · · · , x n−degg2 −1 g2 .
At the end of this section, we give some examples to illustrate the main work in this paper. Example 5.6 Let x 3 − 1 = (x + 1)(x 2 + x + 1) = g1 g2 ∈ F2 [x]. Let C1 = C2 = (g, ua) be a cyclic code of length 3 over R1 , where g = g2 , a = g2 . The image of C under the Gray map is a DNA code of length 6. This code has 16 codewords which are listed in the Table 2. Table 2 All 16 codewords of C AAAAAA
GGGGGG
TTTTTT
CCCCCC
A A AGGG
GGG A A A
T T T CCC
CCC T T T
A A AT T T
GGGCCC
T T T AAA
CCC GGG
A A ACCC
GGGT T T
T T T GGG
CCC A A A
123
123
859019673
572679782
286352998
859028411
30583
286344260
2290649224
2863307161
3149638314
2863267840
3149607731
2290679807
2576993484
2290640486
2576962901
3149612100
1717991287
2004309333
1718021870
1431695086
859032780
2863315899
3149633945
2863276578
2576949794
2290671069
2577006591
2290631748
2863289685
3149620838
1717978180
2004318071
1145372671
1431686348
286392319
286383581
56797
572701627
572688520
61166
858980388
858989090
8738
286331153
572657937
0
2004353023
1718013132
2004313702
1717982549
1145324612
2576971639
2290636117
2577002222
2863329006
3149598993
2863280947
3149629576
2576980377
2290657962
286379212
65535
859010935
572671044
859024042
572692889
34952
286339891
4369
Table 3 DNA correspondence of C = L( f ) explained in Example 5.7 13107
2004344285
1718026239
2004304964
1431655765
1145333350
2576958532
2290644855
3149660159
2863320268
3149603362
2863272209
2290614272
2576989115
2290653593
286387950
572714734
859002197
572684151
859015304
286366105
43690
286326784
572662306
2004348654
1718017501
1145359564
1431664503
1145328981
2576967270
2863294054
3149651421
2863333375
3149594624
2576945425
2290623010
2576976008
2290662331
859045887
572705996
859006566
572675413
17476
286374843
39321
286335522
572653568
2004339916
1431690717
1145368302
1431651396
1145337719
3149625207
2863285316
3149655790
2863324637
2290666700
2576954163
2290618641
2576984746
2863311530
859037149
572719103
858997828
286348629
26214
286361736
48059
858993459
572666675
1145307136
1431699455
1145363933
1431660134
1717986918
3149616469
2863298423
3149647052
2576997853
2290675438
2576941056
2290627379
3149642683
2863302792
859041518
572710365
52428
286357367
21845
286390474
572697258
858984721
490 S. Zhu, X. Chen
1145311505
1431647027
1145342088
1718000025
2004331178
4008627404
4294967295
3435951991
3722265668
3435965098
3722287513
4294936712
4008588083
4294906129
1145315874
1431638289
2004287488
1718008763
2004326809
4008636142
3722309358
3435943253
3722278775
3435956360
4008614297
4294945450
4008574976
4294914867
Table 3 continued
4294910498
4008579345
3435921408
3722296251
3435960729
3722274406
4008601190
4294958557
4008640511
2004322440
1431673241
1145350826
1431633920
1145320243
4294901760
3722252561
3435930146
3722283144
3435969467
4294932343
4008592452
4294962926
4008631773
3435973836
1431681979
1145346457
1431642658
1717969442
3722261299
3435925777
3722291882
4008618666
4294923605
4008605559
4294954188
3722304989
3435982574
1431668872
1145355195
2004300595
1717960704
3722248192
3435934515
4294949819
4008609928
4294927974
4008596821
3435938884
3722313727
3435978205
1431677610
1718004394
2004291857
1717973811
3722256930
4008583714
4294941081
4008623035
4294919236
3722270037
3435947622
3722300620
3435986943
2004335547
1717995656
2004296226
1717965073
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ... 491
123
492
S. Zhu, X. Chen
In the following example, we obtain some optimal codes over R where f 1 = f 2 which satisfying the max Griesmer bound has been given by Shiromoto and Storme [15]. Example 5.7 Let f 1 = 1 + x 2 + x 4 + x 6 = f 2 be a self-reciprocal polynomial where f 1 | x 8 −1 over R. C = L( f ) is a cyclic linear code over R that attains the maximum Griesmer bound on R with parameters [2,4,8]. Also (C ) is a reversible DNA code which is not complement because (x + 1) | f 1 . We assign the DNA bases A, T , G, C to 0, 1, 2 and 3, respectively. A DNA string is converted to quaternary number system and then to the decimal system to save some space in Table 3. For instance, 859024042 represents ACACACACGGGGGGGG.
6 Conclusion Algebraic structure of codes have already acquired over the non-chain ring R with 16 elements. The DNA codes over R are studied which are obtained by using a special auto-morphism and properties of cyclic codes. We introduced these codes correspond to reversible and reversible-complement DNA codes with DNA double pairs by means of a special DNA corresponding table. Finally, the GC weight over R is studied by using the Gray image. Some examples are given to demonstrate our study is significative. Acknowledgements The authors would like to thank the referees for their helpful comments and a very meticulous reading of this manuscript.
References 1. Abualrub, T., Siap, I.: Cyclic codes over the rings Z2 + uZ2 and Z2 + uZ2 + u 2 Z2 . Des. Codes Cryptogr. 42, 273–287 (2007) 2. Adleman, L.: Molecular computation of solutions to combinatorial problem. Science 266, 1021–1024 (1994) 3. Bayram, A., Oztas, E.S., Siap, I.: Codes over F4 + vF4 and some DNA applications. Des. Codes Cryptogr. (2015). doi:10.1007/s10623-015-0100-8 4. Bayram, A., Siap, I.: Cyclic and constacyclic codes over a non-chain ring. J. Algebra Comb. Discret. Struct. Appl. 1, 1–13 (2014) 5. Bonnecaze, A., Udaya, P.: Cyclic codes and self-dual codes over F2 + uF2 . IEEE Trans. Inf. Theory. 45(4), 1250–1255 (1999) 6. Bonnecaze, A., Udaya, P.: Decoding of cyclic codes over F2 + uF2 . IEEE Trans. Inf. Theory 45(6), 2148–2156 (1999) 7. Dinh, H., López-Permouth, S.R.: Cyclic and negacyclic codes over finite chain rings. IEEE Trans. Inf. Theory 50(8), 1728–1744 (2000) 8. Guenda, K., Gulliver, T.A.: Construction of cyclic codes over F2 + uF2 for DNA computing. AAECC 24(6), 445–459 (2013) 9. Gursoy, F., Siap, I., Yildiz, B.: Construction of skew cyclic codes over Fq +vFq . Adv. Math. Commun. 8, 313–322 (2014) 10. Hammons, A.R., Kumar, P.V., Calderbank, A.R., Sloane, N.J.A., Solé, P.: The Z4 -linearity of Kerdock, Preparata, Goethals, and related codes. IEEE Trans. Inf. Theory. 40, 301–319 (1994) 11. Liang, J., Wang, L.Q.: Cyclic DNA codes over F2 + uF2 . J. Appl. Math. Comput. 51(1), 81–91 (2016) 12. Li, P., Zhu, S.X.: Cyclic codes of arbitrary lengths over Fq + uFq . J. Univ. Sci. Technol. China 38(12), 1392–1396 (2008) 13. Massey, J.L.: Reversible codes. Inf. Control 7, 369–380 (1964)
123
Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...
493
14. Oztas, E.S., Siap, I.: Lifted polynomials over F16 and their applications to DNA codes. Filomat 27, 459–466 (2013) 15. Shiromoto, K., Storme, L.: A Griesmer bound for linear codes over finite quasi-Frobenius rings. Discret. Appl. Math. 128, 263–274 (2003) 16. Siap, I., Abualrub, T., Ghrayeb, A.: Cyclic DNA codes over the ring F2 [u]/(u 2 − 1) based on the deletion distance. J. Frankl. Inst. 346, 731–740 (2009) 17. Srinivasulu, B., Bhaintwal, M.: On linear codes over a non-chain extension of F2 + uF2 . In: Computer, Communication, Control and Information Technology (C3IT), 2015 Third International Conference on. IEEE, pp. 1–5 (2015) 18. Yildiz, B., Karadeniz, S.: Linear codes over F2 + uF2 + vF2 + uvF2 . Des. Codes Cryptogr. 54, 61–81 (2010) 19. Yildiz, B., Siap, I.: Cyclic codes over F2 [u]/(u 4 − 1) and applications to DNA codes. Comput. Math. Appl. 63, 1169–1176 (2012) 20. Zhu, S.X., Wang, L.Q.: A class of constacyclic codes over F p + vF p and its Gray image. Discret. Math. Theory 311(23), 2677–2682 (2011) 21. Zhu, S.X., Wang, Y., Shi, M.J.: Some result on cyclic codes over F2 + vF2 . IEEE Trans. Inf. Theory 56, 1680–1684 (2010)
123