Cyclic DNA codes over and their applications - Springer Link

J. Appl. Math. Comput. (2017) 55:479–493 DOI 10.1007/s12190-016-1046-3 ORIGINAL RESEARCH

Cyclic DNA codes over F2 + uF2 + v F2 + uv F2 and their applications Shixin Zhu1 · Xiaojing Chen1

Received: 29 March 2016 / Published online: 24 September 2016 © Korean Society for Computational and Applied Mathematics 2016

Abstract In this paper, we study the structure of cyclic DNA codes of arbitrary length over the ring R = F2 + uF2 + vF2 + uvF2 , u 2 = 0, v 2 = v, uv = vu. By defining a Gray map, we establish a relation between R and R12 , where R1 = F2 + uF2 is a ring with four elements. Cyclic codes of arbitrary length over R satisfying the reverse constraint and the reverse-complement constraint are studied in this paper. Furthermore, we introduce reversible codes which provide a rich source for DNA codes. The GC content constraint is also considered. We give some examples to support our study in the last. Keywords Non-chain rings · Cyclic DNA codes · Reversible cyclic codes · Reversible-complement cyclic codes · The GC content

1 Introduction Algebraic coding theory of linear codes has attracted remarkable attention for the last half of the century. Cyclic codes are important families of linear codes because of their rich algebraic structures and practical implementations. Cyclic codes over the ring R1 have been extensively studied by many authors [1,5,6,12]. The focus on constructing

This research is supported by the National Natural Science Foundation of China (No. 61370089), the Anhui Provincial Natural Science Foundation under Grant (No.1508085SQA198, 1508085MA13).

B

Xiaojing Chen [email protected] Shixin Zhu [email protected]

1

Department of Mathematics, Hefei University of Technology, Hefei 230009, Anhui, People’s Republic of China

123

480

S. Zhu, X. Chen

codes was mainly over fields, but after the study in [10], finite rings have received a great deal of attention. Most of the studies were concentrated on codes over finite chain rings [7]. However, optimal codes over non-chain rings exist (e.g see [18]). But the case over a non-chain ring structure was more complicated in [4]. In [21], the algebraic structure of cyclic codes over F2 + vF2 , where v 2 = v were studied. Zhu and Wang gave a class of constacyclic codes over F p + vF p in [20]. Recent years, the DNA codes over rings have received more and more attention. Adleman [2] pioneered the studies on DNA computing by solving an instance of NPcomplete problem over DNA molecules. DNA is a nucleic acid containing the genetic instructions used in the development and functioning of all known living organisms. It is formed by strands linked together and twisted in the shape of a double helix. Each strand is a sequence consists of four possible nucleotides, two purines, adenine (A) and guanine (G), and two pyrimidines, thymine (T ) and cytosine (C). The ends of a DNA strand are chemically polar with 5 and 3 ends, which implies that the strands are oriented. DNA has two strands that are governed by the rule called Watson Crick complement (WCC), that is, A pairs with T and G pairs with C. We denote the WCC in this paper as A = T , T = A, G = C and C = G. The pairing is done in the opposite direction and reverse order. For instance, the WCC strand of 3 − T A AGC T C − 5 is the strand 5 − G AGC T T A − 3 . Furthermore, since DNA computing can store more memory than silicon based computing systems, there are many scholars begin to study it. Siap et al. [16] constructed cyclic DNA codes considering the GC content constraint over F2 [u]/(u 2 − 1) and used the deletion distance. Guenda and Gulliver [8] studied cyclic codes over F2 [u]/(u 2 ) satisfying the reverse constraint, the reverse-complement constraint and the GC content constraint, and an infinite family of BCH DNA codes were constructed. Liang and Wang [11] studied the cyclic DNA codes over F2 +uF2 . Yildiz and Siap [19] studied DNA pairs instead of single DNA bases for the first time, where 15 elements of a ring and DNA pairs were matched and the algebraic structure of these DNA codes were studied. Later in [14], DNA pairs were matched with F16 and by introducing some special polynomials DNA codes were constructed. It is also observed that, in some cases reversible codes introduced by Massey over GF(q) were useful for constructing DNA codes in [13]. Besides, Bayram et al. [3] have considered codes over the ring F4 + vF4 . The constacyclic codes and skew constacyclic codes over the ring were studied. And they studied the structure of DNA codes over the ring and presented applications to DNA codes. Furthermore, some work have been done on DNA codes over non-chain ring recently. We do such work over the ring F2 + uF2 + vF2 + uvF2 in the paper. In this paper, we study the structure of cyclic DNA codes of arbitrary length over the ring R. The rest of the paper is organized as follows. Section 2 includes some basic background and some basic results of cyclic codes of arbitrary length over R1 . In Sect. 3, we study cyclic codes satisfying the reverse constraint and reversecomplement constraint over such ring. Furthermore, the existence and the structure of such codes are entirely determined. In Sect. 4, we study the structure of DNA codes over R and present applications to DNA codes where some examples of such codes are optimal. In Sect. 5, we use the Gray image of the minimal generating sets of C to study the GC content of C . Section 6 concludes the paper.

123

Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ...

481

2 Preliminaries Let F2 be the binary field. Throughout this paper, R denotes the commutative ring F2 + uF2 + vF2 + uvF2 with u 2 = 0, v 2 = v and uv = vu with characteristic 2. Let R1 be the finite chain ring F2 + uF2 with u 2 = 0. The following results come from Srinivasulu and Bhaintwal [17]. R is a semi-local ring with two maximal ideals namely Iu+v and I1+u+v . The quotient rings R/Iu+v and R/I1+u+v are isomorphic to F2 . A direct decomposition of R is R = Iv ⊕ I1+v . We can also see that Iv and I1+v are isomorphic to R1 . So every element c in R can uniquely be written as c = a + bv, a, b ∈ R1 . R is isomorphic to the residue ring R1 [v]/v 2 − v. Note that Iv = {av|a ∈ R1 } and I1+v = {b(1 + v)|b ∈ R1 }. An important property of codes over the ring R is the existence of a mapping ξ called the Gray map which sends linear codes over R to binary linear codes. The Gray map from R to R12 is defined as ξ(a + bv) = (a, a + b). One type of nontrivial automorphisms can be defined over R as follows : σ : F2 + uF2 + vF2 + uvF2 → F2 + uF2 + vF2 + uvF2 , a + bv → a + (1 + v)b, where a, b ∈ F2 + uF2 . Let S D4 = {A, T, C, G} represent the DNA alphabet. We use the same notation for the set

S D16 = {A A, AT, AC, AG, T T, T A, T C, T G, CC, C A, C T, C G, GG, G A, GT, GC},

which was originally presented in [14]. We define a ζ correspondence between the elements of the ring R and DNA double pairs presented explicitly in Table 1. The elements 0, 1, u, 1 + u of R1 are in one-to-one correspondence with the nucleotide DNA bases A, T , C, G, respectively, such that 0 → A, 1 → G, u → T and 1+u → C. Naturally we extend The Watson Crick complement to the elements of S D16 such that A A = T T, · · · , T G = AC. For any a ∈ R, we can define a as the complement of a, where ζ (a) = ζ (a). Obviously, a and a is one-to-one and a = a. For each c = (c0 , c1 , · · · , cn−1 ) ∈ R n , we define the reverse of c as cr = (cn−1 , cn−2 , · · · , c0 ), the complement of c as c = (c0 , c1 , · · · , cn−1 ) and the reversecomplement of c as cr = (cn−1 , cn−2 , · · · , c0 ). An n-tuple c is identified with the polynomial c(x) = c0 + c1 x + · · · + cn−1 x n−1 in the ring R[x]/(x n − 1). Furthermore, for any c(x) = c0 + c1 x + · · · + cn−1 x n−1 , we define the reverse of c(x) as cr (x) = cn−1 + cn−2 x + · · · + c0 x n−1 , the complement of c(x) as c(x) = c0 + c1 x + · · · + cn−1 x n−1 and the reverse-complement of c(x) as cr (x) = cn−1 + cn−2 x + · · · + c0 x n−1 .

123

482

S. Zhu, X. Chen

Table 1 ζ -table for DNA correspondence

Elements a

Gray images

DNA double pairs ζ (a) AA

0

(0, 0)

v

(0, 1)

AG

uv

(0, u)

AT

v + uv

(0, 1 + u)

AC

1

(1, 1)

GG

1+v

(1, 0)

GA

1 + uv

(1, u + 1)

GC

1 + v + uv

(1, u)

GT

u

(u, u)

TT

u+v

(u, 1 + u)

TC

u + uv

(u, 0)

TA

u + v + uv

(u, 1)

TG

1+u

(1 + u, 1 + u)

CC

1+u+v

(1 + u, u)

CT

1 + u + uv

(1 + u, 1)

CG

1 + u + v + uv

(1 + u, 0)

CA

A nonempty subset C of R n will be called an linear code over R of length n if C is an R-submodule of R n . C is said to be cyclic if (cn−1 , c0 , . . . , cn−2 ) ∈ C for all (c0 , c1 , . . . , cn−1 ) ∈ C. Let C be a linear code over R. The following results are presented in [9], let C1 = {x + y ∈ R1n |(x + y)v + x(v + 1) ∈ C , f or some x, y ∈ R1n }, C2 = {x ∈ R1n |(x + y)v + x(v + 1) ∈ C , f or some y ∈ R1n }. Note that C1 and C2 are linear codes over R1 . Consequently, C = vC1 ⊕ (1 + v)C2 . Lemma 2.1 [3] (1) Let C be a linear code over R such that C = vC1 ⊕ (1 + v)C2 . Then, C is a cyclic code if and only if C1 and C2 are both cyclic codes over R1 . (2) If C = vC1 ⊕ (1 + v)C2 is a cyclic code of length n over R, then C = (v f 1 , (1 + v) f 2 ), where f 1 and f 2 are generator polynomials of C1 and C2 , respectively. Remark 1 In this paper, we use f , f to represent f (x) and (x n −1)/ f (x), respectively, if don’t confuse. Recall that the Hamming weight of a codeword c is defined by w H (c) = |{i|ci = 0}|, i.e., the number of the nonzero entries of c. The minimum Hamming weight w H (c) of a code C is the smallest possible weight among all its nonzero codewords. The Hamming distance d(c1 , c2 ) between two codewords c1 and c2 is the Hamming weight of the codeword c1 − c2 . The minimum Hamming distance d(C ) of C is defined as min{d(c1 , c2 )|c1 , c2 ∈ C , c1 = c2 }.

123


483

For some prescribed minimum distance d, a code is called a DNA code if it satisfies some or all of the following conditions: (1) The Hamming constraint For any two different codewords c1 , c2 ∈ C , d(c1 , c2 ) ≥ d. (2) The reverse constraint For any two codewords c1 , c2 ∈ C , d(c1 , cr2 ) ≥ d. (3) The reverse-complement constraint For any two codewords c1 , c2 ∈ C , d(c1 , cr2 ) ≥ d. (4) The fixed GC content constraint For any codeword c ∈ C contains the same number of G and C elements. The purpose of the first three constraints is to avoid undesirable hybridization between different strands. The fixed GC content ensures that all codewords have similar thermodynamic characteristics, which allows parallel operations on DNA sequences. The structure of cyclic codes of arbitrary length n over R1 has been extensively studied in [1], which is Theorem 2.2 Let C be a cyclic code in R1,n = R1 [x]/(x n − 1). Then (1) If n is odd, then R1n is a principal ideal ring and C = (g, ua) = (g + ua), where g, a are binary polynomials with a | g | (x n − 1) mod 2. (2) If n is not odd, then (2.1) C = (g + up), where g | (x n − 1) mod 2 and (g + up) | (x n − 1) in R and g | p g . Or, (2.2) C = (g + up, ua), where g, a and p are binary polynomials with a | g | g and degp < dega. (x n − 1) mod 2, a | p

3 The reverse constraint and reverse-complement constraint codes In this section, we main study the reverse constraint and the reverse-complement constraint codes over R. We begin with the following definition. Definition 3.1 A cyclic code C of length n over R is said to be reversible if cr ∈ C for all c ∈ C , complement if c ∈ C for all c ∈ C and reversible-complement if cr ∈ C for all c ∈ C . For each polynomial f (x) = a0 + a1 x + · · · + as x s with as = 0, we define the reciprocal of f (x) to be the polynomial f ∗ (x) = as + as−1 x + · · · + a0 x s = x deg( f (x)) f (x −1 ). We note that deg f ∗ (x) ≤ deg f (x), and if a0 = 0, then f (x) and f ∗ (x) always have the same degree. f (x) is called self-reciprocal if and only if f (x) = f ∗ (x). There are many properties of polynomial reciprocal, we list two of them in the following which will be used later in the paper. Lemma 3.2 [1] Let f , g be any two polynomials in R with degg ≤ deg f . Then 1. ( f ·g)∗ = f ∗ ·g ∗ ; 2. ( f + g)∗ = f ∗ + x deg f −degg g ∗ .

123

484

S. Zhu, X. Chen

3.1 The reverse constraint codes In this part, we study the reverse constraint codes over the ring R. Now we give some important Lemmas below. Lemma 3.3 [8] Let C = (g, ua) = (g + ua) be a cyclic code of odd length n over R1 . Then C is reversible if and only if g and a are self-reciprocal. Lemma 3.4 [11] Let C = (g + up) be a cyclic code of even length n over R1 . Then C is reversible if and only if 1. g is self-reciprocal; 2. (a) x i p ∗ = p. Or (b) g = x i p ∗ + p, where i = degg − degp. g and Lemma 3.5 [11] Let C = (g + up, ua) with a | g | (x n − 1) mod 2, a | p deg p ≤ dega be a cyclic code of even length n over R1 . Then C is reversible if and only if 1. g and a are self-reciprocal; 2. a | (x i p ∗ + p), where i = degg − degp. We will give one of the main conclusions below. Theorem 3.6 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R. Then C is reversible if and only if C1 and C2 are reversible, respectively, where C1 and C2 are both cyclic codes over R1 . Proof On the one hand, if C1 and C2 are reversible, then for any b ∈ C , b = vb1 + (1 + v)b2 where b1 ∈ C1 and b2 ∈ C2 . We can easy know that br1 ∈ C1 and br2 ∈ C2 , thus br = vbr1 + (1 + v)br2 ∈ C . Hence C is reversible. On the other hand, if C is reversible, then for any b = vb1 + (1 + v)b2 ∈ C , where b1 ∈ C1 and b2 ∈ C2 . we have br = vbr1 +(1+v)br2 ∈ C . Let br = vbr1 +(1+v)br2 = ve1 + (1 + v)e2 , where e1 ∈ C1 and e2 ∈ C2 . Then v(br1 − e1 ) + (1 + v)(br2 − e2 ) = 0, thus br1 = e1 ∈ C1 and br2 = e2 ∈ C2 . Hence C1 and C2 are reversible, respectively. Example 3.7 Let x 8 − 1 = (x + 1)8 = g 8 over F2 . Let C1 = ( f 1 ) = (g1 + up1 ), g1 = g 6 , p1 = x 5 + x, C2 = ( f 2 ) = (g2 + up2 ), g2 = g 4 , p2 = x 3 + x. It is easy to check that g1 and g2 are self-reciprocal, x i p1∗ = p1 and x j p2∗ = p2 , where i = degg1 − degp1 , j = degg2 − degp2 . On the one hand, since C = ( f ) = (v(g1 + up1 ) + (1 + v)(g2 + up2 )), clearly we have f = vx 6 + uvx 5 + x 4 + (u + uv)x 3 + vx 2 + ux + 1 ∈ C , f r = vx + uvx 2 + x 3 + (u + uv)x 4 + vx 5 + ux 6 + x 7 . On the other hand, (vx + (1 + v)x 3 ) f = vx + uvx 2 + x 3 + (u + uv)x 4 + vx 5 + ux 6 + x 7 = f r ∈ C . By Theorem 3.6, C is a reversible code of length 8 over R.

123


485

3.2 The reverse-complement constraint codes In this section, cyclic codes of arbitrary length n satisfying the reverse-complement are examined. We give a useful lemma firstly which can be easily proved. Lemma 3.8 For any c ∈ R, we have c + c = u. Let a, b ∈ R, then a + b = a + b + u. Besides, if c ∈ F2 , then we have u + uc = uc. We will give one of our main conclusions below. Theorem 3.9 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R. Then C is reversible-complement if and only if C is reversible and (0, 0, · · · , 0) ∈ C , where C1 and C2 are both cyclic codes over R1 . Proof On the one hand, suppose C = vC1 ⊕(1+v)C2 , where C1 and C2 are both cyclic codes over R1 . For any c = (c0 , c1 , · · · , cn−1 ) ∈ C , cr = (cn−1 , cn−2 , · · · , c0 ) ∈ C because of C is reversible-complement. Since the zero codeword is in C , its WCC is also in C , i.e., (0, 0, · · · , 0) ∈ C . Whence, cr = (cn−1 , cn−2 , · · · , c0 ) = (cn−1 , cn−2 , · · · , c0 ) + (0, 0, · · · , 0) ∈ C . On the other hand, if C is reversible, then for any c = (c0 , c1 , · · · , cn−1 ) ∈ C , cr = (cn−1 , cn−2 , · · · , c0 ) ∈ C . Since (0, 0, · · · , 0) ∈ C , we get cr = (cn−1 , cn−2 , · · · , c0 ) = (cn−1 , cn−2 , · · · , c0 ) + (0, 0, · · · , 0) ∈ C . Therefore, C is reversible-complement.

Example 3.10 In Example 3.7, since C is reversible, if (0, 0, · · · , 0) ∈ C , we can get C is reversible-complement immediately. Let C be a cyclic code of arbitrary length n over R. Then we can get the conditions that C is reversible or reversible-complement easily by using Lemmas 2.1, 3.3, 3.4, 3.5 and Theorems 2.2, 3.6, 3.9.

4 DNA codes over R In this section, the design of linear DNA codes is presented. We obtain DNA codes over R of arbitrary length that correspond to DNA double pairs. We begin with the following definitions. Definition 4.1 Let C be a code over R of arbitrary length n and c ∈ C be a codeword where c = (c0 , c1 , · · · , cn−1 ), ci ∈ R, then, by Table 1, we define

123

486

S. Zhu, X. Chen 2n (c) : C → S D 4

(a0 + b0 v, a1 + b1 v, · · · , an−1 + bn−1 v) → (a0 , a1 , · · · , an−1 , a0 + b0 , a1 + b1 , · · · , an−1 + bn−1 ). For instance, (c0 , c1 , c2 , c3 ) = (1, v, u, u + v) is mapped to (1, v, u, u + v) = (GATTGGTC). Definition 4.2 Let f 1 and f 2 be polynomials with deg f 1 = t1 , deg f 2 = t2 and both dividing x n − 1 over R1 . Let m = min{n − t1 , n − t2 } and f = v f 1 + (1 + v) f 2 over R. The set L( f ) is called a σ -set and is defined as L( f ) = {E 0 , E 1 , · · · , E m−1 , F0 , F1 , · · · , Fm−1 }, where E i = x i f, Fi = x i σ (h), 0 ≤ i ≤ m − 1, h = vx t2 −t1 f 1 + (1 + v) f 2 if t2 ≥ t1 , h = v f 1 + (1 + v)x t1 −t2 f 2 otherwise. L( f ) generates a linear code C over R denoted by C = f σ . Remark 2 In this paper, the notation L( f ) or f σ denotes the R-module generated by the set L( f ). The notation ( f ) stands for the ideal generated by f . Let f = a0 + a1 x + a2 x 2 + · · · + at x t over R, σ (h) = b0 + b1 x + b2 x 2 + · · · + bs x s and the R-submodule generated by L( f ) can be considered to be generated by the rows of the following matrix ⎞ ⎛ E0 a0 ⎜ F0 ⎟ ⎜ ⎜ ⎟ ⎜ b0 ⎜ ⎟ 0 L( f ) = ⎜ E 1 ⎟ = ⎜ ⎜ F1 ⎟ ⎜ ⎝ ⎠ ⎝0 .. ··· . ⎛

a1 b1 a0 b0 ···

a2 · · · b2 · · · a1 · · · b1 · · · ··· ···

at · · · 0 · · · · · · · · · bs · · · · · · at · · · · · · · · · · · · · · · bs ··· ··· ··· ···

··· ··· ··· ··· ···

⎞ 0 0⎟ ⎟ 0⎟ ⎟. 0⎠ ···

Theorem 4.3 Let f 1 and f 2 be self-reciprocal polynomials dividing x n − 1 over R1 with degrees t1 and t2 , respectively. If f 1 = f 2 , then f = v f 1 + (1 + v) f 2 and |L( f )| = 16m . Besides, C = L( f ) is a linear code over R and (C ) is a reversible DNA code. Proof Most of the claims follow from the algebraic structures that are discussed before. Especially, the reverse of each DNA code given by C = L( f ) over R is shown to fall inside the codes by the following observation r

αi E i + βi Fi = σ (αi )Fm−1−i + σ (βi )E m−1−i , where αi , βi ∈ R and 0 ≤ i ≤ m − 1.

Below we give an example which illustrates the power of Theorem 4.3. Example 4.4 Let f 1 = x + 1 and f 2 = x 6 + x 3 + 1 where both divide x 9 − 1 over F2 . Hence f = v f 1 + (1 + v) f 2 = 1 + vx + (1 + v)x 3 + (1 + v)x 6 , σ (h) =

123


487

v +vx 3 +(1+v)x 5 + x 6 . C = L( f ) is a linear code over R and (C ) is a reversible DNA code. Now we consider the generator matrix of C . ⎞ ⎛ 1 E0 ⎜ F0 ⎟ ⎜v ⎜ ⎟ ⎜ ⎜ E 1 ⎟ ⎜0 ⎜ ⎟=⎜ ⎜ F1 ⎟ ⎜0 ⎜ ⎟ ⎜ ⎝ E 2 ⎠ ⎝0 0 F2 ⎛

v 0 1 v 0 0

0 0 v 0 1 v

1+v v 0 0 v 0

0 0 1+v v 0 0

0 1+v 0 0 1+v v

1+v 1 0 1+v 0 0

0 0 1+v 1 0 1+v

⎞ 0 0 ⎟ ⎟ 0 ⎟ ⎟. 0 ⎟ ⎟ 1 + v⎠ 1

If we take α0 = 0, α1 = 1, α2 = u, β0 = 0, β1 = 1 and β2 = v, then α0 E 0 +α1 E 1 + α2 E 2 +β0 F0 +β1 F1 +β2 F2 = (1+v)x +ux 2 +uvx 3 +x 4 +(u+v+uv)x 5 +(1+v)x 6 + vx 7 +(u +v+uv)x 8 and this corresponds to the codeword c1 = (0, 1+v, u, uv, 1, u + v + uv, 1 + v, v, u + v + uv). Hence (c1 ) = (AGTAGTGATAATTGGAGG). Furthermore, σ (α0 )F2 + σ (α1 )F1 + σ (α2 )F0 + σ (β0 )E 2 + σ (β1 )E 1 + σ (β2 )E 0 = 1 + v + uv + (1 + v)x + vx 2 + (1 + v + uv)x 3 + x 4 + (u + uv)x 5 + ux 6 + vx 7 corresponds to the codeword c2 = (1 + v + uv, 1 + v, v, 1 + v + uv, 1, u + uv, u, v, 0) and thus (c2 ) = (GGAGGTTAATAGTGATGA). Therefore, ((c1 ))r = (c2 ). Corollary 4.5 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, C1 and C2 are reversible and C = L( f ) be a linear code over R. If (0, 0, · · · , 0) ∈ C , then (C ) gives a reversible-complement DNA code. Proof It follows from Theorems 3.7, 4.3, 3.9 immediately.

Example 4.6 Let f 1 = x + 1 and f 2 = x 6 + x 5 + x 4 + x 3 + x 2 + x + 1 where both divide x 7 − 1 over F2 . Hence C = v f 1 + (1 + v) f 2 σ = 1 + x + (1 + v)x 2 + (1 + v)x 3 + (1 + v)x 4 + (1 + v)x 5 + (1 + v)x 6 σ , is a σ -linear code over R and (C ) is a reversible-complement DNA code since (0, 0, · · · , 0) ∈ C . Corollary 4.7 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, where C1 , C2 are reversible and C = L( f ) be a linear code over R and (C ) be a reversible DNA code. If (0, 0, · · · , 0) is added to generator set L( f ), then (C ) is a reversible-complement DNA code. Theorem 4.8 Let C1 is reversible and f = v f 1 + (1 + v) f 1 over R. Then C = ( f ) is a reversible cyclic code over R and (C ) is a reversible DNA code. If x − 1 f , then (C ) is a reversible-complement DNA code. Proof Let the dimension of the code C be k. Suppose that the linear cyclic codes C has generator matrix with rows f, x f, · · · , x k−1 f . If we use the σ -set L( f ), then we observe that

r

i k−1−i αi x f = σ (αi )x f , i

i

where α ∈ R and 0 ≤ i ≤ k − 1 since f = v f 1 + (1 + v) f 2 and coefficients of f are solely from R1 which proves the reversibility in DNA. Thus, we can use the

123

488

S. Zhu, X. Chen

generator matrix of a linear cyclic code or the σ -set L( f ) since σ does not effect the coefficients. If x − 1 f , then C contains 1 + x + · · · + x n−1 . Therefore, (C ) gives a reversible-complement DNA code by Corollary 4.7.

5 The GC weight As we all know, a DNA code with the same GC weight (content) in every codeword ensures that the codewords have similar thermodynamic characteristics (i.e. melting temperature and hybridization energy). In this section, we will study the GC weight over R by the Gray image. In order to study the GC weight over R, we give the following lemmas first which can be received from [12] easily.

Lemma 5.1 Let C be a cyclic code over R1 . Then there are unique polynomials g p) and g, a, p in F2 [x], s.t. C = (g + up, ua), where a | g | x n − 1, a| gcd(g, degp < dega.

Lemma 5.2 Let C be a cyclic code over R1 . If (n, 2) = 1, then C = (g, ua) = (g + ua),

(1) If a = g, we have C = (g). It is a free-module with rank of n − degg and a set of basis is {g, xg, · · · , x n−degg−1 g}; (2) If a = g, then C is not a free-module which rank is n − dega. The minimal generating sets is {g, xg, · · · , x n−degg−1 g, ua, uxa, · · · , ux degg−dega−1 a}.

Lemma 5.3 Let C be a cyclic code over R1 . If (n, 2) = 1, then

(1) If a = gcd(g, g p), we have C = (g + up). (a) If g = a, it is a free-module with rank of n − degg and a set of basis is {g + up, x(g + up), · · · , x n−degg−1 (g + up)}; (b) If dega < degg, the minimal generating sets is g+up, x(g+up), · · · , x n−degg−1 (g+up), ua, uxa, · · · , ux degg−dega−1 a ;

(2) If a = gcd(g, g p), then C is not a free-module, where its rank is n − dega. the minimal generating sets is g + up, x(g + up), · · · , x n−degg−1 (g + up), ua, uxa, · · · , ux degg−dega−1 a . Using the lemmas above and the structure of C we have already received, we can get the following Theorems immediately. Theorem 5.4 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, where C1 and C2 are both cyclic codes over R1 . Then C has a minimal generating sets = v + (1 + v) , where , are the minimal generating sets of C1 and C2 , respectively.

123


489

Now we have already had the minimal generating sets of C , so we can study its Gray image. As we all know, the minimal generating sets of C is = v + (1 + v) and the Gray map from R to R12 is defined as ξ(a + bv) = (a, a + b). Then the Gray image of the minimal generating sets of C is () = x n + , where , are the minimal generating sets of C1 and C2 , respectively. If we can prove ζ is a linear transformation, then the GC weight over R is given by the Hamming weight of u() = u(x n + ). For any x = a1 + b1 v, y = a2 + b2 v ∈ C , ξ(x + y) = (a1 + a2 ) + (a1 + a2 + b1 + b2 )v = ξ(x) + ξ( y). Theorem 5.5 Let C = vC1 ⊕ (1 + v)C2 be a cyclic code of arbitrary length n over R, C1 = (g1 + up1 , ua1 ), C2 = (g2 + up2 , ua2 ), where a1 | g1 | x n − 1, a2 | g2 | x n − 1, degp1 < dega1 and degp2 < dega2 . Then the GC weight over R is given by the Hamming weight enumerator = x n g1 , xg1 , · · · , x n−degg1 −1 g1 + g2 , xg2 , · · · , x n−degg2 −1 g2 . Proof The GC content is obtained by multiplying the Gray image of the minimal generating sets of C by u, and from Theorem 5.4 we have u() = ux n g1 , xg1 , · · · , x n−degg1 −1 g1 + u g2 , xg2 , · · · , x n−degg2 −1 g2 . Hence the GC content is given by the Hamming weight enumerator = x n g1 , xg1 , · · · , x n−degg1 −1 g1 + g2 , xg2 , · · · , x n−degg2 −1 g2 .

At the end of this section, we give some examples to illustrate the main work in this paper. Example 5.6 Let x 3 − 1 = (x + 1)(x 2 + x + 1) = g1 g2 ∈ F2 [x]. Let C1 = C2 = (g, ua) be a cyclic code of length 3 over R1 , where g = g2 , a = g2 . The image of C under the Gray map is a DNA code of length 6. This code has 16 codewords which are listed in the Table 2. Table 2 All 16 codewords of C AAAAAA

GGGGGG

TTTTTT

CCCCCC

A A AGGG

GGG A A A

T T T CCC

CCC T T T

A A AT T T

GGGCCC

T T T AAA

CCC GGG

A A ACCC

GGGT T T

T T T GGG

CCC A A A

123

123

859019673

572679782

286352998

859028411

30583

286344260

2290649224

2863307161

3149638314

2863267840

3149607731

2290679807

2576993484

2290640486

2576962901

3149612100

1717991287

2004309333

1718021870

1431695086

859032780

2863315899

3149633945

2863276578

2576949794

2290671069

2577006591

2290631748

2863289685

3149620838

1717978180

2004318071

1145372671

1431686348

286392319

286383581

56797

572701627

572688520

61166

858980388

858989090

8738

286331153

572657937

0

2004353023

1718013132

2004313702

1717982549

1145324612

2576971639

2290636117

2577002222

2863329006

3149598993

2863280947

3149629576

2576980377

2290657962

286379212

65535

859010935

572671044

859024042

572692889

34952

286339891

4369

Table 3 DNA correspondence of C = L( f ) explained in Example 5.7 13107

2004344285

1718026239

2004304964

1431655765

1145333350

2576958532

2290644855

3149660159

2863320268

3149603362

2863272209

2290614272

2576989115

2290653593

286387950

572714734

859002197

572684151

859015304

286366105

43690

286326784

572662306

2004348654

1718017501

1145359564

1431664503

1145328981

2576967270

2863294054

3149651421

2863333375

3149594624

2576945425

2290623010

2576976008

2290662331

859045887

572705996

859006566

572675413

17476

286374843

39321

286335522

572653568

2004339916

1431690717

1145368302

1431651396

1145337719

3149625207

2863285316

3149655790

2863324637

2290666700

2576954163

2290618641

2576984746

2863311530

859037149

572719103

858997828

286348629

26214

286361736

48059

858993459

572666675

1145307136

1431699455

1145363933

1431660134

1717986918

3149616469

2863298423

3149647052

2576997853

2290675438

2576941056

2290627379

3149642683

2863302792

859041518

572710365

52428

286357367

21845

286390474

572697258

858984721

490 S. Zhu, X. Chen

1145311505

1431647027

1145342088

1718000025

2004331178

4008627404

4294967295

3435951991

3722265668

3435965098

3722287513

4294936712

4008588083

4294906129

1145315874

1431638289

2004287488

1718008763

2004326809

4008636142

3722309358

3435943253

3722278775

3435956360

4008614297

4294945450

4008574976

4294914867

Table 3 continued

4294910498

4008579345

3435921408

3722296251

3435960729

3722274406

4008601190

4294958557

4008640511

2004322440

1431673241

1145350826

1431633920

1145320243

4294901760

3722252561

3435930146

3722283144

3435969467

4294932343

4008592452

4294962926

4008631773

3435973836

1431681979

1145346457

1431642658

1717969442

3722261299

3435925777

3722291882

4008618666

4294923605

4008605559

4294954188

3722304989

3435982574

1431668872

1145355195

2004300595

1717960704

3722248192

3435934515

4294949819

4008609928

4294927974

4008596821

3435938884

3722313727

3435978205

1431677610

1718004394

2004291857

1717973811

3722256930

4008583714

4294941081

4008623035

4294919236

3722270037

3435947622

3722300620

3435986943

2004335547

1717995656

2004296226

1717965073

Cyclic DNA codes over F2 + uF2 + vF2 + uvF2 ... 491

123

492

S. Zhu, X. Chen

In the following example, we obtain some optimal codes over R where f 1 = f 2 which satisfying the max Griesmer bound has been given by Shiromoto and Storme [15]. Example 5.7 Let f 1 = 1 + x 2 + x 4 + x 6 = f 2 be a self-reciprocal polynomial where f 1 | x 8 −1 over R. C = L( f ) is a cyclic linear code over R that attains the maximum Griesmer bound on R with parameters [2,4,8]. Also (C ) is a reversible DNA code which is not complement because (x + 1) | f 1 . We assign the DNA bases A, T , G, C to 0, 1, 2 and 3, respectively. A DNA string is converted to quaternary number system and then to the decimal system to save some space in Table 3. For instance, 859024042 represents ACACACACGGGGGGGG.

6 Conclusion Algebraic structure of codes have already acquired over the non-chain ring R with 16 elements. The DNA codes over R are studied which are obtained by using a special auto-morphism and properties of cyclic codes. We introduced these codes correspond to reversible and reversible-complement DNA codes with DNA double pairs by means of a special DNA corresponding table. Finally, the GC weight over R is studied by using the Gray image. Some examples are given to demonstrate our study is significative. Acknowledgements The authors would like to thank the referees for their helpful comments and a very meticulous reading of this manuscript.

References 1. Abualrub, T., Siap, I.: Cyclic codes over the rings Z2 + uZ2 and Z2 + uZ2 + u 2 Z2 . Des. Codes Cryptogr. 42, 273–287 (2007) 2. Adleman, L.: Molecular computation of solutions to combinatorial problem. Science 266, 1021–1024 (1994) 3. Bayram, A., Oztas, E.S., Siap, I.: Codes over F4 + vF4 and some DNA applications. Des. Codes Cryptogr. (2015). doi:10.1007/s10623-015-0100-8 4. Bayram, A., Siap, I.: Cyclic and constacyclic codes over a non-chain ring. J. Algebra Comb. Discret. Struct. Appl. 1, 1–13 (2014) 5. Bonnecaze, A., Udaya, P.: Cyclic codes and self-dual codes over F2 + uF2 . IEEE Trans. Inf. Theory. 45(4), 1250–1255 (1999) 6. Bonnecaze, A., Udaya, P.: Decoding of cyclic codes over F2 + uF2 . IEEE Trans. Inf. Theory 45(6), 2148–2156 (1999) 7. Dinh, H., López-Permouth, S.R.: Cyclic and negacyclic codes over finite chain rings. IEEE Trans. Inf. Theory 50(8), 1728–1744 (2000) 8. Guenda, K., Gulliver, T.A.: Construction of cyclic codes over F2 + uF2 for DNA computing. AAECC 24(6), 445–459 (2013) 9. Gursoy, F., Siap, I., Yildiz, B.: Construction of skew cyclic codes over Fq +vFq . Adv. Math. Commun. 8, 313–322 (2014) 10. Hammons, A.R., Kumar, P.V., Calderbank, A.R., Sloane, N.J.A., Solé, P.: The Z4 -linearity of Kerdock, Preparata, Goethals, and related codes. IEEE Trans. Inf. Theory. 40, 301–319 (1994) 11. Liang, J., Wang, L.Q.: Cyclic DNA codes over F2 + uF2 . J. Appl. Math. Comput. 51(1), 81–91 (2016) 12. Li, P., Zhu, S.X.: Cyclic codes of arbitrary lengths over Fq + uFq . J. Univ. Sci. Technol. China 38(12), 1392–1396 (2008) 13. Massey, J.L.: Reversible codes. Inf. Control 7, 369–380 (1964)

123


493

14. Oztas, E.S., Siap, I.: Lifted polynomials over F16 and their applications to DNA codes. Filomat 27, 459–466 (2013) 15. Shiromoto, K., Storme, L.: A Griesmer bound for linear codes over finite quasi-Frobenius rings. Discret. Appl. Math. 128, 263–274 (2003) 16. Siap, I., Abualrub, T., Ghrayeb, A.: Cyclic DNA codes over the ring F2 [u]/(u 2 − 1) based on the deletion distance. J. Frankl. Inst. 346, 731–740 (2009) 17. Srinivasulu, B., Bhaintwal, M.: On linear codes over a non-chain extension of F2 + uF2 . In: Computer, Communication, Control and Information Technology (C3IT), 2015 Third International Conference on. IEEE, pp. 1–5 (2015) 18. Yildiz, B., Karadeniz, S.: Linear codes over F2 + uF2 + vF2 + uvF2 . Des. Codes Cryptogr. 54, 61–81 (2010) 19. Yildiz, B., Siap, I.: Cyclic codes over F2 [u]/(u 4 − 1) and applications to DNA codes. Comput. Math. Appl. 63, 1169–1176 (2012) 20. Zhu, S.X., Wang, L.Q.: A class of constacyclic codes over F p + vF p and its Gray image. Discret. Math. Theory 311(23), 2677–2682 (2011) 21. Zhu, S.X., Wang, Y., Shi, M.J.: Some result on cyclic codes over F2 + vF2 . IEEE Trans. Inf. Theory 56, 1680–1684 (2010)

123