Correlation between Sequence Conservation of 5'UTR and Codon ...

1 downloads 0 Views 67KB Size Report
1,4 [email protected] [email protected] [email protected]. Akira Shinagawa. 5. Masayoshi Itoh. 5. Kazuhiro Shibata. 5 akira@gsc.riken.go.jp.
Genome Informatics 11: 313–314 (2000)

313

Correlation between Sequence Conservation of 5’ UTR and Codon Usage Bias Hiroaki Sakai1,2

Yusuke Ohkuma1,3

Chiaki Imamura1,4

[email protected]

[email protected]

[email protected]

Akira

Shinagawa5

[email protected]

Piero

Carninci5

[email protected]

Yoshifumi

Fukunishi5

[email protected] 1 2 3 4 5

Masayoshi

Itoh5

[email protected]

Hideaki

Konno5

[email protected]

Yoshihide

Hayashizaki5

[email protected]

Kazuhiro Shibata5 [email protected]

Jun Kawai5 [email protected]

Masaru Tomita1,3 [email protected]

Laboratory for Bioinformatics Graduate School of Media and Governance Department of Environmental Information Department of Policy Management Keio University, 5322 Endo, Fujisawa, 252-8500, Japan Genome Exploration Research Group, Genomic Sciences Center (GSC) RIKEN (The Institute of Physical and Chemical Research) 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan

Keywords: codon usage bias, Kozak’s consensus sequence, translation efficiency

1

Introduction

Codons encoding the same amino acid are used often with biased frequency, and such codon usage bias is generally believed to be due to the selection for efficient translation [1]. We have previously shown that there is a correlation between the synonymous codon usage bias and the extent of the conservation of Shine-Dargalno sequence (SD sequence) in prokaryote genomes [3]. In this study, we conducted a similar analysis for Mus musculus, using the cDNA library data recently generated by RIKEN. In eukaryotes, there is no SD-like motif sequence, but the sequence pattern around the start codon, GCC(A/G)CCatgG, is well known as Kozak’s consensus sequence in vertebrates [2]. In order to analyze the correlation between synonymous codon usage bias and the extent of conservation in 5’ untranslated region (5’UTR), we first systematically computed the ‘codon adaptation index (CAI)’ values of all genes in the Mus musculus cDNA library data, and then analyzed their correlation with the 5’UTR conservation.

2

Methods

The CAI is an index to evaluate synonymous codon usage bias according to the codon usage of highly expressed genes. We followed the method of Sharp [5]. To evaluate the extent of 5’UTR conservation, we computed the ‘relative entropy’ values in each nucleotide position of Kozak’s consensus sequence. ’Relative entropy’ is one of the indices to understand the sequence patterns in the nucleotide positions. We followed the method of Schneider [4].

3

Results

Figure 1 shows the correlation between the synonymous codon usage bias and relative entropy of three nucleotide positions (−3, −1 and 4 relative to the start codon). At the nucleotide position −1 and

314

Sakai et al.

−3, the significant correlation is observed, indicating that genes with higher CAI values have highly conserved nucleotides than genes with lower CAI values. We also analyzed the base compositions at each position (Figure 2), and found that the frequencies of specific nucleotides are higher for the genes with higher CAI values: purine nucleotides at position −3, C residue at position −1, and G residue at position 4, which is consistent with the base composition of Kozak’s consensus sequence. These results indicate that there exists a clear correlation between synonymous codon usage bias (CAI values) and Kozak consensus conservation (relative entropy values) in the M. musculus genome, presumably because highly expressed genes need to adopt both efficient codon usage and efficient translation initiation. Position -3 bp

Relative entropy x 10-3

Position -1 bp

Relative entropy x 10-3 450.00

450.00

400.00

400.00

400.00

350.00

350.00

350.00

300.00

300.00

300.00

250.00

250.00

250.00

200.00

200.00

200.00

150.00

150.00

150.00

100.00

100.00

100.00

50.00

50.00

50.00

0.00

0.00

0.00

ranking of genes 0

10

20

Position 4 bp

Relative entropy x 10-3

450.00

ranking of genes

30

0

10

20

ranking of genes

30

0

10

20

30

Figure 1: Relative entropy of three nucleotide positions (−3, −1 and 4 relative to the start codon). The X axis represents the ranking of CAI values of genes. Position -3 bp

frequency x 10-3

adenine thymine guanine cytosine

650.00 600.00

Position -1 bp

frequency x 10-3

adenine thymine guanine cytosine

650.00 600.00 550.00

550.00

500.00

500.00

500.00

450.00

450.00

450.00

400.00

400.00

400.00

350.00

350.00

350.00

300.00

300.00

300.00

250.00

250.00

250.00

200.00

200.00

200.00

150.00

150.00

150.00

100.00

100.00

100.00

50.00

50.00

50.00

0.00

0.00

ranking of gene 0

10

20

30

adenine thymine guanine cytosine

600.00

550.00

0.00

Position 4 bp

frequency x 10-3 650.00

ranking of gene 0

10

20

30

ranking of gene 0

10

20

30

Figure 2: Frequencies of four bases at the three nucleotide positions (−3, −1 and 4). The X axis represents the ranking of CAI values of genes.

References [1] Gouy, M. and Gautier, C., Codon usage in bacteria: correlation with gene expressivity, Nucleic Acids Res., 10:7055–7074, 1982. [2] Kozak, M., An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNA, Nucleic Acids Res., 15:8125–8148. [3] Sakai, H. et al., Correlation between Shine-Dalgarno sequence conservation and codon usage of bacterial genes, J. Mol. Evol., in press. [4] Schneider, T. et al., Information content of binding sites on nucleotide sequences, J. Mol. Biol., 188:415–431. [5] Sharp, P.M. and Li W.H., The codon adaptation index –a measure of directional synonymous codon usage bias and its potential applications, Nucleic Acids Res., 15:1281–1295, 1987.

Suggest Documents