Challenges and obstacles related to solving the codon bias riddles

2 downloads 0 Views 161KB Size Report
codon bias riddles. Tamir Tuller*†1. *Department of Biomedical Engineering, Tel Aviv University, Tel Aviv 69978, Israel. †The Sagol School of Neuroscience, Tel ...
Translation UK 2013

Challenges and obstacles related to solving the codon bias riddles Tamir Tuller*†1 *Department of Biomedical Engineering, Tel Aviv University, Tel Aviv 69978, Israel †The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel

Abstract Dozens of papers have been written about the relationship between codon bias, transcript features and gene translation. Even though answering these questions may sound straightforward, apparently many of these studies seem to contradict each other. In the present article, I provide four major non-mutually exclusive explanations related to this issue: (i) there are dozens of related relevant variables with unknown causal relationships; (ii) various biases in the relevant experimental data; (iii) drawing conclusions from specific examples; and (iv) challenges in experimentally modifying one biological variable without affecting the system via multiple biological feedback mechanisms. Specifically, some of the contradictions can be settled when considering these four points and/or via a multidisciplinary approach. The discussion reported in the present article is also relevant to many other biological/medical questions/fields.

Introduction Codon bias is defined as the non-uniform distribution of codons and codon groups in different genes and organisms. Dozens of studies in recent years have been aimed at understanding the evolution and functionality of codon bias. Although earlier studies mainly suggested that codon bias affects translation elongation rates via adaption to the organism’s tRNA pool (for examples, see [1,2]), codon frequencies usually correlate with tRNA copy numbers and thus genes with more frequent codons are assumed to be translated more efficiently; hence highly expressed genes are under selection for codons that are more adapted to the tRNA pool and are usually more frequent. However, more recent studies have demonstrated that the relationship between codon bias and the organism’s fitness is much more complicated than thought previously [3–5]. Specifically, the redundancy of the genetic code enables evolution to shape the codon distributions in various parts of the ORF to optimize it to the cellular machinery processing it for its expression. Thus it was suggested, among other hypotheses, that there is selection for codons less adapted to the tRNA pool at the beginning of the ORF to improve ribosomal allocation [6], that weak mRNA folding at the beginning of the ORF followed by a stronger folding are selected for enhanced ribosomal initiation [7–10] and that there is selection for strong folding of highly expressed genes possibly to prevent aggregation of mRNA molecules [11]. Furthermore, additional aspects of translation initiation are believed to be encoded in the UTRs or in regions that overlap both the ORFs and the UTRs (for examples, see [12–14]). Recently, it was demonstrated that non-optimality of codons

Key words: biological model, codon bias, gene expression, gene translation, systems biology. Abbreviations: tAI, tRNA adaptation index. 1 email [email protected]

Biochem. Soc. Trans. (2014) 42, 155–159; doi:10.1042/BST20130095

is a mechanism to achieve circadian rhythm in some genes [15,16]. Surprisingly, there remain basic open questions related to codon bias and translation elongation, with contradictory conclusions in different studies. In the present article I mention two central examples of such questions, The first one is related to the relationship between codon bias, elongation speed and the tRNA pool; whereas some studies have suggested that codons recognized by more abundant tRNA molecules are translated more efficiently (for examples, see [17,18]), others have suggested that elongation speed and/or ribosomal density is not significantly affected by the tRNA pool (for examples, see [19,20]). The second is related to the rate-limiting step of translation; some studies have suggested that it is only the initiation step, which is encoded in the 5 -end of the transcript (for examples, see [10,21]), whereas others have suggested that it may also be the elongation step, which is affected by codon bias (for examples, see [8,22]). I decided to write the present short article following reiterated questions from my colleagues regarding why different papers in the field tend to contradict each other. Thus the aim of the present article is not to support or reject one of the studies mentioned above, but only to supply explanations for the (sometimes seemingly) contradicting conclusions, emphasize the challenges in the field, suggest some solutions or general approaches, and promote openmindedness and thought flexibility in the field.

Four major explanations of the contradictions in transcript features: translation studies In the present article I depict four major possible explanations for the contradicting conclusions in the field. The examples  C The

C 2014 Biochemical Society Authors Journal compilation 

155

156

Biochemical Society Transactions (2014) Volume 42, part 1

Figure 1 Illustrations of the four explanations for the contradictions and inaccurate conclusions in studies of the relationship between transcript features and gene translation (A) A hypothetical example. If positively charged amino acids slow down translation elongation and tend to be encoded by codons that are recognized by rare tRNA molecules, there should be correlation between adaptation to the tRNA pool and elongation speed even though the relationship is not casual. (B) Typical output of an RNA-seq experiment for a single transcript. The Figure includes the number of reads mapped to each position of the transcript. As can be seen, although ‘mathematically’ (if there has been no noise/bias), the distribution of reads should be uniform, in practice it is far from that; there are positions with many reads and positions with no reads. The non-uniform distribution is due to various biochemical aspects of the experiment and features of the transcripts. (C) A hypothetical example where drawing conclusions from a specific example can be misleading. Upper graph: the codons of a GFP protein with weak mRNA structure (efficient translation initiation) at the beginning of the ORF are randomized and demonstrate a significant relationship between adaptation to the tRNA pool (tAI; [2]) and protein levels. Lower graph: the codons of a GFP protein with strong mRNA structure (non-efficient translation initiation) at the beginning of the ORF are randomized; in this case, the folding at the beginning is the rate-limiting step, and altering the codons afterwards does not affect protein levels. (D) Illustration of the difficulty in inferring causal relationships via biological system modifications. Although in both systems variable a has positive effect on variable c, deletion of variable a in the case of the system on the left-hand side will eventually increase the value of c via the fact that a down-regulates b that in-turn up-regulates c; in the second system on the right-hand side without variable b, deletion of variable a will eventually decrease the value of c as expected.

used are deliberately oversimplified aiming at elucidating and demonstrating the ideas.

Dozens of related relevant variables with unknown causal relationship (Figure 1A) Cellular variables and sequence features related to translation efficiency include: cellular abundance of tRNAs, amino acids and aminoacyl-tRNA synthetase, mRNA levels, the number of available ribosomes, initiation and elongation factors, the folding of the mRNA in different parts of the transcript, the codon usage in different parts of the transcript, and the amino acid charge of the translated protein. Naturally, the values of many of these variables are correlative and co-evolve. Thus many correlations reported in a study may be related  C The

C 2014 Biochemical Society Authors Journal compilation 

to non-direct/non-causal relationships. For example, it was suggested that positively charged amino acids slow down translation elongation [7,23]; as a result, highly expressed genes tend to have less positively charged amino acids (a hypothetical relationship). Additionally, highly expressed genes may also have more codons that are recognized by more abundant tRNA molecules [8,24]. The latter could be (again, a hypothetical relationship) due to the fact that positive amino acids are usually encoded by codons that are recognized by rare tRNA molecules. Thus one may deduce from an observed correlation between the adaptation of genes to the tRNA pool and their expression levels that the cause of the high expression of these genes is that their codons are adapted to the tRNA pool; however, it may be related to the fact

Translation UK 2013

that there are fewer positively charged amino acids in these genes.

Possible solutions Often it is possible to statistically control for alternative explanations, for example, by performing conventional statistical analyses such as partial correlations. Other approaches include tailoring the statistical analyses to the data and research question; for example, by bootstrapping the genomic data and generating randomized versions of the genome, which maintain the genomic properties that we want to control (e.g. general codon bias, protein content or general GC content), and by showing that the observed phenomenon is significantly stronger in the real genome than in the randomized ones [e.g. for the example above, stronger correlation between the tAI (tRNA adaptation index) and expression levels in the real genome than in randomized genomes with the same proteins, global codon bias and GC content] [7,8,11]. An additional approach is by designing and performing synthetic biology-based experiments (e.g. express synthetic genes without positively charged amino acids and with high/low adaptation to the tRNA pool).

Various biases in the relevant experimental data (Figure 1B) Many studies in the field aim at associating the experimental measurements of gene expression with various features of the transcript. However, these experimental approaches, although tremendously conducive and constructive to life science research, usually include various sources of noise and bias. These biases may be related directly or indirectly to the transcript features. For example, high variability disagreements between technical replicates generated by RNA-seq have been reported [25]. For this and other reasons, it was suggested that the most widely used approach for studying ribosomal densities and movements may also be biased [18]. Thus, in some cases, correlations, or the fact that there is no correlation, between gene expression measurements and a transcript feature may be due to such a bias and not to the biophysical aspects of translation. Specifically, unusual biases may cancel significant correlations between variables and/or new/artifical correlations.

Possible solutions One solution is to analyse and compare the output of more than one experimental protocol; for example, in the case of mRNA level estimation, use both DNA-chip and RNA-seq data. Another approach is to develop statistical methodologies to control for such biases; for example, filter sets of genes and/or parts of genes with features that are believed to be erroneous or biased, or filter noisy/biased aspects of the measurements. Finally, it is possible to decrease the probability of erroneous conclusions by employing tools from more than one discipline for studying a research question. For example, it is clear that both computational prediction of mRNA folding [26] and experimental measurement of mRNA folding [27] include

various biases and inaccuracies; however, the fact that the same conclusion related to both approaches can be derived (for an example, see [11]), strongly indicates that the hypothesis and reported result are not due to bias.

Drawing conclusions from specific examples (Figure 1C) As mentioned above, a model of gene translation should include at least dozens of variables which may induce many ‘biophysical regimens’. Thus the relationships between a pair of variables are affected by the state of all other variables. For example, it was suggested that both adaptation of codons to the tRNA pool and folding of the mRNA (specifically at the 5 -end of the ORF) may affect the translation rate [8–10,22]. Thus performing heterologous gene expression experiments to study the relationship between codon bias and protein levels (for examples, see [10,28]) in gene variants with very strong mRNA folding of the 5 -end may ‘blur’ the possible positive relationship between the adaptation of codons to the tRNA pool and protein levels; on the other hand, studying this relationship in a library with weak ORF 5 -end-folding energy may suggest a much higher correlation between adaptation of codons to the tRNA pool and protein levels. The output of these two hypothetical experiments may seem contradictory only if the relationship between adaptation of codons to the tRNA pool and protein levels is studied without considering mRNA folding.

Possible solutions First, conclusions derived from specific experiments should be phrased in an accurate and cautious manner (preferably also in the abstract), which includes the exact conditions of the experiment, the features of the transcript and the organism analysed, while also mentioning that this result may alter under different conditions. Secondly, as mentioned above, employing tools from more than one discipline for studying a research question can be helpful; for example, in the case of the example above, a biophysical/mathematical model that links mRNA folding, adaptation to the tRNA pool and protein levels should teach us that a lack of correlation between adaptation to the tRNA pool and protein levels may be due to strong mRNA folding. Finally, when possible, the experiment should be designed while making sure that the studied phenomenon is not masked; for example, in the case of the above example, study heterologous genes with weak mRNA folding.

Challenges in experimentally modifying one biological variable without affecting the system via multiple biological feedback mechanisms (Figure 1D) When analysing simple systems with a small number of known variables and no feedback, it is usually much easier to isolate the effect of one variable on another; however, when analysing complex biological systems, this usually includes many intricate feedback mechanisms between the cellular components, and is much more challenging. For  C The

C 2014 Biochemical Society Authors Journal compilation 

157

158

Biochemical Society Transactions (2014) Volume 42, part 1

example, suppose someone attempts to study the effect of tRNA concentrations upon the translation of a certain gene via deletion of a tRNA gene(s); since tRNA molecules are very central cellular components related to translation, such an experiment may eventually significantly change the expression levels of many coding and non-coding genes in unexpected ways such that it will not be possible to estimate the direct contribution of the tRNA gene(s) deletion to the analysed gene.

Possible solutions This is not a trivial challenge to deal with since many intracellular regulatory aspects and feedbacks are currently unknown. One possible solution includes ‘experiments’ on in silico models. For example, a computational model that includes various aspects of the translation process [7,18,29,30] can be used to simulate translation and specifically perform the experiments mentioned above in silico, eliminating undesired feedback concerns; a different approach can include in vitro experimental analysis with all important components of the translation machinery (for an example, see [31]); and in some such systems, it is possible to introduce changes in various cellular components without undesired feedback. Finally, there are experimental biological approaches that are expected to cause less perturbation on the biological system; when possible, such approaches should be used. For example, for understanding the effect of a tRNA on translation elongation speed, one can knock out the tRNA gene(s) or try to evaluate it indirectly by estimating the correlation between tRNA levels and ribosomal densities on the codons they recognize on the basis of a ribosomal profiling approach [18–20,32]; the second approach is less direct, but (in my opinion) is expected to introduce fewer perturbations to the biological system.

Discussion In summary, in the present article I have reviewed some major reasons that make the study of the relationship between codon bias, transcript features and gene translation challenging and less trivial than it may initially seem. Another aim of the present article is to explain some of the seemingly contradicting conclusions among different studies in the field. I believe that an open-minded multidisciplinary approach may solve at least some of the issues I reviewed in the present article. The ideas presented are clearly also relevant to many other biological research questions related to gene expression study, or more generally to understanding the biophysics and evolution of complex biological systems. Finally, I recommend further reading of previous relevant opinions generally related to the topic of this article [33–35].

Acknowledgements I thank Ms Hadas Zur and Mr Alon Diament for helpful comments.

 C The

C 2014 Biochemical Society Authors Journal compilation 

Funding This research is partially supported by an Minerva ARCHES award.

References 1 Ikemura, T. (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34 2 dos Reis, M., Savva, R. and Wernisch, L. (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 32, 5036–5044 3 Chamary, J.V., Parmley, J.L. and Hurst, L.D. (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 7, 98–108 4 Plotkin, J.B. and Kudla, G. (2010) Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42 5 Sauna, Z.E. and Kimchi-Sarfaty, C. (2011) Understanding the contribution of synonymous mutations to human disease. Nat. Rev. Genet. 12, 683–691 6 Tuller, T., Carmi, A., Vestsigian, K., Navon, S., Dorfan, Y., Zaborske, J., Pan, T., Dahan, O., Furman, I. and Pilpel, Y. (2010) An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141, 344–354 7 Tuller, T., Veksler-Lublinsky, I., Gazit, N., Kupiec, M., Ruppin, E. and Ziv-Ukelson, M. (2011) Composite effects of gene determinants on the translation speed and density of ribosomes. Genome Biol. 12, R110 8 Tuller, T., Waldman, Y.Y., Kupiec, M. and Ruppin, E. (2010) Translation efficiency is determined by both codon bias and folding energy. Proc. Natl. Acad. Sci. U.S.A. 107, 3645–3650 9 Gu, W., Zhou, T. and Wilke, C.O. (2010) A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol. 6, 1–8 10 Kudla, G., Murray, A.W., Tollervey, D. and Plotkin, J.B. (2009) Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 11 Zur, H. and Tuller, T. (2012) Strong association between mRNA folding strength and protein abundance in S. cerevisiae. EMBO Rep. 13, 272–277 12 Zur, H. and Tuller, T. (2013) New universal rules of eukaryotic translation initiation fidelity. PLoS Comput. Biol. 9, e1003136 13 Nakagawa, S., Niimura, Y., Gojobori, T., Tanaka, H. and Miura, K. (2008) Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res. 36, 861–871 14 Kozak, M. (1984) Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo. Nature 308, 241–246 15 Xu, Y., Ma, P., Shah, P., Rokas, A., Liu, Y. and Johnson, C.H. (2013) Non-optimal codon usage is a mechanism to achieve circadian clock conditionality. Nature 495, 116–120 16 Zhou, M., Guo, J., Cha, J., Chae, M., Chen, S., Barral, J.M., Sachs, M.S. and Liu, Y. (2013) Non-optimal codon usage affects expression, structure and function of clock protein FRQ. Nature 495, 111–115 17 Gustafsson, C., Govindarajan, S. and Minshull, J. (2004) Codon bias and heterologous protein expression. Trends Biotechnol. 22, 346–353 18 Dana, A. and Tuller, T. (2012) Determinants of translation elongation speed and ribosomal profiling biases in mouse embryonic stem cells. PLoS Comput. Biol. 8, e1002755 19 Ingolia, N.T., Lareau, L.F. and Weissman, J.S. (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 20 Qian, W., Yang, J.R., Pearson, N.M., Maclean, C. and Zhang, J. (2012) Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 8, e1002603 21 Jacques, N. and Dreyfus, M. (1990) Translation initiation in Escherichia coli: old and new questions. Mol. Microbiol. 4, 1063–1067 22 Supek, F. and Smuc, T. (2010) On relevance of codon usage to expression of synthetic and natural genes in Escherichia coli. Genetics 185, 1129–1134 23 Lu, J. and Deutsch, C. (2008) Electrostatics in the ribosomal tunnel modulate chain elongation rates. J. Mol. Biol. 384, 73–86 24 Tuller, T., Kupiec, M. and Ruppin, E. (2007) Determinants of protein abundance and translation efficiency in S. cerevisiae. PLoS Comput. Biol. 3, 2510–2519 25 McIntyre, L.M., Lopiano, K.K., Morse, A.M., Amin, V., Oberg, A.L., Young, L.J. and Nuzhdin, S.V. (2011) RNA-seq: technical variability and sampling. BMC Genomics 12, 293

Translation UK 2013

26 Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415 27 Kertesz, M., Wan, Y., Mazor, E., Rinn, J.L., Nutter, R.C., Chang, H.Y. and Segal, E. (2010) Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 28 Welch, M., Govindarajan, S., Ness, J.E., Villalobos, A., Gurney, A., Minshull, J. and Gustafsson, C. (2009) Design parameters to control synthetic gene expression in Escherichia coli. PLoS ONE 4, 1–10 29 Zur, H. and Tuller, T. (2012) RFMapp: ribosome flow model application. Bioinformatics 28, 1663–1664 30 Reuveni, S., Meilijson, I., Kupiec, M., Ruppin, E. and Tuller, T. (2011) Genome-scale analysis of translation elongation with a ribosome flow model. PLoS Comput. Biol. 7, e1002127 31 Uemura, S., Aitken, C.E., Korlach, J., Flusberg, B.A., Turner, S.W. and Puglisi, J.D. (2010) Real-time tRNA transit on single translating ribosomes at codon resolution. Nature 464, 1012–1017

32 Li, G.W., Oh, E. and Weissman, J.S. (2012) The anti-Shine–Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541 33 Lazebnik, Y. (2002) Can a biologist fix a radio? Or, what I learned while studying apoptosis. Cancer Cell 2, 179–182 34 Rosenfeld, S. (2011) Mathematical descriptions of biochemical networks: stability, stochasticity, evolution. Prog. Biophys. Mol. Biol. 106, 400–409 35 Calvert, J. and Fujimura, J.H. (2011) Calculating life? Duelling discourses in interdisciplinary systems biology. Stud. Hist. Philos. Biol. Biomed. Sci. 42, 155–163

Received 12 June 2013 doi:10.1042/BST20130095

 C The

C 2014 Biochemical Society Authors Journal compilation 

159

Suggest Documents