Recurrent Adaptation in RNA Interference Genes Across the Drosophila Phylogeny Bryan Kolaczkowski,*,1 Daniel N. Hupalo,1 and Andrew D. Kern1 1
Abstract RNA interference (RNAi) is quickly emerging as a vital component of genome organization, gene regulation, and immunity in Drosophila and other species. Previous studies have suggested that, as a whole, genes involved in RNAi are under intense positive selection in Drosophila melanogaster. Here, we characterize the extent and patterns of adaptive evolution in 23 known Drosophila RNAi genes, both within D. melanogaster and across the Drosophila phylogeny. We find strong evidence for recurrent protein-coding adaptation at a large number of RNAi genes, particularly those involved in antiviral immunity and defense against transposable elements. We identify specific functional domains involved in direct protein–RNA interactions as particular hotspots of recurrent adaptation in multiple RNAi genes, suggesting that targeted coadaptive arms races may be a general feature of RNAi evolution. Our observations suggest a predictive model of how selective pressures generated by evolutionary arms race scenarios may affect multiple genes across protein interaction networks and other biochemical pathways. Key words: Drosophlia, RNA interference, RNAi, adaptation, positive selection.
Introduction RNA interference (RNAi) is a class of biochemical processes that use short RNA molecules to recognize complementary nucleotide sequences and regulate their activities in the cell. Discovered only ∼2 decades ago (Fire et al. 1998; Matranga and Zamore 2007), RNAi is now known to be vitally important in many eukaryotes, including plants and animals. In Drosophila, RNAi has been shown to play critical roles in embryo development (Deshpande et al. 2005), heterochromatin formation (Fagegaltier et al. 2009), posttranscriptional gene regulation (Lai 2002; Siomi H and Siomi MC 2009), control of transposable elements (TEs) (Obbard et al. 2009; Lu and Clark 2010), and antiviral immunity (van Rij et al. 2006; Flynt et al. 2009; Obbard et al. 2009; Saleh et al. 2009). Although the molecular details remain unclear, a broad picture of how RNAi works is beginning to emerge (see Obbard et al. 2009). RNAi begins with the processing of specific double-stranded RNAs (dsRNAs)—typically by some form of RNaseIII—into short (∼20–30 nt) fragments. The dsRNA fragment is converted to a single-stranded form and loaded onto an Argonaute protein, a component of a multiprotein complex called the RNA-induced silencing complex (RISC). Single-stranded RNA complementary to the Argonaute-associated RNA fragment is targeted by the RISC for downstream processing, which can include complete degradation. Although there is some evidence that specific genes are involved in processing different types of RNAs (i.e., RNAs involved in transcriptional regulation vs. control of TEs vs. antiviral immunity), recent observations suggest considerable functional overlap among these pathways (Czech et al. 2008; Zhou et al. 2008). Many RNAi genes exhibit strong evidence for adaptive evolution in Drosophila melanogaster. A recent screen for
adaptive protein-coding changes in immune system genes found that, as a whole, RNAi genes have the highest rate of adaptive protein-coding changes among any class of immune system genes (Obbard et al. 2009). These results suggest that RNAi may be a “hotspot” of adaptive evolution in D. melanogaster, although not all RNAi genes show evidence of adaptation (Obbard et al. 2009). So far, studies of adaptive evolution in Drosophila RNAi genes have been largely confined to D. melanogaster and comparisons with its sibling species, D. simulans. As such, we know very little about how adaptive processes may have shaped RNAi genes in other Drosophila species. Here, we utilize the rich comparative genomic resources available for 12 Drosophila species (Drosophila 12 Genomes Consortium et al. 2007) as well as recently released population-genomic data from D. melanogaster (DPGP.org) to examine the extent and patterns of adaptive protein-coding evolution in RNAi genes across the Drosophila phylogeny. Our analysis reveals subtle differences in when and where adaptation occurs for different types of RNAi genes and suggests that RNAi genes involved in the processing of viral RNA and TEs are specific hotspots of recurrent adaptation.
Methods Recent Selective Sweeps in D. melanogaster We identified recent selective sweeps in D. melanogaster using whole-genome sequence data from the Drosophila Population Genomics Project (DPGP.org) and the SweepFinder software of Nielsen et al. (2005). For each RNAi gene, we excised ±50 kb of sequence data from the DPGP Q30 assembly of African and North American melanogaster genomes. We polarized individual bases into ancestral/derived classes using maximum likelihood (ML)
© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail:
[email protected]
Mol. Biol. Evol. 28(2):1033–1042. 2011
doi:10.1093/molbev/msq284
Advance Access publication October 22, 2010
1033
Research article
Department of Biological Sciences, Dartmouth College *Corresponding author: E-mail:
[email protected]. Associate editor: Matthew Hahn
MBE
Kolaczkowski et al. · doi:10.1093/molbev/msq284
ancestral sequence reconstruction, provided by PAML v4.3 (Yang 2007). We reconstructed the ancestral melanogaster genome sequence using the 15-way MULTIZ alignment available from the UCSC genome browser (Blanchette et al. 2004; Hinrichs et al. 2006), assuming the reference Drosophila phylogeny (Drosophila 12 Genomes Consortium et al. 2007; Tribolium Genome Sequencing Consortium et al. 2008), the Hasegawa–Kishino–Yano nucleotide substitution model (Hasegawa et al. 1985), and gamma-distributed among-site rate variation (Yang 1996). Per-base confidence scores for the ML ancestral reconstruction were estimated using the empirical Bayesian approach described by Yang et al. (1995). The posterior probability of ancestral base bi , given observed data xj at alignment position j is P (x j |b i )P (b i ) , P (bi |xj ) = 4 k =1 P (xj |bk )P (bk ) where P (xj |bi ) is the probability of observing data xj given base bi in the ancestral sequence and P (bi ) is the frequency of base bi in the data set. Positions with posterior probability < 0.9 were excluded as potentially unreliable. We also excluded any positions for which more than two of D. simulans, D. sechellia, D. yakuba, and D. erecta had missing data or for which any two of these species had different aligned bases. Positions with 1. The null hypothesis constrains ω2 = 1. For each branch on the phylogeny, we tested the hypothesis of adaptive protein-coding substitutions (ω2 > 1) versus the null hypothesis of neutral evolution (ω2 = 1) using a likelihood ratio test. P values were calculated from the χ21 distribution as suggested by Zhang et al. (2005). We corrected for testing multiple lineages using the Bonferroni correction (Anisimova and Yang 2007) as well as controlling the false-discovery rate using the method of Storey (2002), which gave equivalent results. Individual codons were classified as adaptive/ nonadaptive in each lineage using Bayes empirical Bayes (BEB) posterior probabilities calculated by PAML (Yang et al. 2005; Zhang et al. 2005). Briefly, BEB posterior probabilities are calculated for each codon (xi ) by comparing the probability that the codon evolved with ω2 > 1 in a given lineage to the sum of the probabilities of that codon evolving under 0 < ω0 < 1, ω1 = 1 or ω2 > 1, each weighted by the estimated proportion of sites evolving under that ω category: P (ω2 |xi ) =
P (xi |ω2 )p2 . P (xi |ω0 )p0 + P (xi |ω1 )p1 + P (xi |ω2 )p2
Uncertainty in parameter estimates is incorporated by integrating over diffuse prior distributions (Yang et al. 2005). Sites with BEB posterior probability >0.95 were considered adaptive protein-coding changes. We verified that sites with high BEB posterior probability on each branch had amino acid substitutions along that branch using ancestral protein-sequence reconstruction (Yang et al. 1995). We reconstructed the ML protein sequence at each node in the phylogeny to determine the branch on which each amino acid substitution occurred. Not surprisingly, we found that all sites with BEB posterior probability >0.95 on each lineage also had an inferred amino acid substitution along that branch, and the inferred residues always agreed between the codon- and proteinbased analyses (not shown).
cant adaptive protein-coding changes within a window of size k , varying k from a minimum of one amino acid to a maximum of the gene’s protein sequence length. We calculate the maximum proportion of adaptive lineages across all possible windows for each sequence alignment and plot these values against increasing k for each gene. We used a posterior probability cutoff of 0.95 to identify significant adaptive sites in each lineage (see above). We normalized k by dividing by each gene’s total sequence length in order to directly compare results across genes. This analysis was limited to those genes with at least five adaptive lineages. Genes were categorized into similarity groups based on their scale of recurrent adaptation. For each gene, we calculated the window size that produced 90% adaptive lineages and used complete-linkage hierarchical clustering (Hastie et al. 2009) to group genes into similarity clusters. We used a clustering height cutoff of 0.1 to define similarity clusters. We also examined the pattern of recurrent adaptive targets across each gene’s protein sequence using a sliding window approach. We used a window of 20 amino acids, slid one position at a time across the sequence alignment. For each window, we calculated the proportion of adaptive lineages with significant adaptive sites within the window. We used posterior probability cutoffs of 0.9 and 0.95 to identify significant adaptive sites in each lineage.
Domain Architecture and Structural Homology Modeling Protein domain architecture was inferred via a sequence search of the Pfam database (Coggill et al. 2008; Finn et al. 2010) using the D. melanogaster protein sequence. Homology to solved 3D protein structures was inferred via a sequence search of the Protein Data Bank (Berman et al. 2000), with an e-value cutoff of 10−5 . Structural homology models were built using MODELLER 9v7 (Eswar et al. 2008). Sequences were aligned to the structural template using MAFFT. Five structural models were constructed and evaluated using the MODELLER objective function as well as DOPE and GA341 assessment scores (Eramian et al. 2008). Results are shown for the overall best model. Sequence not alignable to the structural template was excluded. We constructed structural models for both extant D. melanogaster protein sequences and reconstructed ancestral sequences. Ancestral sequences were reconstructed using PAML (Yang et al. 1995; Yang 2007). We selected the best-fit evolutionary model for each protein alignment using jModelTest (Posada 2008) and excluded any positions for which the ancestral reconstruction had posterior probability 0.99, red indicates posterior probability >0.95, and blue indicates posterior probability >0.90. Gray indicates posterior probability 0.90. The proportion of adaptive lineages at each position in the sequence is shown for posterior probability cutoffs of 0.90 (blue) and 0.95 (red).
gion as well as in the hinge region separating its two dsrm domains. Finally, VIG had an amazingly strong signature of recurrent positive selection across the entire length of its N-terminal domain, which is highly divergent across species. Our results suggest that much of the adaptive protein-coding changes in RNAi genes occur outside conserved functional domains, consistent with recent findings that protein-coding adaptation is likely to occur in clusters, at the tips of proteins and in disordered loop regions (Ridout et al. 2010).
Recurrent Adaptation in DCR-2 and SPN-E Two of our strongest candidates for recurrent proteincoding adaptation are DCR-2 and SPN-E, both of which have multiple adaptive substitutions occurring throughout the Drosophila phylogeny (fig. 1, supplementary fig. S2, Supplementary Material online). Both these genes exhibit pervasive adaptive substitutions across their entire sequence and impacting multiple functional domains (fig. 4), suggesting that the functions of these genes have been shaped by multiple rounds of strong, recurrent, and variable selection. Both genes also have strong signatures of selective sweeps in D. melanogaster, with DCR-2 having by far the strongest signature of any gene we examined (fig. 1, supplementary fig. S1, Supplementary Material online). Interestingly, DCR2 and SPN-E share the same N-terminal domain architecture: a DEAD domain followed by a Helicase C domain, both of which are involved in manipulating dsRNA. Our analysis reveals specific targets of recurrent adaptation in these N-terminal domains for both DCR-2 and SPN-E (fig. 4),
MBE
Drosophila RNAi adaptation · doi:10.1093/molbev/msq284
FIG. 5. Structural modeling of DCR-2 and SPN-E suggests functional importance of melanogaster-specific amino acid substitutions. (a) Inferred 3D structures of ancestral (blue) and derived (red) melanogaster dsrm domains of DCR-2. Substitutions occurring along the melanogaster lineage are indicated. (b) Ancestral (blue) and derived (red) structures of the N-terminal region of melanogaster SPN-E. Amino acid substitutions are shown as spheres to highlight their locations in the protein structure.
suggesting that these genes may have modulated the way they interact with dsRNA in response to strong directional selection. We examined the potential structural consequences of adaptive protein-coding changes in DCR-2 and SPN-E by comparing 3D structural models of both ancestral and extant D. melanogaster proteins (see Methods for details). We used the solved structure of the mouse Dicer protein (Du et al. 2008) as a template from which to construct a model of the C-terminal region of DCR-2—spanning its last Ribonuclease III domain as well as its dsrm domain (fig. 5a). Although this region of DCR-2 is not a particularly strong target of recurrent adaptation (see fig. 4), there were a number of adaptive melanogaster-specific substitutions present. All but two of these substitutions occurred within the dsrm domain, a critical domain for RNA binding (Gan et al. 2006), and one of the remaining substitutions occurred at the end of the linker region connecting the dsrm and Ribonuclease III domains (fig. 5a). The bulk of melanogaster-specific adaptive substitutions in the DCR-2 dsrm domain occur in regions known to be directly involved in protein–RNA interactions (fig. 5a). For example, the α1 helix is critical for RNA recognition and
binding, forming four hydrogen bonds with RNA O2’ hydroxyl groups in the bacterial homolog (Gan et al. 2006). There is an I → V substitution in the center of this helix and a P → H substitution at the residue connecting α1 to the linker region. Interestingly, mutations at the corresponding α1-linker position in Escherichia coli have been shown to obliterate dsRNA binding (Inada and Nakamura 1995). The loop region between β strands 1 and 2 is also important for RNA binding, fitting into the minor grove of bound dsRNA and forming one hydrogen bond (Gan et al. 2006). We observed a D → E substitution in the middle of this loop region as well as an L → V substitution at the residue connecting this loop to β 2. These results strongly suggest that adaptive protein-coding changes in melanogaster DCR-2 are important for modulating interactions with dsRNA molecules, probably of viral origin. The N-terminal region of SPN-E—spanning its DEAD, Helicase C, and HA2 domains—was modeled using the yeast Prp43p protein (He et al. 2010). This region contains a number of strong spikes of recurrent adaptation across the phylogeny (fig. 4). Structural modeling revealed that the bulk of adaptive substitutions along the melanogaster lineage occurred on the protein surface away from the catalytic core, suggesting possible roles in modulating intermolecular interactions rather than radically altering the protein’s core function (fig. 5b ). The exception to this was a cluster of internal substitutions occurring throughout the RecA-2 domain as well as at the interface between RecA-2 and 5’HP (fig. 5b ). It has been suggested that these domains may play critical roles in RNA binding and that repositioning of 5’HP in ADP- versus ATP-bound proteins may be particularly important for protein function (He et al. 2010). Overall, our comparative structural analysis highlights the prevalence of adaptive amino acid substitutions in regions of DCR-2 and SPN-E likely to be involved in direct interactions with RNA molecules, suggesting that these changes may be functionally important for modulating these interactions over evolutionary time.
Discussion Genomic scans for targets of adaptive evolution are becoming increasingly commonplace, generating large lists of adaptive candidate genes. In contrast to the relative ease with which candidate genes can be identified, functional characterization of changes in the genome typically involves difficult and time-consuming bench work, limiting the ability of researchers to follow up on interesting leads. Thus, although genome-wide characterization is an important effort, focused attention on well-described pathways or functions can play a complementary role in our characterization of the adaptive process. Here, we have shown how detailed evolutionary and computational analyses can be used to dissect the patterns of protein-coding adaptation in Drosophila RNAi genes and infer the functional consequences of these patterns, informing downstream biochemical analyses. Although our 1039
MBE
Kolaczkowski et al. · doi:10.1093/molbev/msq284
understanding of RNAi pathways is still in its infancy (Czech et al. 2008; Zhou et al. 2008), our analyses do suggest several testable hypotheses about the targets of natural selection in Drosophila RNAi. The strongest signatures of recurrent adaptation we observed occurred across RNAi genes involved in antiviral immunity (see fig. 1 and supplementary fig. S1–S3, Supplementary Material online). The most extreme outlier was DCR-2, which is known to directly interact with and dice viral dsRNA into siRNAs (Galiana-Arnoux et al. 2006; Aliyari et al. 2008). DCR-2-produced siRNAs are loaded specifically onto AGO2 (van Rij et al. 2006; Aliyari et al. 2008), which also has a very strong signature of adaptive evolution. Other genes known to directly interact with DCR-2 also had strong adaptive signatures in our analysis, including R2D2, whose dsrm domains bind directly to DCR-2 (Liu et al. 2003); it is interesting to note that the N-terminal dsrm domain of R2D2 contains a number of adaptive protein-coding changes (see supplementary fig. S3, Supplementary Material online). Although not a target of recurrent protein-coding adaptation across the Drosophila phylogeny, PIMET—a methyltransferase that methylates siRNAs on AGO2 (Horwich et al. 2007)—had a moderately strong signature of a recent selective sweep in D. melanogaster, suggesting that it may also be a target of positive selection. As a whole, these results suggest that coevolutionary arms races driven by viral components may dominate Drosophila RNAi evolution. On the virus’ side, the most likely molecular candidates driving this coevolutionary arms race are viral suppressors of RNAi (VSRs). VSRs are viral-encoded factors that interfere with the host’s RNAi machinery and are typically required for successful viral infection (Li et al. 2002; Andersson et al. 2005; Voinnet 2005; Aliyari et al. 2008). Although the precise mechanisms by which many VSRs act is unknown, our results suggest that VSRs may primarily interfere with the processing of viral dsRNA by DCR-2 in Drosophila. An alternative is that DCR-2 may be responding—in part— to changes in the viral dsRNA, itself. Previous studies have shown that changes in dsRNA structure and sequence, particularly at the 3’ overhang, can affect Dicer efficiency and specificity (Zhang et al. 2004; Vermeulen et al. 2005). Another outlier in our analysis, SPN-E (see figs. 1, 4 and supplementary figs. S1, S2, Supplementary Material online), has been shown to be a major player in protection against TEs in the Drosophila germline (Vagin et al. 2006; Lim et al. 2009; Malone et al. 2009). Other genes involved in defense against TEs (e.g., AUB, KRIMP, SQU, and ZUC; Pane et al. 2007) also show evidence for adaptive evolution. Limited data prevent us from drawing strong conclusions about recurrent adaptation in AUB, KRIMP, and ZUC (see supplementary fig. S2, Supplementary Material online), although we did observe evidence for recent selective sweeps at D. melanogaster AUB and ZUC (supplementary fig. S1, Supplementary Material online). There was less evidence for protein-coding adaptation in downstream genes involved in TE defense (e.g., PIWI). Interestingly, a recent study concluded that VSRs may affect both antiviral and anti-TE RNAi pathways (Berry et al. 2009), suggesting the possibility that 1040
the adaptation we observed in the anti-TE pathway may be a byproduct of VSRs’ effects on antiviral RNAi. Finally, AGO1, DCR-1, DROSHA, and LOQS have all be shown to be involved in posttranscriptionalgene regulation (Saito et al. 2005). We uncovered evidence that all these genes may have experienced a burst of adaptive proteincoding changes during early Drosophila divergence, with little evidence for continued adaptation in more derived lineages (see supplementary fig. S2, Supplementary Material online), except for the strong signature of a recent selective sweep in melanogaster LOQS (fig. 1, supplementary fig. S1, Supplementary Material online). This finding suggests that these genes may have been important for defining Drosophila-specific gene regulatory patterns. Several studies have suggested that a gene’s specific position within a larger biochemical pathway or protein interaction network might correlate with the amount of adaptive evolution observed at that gene, with results varying as to whether upstream or downstream genes are more likely to be adaptive targets (Rausher et al. 1999; AlvarezPonce et al. 2009; Ramsay et al. 2009). Our results are largely consistent with these models. We observed the greatest amount of recurrent protein-coding adaptation at genes directly responsible for dicing dsRNA targets of RNAi (DCR-2 and SPN-E). Weaker more patchy distributions of proteincoding adaptation were observed for genes that directly interact with these genes (e.g., LOQS, R2D2), with even less adaptation observed in more downstream RNAi genes (e.g., AGO1, AGO2). In the case of Drosophila RNAi, recurrent adaptation is most likely generated through an evolutionary arms race between viral-encoded factors and primary host molecules (DCR-2 and SPN-E). This would cause rapid and systemic changes in the primary host molecules, which could disrupt interactions with other proteins in the pathway. Disruption of these interactions would generate some selection pressure on the interacting partners to change in order to maintain the interaction, creating a chain reaction that ripples through the entire interaction network (cf. Mustonen and Lassig 2009). Given that cells typically interact with their external environment through receptor molecules that detect specific extrinsic signals and initiate response cascades, this model may provide a general framework for understanding how adaptive arms races may affect protein interaction networks and other biochemical pathways.
Supplementary Material Supplementary figures S1–S3 and supplementary table S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments Oralia Kolaczkowski assisted with data analysis and figure preparation. Matthew Hahn and three anonymous reviewers provided helpful comments. Research funding was supplied by Dartmouth College and the Neukom Institute.
Drosophila RNAi adaptation · doi:10.1093/molbev/msq284
References Aliyari R, Wu Q, Li HW, Wang XH, Li F, Green LD, Han CS, Li WX, Ding SW. 2008. Mechanism of induction and suppression of antiviral immunity directed by virus-derived small RNAs in Drosophila. Cell Host Microbe. 4:387–397. Alvarez-Ponce D, Aguad´e M, Rozas J. 2009. Network-level molecular evolutionary analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes. Genome Res. 19:234–242. Andersson MG, Haasnoot PC, Xu N, Berenjian S, Berkhout B, Akusj¨arvi G. 2005. Suppression of RNA interference by adenovirus virus-associated RNA. J Virol. 79:9556–9565. Andolfatto P, Przeworski M. 2001. Regions of lower crossing over harbor more rare variants in African populations of Drosophila melanogaster. Genetics 158:657–665. Anisimova M, Yang Z. 2007. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 24:1219–1228. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. 2000. The protein data bank. Nucleic Acids Res. 28:235–242. Berry B, Deddouche S, Kirschner D, Imler JL, Antoniewski C. 2009. Viral suppressors of RNA silencing hinder exogenous and endogenous small RNA pathways in Drosophila. PLoS One. 4(6):e5866. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14:708–715. Caruthers JM, Johnson ER, McKay DB. 2000. Crystal structure of yeast initiation factor 4a, a dead-box RNA helicase. Proc Natl Acad Sci U S A. 97:13080–13085. Coggill P, Finn RD, Bateman A. 2008. Identifying protein domains with the Pfam database. Curr Protoc Bioinformatics. Chapter 2, 23:2.5.1– 2.5.17. Czech B, Malone CD, Zhou R, Stark A, Schlingeheyde C, Dus M, Perrimon N, Kellis M, Wohlschlegel JA, Sachidanandam R, Hannon GJ, Brennecke J. 2008. An endogenous small interfering RNA pathway in Drosophila. Nature 453:798–802. Deshpande G, Calhoun, G, Schedl P. 2005. Drosophila argonaute-2 is required early in embryogenesis for the assembly of centric/ centromeric heterochromatin, nuclear division, nuclear migration, and germ-cell formation. Genes Dev. 19:1680–1685. Drosophila 12 Genomes Consortium, Clark AG, Eisen MB, et al. (404 co-authors). 2007. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218. Du Z, Lee JK, Tjhen R, Stroud RM, James TL. 2008. Structural and biochemical insights into the dicing mechanism of mouse dicer: a conserved lysine is critical for dsRNA cleavage. Proc Natl Acad Sci U S A. 105:2391–2396. Eramian D, Eswar N, Shen MY, Sali A. 2008. How well can the accuracy of comparative protein structure models be predicted? Protein Sci. 17:1881–1893. Eswar N, Eramian D, Webb B, Shen MY, Sali A. 2008. Protein structure modeling with modeller. Methods Mol Biol. 426:145–159. Fagegaltier D, Boug´e AL, Berry B, Poisot E, Sismeiro O, Copp´ee JY, Th´eodore L, Voinnet O, Antoniewski C. 2009. The endogenous siRNA pathway is involved in heterochromatin formation in Drosophila. Proc Natl Acad Sci U S A. 106:21258–21263. Finn RD, Mistry J, Tate J, et al. (14 co-authors). 2010. The Pfam protein families database. Nucleic Acids Res. 38:211–222. Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC. 1998. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391:806–811. Flynt A, Liu N, Martin R, Lai EC. 2009. Dicing of viral replication intermediates during silencing of latent Drosophila viruses. Proc Natl Acad Sci U S A. 106:5270–5275.
MBE Galiana-Arnoux D, Dostert C, Schneemann A, Hoffmann JA, Imler JL. 2006. Essential function in vivo for dicer-2 in host defense against RNA viruses in Drosophila. Nat Immunol. 7:590–597. Gan J, Tropea JE, Austin BP, Court DL, Waugh DS, Ji X. 2006. Structural insight into the mechanism of double-stranded RNA processing by ribonuclease III. Cell 124:355–366. Hasegawa M, Kishino H, Yano T. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 22:160–174. Hastie T, Tibshirani R, Friedman JH. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics. 2nd ed. New York: Springer. He Y, Andersen GR, Nielsen KH. 2010. Structural basis for the function of DEAH helicases. EMBO Rep. 11:180–186. Hinrichs AS, Karolchik D, Baertsch R, et al. (27 co-authors). 2006. The UCSC genome browser database: update 2006. Nucleic Acids Res. 34:D590–D598. Horwich DM, Li C, Matranga C, Vagin V, Farley G, Wang P, Zamore PD. 2007. The Drosophila RNA methyltransferase, DmHen1, modifies germline piRNAs and single-stranded siRNAs in RISC. Curr Biol. 17:1265–1272. Inada T, Nakamura Y. 1995. Lethal double-stranded RNA processing activity of ribonuclease III in the absence of suhB protein of Escherichia coli. Biochimie 77:294–302. Katoh K, Toh H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9:286–298. Kern AD, Haussler D. 2010. A population genetic hidden Markov model for detecting genomic regions under selection. Mol Biol Evol. 27:1673–1685. Kim Y, Stephan W. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765–777. Lai EC. 2002. Micro RNAs are complementary to 3’UTR sequence motifs that mediate negative post-transcriptional regulation. Nat Genet. 30:363–364. Li H, Li WX, Ding SW. 2002. Induction and suppression of RNA silencing by an animal virus. Science 296:1319–1321. Lim AK, Tao L, Kai T. 2009. piRNAs mediate posttranscriptional retroelement silencing and localization to pi-bodies in the Drosophila germline. J Cell Biol. 186:333–342. Lingel A, Simon B, Izaurralde E, Sattler M. 2003. Structure and nucleicacid binding of the Drosophila Argonaute 2 PAZ domain. Nature 426:465–469. Liu Q, Rand TA, Kalidas S, Du F, Kim HE, Smith DP, Wang X. 2003. R2d2, a bridge between the initiation and effector steps of the Drosophila RNAi pathway. Science 301:1921–1925. Lu J, Clark AG. 2010. Population dynamics of PIWI-interacting RNAs (piRNAs) and their targets in Drosophila. Genome Res. 20: 212–227. Malone CD, Brennecke J, Dus M, Stark A, McCombie WR, Sachidanandam R, Hannon GJ. 2009. Specialized piRNA pathways act in germline and somatic tissues of the Drosophila ovary. Cell 137:522–535. Matranga C, Zamore PD. 2007. Small silencing RNAs. Curr Biol. 17: 789–793. McDonald JH, Kreitman M. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654. Moreno-Hagelsieb G, Latimer K. 2008. Choosing Blast options for better detection of orthologs as reciprocal best hits. Bioinformatics 24:319–324. Mustonen V, Lassig M. 2009. From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet. 25:111–119. Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C. 2005. Genomic scans for selective sweeps using SNP data. Genome Res. 15:1566–1575.
1041
Kolaczkowski et al. · doi:10.1093/molbev/msq284 Obbard DJ, Gordon KH, Buck AH, Jiggins FM. 2009. The evolution of RNAi as a defence against viruses and transposable elements. Philos Trans R Soc Lond B Biol Sci. 364:99–115. Obbard DJ, Welch JJ, Kim KW, Jiggins FM. 2009. Quantifying adaptive evolution in the Drosophila immune system. PLoS Genet. 5(10):e1000698. Pane A, Wehr K, Sch¨upbach T. 2007. Zucchini and squash encode two putative nucleases required for rasiRNA production in the Drosophila germline. Dev Cell 12:851–862. Posada D. 2008. jModelTest: phylogenetic model averaging. Mol Biol Evol. 25:1253–1256. Ramsay H, Rieseberg LH, Ritland K. 2009. The correlation of evolutionary rate with pathway position in plant terpenoid biosynthesis. Mol Biol Evol. 26:1045–1053. Rausher MD, Miller RE, Tiffin P. 1999. Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway. Mol Biol Evol. 16:266–274. Ridout K, Dixon C, Filatov D. 2010. Positive selection differs between protein secondary structure elements in Drosophila. Genome Biol Evol. 2:166–179. Saito K, Ishizuka A, Siomi H, Siomi MC. 2005. Processing of premicrornas by the dicer-1-loquacious complex in Drosophila cells. PLoS Biol. 3(7):e235. Saleh MC, Tassetto M, van Rij RP, Goic B, Gausson V, Berry B, Jacquier C, Antoniewski C, Andino R. 2009. Antiviral immunity in drosophila requires systemic RNA interference spread. Nature 458:346–350. Singh ND, Arndt PF, Petrov DA. 2005. Genomic heterogeneity of background substitutional patterns in Drosophila melanogaster. Genetics 169:709–722. Siomi H, Siomi MC. 2009. On the road to reading the RNA-interference code. Nature 457:396–404. Smith NG, Eyre-Walker A. 2002. Adaptive protein evolution in Drosophila. Nature 415:1022–1024. Song JJ, Liu J, Tolia NH, Schneiderman J, Smith SK, Martienssen RA, Hannon GJ, Joshua-Tor L. 2003. The crystal structure of the argonaute2 paz domain reveals an RNA binding motif in RNAi effector complexes. Nat Struct Biol. 10:1026–1032. Storey J. 2002. A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol. 64:479–498. Thornton K, Andolfatto P. 2006. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172:1607–1619.
1042
MBE Tribolium Genome Sequencing Consortium, Richards S, Gibbs RA, et al. (240 co-authors). 2008. The genome of the model beetle and pest Tribolium castaneum. Nature 452:949–955. Tweedie S, Ashburner M, Falls K, et al. (12 co-authors). 2009. Flybase: enhancing Drosophila gene ontology annotations. Nucleic Acids Res. 37:555–559. Vagin VV, Sigova A, Li C, Seitz H, Gvozdev V, Zamore PD. 2006. A distinct small RNA pathway silences selfish genetic elements in the germline. Science 313:320–324. van Rij RP, Saleh MC, Berry B, Foo C, Houk A, Antoniewski C, Andino R. 2006. The RNA silencing endonuclease argonaute 2 mediates specific antiviral immunity in Drosophila melanogaster. Genes Dev. 20:2985–2995. Vermeulen A, Behlen L, Reynolds A, Wolfson A, Marshall WS, Karpilow J, Khvorova A. 2005. The contributions of dsRNA structure to dicer specificity and efficiency. RNA 11:674–682. Voinnet O. 2005. Induction and suppression of RNA silencing: insights from viral infections. Nat Rev Genet. 6:206–220. Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R. 2007. Localizing recent adaptive evolution in the human genome. PLoS Genet. 3:e90. Yan KS, Yan S, Farooq A, Han A, Zeng L, Zhou MM. 2003. Structure and conserved RNA binding of the PAZ domain. Nature 426: 468–474. Yang Z. 1996. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 11:367–372. Yang Z. 2007. Paml 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24:1586–1591. Yang Z, Kumar S, Nei M. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641– 1650. Yang Z, Wong WS, Nielsen R. 2005. Bayes empirical Bayes inference of amino acid sites under positive selection. Mol Biol Evol. 22: 1107–1118. Zhang H, Kolb FA, Jaskiewicz L, Westhof E, Filipowicz W. 2004. Single processing center models for human dicer and bacterial RNAse III. Cell 118:57–68. Zhang J, Nielsen R, Yang Z. 2005. Evaluation of an improved branchsite likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 22:2472–2479. Zhou R, Hotta I, Denli AM, Hong P, Perrimon N, Hannon GJ. 2008. Comparative analysis of argonaute-dependent small RNA pathways in Drosophila. Mol Cell 32:592–599.