The RNA structurome: transcriptome-wide structure probing with next ...

6 downloads 0 Views 2MB Size Report
2 Department of Biology, Pennsylvania State University, University Park, PA 16802, USA. 3 Center for RNA Molecular Biology, Pennsylvania State University, ...
Review

The RNA structurome: transcriptome-wide structure probing with next-generation sequencing Chun Kit Kwok1,3*, Yin Tang2,3,4, Sarah M. Assmann2,3,4,5, and Philip C. Bevilacqua1,3,5 1

Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA Department of Biology, Pennsylvania State University, University Park, PA 16802, USA 3 Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA 4 Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA 5 Plant Biology Graduate Program, Pennsylvania State University, University Park, PA 16802, USA 2

RNA folds into intricate structures that enable its pivotal roles in biology, ranging from regulation of gene expression to ligand sensing and enzymatic functions. Therefore, elucidating RNA structure can provide profound insights into living systems. A recent marriage between in vivo RNA structure probing and next-generation sequencing (NGS) has revolutionized the RNA field by enabling transcriptome-wide structure determination in vivo, which has been applied to date to human cells, yeast cells, and Arabidopsis seedlings. Analysis of resultant in vivo ‘RNA structuromes’ provides new and important information regarding myriad cellular processes, including control of translation, alternative splicing, alternative polyadenylation, energy-dependent unfolding of mRNA, and effects of proteins on RNA structure. An emerging view suggests potential links between RNA structure and stress and disease physiology across the tree of life. As we discuss here, these exciting findings open new frontiers into RNA biology, genome biology, and beyond. Importance of RNA structure in biology RNA has a significant role in nearly every process in living cells, including transcription, RNA processing, and translation. Moreover, RNA can sense biomolecules (e.g., proteins, RNAs, or DNAs), ligands, temperature, and mutations, all of which can modulate RNA structure (Figure 1) [1–3]. The single-stranded (ss) nature of RNA provides the plasticity needed for it to fold into diverse secondary structures (e.g., hairpins or three-way junctions) and tertiary structures (e.g., pseudoknots and G-quadruplexes) that govern its functional roles. Currently, our knowledge of RNA sequence far exceeds our understanding Corresponding authors: Kwok, C.K. ([email protected]); Assmann, S.M. ([email protected]); Bevilacqua, P.C. ([email protected]). Keywords: next-generation sequencing (NGS); RNA structure; RNA structurome; structure prediction; structure probing; transcriptome. * Current address: Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK 0968-0004/ ß 2015 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tibs.2015.02.005

of RNA structure. Moreover, the structures that we have of RNAs have mostly been determined in vitro and RNA structure often differs dramatically in vivo versus in vitro [4–6]. Additionally, structures determined through highresolution in vitro methods of nuclear magnetic resonance (NMR) and X-ray crystallography sometimes cannot reveal the functional state of RNA in the cellular milieu [7,8]. Knowledge of in vivo RNA structures helps to uncover the working mechanisms of RNAs in cells, facilitates the controlled manipulation of gene expression, and will enable the design and development of molecular tools for bioand nanotechnological applications. In this review, we first describe the classical methods used to predict and probe RNA structure, and how recent efforts improve the throughput of those methods. We then highlight several NGS (see Glossary) approaches that have been developed to provide RNA structural data on a transcriptome-wide scale. Next, we focus on experimental and computational efforts to construct the in vivo RNA structurome in various organisms, and describe some of the novel insights revealed to date. Finally, we present perspectives on potential developments and future challenges toward enabling comprehensive structural and mechanistic understanding of the roles of RNA in vivo. RNA structure prediction and probing of RNAs one at a time RNA adopts complex structures to perform its functional roles in cells. This structure–function relation is exemplified in ribozymes and riboswitches [9]. Given the key role of RNA in gene regulation, the ability to predict and probe RNA structure is essential to better understand the function of RNA in biology. Secondary structures of individual RNAs can be accurately determined from phylogenetic analysis, which was used to provide the initial structural models for diverse RNAs, including tRNAs, introns, and rRNAs [10–12]. These were consistent with later X-ray crystal structures [13]. However, phylogenetic analysis requires extensive sequence information. Moreover, covariation does not Trends in Biochemical Sciences, April 2015, Vol. 40, No. 4

221

Review

Glossary 2–8% normalization: data are first ranked by probe reactivity and the top 2% of the data are taken as outliers. Then, the average reactivity of the next 8% of the data is calculated, and each datum (including the top 2%) is divided by this average reactivity. The typical scale spans from 0 to approximately 2. Cochran–Mantel–Haenszel (CMH) test: a test of independence of multiple variables across categories, such as biological replicates. Typically, when experiments produce multiple two  two tables measuring the relationship between two variables in different biological replicates, the CMH test can test the independence of the two variables across the replicates. Covariation: when two or more putative interacting bases exhibit similar patterns of variation in an alignment of sequences regardless of the type of base pair. Ensemble approach: converts structure-probing experimental data into pseudofree energy terms and incorporates them into energy-based structure prediction algorithms in a partition function, which describes the ensemble of RNA structures and the base-pairing probability of each nucleotide that can be inferred from the partition function. The ensemble approach iteratively estimates the thermodynamic parameters, taking the noise in the data into account. Next-generation sequencing (NGS): a category of DNA-sequencing methodologies that allows high-throughput sequencing with low per-base cost. NGS enables massively parallel DNA sequencing, yielding the simultaneous retrieval of information on tens of millions of DNA sequences. The previous method was dye-terminator sequencing, commonly known as Sanger sequencing, which provides information on only a limited amount of sequence at a time using gel or capillary electrophoresis. Phylogenetic analysis of RNA: a means of determining RNA secondary structure based on evolutionary relations rather than physical principles, such as free energy. Using this method, a query sequence is used to identify homologous sequences in diverse organisms. Next, the sequences are aligned and examined to identify base pair correlated variation (‘covariation’). This method of RNA secondary structure prediction is considered highly reliable. RiboSNitch: a term that describes a SNP in a sequence that causes a change in RNA structure that potentially alters regulatory function. RNA structure probing: a technique that is used to interrogate the structure of RNA, often in the presence of interacting partners, such as proteins and ligands. A ribonuclease or a chemical probe targets a specific structural feature of RNA and induces a cleavage of the backbone or modification of a nucleotide; for example, RNase V1 cleaves ds regions and DMS modifies (methylates) ss adenines and cytosines. For NGS methods, RNA cleavage and modification are detected via the stops they cause during reverse transcription. RNA structurome: a term to describe the RNA structural information of the transcriptome of an organism. RNA structuromes enable correlated and emergent biological and chemical properties of RNA structure and function to be uncovered. Sample and select approach: this approach first uses Sfold to generate a great number of candidate structures. Sfold is a software package that employs a statistical algorithm to sample RNA secondary structures from the Boltzmann ensemble. Then, the sample and select approach selects the candidate structure that most closely matches the experimental data obtained from RNA structure-probing methods. Selective 20 -hydroxyl acylation analysed by primer extension (SHAPE): a chemical probing technique in which flexible RNA nucleotides are marked through covalent modification. A SHAPE reagent, such as NAI or 1M7, detects the local conformation of RNA nucleotides and acylates the 20 OH group of unconstrained nucleotides. The resultant 20 O adduct induces a reverse transcriptase stop, and the chemical reactivity obtained can be used to infer RNA structure.

speak directly to folding intermediaries or alternative structures to which an RNA might switch. RNA structure can be predicted based on free energy minimization [14,15]. Such thermodynamics-based methods have been the dominant approach for RNA secondary structure prediction [16–18]. However, this approach accurately predicts only approximately 70% of the known base pairs in the lowest free-energy structure [19,20]. Recent advances have empowered this approach by coupling it with either phylogenetic analysis [21,22] or in vitro RNA structure-probing data from nucleases [18,23] or chemicals [20]. In some cases, the improvement in structure prediction accuracy upon inclusion of chemical modification data has been substantial [20]. 222

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

Structure probing has been applied to individual RNAs for decades [24,25]. Ribonucleases (RNases) recognize specific ss (e.g., loop) regions or double-stranded (ds; e.g., stem) regions of RNA and cleave the RNA backbone at those sites. Using RNase probing, novel structural features of many RNAs have been identified [26,27] and excellent agreement has been found with the phylogenetic structures of rRNAs [28]. Nevertheless, the large physical size of RNases sometimes restricts their ability to detect RNA structural fingerprints [24] (e.g., mismatches and bulges), and the membrane-impermeant nature of these enzymes has limited their use to in vitro applications. Owing to their smaller bulk, chemical probes offer higher resolution assessment of RNA secondary and tertiary structure compared with RNases [2,25]. Different chemical probes target distinct regions of nucleotides; for example, dimethyl sulfate (DMS) alkylates the unprotected and nonbase-paired N1 position of adenine (N1A), the N3 position of cytosine (N3C), and the N7 position of guanine (N7G) of RNA [29]. Selective 20 -hydroxyl acylation analyzed by primer extension (SHAPE) reagents 2-methylnicotinic acid imidazolide (NAI) or 1-methyl-7-nitroisatoic anhydride (1M7) acylate the flexible 20 -hydroxyl (20 OH) common to all four nucleotides [30,32]. A list of chemical probes used to date for transcriptome-wide RNA structure probing (discussed below) and their specificity are provided in Table 1. One key advantage of chemical probes such as DMS, NAI, or 1M7 over RNases is that they can readily be applied in vivo [4,5,31,32]. This enables interrogation of in vivo RNA structure and its change upon exposure to various conditions. RNA structure-probing data (based either on nucleases or chemicals) report the structure of RNA nucleotides via a scale of probe reactivity, which then can be converted into structural restraints and input into computational algorithms to facilitate RNA structure prediction. The simplest approach is to use these data as so-called ‘hard’ restraints, in which each nucleotide is forced to be either ss or ds using a user-assigned threshold based on probe reactivity. Another approach uses these data as ‘soft’ restraints by means of a pseudo-free energy term (based on probe reactivity) that adjusts the pure thermodynamic free energy of individual nucleotides [33–35]. Pseudo-free energy (soft-restrained) predicted in vitro structures of diverse classes of RNA have been found to be consistent with available X-ray crystal and NMR solution structures [33–36]. For example, while hard restraints improved the prediction accuracy for Escherichia coli 16S rRNA from approximately 50% to approximately 70%, pseudo-free energy terms further improved accuracy to 85–97% [33]. Other computational methods for probingassisted RNA structure prediction have also been used, including a sample and select approach and an ensemble approach [37,38], which provide alternative approaches to analyze structure-probing data [39]. The combined development of computational algorithms and experimental approaches has enabled improved prediction of single RNA structures, and has set the stage for higher-throughput and transcriptome-wide structural analyses. Toward high-throughput RNA structural analysis The classical method to read out RNA structure probing data is gel electrophoresis (PAGE). RNA modifications

Review

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

Ligan

re eratu Temp

d

RNA thermometer

Riboswitch

in Prote

Muta

RNA

o n

RNP

RiboSNitch Ti BS

Figure 1. RNA folding and the effects of various factors. RNA can adopt diverse folds, including secondary structures, such as the hairpin fold shown here, and tertiary structures. RNA structure is dynamic and can be sensitive to various cellular factors. The type of factors that an RNA structure is sensitive to categorizes it into a particular class, such as riboswitches (sensitive to ligands), ribonucleoproteins (RNPs, sensitive to proteins), RNA thermometers (sensitive to temperature), and riboSNitches (sensitive to mutation).

Table 1. Chemicals used to date for transcriptome-wide RNA structure probing Probe name DMS

Probe structure

Specificity G N7 A N1 C N3

Mechanism of read out RNA cleavage and/or RT RT RNA cleavage and/or RT

Modifies in vivo? Yes

Refs [4,29,31]

CMCT

U N3 G N1

RT

Yes

[72]

1M7

20 OH

RT

Yes

[5,6]

induced by DMS, 1-cyclohexyl-(2-morpholinoethyl)carbodiimide metho-p-toluene sulfonate (CMCT), and SHAPE, or cleavages are detected by reverse transcription (RT) using a radio- or fluorescently labeled primer; labeled cDNAs accumulate as a result of the RT stops (or cleavages) and are fractionated by PAGE. This technique provides restraints for structure prediction, but is low throughput and generally provides information only on approximately 150 nucleotides (nt) of one given transcript in each lane. Capillary electrophoresis (CE) has a somewhat improved throughput, which has extended the length amenable to analysis to approximately 500 nt [40]. However, given the plethora of RNAs in a transcriptome, the low- and medium-throughput nature of PAGE and CE,

respectively prevents the determination of RNA structures transcriptome-wide. Transcriptome-wide RNA structural probing in vitro To establish the RNA structurome [41], high-throughput transcriptome-wide RNA structural probing is required. Initial approaches were conducted in vitro and include parallel analysis of RNA structure (PARS) [42], fragmentation-sequencing (Frag-seq) [43], and ss/ds RNA-sequencing (ss/dsRNA-seq) [44–46]. These approaches, which enable the analysis of thousands of in vitro RNA structures in a single experiment, couple RNase probing with NGS, and differ mainly in the choice of RNase probe (Table 2) [47]. PARS has also been applied to in vitro RNA folding 223

Review

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

Table 2. NGS methods of interrogating RNA structures Methods PARS, PARTE

Application to date In vitro

Frag-seq ss/dsRNA-seq

In vitro In vitro

SHAPE-seq 1.0, 2.0 MAP-seq HRF-seq ChemMod-seq SHAPE-MaP RING-MaP CIRS-seq Structure-seq DMS-seq Mod-seq

In In In In In In In In In In

vitro vitro vitro vitro vitro vitro vitro vivo vitro, in vivo vivo

System Saccharomyces cerevisiae, human (lymphoblastoid cells) Mus musculus (KH2 embryonic stem cells) Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans Synthetic Synthetic Escherichia coli S. cerevisiae E. coli, HIV-1 Synthetic M. musculus (E14 embryonic stem cells) A. thaliana S. cerevisiae, human (K562/fibroblast cells) S. cerevisiae

events upon temperature elevation (PARTE; Table 2), demonstrating RNA refolding on a transcriptome scale [48]. The coupling of SHAPE chemical probing with NGS (SHAPE-seq) (Table 2) has enabled probing of in vitro RNA structure at single nt resolution [49–51]. Methods for DMS, 1M7, CMCT, and hydroxyl radical probing combined with NGS, including multiplexed accessibility probing-sequencing (MAP-seq), hydroxyl radical footprinting-sequencing (HRF-seq), chemical modification-sequencing (ChemMod-seq), and chemical inference of RNA structures-sequencing (CIRS-seq), agree well with conventional data and ribosome structures (Table 2) [52–55]. In addition, PARS and CIRS-seq have been applied to interrogate deproteinized RNA secondary structures [55,56]. Recently, the methods SHAPE-mutational profiling (SHAPE-MaP) [57] and RNA interacting groups–mutational profiling (RING-MaP) [58] were reported (Table 2), both of which use special buffer conditions to favor read-through of reverse transcriptase at sites of SHAPE or DMS modification accompanied by incorporation of noncomplementary nucleotides. Subsequent NGS and mutational profiling analysis reveal the sites of chemical modification. SHAPE-MaP-assisted structure prediction is consistent with known RNA structures, and has provided an updated structural map of the HIV-1 RNA genome that includes modeling of RNA pseudoknots [57]. In RING-MaP, RNA interacting groups for the thiamine pyrophosphate (TPP) riboswitch, P456 group I intron domain, and RNase P have been identified by DMS probing and used as restraints to model RNA 3D structure [58]. However, to truly understand the relations between RNA structure and biological function, RNA structures must be probed in vivo, because no in vitro conditions can fully reconstitute the cellular environment and in vitro and in vivo structures often differ dramatically [4,5]. Although the above-mentioned methods were conducted in vitro, some of them, particularly SHAPE-seq 2.0 [51], HRFseq [53], ChemMod-seq [54], CIRS-seq [55], SHAPE-MaP [57], and RING-MaP [58], could in principle be applied in vivo. Transcriptome-wide RNA structural probing in vivo Currently, three methods that utilize DMS to interrogate in vivo RNA structures transcriptome-wide have been 224

Probe RNase S1 (ssRNA), RNase V1 (dsRNA)

Refs [42,48,56]

RNase P1 (ssRNA) RNase I (ssRNA), RNase V1 (dsRNA)

[43] [44–46]

1M7 1M7, DMS, CMCT Hydroxyl radical DMS, 1M7 1M7 DMS DMS, CMCT DMS DMS DMS

[49–51] [52] [53] [54] [57] [58] [55] [59] [60] [61]

reported, namely Structure-seq [59], DMS-sequencing (DMS-seq) [60], and modification-sequencing (Mod-seq) [61], with demonstrated applications to date in Arabidopsis seedlings, yeast, and human cells (Table 2). The experimental workflows for the three methods are outlined in Figure 2. The three approaches share several steps, namely in vivo DMS treatment, RT, and ligation to enable PCR amplification before NGS (Figure 2). Although the three methods are conceptually similar, they differ in important ways, including the number of chemical modifications per molecule, RT priming, ligation methodology, and RNA population bias. Comparison of experimental procedures in Structureseq, DMS-seq, and Mod-seq For any RNA structural probing method, the RNA should be modified only once per every approximately 200 nt to achieve so-called ‘single-hit kinetics’ [47] (Box 1, Figure 2). Structure-seq, DMS-seq, and Mod-seq differ in which subpopulations of RNAs they examine. Following in vivo DMS treatment (Figure 2), RNA is poly(A) selected for Structure-seq and DMS-seq (Figure 2A,B), while total RNA is used for Mod-seq (Figure 2C), which results in a predominance of rRNAs. The subsequent steps of the protocols differ significantly, because the RNAs are subjected to unique processing steps within each experimental workflow, starting with the step of RNA random fragmentation in Figure 2. To generate a cDNA library for sequencing, the DMSmodified RNAs must be converted to cDNA and fused with NGS adapters on each side. For Structure-seq, random hexamer (N6) RT is conducted to generate first-strand cDNAs of varying lengths, together with part of an NGS adapter on one side (Figure 2A), whereas in DMS-seq and Mod-seq, random RNA hydrolysis is first performed to fragment the RNA sample (Figure 2B,C). The fragmented RNAs, which bear 50 hydroxyl and 20 ,30 cyclic phosphate termini in DMS-seq and Mod-seq, are then 30 dephosphorylated; for Mod-seq, a 50 phosphorylation of the RNA is performed. Resultant RNA fragments are subjected to 30 RNA adapter ligation to provide a known common sequence for primer annealing to initiate RT (Figure 2B,C). In Modseq, a 50 RNA adapter ligation is also performed (Figure 2C), followed by a 50 adapter selection step via biotinylated

Review

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

(A)

Structure-seq

DMS-seq

(B)

Mod-seq

(C)

RNA An

An

An

An

An

An

An

An

An

An

An

An

An

An

An

An

An

An

An

An

In vivo DMS treatment (Single-hit kinecs for structure-seq and Mod-seq) ( ) DMS induces RT stop

X

PolyA RNA selecon (An) poly(A) region An An

An

An

An

An

RNA random fragmentaon

X

3′ RNA adapter ligaon

X

5′ RNA adapter ligaon

X

X

5′ RNA adapter selecon

X

X

N6

N6 or 3′ adapter-specific RT

DNA Intermolecular DNA ligaon or Intramolecular DNA ligaon

5′ adapter subtracve hybridizaon

3′ adapter 5′OH 5′OH 5′OH

HO-3′ HO-3′ HO-3′

Inter C 3 -3′ C 3 -3′ C 3 -3′

NNN

3′ adapter

HO-3′ HO-3′ HO-3′

HO-3′

5′p 5′p 5′p

X

PCR and NGS

Intra 5′

5′

5′p 5′p 5′p

HO-3′ HO-3′

Intra 5′OH 5′OH 5′OH

NNN NNN

An

5′

5′

5′

5′

5′

5′

Subtracted

X

PCR and NGS

PCR and NGS Ti BS

Figure 2. Experimental workflow for Structure-seq, dimethyl sulfate sequencing (DMS-seq), and modification-sequencing (Mod-seq). The key steps for Structure-seq (red) [59], DMS-seq (cyan) [60], and Mod-seq (purple) [61] are shown. The steps that are not performed in a particular method are marked as ‘X’. DMS modification is marked with black oval, which will lead to DMS-induced reverse transcriptase stop. The ‘An’ in the poly(A) RNA selection step depicts the poly(A) tail in RNA. The ‘N’ in the DNA ligation step denotes degenerate base. (A) For Structure-seq (red), RNA is treated with DMS in vivo and poly(A) RNA is selected. The RNA is subjected to random hexamer (N6) reverse transcription (RT) to generate cDNA. Intermolecular single-stranded DNA (ssDNA) ligation is then performed to ligate the cDNA (green–black) and a ssDNA linker with NNN on its 50 end (gray). The ligated cDNA is amplified by PCR and submitted for next-generation sequencing (NGS). (B) For DMS-seq (cyan), RNA is treated with DMS in vivo and poly(A) RNA is selected. RNA is subjected to random fragmentation by Zn2+-mediated hydrolysis to generate RNA fragments. A 30 RNA ligation is performed to ligate the RNA fragment with a 30 RNA adapter (black). A 30 adapter-specific RT is performed on the ligated RNA to generate cDNA. Intramolecular circular DNA ligation is then performed, followed by PCR and NGS. (C) For Mod-seq (purple), RNA is treated with DMS in vivo and total RNA is used for the library preparation. RNA is subjected to random fragmentation by Zn2+-mediated hydrolysis to generate RNA fragments. A 30 RNA ligation is performed to ligate the RNA fragment with a 30 RNA adapter (black). A 50 RNA ligation is also performed to ligate the 50 RNA adapter (orange) to the RNA from the previous step (RNA fragment plus 30 RNA adapter), followed by a 50 RNA adapter selection via biotinylated DNA oligonucleotide that is antisense to the 50 adapter to select for successfully ligated RNAs. A 30 adapter-specific RT is performed on the ligated RNA to generate cDNA. A 50 adapter subtractive hybridization using biotinylated DNA oligonucleotide that senses the 50 adapter is performed to remove cDNAs generated from unmodified RNA. Intramolecular circular DNA ligation is then performed, followed by PCR and NGS.

225

Review

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

Box 1. Single-hit kinetics For RNA structure-probing experiments using either nuclease or chemical probes, single-hit kinetics (one modification and/or cleavage every approximately 200 RNA nucleotides) is desired because the modification and/or cleavage of one site can induce conformational changes that can lead to modification and/or cleavage of non-native sites and, thus, erroneous conclusions; also, modification and/or cleavage of two nearby sites can lead to artificially low reads at the more 50 site; that is, 30 bias.

oligonucleotide to select for successfully ligated RNA products. The 50 RNA adapter ligation in Mod-seq is also important for the 50 adapter subtractive hybridization step, which is described later. Overall, Structure-seq has significantly less processing steps at the RNA level compared with DMS-seq and Mod-seq (Figure 2), which reduces the chance of undesired RNA degradation. For DMS-seq and Mod-seq, the RNA ligases used in the RNA ligation steps may introduce sequence bias [62,63], although a recent study has demonstrated that this bias can be remediated [64]. All three methods used Circligase for cDNA ligation. Circligase has also been used in in vitro RNA structurome and other transcriptome-wide studies [49,51–54,65,66].

Structure-seq uses an intermolecular linear DNA ligation, whereas DMS-seq and Mod-seq use intramolecular circular DNA ligation (Figure 2). Circligase, similar to T4 RNA ligase, is known to have sequence bias [67]. Recent efforts have alleviated this issue by improving the reaction conditions for Circligase [51,67] or developing hybridizationbased ligation or ligation-free approaches [57,67]. As mentioned, under single-hit kinetic conditions, approximately 80% of RNA sequences are unmodified and, thus, do not yield structural information. Mod-seq increases the coverage of modified RNAs by using a 50 adapter subtractive hybridization step to deplete those cDNAs that contain the reverse complement of the ligated 50 adapter and, thus, must be unmodified by DMS (Figure 2C). In all methods, after the processing steps, the cDNAs are subjected to PCR and NGS, and raw sequencing reads are generated for use in computational analysis (Figure 2). Comparison of computational procedures in Structureseq, DMS-seq, and Mod-seq Computational analysis is performed to extract the transcriptome-wide RNA structural information from the NGS

Raw data Key:

Result

Reads mapping

Process

Reads mapped file Calculate no. of RT stops mapped to each nucleode

RT stop count file

Take natural log (In) to the RT stop counts

Cochran–Mantel–Haenszel test

RT stop count on significant enriched sites

Logged RT stop counts Normalize by transcript abundance and length

Local count normalizaon over windows

Normalize by total no. of RT stops in library

Normalized RT stop counts Subtract (–) library from (+) library

Normalized RT stop counts

Raw DMS reacvity Calculate fold enrichment between (+) and (–) libraries

2-8% normalizaon

DMS reacvies (signals) (A)

Structure-seq

(B)

DMS-seq

(C)

Mod-seq Ti BS

Figure 3. Computational analysis for Structure-seq (red), dimethyl sulfate sequencing (DMS-seq; cyan) and modification sequencing (Mod-seq; purple). Raw reads are treated similarly in the initial steps, including reads mapping and calculation of reverse transcription (RT) stop counts on each nucleotide (in gray boxes). (A) Structure-seq (red) [59] increments the RNA stop count on each nucleotide by one and then the natural logarithm is taken of the RT stop counts, followed by normalization by transcript abundance and length. Raw DMS reactivity is obtained by subtracting RT stop counts on each nucleotide in the (–) DMS library from those in the (+) DMS library. Final DMS reactivities are derived by performing 2–8% normalization on raw structural reactivities, which are reactivities that have been normalized for transcript abundance and length, and corrected for background. (B) DMS-seq (cyan) [60] derives structural reactivities by normalizing the RT stop counts within a given size window of each transcript proportionally to the most highly reactive base in the window. (C) Mod-seq (purple) [61] performs Cochran–Mantel–Haenszel tests on the RT stop count file to identify significantly enriched sites. The RT stop counts on these sites are further normalized by the total number of RT stops in each library [(–) DMS and (+) DMS]. The final DMS signals are derived as the fold enrichment of RT stop counts between the (+) DMS and (–) DMS library.

226

Review output. A computational workflow for the three in vivo methods is outlined in Figure 3. While the raw reads are treated similarly in the initial steps of each of the three methods, including reads mapping (mapping the sequencing reads to the corresponding transcriptome) and RT stop counting (counting the RT stops at a given nucleotide induced by DMS), the approaches used for normalization of RT stop counts and derivation of DMS reactivities on each nucleotide, which indicates the probability of DMS modification, are different. In Structure-seq (Figure 3A), the number of RT stops that map to each position are incremented by 1, natural log transformed, and divided by the average of the natural log of the RT stops per position (this normalizes for transcript abundance and length). DMS reactivity is derived by subtracting the normalized number of RT stops mapped to each position in the (–) DMS library from that in the (+) DMS library, followed by 2–8% normalization [68]. Structure-seq takes into consideration the abundance of transcripts in vivo and minimizes the skewness of the distribution of the raw number of RT stops on the transcripts by taking the natural log of the raw number of RT stops. Structure-seq is vulnerable to noise to some extent, especially on low-coverage RNAs, in which the effects of reverse transcriptase background drop off induced by RNA structure or native RNA modification can dominate the DMS-induced reverse transcriptase stop signal [69]. DMS-seq utilizes a different computational approach to normalize the data and derive DMS reactivities. In DMSseq [60] (Figure 3B), the raw number of RT stops mapped to each position in an mRNA is normalized locally within the chosen size window (50–200 nt) and proportionally to the base with the highest raw DMS signal in the transcript within that size window. A size window is chosen to alleviate 30 end enrichment caused by RNA degradation before poly(A) selection and, thus, to circumvent bias between different regions of a transcript. Nevertheless, if a nucleotide within a window has a high reactivity, it might have a prominent weakening effect on the calculated reactivities of other nucleotides, resulting in local bias within the window. In addition, local region normalization assumes independence of the different chosen windows on one transcript, which is not always true. A different normalization and DMS reactivity derivation approach based on DMS modification enrichment is used in Mod-seq. This approach identifies significantly enriched sites of DMS modification by Cochran–Mantel– Haenszel (CMH) tests on each transcript [61]. Mod-seq takes experimental noise into account and, thus, is more resistant to noise compared with the other two methods, because significantly enriched sites consistently identified by the CMH test have enrichment of the number RT stops in the (+) DMS library compared with that in the (–) DMS library in all the biological replicates. The number of RT stops mapped to each position is normalized by the total number of RT stops in the (–) DMS and (+) DMS library. However, given that the abundance of each transcript varies in living cells, it is preferable to normalize the RT stops mapped to each position of a transcript using the abundance of that individual transcript in the (–) DMS and (+) DMS library rather than by using the total number of

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

RT stops in each library. If reads are not normalized by their abundance, then the reactivities on different transcripts are not directly comparable due to differential transcript abundance in living cells, and this prevents global analysis of RNA structure to uncover meta-properties. In addition, in Mod-seq, fold enrichment is used as the measure of DMS reactivity, and it is derived for each nucleotide by dividing the normalized number of RT stops in the (+) DMS library by that in the (–) DMS library. However, because of this, Mod-seq is not able to calculate the fold enrichment (i.e., the reactivity) if no RT stop occurs on a given nucleotide in the (–) DMS library. To validate the experimental and computational approaches in the three methods, the authors in each case have shown high A and C nucleotide specificity of the DMS treatment, and good consistency when comparing the single strandedness (as inferred by DMS reactivities) of rRNAs with the phylogenetic structure or X-ray crystal structure of rRNAs. While the three methods utilize different experimental and computational procedures, these validations indicate the robustness of each method. Novel findings revealed from RNA structurome data Exploration of RNA structurome data has begun to provide intriguing insights into roles of RNA structure in cellular processes, including translation, splicing, polyadenylation, miRNA-mediated regulation, transcript stability, and transcript localization [2,3]. The presence of triplet periodicity in probe reactivity (i.e., reactivity cycling regularly every three nucleotides) within coding sequences (CDS) but not untranslated regions (UTRs) of mRNAs has been experimentally observed in yeast, mouse, and human in vitro [42,55,56], and in Arabidopsis in vivo [59]. In addition, computational predictions on human and mouse mRNAs revealed a triplet repeat [70], which suggests a general feature of mRNA coding sequences that warrants further examination. RNA structure is dynamic and well known to be sensitive to cellular conditions. One interesting finding from Structure-seq [59] and DMS-seq [60] was that certain mRNAs tend to be less structured and/or more dynamic in vivo. Structure-seq [59] revealed that the Arabidopsis mRNAs that showed greatest differences between their in silico versus in vivo restrained structures were enriched in annotations related to abiotic stress responses. Moreover, stress-related transcripts were found to have greater single-strandedness than those mRNAs that exhibited similar structures in in vivo and in silico prediction; the latter were enriched in functions related to basic metabolism and housekeeping [59] (Figure 4A). One possibility is that stress response mRNAs are more structurally dynamic and, thus, may adopt multiple conformations that allow regulatory purposes in cells. A related observation was reported using PARS, in which approximately 4% of the bases (located in 9.7% of the reported human mRNA transcripts) have both strong RNase V1 (dsRNA-specific) and strong RNase S1 (ssRNA-specific) reads, suggesting the presence of multiple RNA conformations [56]. In the future, further in vivo studies under stress conditions could provide clues about the structural characteristics of these dynamic mRNAs and their regulatory roles. 227

Review

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

(A)

(B)

In silico

Flexible

RPL33A 1.0

In vivo

0.5

Structured 0.0 1.0

30°C

0.5

DMS signal

0.0 1.0

In vivo

45°C

0.5 0.0 1.0

60°C

0.5 0.0 1.0

75°C

0.5 0.0 1.0

95°C

0.5 0.0 1

10

20

30

50

40

60

70

Posion

At3g05880 (C)

30 40

Flexible

GLN1 1.0

In vivo ATP deplete 0.5

Structured 0.0

20

1.0

In vitro

50

DMS signal

0.5 0.0 1.0

In vivo wild type

10

0.5 60

0.0 1.0

Denatured 1

0.5 0.0 1

20

40

60

80 0.0

Structured

(D)

Fold enrichment

70

80

Posion 1.0

Flexible

ATP deplete

(E)

>5 3–4 2–3

L26

1.5–2

Structured 4 3 2 1 0 –1 –2 –3 –4 –5 Flexible –6

PARS score

4–5

MRPS21

Posion relave to SNP –15 –13 –11 –9

Key:

–7 –5 –3

–1

1

3

5

7

9

11 13 15

Base Father PARS score

Mother PARS score Ti BS

Figure 4. Novel findings revealed by RNA structurome data. (A) Structure-seq revealed that the Arabidopsis mRNAs that showed greatest differences between their in silico versus in vivo constrained structures are enriched in annotations related to abiotic stress responses. A representative stress-related mRNA example, At3g05880, is shown. The in vivo and in silico structures were generated with or without experimental restraints derived from Structure-seq dimethyl sulfate sequencing (DMS) reactivity, respectively. RNAstructure [14] was used to visualize the RNA structures. (B) Comparison of the in vivo DMS signal (tall bars = flexible, short bars = structured) of yeast mRNA to the in vitro DMS signal from DMS-seq at different temperatures revealed that the in vivo DMS signal resembled the in vitro signal at a higher temperature, suggesting that the mRNAs were more unstructured in vivo. A representative mRNA example, RPL33A, is shown. (C) To investigate what cellular processes might be involved in reducing RNA structure in vivo, DMS-seq was performed in vivo under ATP depletion, decreasing the activity of ATP-dependent helicases. Under ATP depletion, the mRNAs were found to be more structured (lower DMS signal). A representative mRNA example, GLN1, is shown. (D) Mod-seq revealed the effect of protein on RNA structure. Nucleotides that are close or within the site of L26 protein binding have higher in vivo DMS modifications when L26 protein is absent. The color scale represents the -fold enrichment. L26 protein is shown in a transparent surface rendering. (E) PARS scores, indicative of the extent of RNA structure [positive parallel analysis of RNA structure (PARS) score signifying more structured], were found to be largely different for paternal and maternal alleles that differ by a SNP. The number 0 on the X-axis indicates the SNP position. A representative example, MRPS21, is shown. Adapted, with permission, from [59] (A), [60] (B,C), [61] (D), and [56] (E).

228

Review Using DMS-seq [60], the in vivo DMS signals of yeast mRNAs were compared to in vitro DMS signals at different temperatures. The comparison revealed that the in vivo DMS signal of regions of many mRNAs resembled the higher temperature in vitro signals, suggesting that the mRNAs were less structured or more dynamic in vivo [60] (Figure 4B). To relate this observation to energy-dependent processes, such as ATP-dependent RNA helicase unwinding of RNA in vivo, the authors performed DMSseq under ATP depletion conditions in yeast. They found that the mRNAs then became more structured in vivo (Figure 4C), implying that RNA helicases have pervasive effects on mRNA structures, with the important corollary that RNA thermodynamic rules alone do not dictate the RNA structure that prevails in vivo [60]. However, ATP depletion in cells also affects cellular processes, such as translation, transcription, and ion homeostasis [71], which can also affect RNA structure. To provide a mechanistic basis for the effect of ATP depletion on the RNA structurome, future experiments may focus on identifying how, and which, proteins and other cellular factors affect mRNA unfolding in living cells. Along the same line, RNAbinding proteins of interest can be knocked out in cells to investigate the effect of protein on RNA structure. Modseq was used to compare DMS reactivities with and without ribosomal protein L26 in yeast, and 58 nucleotides were identified that were more reactive to DMS in yeast with L26 protein deletion. Most of the nucleotides were clustered at the 5.8S–25S rRNA interface, which is consistent with the L26 protein binding site as revealed by ribosome structure (Figure 4D) [61]. This strategy can be applied in the future to the study of other RNA-binding proteins in vivo. Another notable finding is the prevalence of riboSNitches in a human parent–offspring trio [56]. The PARS scores (higher PARS score signifying more structure) were found to be largely different for some SNP present between the paternal and maternal alleles, with some causing RNA structural changes (Figure 4E). Altogether, 1907 out of 12 223 (15%) of single nucleotide variants (SNVs) were detected to result in large differences in PARS scores indicative of structure switches. Given that some SNPs are linked to gene regulation and disease, the fact that certain SNPs can cause a switch in RNA conformation suggests that RNA structure at riboSNitches has a causative link to gene regulation and disease. However, these studies were conducted in vitro under deproteinized conditions [56]. A future direction is to demonstrate how riboSNitches operate in vivo to regulate gene expression, and to assess the genotype–phenotype relationship of RNA structural differences arising from riboSNitches. Now that in vivo and transcriptome-wide structure probing tools are available, future efforts can also focus on interrogating the pervasiveness of structural archetypes (e.g., riboswitches and RNA thermometers, Figure 1) in vivo, to decipher their working mechanisms and demonstrate their functional roles in diverse organisms. Concluding remarks Development of transcriptome-wide structure probing of RNA has led to entirely new insights into RNA structure

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

and function. One future challenge is obtaining more detailed structural information on RNA, including tertiary structure. Besides DMS, other chemicals (e.g., CMCT, hydroxyl radicals, and SHAPE reagents) also modify RNA in vivo [4–6,32,72,73]. Given that these chemicals target different aspects of the RNA nucleotide, they complement each other and will facilitate more comprehensive RNA structure prediction. In addition, it will be exciting to see implementation of new chemical probes and strategies to elucidate tertiary structures in vivo. Recent benchmark tests suggest that differential SHAPE reactivities using 1M6 and NMIA can identify noncanonical and tertiary interactions in vitro and improve structure prediction accuracy [57,74,75]. Advances in prediction algorithms [35,76] also provide opportunities to explore complex RNA motifs including pseudoknots and G-quadruplexes. An ultimate goal is to accurately model 3D RNA structures using structural probing data, or even sequence alone. One current limitation of in vivo RNA structurome studies is that the low chemical reactivity of a given nucleotide can result either from RNA structure (base pairing) or from protection from cellular components such as proteins, other RNAs, or DNAs. One possibility to circumvent this problem is to use protein-centric methods, such as individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) [77] to identify the in vivo protein-binding sites on RNAs transcriptome-wide and at nucleotide resolution. This will help to delineate whether protein binding causes the low chemical reactivity, as assayed for one protein at a time. A future challenge is also to develop chemical probes that can be applied in vivo to specifically modify and/or cleave dsRNA regions. This will provide the complement to extant chemical probes, such as DMS and SHAPE, which target ss nucleotides (Table 1). Other NGS techniques are also available to dissect in vivo RNA–DNA and RNA–RNA interactions [78– 80]. At present, in vivo RNA structurome studies capture the averaged structure in the context of dynamic cellular processes. Dissecting and deconvoluting these ensemble structural measurements into their single-molecule states will require extensive experimental and computational efforts. An initial in vitro approach has been reported in RING-MaP, in which spectral clustering can in principle be used to generate structural profiles of individual RNA conformations [58]. In addition, structure probing over multiple in vivo snapshots followed by selective isolation of subsets of RNAs of interest, for example via biotinylated oligonucleotides or antibody pull down, followed by NGS, could facilitate elucidation of RNA dynamics and folding events. An in vitro demonstration was reported using ChemMod-seq, in which DMS/1M7 probing was performed on affinity-purified pre-rRNA at different maturation stages to study yeast ribosome assembly [54]. Likewise, reagents that can halt cellular processes, such as rifampicin for transcription and cycloheximide for translation, can be applied in vivo to investigate dynamic RNA processes (e.g., splicing, polyadenylation, RNA degradation, or assembly of ribonucleoproteins) by halting distinct steps of the RNA process of interest, followed by in vivo structure probing. For example, rifampicin was applied recently in 229

Review

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

Table 3. Computational programs to predict RNA structures using NGS data Program SeqFold SAVoR ShapeMapper/SuperFold StructureFold

Types of restraint a Soft and hard Hard Soft Soft and hard

Data used to date PARS, SHAPE-seq, Frag-seq PARS, Frag-seq, ds/ssRNA-seq SHAPE-MaP Structure-seq

Backend software/algorithm SeqFold RNAfold RNAstructure RNAstructure, ViennaRNA package

Refs [82] [81] [57] (Tang et al., unpublished)

a

‘Soft’ restraints use a pseudo-free energy term that adjusts the experimental thermodynamic free energy of individual nucleotides. ‘Hard’ restraints force a nucleotide to be either single stranded or double stranded.

conjunction with 1M7 to study bacterial rRNA assembly in vivo [6]. RNA structurome research is a rapidly evolving area and an additional topic for further discussion and development is the statistical methods applied for analysis of the large RNA-probing data sets. To facilitate application of transcriptome-wide techniques by the field, computational tools that are both statistically rigorous and user friendly are essential. Several approaches have been or are being developed (Table 3) [57,81–82] (Y. Tang et al., unpublished). A concerted effort is needed to systematically analyze available and forthcoming large structure-probing data sets to converge on an optimal NGS-based structure analysis workflow. Initial efforts have highlighted the factors to be considered for transcriptome-wide RNA structure-probing studies. A rational and robust statistical model is required to derive structural reactivity from RNA structure profiling data in a way that fully accounts for background noise, local bias of signals, and RNA– protein and RNA–RNA interactions in vivo [69]. There are also various challenges in the computational analysis of RNA structural profiling data generated by NGS, including the issue of the extent of coverage required and the method for derivation of structural reactivity. Higher transcriptome coverage leads to greater confidence in the derived structure information, but is more expensive. At present, there is no consensus on a quantitative criterion for the coverage. Second, the three in vivo methods [59–61] Box 2. Outstanding questions  What additional chemical probes and strategies can be designed to investigate in vivo RNA tertiary structure and noncanonical base pairs?  What are the best approaches toward the design of molecular probes and strategies that can specifically probe ds regions of RNA, and decipher RNA–protein, RNA–RNA, and RNA–DNA interactions in vivo?  Is there an optimal interface to synergize molecular dynamicsbased structure prediction methods with probing-assisted thermodynamics-based methods to generate 3D models of RNAs on a transcriptomic scale?  What are the best strategies to deconvolute the ensemble measurements of a population of identical RNA molecules into the multiple structures of each transcript that may prevail simultaneously?  How does RNA structure change in different cellular environments, such as those imposed by stress, protein, ligand binding, and mutation, to regulate gene expression?  How can we best uncover the mechanistic links between RNA structure and genetic and infectious diseases?  To analyze large structure-probing data sets for improved structure prediction, are there optimal computational approach parameters such as normalization of read counts and structural reactivity derivation, and will the field converge upon one universal approach? 230

use different ways to calculate structural reactivities from raw sequencing data. The challenge remains to develop an approach to derive structural reactivity that utilizes the experimental data to the greatest possible extent while remaining nonsusceptible to data noise and inhomogeneous read depth [69]. Moreover, while the derived reactivties will ultimately be used to predict RNA structures, validation of these predicted structures is still challenging because the structures of most of the mRNAs and noncoding RNAs remain unknown. It has been just 5 years since the first in vitro integration of RNA structure with NGS. The study of RNA structure has entered into a transcriptomic era, in which RNA structural data generated from NGS have far exceeded the past decades of combined data from electrophoresis-based approaches and have extended to cover many branches of the tree of life. Nevertheless, our understanding of in vivo RNA structure is still rudimentary and much remains to be done. Over the next 5 years, we foresee development of new suites of innovative methodologies and resultant novel biological discoveries (Box 2). Collectively, the development of new molecular probes and experimental strategies, exploitation of in vivo RNA structural information, and advances in computational tools will provide a giant leap toward the construction of a complete RNA ontology, which was proposed almost a decade ago [83], the goals of which are to create and integrate comprehensive RNA sequence and structural databases and to design advanced and userfriendly computational tools for wet-lab scientists. We anticipate that NGS-based RNA structure probing will bring us closer to achieving this utopian goal. Acknowledgments We would like to acknowledge the support of NSF IOS1339282 to S.M.A and P.C.B, and the Croucher Foundation for fellowship support to C.K.K. We would like to thank Jamie Bingaman for help with Table 1. We thank Howard Chang, Joel McManus, and Jonathan Weissman for permission to reproduce portions of their work. We apologize to colleagues whose work was not cited due to space limitations.

References 1 Sharp, P.A. (2009) The centrality of RNA. Cell 136, 577–580 2 Wan, Y. et al. (2011) Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12, 641–655 3 Mortimer, S.A. et al. (2014) Insights into RNA structure and function from genome-wide studies. Nat. Rev. Genet. 15, 469–479 4 Kwok, C.K. et al. (2013) Determination of in vivo RNA structure in lowabundance transcripts. Nat. Commun. 4, 2971 5 Tyrrell, J. et al. (2013) The cellular environment stabilizes adenine riboswitch RNA structure. Biochemistry 52, 8777–8785 6 McGinnis, J.L. and Weeks, K.M. (2014) Ribosome RNA assembly intermediates visualized in living cells. Biochemistry 53, 3237–3247 7 Blount, K.F. and Uhlenbeck, O.C. (2005) The structure-function dilemma of the hammerhead ribozyme. Annu. Rev. Biophys. Biomol. Struct. 34, 415–440

Review 8 Yajima, R. et al. (2007) A conformationally restricted guanosine analog reveals the catalytic relevance of three structures of an RNA enzyme. Chem. Biol. 14, 23–30 9 Serganov, A. and Patel, D.J. (2007) Ribozymes, riboswitches and beyond: regulation of gene expression without proteins. Nat. Rev. Genet. 8, 776–790 10 Gutell, R.R. (1993) Comparative studies of RNA: inferring higher-order structure from patterns of sequence variation. Curr. Opin. Struct. Biol. 3, 313–322 11 Gutell, R.R. et al. (1994) Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol. Rev. 58, 10–26 12 Cannone, J. et al. (2002) The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinform. 3, 2 13 Gutell, R.R. et al. (2002) The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. 12, 301–310 14 Reuter, J. and Mathews, D. (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinform. 11, 129 15 Turner, D.H. and Mathews, D.H. (2010) NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 38, D280–D282 16 Freier, S.M. et al. (1986) Improved free-energy parameters for predictions of RNA duplex stability. Proc. Natl. Acad. Sci. U.S.A. 83, 9373–9377 17 Jaeger, J.A. et al. (1989) Improved predictions of secondary structures for RNA. Proc. Natl. Acad. Sci. U.S.A. 86, 7706–7710 18 Mathews, D.H. et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940 19 Mathews, D.H. (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10, 1178–1190 20 Mathews, D.H. et al. (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. U.S.A. 101, 7287–7292 21 Xu, Z. and Mathews, D.H. (2011) Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences. Bioinformatics 27, 626–632 22 Harmanci, A. et al. (2011) TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinform. 12, 108 23 Zuker, M. and Stiegler, P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 24 Ehresmann, C. et al. (1987) Probing the structure of RNAs in solution. Nucleic Acids Res. 15, 9109–9128 25 Weeks, K.M. (2010) Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295–304 26 Aultman, K.S. and Chang, S.H. (1982) Partial P1 nuclease digestion as a probe of tRNA structure. Eur. J. Biochem. 124, 471–476 27 Guerrier-Takada, C. and Altman, S. (1984) Structure in solution of M1 RNA, the catalytic subunit of ribonuclease P from Escherichia coli. Biochemistry 23, 6327–6334 28 Woese, C.R. et al. (1980) Secondary structure model for bacterial 16S ribosomal RNA: phylogenetic, enzymatic and chemical evidence. Nucleic Acids Res. 8, 2275–2293 29 Wells, S.E. et al. (2000) Use of dimethyl sulfate to probe RNA structure in vivo. Method Enzymol. 318, 479–493 30 Wilkinson, K.A. et al. (2006) Selective 20 -hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat. Protoc. 1, 1610–1616 31 Zaug, A.J. and Cech, T.R. (1995) Analysis of the structure of Tetrahymena nuclear RNAs in vivo: telomerase RNA, the selfsplicing rRNA intron, and U2 snRNA. RNA 1, 363–374 32 Spitale, R.C. et al. (2013) RNA SHAPE analysis in living cells. Nat. Chem. Biol. 9, 18–20 33 Deigan, K.E. et al. (2009) Accurate SHAPE-directed RNA structure determination. Proc. Natl. Acad. Sci. U.S.A. 106, 97–102 34 Cordero, P. et al. (2012) Quantitative dimethyl sulfate mapping for automated RNA secondary structure inference. Biochemistry 51, 7037– 7039

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

35 Hajdin, C.E. et al. (2013) Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc. Natl. Acad. Sci. U.S.A. 110, 5498–5503 36 Zarringhalam, K. et al. (2012) Integrating chemical footprinting data into RNA secondary structure prediction. PLoS ONE 7, e45160 37 Quarrier, S. et al. (2010) Evaluation of the information content of RNA structure mapping data for secondary structure prediction. RNA 16, 1108–1117 38 Washietl, S. et al. (2012) RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res. 40, 4261–4272 39 Eddy, S.R. (2014) Computational analysis of conserved RNA secondary structure in transcriptomes and genomes. Annu. Rev. Biophys. 43, 433–456 40 Vasa, S.M. et al. (2008) ShapeFinder: a software system for highthroughput quantitative analysis of nucleic acid reactivity information resolved by capillary electrophoresis. RNA 14, 1979–1990 41 Westhof, E. and Romby, P. (2010) The RNA structurome: highthroughput probing. Nat. Meth. 7, 965–967 42 Kertesz, M. et al. (2010) Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 43 Underwood, J.G. et al. (2010) FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat. Meth. 7, 995–1001 44 Zheng, Q. et al. (2010) Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis. PLoS Genet. 6, e1001141 45 Li, F. et al. (2012) Global analysis of RNA secondary structure in two metazoans. Cell Rep. 1, 69–82 46 Li, F. et al. (2012) Regulatory impact of RNA secondary structure across the Arabidopsis transcriptome. Plant Cell 24, 4346–4359 47 Wan, Y. et al. (2013) Genome-wide mapping of RNA structure using nuclease digestion and high-throughput sequencing. Nat. Protoc. 8, 849–869 48 Wan, Y. et al. (2012) Genome-wide measurement of RNA folding energies. Mol. Cell 48, 169–181 49 Lucks, J.B. et al. (2011) Multiplexed RNA structure characterization with selective 20 -hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). Proc. Natl. Acad. Sci. U.S.A. 108, 11063– 11068 50 Aviran, S. et al. (2011) Modeling and automation of sequencing-based characterization of RNA structure. Proc. Natl. Acad. Sci. U.S.A. 108, 11069–11074 51 Loughrey, D. et al. (2014) SHAPE-Seq 2.0: systematic optimization and extension of high-throughput chemical probing of RNA secondary structure with next generation sequencing. Nucleic Acids Res. 42, e165 52 Seetin, M.G. et al. (2014) Massively parallel RNA chemical mapping with a reduced bias MAP-seq protocol. Methods Mol. Biol. 1086, 95–117 53 Kielpinski, L.J. and Vinther, J. (2014) Massive parallel-sequencingbased hydroxyl radical probing of RNA accessibility. Nucleic Acids Res. 42, e70 54 Hector, R.D. et al. (2014) Snapshots of pre-rRNA structural flexibility reveal eukaryotic 40S assembly dynamics at nucleotide resolution. Nucleic Acids Res. 42, 12138–12154 55 Incarnato, D. et al. (2014) Genome-wide profiling of mouse RNA secondary structures reveals key features of the mammalian transcriptome. Genome Biol. 15, 491 56 Wan, Y. et al. (2014) Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 57 Siegfried, N.A. et al. (2014) RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Meth. 11, 959–965 58 Homan, P.J. et al. (2014) Single-molecule correlated chemical probing of RNA. Proc. Natl. Acad. Sci. U.S.A. 111, 13858–13863 59 Ding, Y. et al. (2014) In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 60 Rouskin, S. et al. (2014) Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 61 Talkish, J. et al. (2014) Mod-seq: high-throughput sequencing for chemical probing of RNA structure. RNA 20, 713–720 62 Walker, G.C. et al. (1975) T4-induced RNA ligase joins single-stranded oligoribonucleotides. Proc. Natl. Acad. Sci. U.S.A. 72, 122–126 63 England, T.E. and Uhlenbeck, O.C. (1978) Enzymic oligoribonucleotide synthesis with T4 RNA ligase. Biochemistry 17, 2069–2076 231

Review 64 Jayaprakash, A.D. et al. (2011) Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 39, e141 65 Ingolia, N.T. et al. (2012) The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosomeprotected mRNA fragments. Nat. Protoc. 7, 1534–1550 66 Gansauge, M.T. and Meyer, M. (2013) Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8, 737–748 67 Kwok, C.K. et al. (2013) A hybridization-based approach for quantitative and low-bias single-stranded DNA ligation. Anal. Biochem. 435, 181–186 68 Low, J.T. and Weeks, K.M. (2010) SHAPE-directed RNA secondary structure prediction. Methods 52, 150–158 69 Aviran, S. and Pachter, L. (2014) Rational experiment design for sequencing-based RNA structure mapping. RNA 20, 1864–1877 70 Shabalina, S.A. et al. (2006) A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res. 34, 2428–2437 71 Jennings, M.L. and Cui, J. (2008) Chloride homeostasis in Saccharomyces cerevisiae: high affinity influx, V-ATPase-dependent sequestration, and identification of a candidate Cl- sensor. J. Gen. Physiol. 131, 379–391 72 Harris, K.A. et al. (1995) In-vivo structural-analysis of spliced leader RNAs in Trypanosoma brucei and Leptomonas collosoma: a flexible structure that is independent of Cap4 methylations. RNA 1, 351–362

232

Trends in Biochemical Sciences April 2015, Vol. 40, No. 4

73 Adilakshmi, T. et al. (2009) Structural analysis of RNA in living cells by in vivo synchrotron X-ray footprinting. Methods Enzymol. 468, 239–258 74 Steen, K.A. et al. (2012) Fingerprinting noncanonical and tertiary RNA structures by differential SHAPE reactivity. J. Am. Chem. Soc. 134, 13160–13163 75 Rice, G.M. et al. (2014) RNA secondary structure modeling at consistent high accuracy using differential SHAPE. RNA 20, 846–854 76 Lorenz, R. et al. (2012) RNA folding algorithms with G-Quadruplexes. Adv. Bioinform. Comput. Biol. 7409, 49–60 77 Huppertz, I. et al. (2014) iCLIP: Protein-RNA interactions at nucleotide resolution. Methods 65, 274–287 78 Helwak, A. et al. (2013) Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654–665 79 Quinn, J.J. et al. (2014) Revealing long noncoding RNA architecture and functions using domain-specific chromatin isolation by RNA purification. Nat. Biotechnol. 32, 933–940 80 Engreitz, J.M. et al. (2014) RNA-RNA interactions enable specific targeting of noncoding RNAs to nascent Pre-mRNAs and chromatin sites. Cell 159, 188–199 81 Li, F. et al. (2012) SAVoR: a server for sequencing annotation and visualization of RNA structures. Nucleic Acids Res. 40, W59–W64 82 Ouyang, Z. et al. (2013) SeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data. Genome Res. 23, 377–387 83 Leontis, N.B. et al. (2006) The RNA Ontology Consortium: an open invitation to the RNA community. RNA 12, 533–541

Suggest Documents