Published online 10 October 2014
Nucleic Acids Research, 2014, Vol. 42, No. 21 e165 doi: 10.1093/nar/gku909
SHAPE-Seq 2.0: systematic optimization and extension of high-throughput chemical probing of RNA secondary structure with next generation sequencing David Loughrey† , Kyle E. Watters† , Alexander H. Settle and Julius B. Lucks* School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14850, USA Received July 22, 2014; Revised August 29, 2014; Accepted September 21, 2014
ABSTRACT RNA structure is a primary determinant of its function, and methods that merge chemical probing with next generation sequencing have created breakthroughs in the throughput and scale of RNA structure characterization. However, little work has been done to examine the effects of library preparation and sequencing on the measured chemical probe reactivities that encode RNA structural information. Here, we present the first analysis and optimization of these effects for selective 2 -hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq). We first optimize SHAPE-Seq, and show that it provides highly reproducible reactivity data over a wide range of RNA structural contexts with no apparent biases. As part of this optimization, we present SHAPESeq v2.0, a ‘universal’ method that can obtain reactivity information for every nucleotide of an RNA without having to use or introduce a specific reverse transcriptase priming site within the RNA. We show that SHAPE-Seq v2.0 is highly reproducible, with reactivity data that can be used as constraints in RNA folding algorithms to predict structures on par with those generated using data from other SHAPE methods. We anticipate SHAPE-Seq v2.0 to be broadly applicable to understanding the RNA sequence–structure relationship at the heart of some of life’s most fundamental processes. INTRODUCTION RNAs play diverse functional roles in many natural cellular processes (1), and are being increasingly engineered to control these processes in many synthetic biology and biotechnology applications (2). This diverse function of * To †
RNA is intimately connected to its ability to fold into intricate structures. Recently, high-throughput techniques that combine nuclease digestion (3–5) or chemical probing (6–9) with next-generation sequencing have started to shed new light on the sequence/structure relationship of RNA. Because of the inherent multiplexing and enormous throughput offered by sequencing-based approaches, these techniques are providing some of the first ‘genome-wide’ snapshots of RNA structure (9,10)––effectively bringing RNA structural biology into the ‘omics’ era (11). Of these techniques, those that favor chemical probing over nuclease digests show the most promise because of the inherent versatility (12), higher resolution and in vivo accessibility (9–10,13–15) of many chemical probes. Several such techniques have been developed (selective 2 -hydroxyl acylation analyzed by primer extension sequencing (SHAPESeq) (6,7), DMS-Seq (9–10,13), MAP-Seq (8)) that each follow the same general protocol consisting of: (i) structuredependent modification of the RNA in vitro or in vivo; (ii) reverse transcription (RT) of the modified RNA into a cDNA pool whose length distribution reflects the location of modifications; (iii) sequencing library construction, involving the addition of platform-specific adapter sequences to the cDNA pool, optional amplification with polymerase chain reaction (PCR) and quality control assessment steps; (iv) sequencing of the library and (v) bioinformatic processing of sequencing reads and calculation of reactivity spectra for each RNA (Figure 1). While proving to be powerful, these sequencing-based techniques are complex and involve many more steps, including ligation and PCR, than approaches that use electrophoretic analysis (Supplementary Figure S1) (16–18). Very little work has been done to evaluate the impact of these extra steps. In this work, we systematically analyze and optimize SHAPE-Seq, and present a new version, SHAPE-Seq v2.0, that can obtain reactivity information for every nucleotide of an RNA without requiring an internal RT priming site. We start by analyzing SHAPE-Seq in the context of a panel
whom correspondence should be addressed. Tel: +1 607 255 3601; Fax: +1 607 255 9166; Email:
[email protected]
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
e165 Nucleic Acids Research, 2014, Vol. 42, No. 21
PAGE 2 OF 10
quiring a specific RT priming site to be present on the target RNAs. SHAPE-Seq v2.0 significantly expands the capability of SHAPE-Seq by allowing it to be performed through a ‘kit’-like protocol independent of the RNAs studied. We also show that SHAPE-Seq v2.0 reactivity data can be readily incorporated into RNA structure prediction algorithms to give experimentally constrained predicted folds that are highly accurate, and on par with traditional SHAPE constrained folding (20).
Legend 5′
RNA DNA Illumina Adapters: Illumina RD1 (+) Illumina RD1 (-)
3′
Input RNA Treated (+)
= SHAPE Reagent Illumina RD2 Illumina P7 Illumina P5
Untreated (-) (DMSO)
RNA Modification 5′
3′
5′ P-
3′-C3
Reverse Transcription
RNA Hydrolysis & Adapter Ligation
Output (+) ssDNA 3′
3′ 3′
5′ P-
5′
3′-C3
5′ 5′
RNA preparation 3′ 3′
Output (-) dsDNA 3′ 5′
3′
Illumina Sequencing & Bioinformatic Analysis (Spats) 0.2
Theta
MATERIALS AND METHODS
3′
PCR
Output (+) dsDNA 5′
3′
Output (-) ssDNA
5′
5′ 5′
5′
0.1
0
Nucleotide Position (5′ to 3′)
Single Nucleotide Resolution Reactivity Data
Figure 1. The basic SHAPE-Seq protocol. In SHAPE-Seq (6,7), RNAs are modified with chemical probes, such as 1M7 (24), or any probe that covalently modifies the RNA in a structure-dependent fashion (12). RT of the RNAs creates a pool of cDNAs, whose length distribution reflects the distribution of modification positions. Control reactions are performed to account for RT fall-off at unmodified positions. RT primer tails contain a portion of one of the required Illumina sequencing adapters, while the other is added to the 3 end of each cDNA through a single-stranded DNA ligation. A limited number of PCR cycles are used to both amplify the library and add the rest of the required adapters prior to sequencing. A freely available bioinformatic pipeline Spats (7,26–27) is then used to align sequencing reads, correct for biases due to RT-based signal decay (26,27) and calculate reactivity spectra for each RNA. See Supplementary Figures S2–S4 for protocol details.
of RNAs used in previous benchmarking of chemical probing techniques (19–21). Specifically, we systematically investigate steps of the SHAPE-Seq protocol that differ from more traditional methods that could affect measured reactivity data, including sequence context effects of adapter ligation, adapter ligation conditions, RT primer modifications and PCR. During this process, we optimize several of these steps, and show that they do not appear to be a source of differences between SHAPE-Seq and electrophoresis-based SHAPE analysis. We also show that SHAPE-Seq is highly reproducible, and report replicate reactivity spectra for each RNA in the panel. Finally, we present SHAPE-Seq v2.0 and show that it recapitulates SHAPE-Seq reactivity spectra, but without re-
For SHAPE-Seq v1.0, RNAs were generated through in vitro transcription reactions with T7 RNA polymerase. DNA templates consisted of a preceding 17-nucleotide T7 promoter, an optional 14-nucleotide 5 structure cassette sequence (22), the desired RNA coding sequence and an optional 43-nucleotide 3 structure cassette sequence (7,22) (Supplementary Table S1). DNA templates were generated by PCR [1 ml; containing 20 mM Tris (pH 8.4), 50 mM KCl, 2.5 mM MgCl2 , 200 M each dNTP, 500 nM each forward and reverse primer, 5 pM template and 0.025 U/l Taq polymerase; denaturation at 94◦ C, 45 s; annealing 55◦ C, 30 s and elongation 72◦ C, 1 min; 34 cycles]. The PCR product was recovered by ethanol precipitation and resuspended in 150 l of TE [10 mM Tris (pH 8.0), 1 mM ethylenediaminetetraacetic acid (EDTA)]. Transcription reactions (1.0 ml, 37◦ C, 12–14 h) contained 40 mM Tris (pH 8.0), 20 mM MgCl2 , 10 mM DTT, 2 mM spermidine, 0.01% (vol/vol) Triton X-100, 5 mM each NTP, 50 l of PCRgenerated template, 0.04 U/l SuperaseIN RNase Inhibitor (Ambion) and 0.1 mg/ml of T7 RNA polymerase. The RNA products were purified by denaturing polyacrylamide gel electrophoresis (8% polyacrylamide, 7 M urea, 35 W, 3 h), excised from the gel using an appropriately placed sacrifice lane for ultraviolet shadowing and recovered by passive elution and ethanol precipitation. The purified RNA was resuspended in 50 l TE, and concentrations were measured with the Qubit fluorimeter. All of the RNAs with the flanking structure cassettes contained a unique four-nucleotide bar code to multiplex SHAPE-Seq v1.0 experiments as described previously (7) (Supplementary Table S1). In general, QuSHAPE and SHAPE-Seq v1.0 experiments were performed on RNAs containing the optional structure cassettes, while SHAPE-Seq v2.0 experiments were performed on RNAs without these cassettes (Supplementary Table S1). For SHAPE-Seq v2.0 RNAs, each RNA sequence was altered to begin with GG, and cloned between the T7 RNA promoter and Hepatitis ␦ (HepD) antigenomic ribozyme and PCR amplified. The resulting templates were transcribed in vitro using T7 RNA polymerase as above, and purified by standard gel excision methods (23). For the HepC IRES and cyclic-di-GMP Riboswitch, standard runoff transcription was performed without the ribozyme to obtain higher yields. A table of all RNA sequences used in this study can be found in Supplementary Table S1.
PAGE 3 OF 10
RNA modification For the initial benchmarking and optimization studies, all RNAs were folded and modified individually with 1methyl-7-nitroisatoic anhydride (1M7) (6.5 mM, final) in batches (24). The RNAs (50 pmol in 60 l) were denatured by heating at 95◦ C for 2 min, and snap-cooled on ice for 60 s before the addition of 30 l of folding buffer. The sample was refolded in the 1× folding buffer (10 mM MgCl2 , 100 mM NaCl and 100 mM HEPES (pH 8.0)), in a total volume of 90 l for 20 min at 37◦ C. These reaction volumes were split and added to 5 l 1M7 (10×, 65 mM in dry dimethyl sulfoxide (DMSO)) and 5 l dry DMSO to form the (+) and (–) reactions, respectively. Reactions were complete after 70 s at 37◦ C. RNAs were recovered by the addition of 50 l of water, 10 l of 3M NaOAc, 1 l of 20 mg/ml glycogen and 300 l of 100% ethanol, followed by incubation at –80◦ C for 30 min and centrifugation (15k rpm) at 4◦ C for 30 min. Multiple batches of the same RNAs were combined into an overall stock of 700 pmol RNA (350 pmol unmodified (+), 350 pmol modified (–), each in 70 l TE), and stored at – 20◦ C until use (