bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1 2
The transcriptome-wide landscape and modalities of EJC
3
binding in adult Drosophila
4 5 6
Ales Obrdlik1,*, Gen Lin1, Nejc Haberman2, Jernej Ule2,3, and Anne Ephrussi1,*
7 8
1
9 10 11 12 13 14
2
European Molecular Biology Laboratory, Heidelberg, 69117, Germany
Department for Neuromuscular Diseases UCL Institute of Neurology, London WC1N 3BG, United Kingdom 3
The Francis Crick Institute, London NW1 1AT, United Kingdom
* corresponding authors:
[email protected],
[email protected]
15 16 17
KEYWORDS: Exon Junction Complex, Drosophila EJC, ipaRt, organismal
18
interactome, DSP crosslinking, EJC-ipaRt, cytoplasm, EJC assembly
19
regulation
20 21
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
22
ABSTRACT
23 24
Splicing-dependent assembly of the exon junction complex (EJC) at canonical
25
sites -20 to -24 nucleotides upstream of exon-exon junctions in mRNAs occurs
26
in all higher eukaryotes and affects most major regulatory events in the life of a
27
transcript. In mammalian cell cytoplasm, EJC is essential for efficient RNA
28
surveillance, while in Drosophila the most essential cytoplasmic EJC function
29
is in localization of oskar mRNA. Here we developed a method for isolation of
30
protein complexes and associated RNA-targets (ipaRt), which provides a
31
transcriptome-wide view of RNA binding sites of the fully assembled EJC in
32
adult Drosophila. We find that EJC binds at canonical positions, with highest
33
occupancy on mRNAs from genes comprising multiple splice sites and long
34
introns. Moreover, the occupancy is highest at junctions adjacent to strong
35
splice sites, CG-rich hexamers and RNA structures. These modalities have not
36
been identified by previous studies in mammals, where more binding was seen
37
at non-canonical positions. The most highly occupied transcripts in Drosophila
38
have increased tendency to be maternally localized, and are more likely to
39
derive from genes involved in differentiation or development. Taken together,
40
we identify the RNA modalities that specify EJC assembly in Drosophila on a
41
biologically coherent set of transcripts.
42 43
2
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
44
INTRODUCTION
45 46
The exon junction complex (EJC) consists of a hetero-tetramer core
47
composed of eIF4AIII, Mago, Y14 and Barentsz (Btz) (1,2), and auxiliary factors
48
that form the EJC periphery (3). The complex assembles on mRNAs during
49
splicing, -20 to -24 nucleotides (nt) upstream of exon-exon junctions (4). EJC
50
assembly is a multi-step process that begins with the CWC22-mediated
51
deposition of the DEAD-box helicase eIF4AIII on nascent pre-mRNAs (5-7) and
52
is followed by the recruitment of Mago and Y14, forming a pre-EJC
53
intermediate. The pre-EJC is stably bound to RNA due to the ATPase inhibiting
54
activity of the (non-RNA-binding) Mago-Y14 heterodimer, which “locks” eIF4AIII
55
helicase in its RNA-bound state (1,2,8-10). Once formed, the pre-EJC is
56
completed by recruitment of Barentsz (Btz), forming mature EJCs (1,3,11). The
57
roles of the EJC in post-transcriptional control of gene expression are manifold.
58
In the nucleus, EJC subunits have been shown to play a role in splicing (12-
59
15), mRNA export (16), and nuclear retention of intron-containing RNAs (17).
60
In the cytoplasm, the EJC is reported to play a role in translation (18,19),
61
nonsense-mediated decay (NMD) (7,20-26) and RNA localization (23,27-30).
62
While most of the EJC’s functions appear conserved, in Drosophila the EJC is
63
not crucial for NMD (31), but it is essential for oskar mRNA transport and
64
localization within the developing oocyte (23,27-30,32,33). To better
65
understand the engagement of the EJC in the fly we set out to define the EJC
66
mRNA interactome in adult Drosophila melanogaster using a novel strategy
67
that stabilizes mRNA binding proteins (mRBPs) associated with their RNA
68
templates within multi-protein mRNP assemblies. This method through the use
3
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
69
of the crosslinking agent dithio(bis-) succinimidylpropionate (DSP) captures
70
stable and transient protein interactions in close proximity (34,35) and allows
71
definition of the binding sites of specific protein (holo-) complexes associated
72
with their RNA templates (ipaRt). Indeed, our analysis of EJC protected sites
73
defined by ipaRt reveals that, in Drosophila, EJC binding occurs at canonical
74
deposition sites (4), with a median coordinate ~22 nt upstream of exon-exon
75
junctions. While in mammals EJC mediated protection outside canonical sides
76
has been reported (36,37), we find that in Drosophila the degree of non-
77
canonical EJC mediated RNA protection is minimal. We show in Drosophila
78
that RNA polymerase II transcripts primarily protected by the EJC derive from
79
genes involved in differentiation or development, while mRNAs primarily
80
protected by mRBPs derive from genes with homeostatic functions. Our
81
analysis suggests that the EJC’s bias for transcripts is in Drosophila a
82
consequence of several modalities in the genes’ architecture, particularly the
83
number of splice sites and intron length. Moreover, EJC binding is enhanced
84
by adjacent RNA secondary structures and CUG-rich hexamer sequences
85
located 3’ to the EJC binding site. These modalities have not been identified in
86
previous studies of mammalian EJC binding sites (36-38), which could either
87
reflect the greater specificity of our method for the fully assembled EJC, or could
88
reflect differences in EJC binding between flies and human. Our study provides
89
a first comprehensive transcriptome-wide view of EJC-RNA interactions in a
90
whole organism, which unravels RNA modalities that contribute to the
91
unforeseen biological coherence of the bound transcripts.
92 93
RESULTS
4
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
94 95
Stabilization of the Exon Junction Complex on mRNAs by DSP
96
The EJC is maintained in its RNA-bound state through the direct
97
interaction of the Mago-Y14 hetero-dimer with the otherwise dynamically
98
binding RNA helicase eIF4AIII (1-3,9,22,39). EJC binding to RNA is labile, as
99
under stringent washing conditions (~ 1M salt concentrations) the interaction of
100
eIF4AIII and Mago-Y14 is abolished and the RNA is released from the complex
101
(36). We therefore hypothesized that the EJC might be stabilized on its RNA
102
targets by introducing covalent bonds between the Mago-Y14 heterodimer and
103
eIF4AIII, which would render the protein-RNA complex resistant to the high salt
104
concentrations commonly used in iCLIP studies. Furthermore, a stabilized EJC
105
complex would enable us to “pull” on EJC subunits other than the RNA-binding
106
eIF4AIII, ensuring isolation of the complex under stringent conditions. To test
107
this
108
succinimidylpropionate (DSP), which crosslinks reversibly primary amino
109
groups of polypeptides in close proximity (34,35). We isolated poly(A)
110
containing mRNPs on an oligo d(T)25 resin (40,41) from cytoplasm of adult
111
Drosophila either kept untreated or treated with UV, DSP, or UV plus DSP
112
(Figure 1). SDS-PAGE silver staining and western analyses of mRNA-RNP
113
precipitates (Figure 1A,B) revealed that irradiation of cytoplasmic lysates by UV
114
ex vivo only marginally increased co-precipitation of proteins with poly(A)
115
containing RNAs (Figure 1A, lane 7 and 8). Upon UV irradiation, only faint
116
signals of known RBPs such as eIF4AIII and cytoplasmic poly(A) binding
117
protein (PABP) were detected in the poly(A) RNA precipitates. Non RNA-
118
binding EJC subunits such as Y14 were not detected (Figure 1B, lane 7 and 8,
we
made
use
of
the
bivalent
crosslinking
agent
dithio(bis-)
5
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
119
Figure 1C) in agreement with previous observations (41). In contrast, treatment
120
of cytoplasmic lysates with DSP led to strong protein co-precipitation with
121
mRNAs (Figure 1A, compare lanes 7-10). Western analysis of precipitates from
122
DSP and UV-DSP treated cytoplasmic lysates revealed strong signals not only
123
for direct mRNA binding proteins such as eIF4AIII and PABP, but also mRNP
124
components not directly bound to RNA, such as Y14 (compare Figure 1A-B,
125
lanes 7-10, Figure 1C). In none of the precipitates were cytoplasmic proteins
126
such as kinesin heavy chain (Khc) or the small ribosomal subunit protein RpS6
127
observed (Figure 1B, lanes 7-10, Figure 1C), confirming the stringency of the
128
assay. Furthermore, precipitates from the “beads” only control were free of all
129
proteins tested (Figure 1A, lane 6), indicating that artifacts due to DSP or UV
130
derived treatment are unlikely. These observations suggest that, for EJC
131
stabilization on RNA, DSP-mediated covalent bond formation between
132
individual EJC subunits is superior to UV-crosslinking and support the use of
133
this agent when studying other mRNP assemblies (Figure 1D).
134 135
ipaRt: a novel approach for high quality isolation of EJC complexes
136
associated to RNA templates
137
Of all the tagged EJC subunits we tested in the fly, GFP-Mago showed
138
the highest degree of incorporation into endogenous EJCs (Figure S1B).
139
Furthermore, the eIF4AIII subunit was additionally found to co-sediment with
140
polysome fractions in sucrose density gradients, independently of Mago-Y14
141
(Figure S1C), suggesting that the DEAD-box helicase might have yet unknown
142
EJC independent functions in the fly (see Supplementary Results). Therefore,
143
we carried out EJC specific RNA immunoprecipitation (RIP) from DSP treated
6
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
144
cytoplasmic extracts prepared from GFP-Mago and GFP-tag expressing flies.
145
By titrating salt and detergent concentrations, we identified stringent washing
146
conditions (see Materials and Methods) that yielded high quality RNA profiles
147
from GFP-Mago RIPs and only scant RNA profiles from GFP control RIPs,
148
compared with standard IP washing conditions (Figure 2A; compare lanes 2-
149
5). To test whether the presence of RNA in GFP-Mago precipitates was a
150
consequence of its incorporation into the EJC rather than by virtue of transient
151
interactions of Mago with other RBPs, we treated the DSP treated or untreated
152
lysates immunoprecipitates with RNAseI. Western analysis revealed the
153
presence of all tested EJC subunits in the GFP-Mago precipitates (Figure 2B,
154
lanes 7-8), but no protein other than GFP itself in the GFP only controls (Figure
155
2B, lanes 3-4). In contrast we detected no signal for the proteins probed in the
156
“beads only” control precipitates (Figure 2B, lanes 2 and 6) and observed
157
signals for the RNA non-binding Khc only in lysate inputs (Figure 2B, compare
158
lanes 1, 5; lanes 2-4, 6-8), indicating high stringency of the assay.
159
The stabilizing effect of DSP on the EJC is evident from the enhanced
160
eIF4AIII signals in GFP-Mago precipitates when cytoplasm was treated with
161
DSP prior to immunoprecipitation (Figure 2B, lanes 5, 7-8). Conversely, the
162
PABP signal in the GFP-Mago precipitates disappeared upon incubation with
163
RNase of both the DSP treated and the untreated samples (Figure 2B, lanes 1,
164
5; lanes 2-4, 6-8). This shows that even when exposed to DSP, proteins whose
165
association with the EJC is bridged by RNA can be removed by RNA
166
fragmentation.
167
To confirm the “cleansing effect” of RNAseI, we perfomed IPs from DSP
168
treated cytoplasm with or without an RNA fragmentation step and analyzed the
7
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
169
precipitates by tandem mass spectrometry (MS). Expression set analysis of MS
170
signals obtained in GFP-Mago and GFP control precipitates identified 45
171
versus 35 significantly enriched proteins in the RNase untreated and treated
172
samples respectively (Figure 2C, Figure S3). While all EJC subunits were
173
enriched independently of RNA integrity (Figure 2C), only upon RNA
174
fragmentation was Btz enriched to a similar degree as Mago, Y14, and eIF4AIII.
175
Except for the poly(A) binding protein Nab2 (42), no EJC-unrelated RBPs
176
showed significant enrichment upon RNA fragmentation (Figure 2C), showing
177
that RNA fragmentation by RNaseI increases the sensitivity and specificity of
178
EJC IPs.
179
Based on these findings, we conclude that DSP is a suitable tool for
180
stablilization and isolation of EJC RNA complexes from animal tissues and that
181
our protocol provides a reliable approach for isolation of proteins (or protein
182
complexes) associated to their RNA targets. Consequently we termed our
183
experimental strategy “ipaRt”.
184 185
EJC binding in Drosophila cytoplasm maps to canonical deposition
186
sites
187
We sought to determine which sites in the Drosophila transcriptome are
188
protected by the EJC as opposed to other mRNA binding proteins (mRBPs).
189
To this end, we performed EJC ipaRt and oligo(dT) mediated mRNP capture in
190
parallel, both followed by an RNase digestion step (see Material and Methods),
191
and constructed cDNA libraries of the protein-protected RNA fragments as
192
previously described for iCLIP (43).
193
revealed that more than 92% of all reads aligned uniquely to the Drosophila
194
genome (Figure S3B). 85% of EJC ipaRt reads mapped to exons as opposed
Analysis of the sequencing results
8
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
195
to 34% in the mRBP footprinting mapped reads, indicating specificity of the
196
ipaRt library (Figure 3A).
197
To define the median binding coordinates of the protected sites, we
198
determined the sequence coverage +/- 50 nt of exon-exon junctions in EJC
199
ipaRt and mRNP footprinting (Figure 3B) and averaged the coverage profile
200
over all junctions. The mean coverage profile in mRBP footprinting appeared
201
evenly distributed, indicating an absence of protection bias (Figure 3B). In
202
contrast, the protected sites in EJC ipaRt were located almost exclusively in
203
upstream exons, with an EJC coverage median -21,7 nt 5’ to the exon-exon
204
junction (Figure 3B), consistent with previous studies (4,16,44). Similarly, we
205
estimated a 13nt long region of saturated RNA protection in the EJC RNA
206
footprints, from -27 to -15nt 5’ of the exon-exon junction (Figure S3D) (4,8,36).
207 208
Protection sites remote of exon-exon junctions in EJC ipaRt are not of
209
EJC origin
210
Studies of mammalian EJCs have reported a high frequency of EJC-
211
mediated protection outside of canonical binding sites (non-canonical EJC
212
deposition sites) (36,37). To test whether this non-canonical distribution is
213
representative of EJC protection across the Drosophila transcriptome, we
214
determined coverage maxima for every exon-exon junction protected by EJC.
215
The majority (~ 95.5%) of protection peaks in EJC ipaRt libraries mapped to
216
sites of canonical EJC binding, proximal to exon-exon junctions (Figure 3 C, D).
217
The remaining ipaRt protection coverage peaks were located remotely (>50nt)
218
of canonical EJC binding regions (Figure 3D,G and S4A).
9
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
219
Our analysis shows that proximal peaks map mainly to internal exons
220
(79%), to first exons (20.6%), and only minimally to terminal exons (0.4%), as
221
expected given the splicing-dependent deposition of EJCs upstream of splice
222
junctions. Remote protection site peaks mostly mapped to internal exons (67%)
223
and last exons (25%), and only minimally (8%) to first exons of the bound
224
mRNAs (Figure 3F).
225
Three main features characterized the remote peaks in our EJC ipaRt
226
libraries: low abundance, low sequencing coverage (Figure 3E,G) and relatively
227
low expression compared with proximal peaks (Figure S4C,D). Further analysis
228
using AME (45) revealed that the remote peaks in EJC ipaRt are significantly
229
enriched in RNA binding motifs corresponding to known Drosophila splicing
230
regulators (Figure S4E), whose binding might be a consequence of DSP
231
crosslinking and co-purification due to direct interaction with the EJC. Our
232
analysis shows that, in contrast to EJC in mammals (36,37), the proportion of
233
remote EJC binding sites is negligible in Drosophila.
234 235
EJCs mark multi-intron genes important for differentiation and
236
development
237
We determined which RNAs are bound preferentially by EJC versus
238
other RBPs using DESeq (46). This revealed a bias in EJC binding towards
239
mRNAs of protein coding genes comprising greater than 1 exon (Figure S5D)
240
and identified 3332 enriched and 4436 depleted genes (FDR < 0.05 and log2
241
fold change > log2(1.5)). Gene ontology (GO) term analysis of EJC enriched
242
genes (Figure 4A) revealed significant association with genes involved in
243
development or specialized cellular functions such as cell polarity,
10
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
244
differentiation and cell signaling. In contrast, genes with homeostatic functions,
245
involved in cytoskeletal and chromatin organization, in transcription or
246
translation, and metabolic processes are biased for protection by other RBPs
247
(Figure 4B, compare left and right plot). The bias of EJC binding to mRNAs
248
from genes with functions in development and cell polarity, suggested that
249
transcripts under spatial or temporal control might also show a preference for
250
EJC binding. Consistent with this hypothesis, analysis of EJC and RBP
251
protection sites on Drosophila transcripts annotated in the FlyFISH RNA
252
localization database (47,48) revealed that localized maternal mRNAs are more
253
likely to be bound by EJC than non-localizing transcripts (Figure 4C).
254
It was reported that in mammalian cells EJCs are enriched at mRNAs
255
from alternatively spliced genes (38). In the fly, the EJC has been reported to
256
promote correct splicing of long intron-containing genes (12,14). This
257
relationship between EJCs and gene architecture led us to ask which features
258
could explain the gene-to-gene variation in EJC deposition. We used a multiple
259
regression model (see Materials and Methods) to assess how five features
260
(number of introns, maximum intron length, transcript abundance, transcript
261
length and the degree of alternative splicing) influence gene deposition of EJC.
262
We checked that the effects estimated from the model holds given underlying
263
correlations of the features. For example, after accounting for intron number,
264
which has a strong effect on EJC binding (Figure S6A), we determined that
265
intron length has a strong effect on EJC deposition (Figure S6B), while
266
alternative splicing has only a minimal effect on EJC binding (Figure S6C). We
267
assessed the relative importance of each feature from the multivariate
268
regression analysis (Figure 4D). Two features dominated our model of EJC
11
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
269
deposition: the number of introns per gene and the maximum intron length of
270
the gene (Figure 4D) are positively associated with EJC deposition. The
271
enhancement of EJC deposition with intron length agrees with previous studies
272
showing that splicing of long intron-containing genes is sensitive to EJC activity
273
(12,14).
274 275
Splice site strength and hexamer composition influence EJC deposition
276
We estimated enrichment of EJC at the junction level using reads within
277
+/- 50 nt of the splice site and observed a strong dependency on the gene EJC
278
estimate (Pearson correlation = 0.66, p < 2e-16). This suggests that the
279
junction’s EJC profile is primarily determined by its parent gene architecture.
280
We next tested whether any exon-exon junction deviates significantly from its
281
parent gene EJC binding. ~31% of detected junctions have an enrichment that
282
deviates significantly from the gene level (Figure 5A). Furthermore we observed
283
that EJCs reads cover only a subset of junctions within a gene and show higher
284
read-coverage coefficient of variation than reads protected by other RBPs
285
(Figure S6D and S6E).
286
We asked if a known sequence context related to splicing might be
287
responsible for the variability in EJC binding to exon-exon junctions within a
288
gene. We found that strong 5’ and 3’ splice site signals, and presence of intronic
289
5’ ISE correlates with increased EJC deposition, while the presence of an
290
exonic 5’ ESS correlates with reduced EJC deposition (Table S1, Figure 7A),
291
consistent with the fact that EJC assembly is dependent on splicing.
292
Surprisingly, 5’ and 3’ ESEs and intronic ISEs at 3’ splice sites have no effect.
293
Next we tested the effects of unannotated hexamers, while accounting for ESS,
12
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
294
ISE and splice strength. We detected 63 hexamers that are associated with a
295
change in EJC deposition. By clustering the hexamers based on sequence
296
similarity, two major groups with either a negative effect or a positive effect on
297
EJC assembly emerged (Figure 5B). Strikingly, 5 out of the 16 hexamers
298
associated with increased EJC deposition contain the trinucleotide CUG, and
299
28 out of the 47 hexamers associated with decreased EJC deposition contain
300
the trinucleotide UUU (Figure 5B). The strongest effect of these CUG or UUU
301
trinucleotides is observed when they are present around the region
302
downstream of EJC binding (approximately -16 to -18nt). This indicates that the
303
sequence composition of this region is a strong determinant of EJC binding
304
(Figure 5C).
305 306
RNA structure modulates the degree and position of EJC binding in
307
Drosophila
308
Deposition of an EJC at the first exon-exon junction and presence of a
309
structured element next to the deposition site are required for localization of
310
oskar mRNA at the posterior pole of the Drosophila oocyte (30,49).
311
Interestingly, we observed a 2.38-fold stronger enrichment of the first oskar
312
exon-exon junction than anticipated from oskar mRNA enrichment in our data,
313
indicating that structures near EJC binding sites might affect EJC assembly.
314
To test whether RNA structures might have an impact on EJC binding,
315
we estimated for every junction the probability of base pairing for each
316
nucleotide -37 to +28 bp of the splice site. We observed three distinct average
317
base pairing probability (bpp) profiles for exon-exon junctions with unaffected,
318
positively correlated, or negatively correlated EJC binding (Figure 6A). Two
13
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
319
regions showed significantly different bpps for junctions with a positive versus
320
a negative effect on EJC binding (Figure 6B). The first region, located in the
321
canonical EJC binding site, showed a decreased bpp for junctions with a
322
positive EJC binding effect (Figure 6A,B) and increased bpp for junctions with
323
negative EJC binding effect (Figure 6A,B). Surprisingly the second region,
324
located directly downstream of the canonical deposition site (Figure 6A,B),
325
showed an elevated bpp near junctions with positive EJC binding, but
326
decreased bpp at junctions with negative EJC binding (Figure 6A). This result
327
indicates that while EJC binding in Drosophila occurs on single stranded RNA
328
(ssRNA), in agreement with previous reports (1,9), EJC binding to RNA may be
329
enhanced by RNA secondary structures proximal to the EJC binding site.
330
Given the redundant information between bpp of each nt pair, we
331
performed dimension reduction on bpp profiles within the -24 to -11 region
332
(Materials and Methods, Figure 6B) using a Gaussian mixture model, to
333
facilitate subsequent analysis of EJC binding. We obtained 4 folding categories
334
(Figure 6C) and observed an association between significant positive EJC
335
binding (log2fold-change >log2(1.5)) and junctions harboring folding categories
336
2 and 3, which contain a bpp elevation downstream of, or surrounding, EJC
337
binding sites (Figure 6C, D). Junctions with an unstructured profile (folding
338
category 1) show no such bias, and junctions with a bpp elevation in the EJC
339
binding site (category 4) are associated with a negative EJC binding effect. This
340
suggests that, when located near EJC deposition sites, RNA structures may
341
positively affect EJC binding in Drosophila (Figure 6C, 6D lanes 2 and 3) and
342
could explain the enhanced binding of EJC to the first exon-exon junction in
343
oskar mRNA.
14
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
344
Previous in vitro experiments have shown that RNA secondary
345
structures affect EJC assembly site coordinates (50). We asked whether
346
predicted stable RNA stem structures within or flanking a canonical EJC binding
347
site at exon-exon junctions have any impact on the precise coordinates of EJC
348
assembly in the fly. For this we estimated potential dsRNA content in region A,
349
spanning from -30 to -21 and for region B spanning from -23 to -14 within exon-
350
exon junctions (see sketch in Figure 6E). We focused on junctions with a
351
positive EJC binding effect to allow robust estimates based on junctions with
352
high coverage. The analysis revealed a strong downstream shift of EJC
353
assembly coordinates when dsRNA content in region A was high and low in
354
region B, and only a minor shift when the dsRNA content was low in region A
355
and high in region B (Figure 6F). Taken together, these observations confirm
356
that the structural context of exon-exon junctions not only affects EJC binding
357
efficiency, but also directs the site of EJC assembly.
358 359
Comparison of human and Drosophila datasets reveals common factors
360
that influence EJC enrichment.
361
Previous studies of the EJC in Drosophila and human have reported
362
differences between these two species in terms of EJC protein components
363
and cellular function. We asked whether any of the factors we identified in
364
Drosophila would also influence EJC deposition in human cell lines. Using the
365
tools and models described in previous sections, we analyzed CLIP enrichment
366
of the EJC component BTZ in HeLa cells (51). Results from the multiple
367
regression analysis showed that the number of introns is a major determinant
368
of the gene-to-gene variation in EJC deposition (Figure S8A). This agrees with
15
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
369
our finding in Drosophila and suggests that this mechanism is conserved
370
between Drosophila and humans. In contrast to our observations in the fly, we
371
found that in mammals the extent of alternative splicing in genes has a small
372
but positive effect on EJC deposition (Figure S8A), in agreement with previous
373
findings that alternatively spliced genes are over-represented in EJC enriched
374
genes (37,38). However, in contrast to Drosophila, in humans intron length did
375
not facilitate but rather antagonizes EJC mRNA binding (Figure S8A).
376
Next we investigated the factors that determine variability in EJC
377
deposition within genes (Table S2, Figure S8D). Similar to our Drosophila
378
findings, high 5’ and 3’ splice strength and presence of an ISE at the 5’ junction
379
enhances EJC deposition within a junction. In this analysis, presence of ESEs
380
in the upstream, and of ESS in the downstream 50nt strongly affects EJC
381
deposition, something we did not observe in Drosophila (Figure 7A). A possible
382
explanation for this is that ESE/ESS/ISE sequences are annotated based on
383
experiments in human cell lines and may not exert a similar effect in Drosophila.
384
Among the 238 ESEs, we observed 28 hexamers containing AGAA which is
385
similar to a motif (GAAGA) found in a previous EJC clip study (37,38). We
386
separated ESEs according to presence of AGAA, and indeed found that the
387
ESEs containing AGAA are associated with stronger EJC deposition. We asked
388
whether bpp-profiles differ between junctions with positive and negative EJC
389
binding. We found that an overall negative bpp (-24 to -18) around the EJC
390
deposition site favors EJC deposition (Figure S8B,C). Unlike Drosophila, we
391
did not find other regions whose bpp are associated with increased EJC
392
deposition in mammalian cells. Taken together, these results indicate that
393
across Drosophila and human, not only do conserved regulatory mechanisms
16
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
394
such as intron counts, splice strength, or structural hindrance within EJC
395
deposition sites influence EJC deposition, but also divergent regulatory factors
396
such as ESEs in mammals or RNA folding in proximity of EJC deposition sites
397
in Drosophila can affect EJC deposition.
398 399
Features that inform on EJC binding may predict mRNA localization
400
The bias of EJC binding to mRNAs from genes with functions in
401
development and cell polarity suggested that EJC binding might be indicative
402
of transcripts under spatial or temporal control. Consistent with this hypothesis,
403
analysis of EJC and RBP protection sites on Drosophila transcripts annotated
404
in the FlyFISH RNA localization database (47,48) revealed that localized
405
maternal mRNAs are more bound by EJC than non-localizing transcripts
406
(Figure 4C). We postulated that modalities underlying EJC enrichment at the
407
gene and junction level can inform us about the localization of a transcript and
408
applied decision tree learning on RNA localization using the R package rpart
409
(52). Given the imbalance between localized to non-localized dataset, we used
410
Cohen’s kappa coefficient to assess the predictive value of the model using
411
different data groups. The use of gene features and features of the most
412
enriched EJC junction is sufficient to achieve predictive accuracy comparable
413
to a model incorporating all possible features (Figure 8A). Using the gene and
414
highest EJC covered junction information as a working model, we asked which
415
variables are important in this model. To prevent bias in estimate, the data were
416
bootstrapped 1000 times. We observed that transcript length, maximum intron
417
size in the gene, and 5’ splice strength and folding of the most enriched junction
418
are more useful predictors (Figure 8B) compared to other variables. This
17
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
419
suggests that features of gene architecture that orchestrate EJC deposition can
420
distinguish localizing and non-localizing mRNA and might be important for
421
mRNA localization.
422 423 424
DISCUSSION
425 426
ipaRt: a method for high-confidence identification of protein binding
427
sites on RNAs in vivo
428
We have profiled the landscape of EJC binding across the transcriptome
429
of a whole animal, Drosophila melanogaster, and determined the parameters
430
that influence the distribution of the complex on RNAs in the organism. Previous
431
knowledge of EJC-RNA interactions was based on UV-crosslinking-based
432
experiments in specific cell types grown as homogeneous cultures for the
433
individual studies (37,38,43,53-60). While UV-crosslinking remains a method of
434
choice for identification of protein binding sites on nucleic acids, due to the
435
inefficient penetration of UV light into tissues and organisms, the method is
436
most useful when applied to cells in culture. In contrast, our analysis of EJC
437
distribution in the tissues of whole Drosophila flies was made possible by ipaRt,
438
which employs the crosslinking agent DSP to freeze protein-protein interactions
439
within otherwise dynamic RNP complexes, such as the EJC.
440
We have demonstrated that DSP-mediated covalent bond formation
441
between the RNA helicase eIF4AIII and the Mago-Y14 heterodimer preserves
442
EJCs in their “locked” state on mRNAs (1,2,8,9) and that efficient recovery of
443
the bound RNAs does not require their crosslinking to eIF4AIII using UV light.
18
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
444
Our “ipaRt” approach, like CLIP and iCLIP (53,56), enables highly stringent
445
washing of the samples. In support of the robustness and reliability of our DSP
446
based assays, we demonstrated high reproducibility not only among technical
447
but also biological replicates of EJC ipaRt, as well as mRBP footprinting
448
sequencing results (Figure S5A).
449
Furthermore, ipaRt allows enables use of non-RNA-binding subunits of
450
the EJC, such as Mago, as immunoprecipitation baits. This is highly relevant in
451
the context of studies of the EJC, as we and others have shown that its RNA-
452
binding subunit, the RNA helicase eIF4AIII, may have other, EJC-independent
453
functions in the cell (Figure S1C) (61,62). ipaRt afforded us the option of using
454
Mago as our EJC bait, and indeed this is a main reason for the high-quality
455
definition of the EJC binding landscape in the fly cytoplasm that we have
456
achieved. The protection site reads we obtained from EJC ipaRt map almost
457
exclusively to canonical EJC deposition sites (4) with a median protection ~22nt
458
of the upstream exon’s 3’end. In contrast to mammalian EJC CLIP and RIP
459
studies, in which eIF4AIII served as an immunoprecipitation bait (36,37), EJC
460
ipaRt sequencing reads mapping to regions distant from canonical deposition
461
sites are of low abundance and sequencing coverage. While this discrepancy
462
could reflect differences in EJC engagement in humans and Drosophila, it more
463
likely reflects the choice of bait or the cell compartment in which the analysis
464
was executed. Indeed, a recent study in human cell lines revealed that when
465
the cytoplasmic EJC component Btz was chosen as the immunoprecipitation
466
bait rather than eIF4AIII, the proportion of non-canonical EJC deposition sites
467
was negligible (38).
19
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
468
Finally, in ipaRt the DSP crosslinker is applied ex vivo during tissue
469
disruption, and does not require inhibition of translation in vivo. We therefore
470
consider ipaRt a method of choice for functional investigations of protein-RNA
471
complexes in fully developed organisms and organ tissues.
472 473
Regulation of EJC assembly in Drosophila
474
Through our analysis, we defined factors that contribute to or inhibit EJC
475
assembly on mRNAs and at individual exon-exon junctions in Drosophila. From
476
this we deduce that the landscape of EJC binding to RNAs is sculpted through
477
regulation of EJC assembly at two levels in the fly (Figure 7B).
478
At the upstream regulatory level, the degree to which EJCs are
479
assembled on an mRNA is dictated by the complexity of the gene’s architecture:
480
mRNAs produced from genes of simple architecture are marked by fewer EJCs,
481
while mRNAs from genes of complex architecture, comprising multiple splice
482
sites and long introns are EJC bound to a higher degree. Due to the EJC’s
483
assembly co-transcriptionally during splicing (4-7) it is not surprising that
484
mRNAs from genes containing a larger number of introns are more likely to be
485
bound by EJC. However, it is unprecedented that EJC assembly is enhanced
486
on mRNAs of genes with large introns (Figure 4D). Although the EJC has been
487
implicated to play a role in exon definition in the case of long-intron containing
488
genes in Drosophila (12,14), the fact we observe a preference of EJC binding
489
to the mRNAs, but not to the pre-mRNA splice junctions of long-intron
490
containing genes (Figure S6F) disqualifies this simple explanation. We
491
therefore propose that it is the number and the resting time of co-
492
transcriptionally assembled spliceosomes at active RNA pol II sites that
20
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
493
determines the likelihood of CWC22-dependent eIF4AIII recruitment and
494
deposition, and thereby ascertains the degree of EJC assembly at mRNAs
495
(Figure 7B).
496
At the downstream regulatory level, after EJC assembly rates at
497
transcripts are defined, deposition of EJCs along mRNA exon-exon junctions is
498
modulated by the structural and sequence context of the splice sites (Figure 7
499
A,B). dsRNA stem structures in exon-exon junctions of Drosophila mRNAs
500
either antagonize EJC assembly when present within canonical EJC deposition
501
sites, or enhance EJC assembly when located in the vicinity of the EJC
502
deposition site (Figure 6D, 7A). Absence of dsRNA within the EJC binding
503
moiety is in agreement with reported preference of EJCs for ssRNA (1,9,50);
504
yet it remains to be elucidated, how and why EJC binding is positively affected
505
when RNA-stem structures are found in its direct proximity on the bound
506
template.
507
While it is likely that the structural context of exon-exon junctions in
508
Drosophila directly influences the degree of EJC assembly, sequence
509
composition derived effects on EJC binding to mRNA are the consequence of
510
the assigned roles of these sequences during pre-mRNA splicing. We have
511
demonstrated that exon-exon junctions with strong 5’ and strong 3’ splice sites
512
(ss) are biased towards junctions with enhanced EJC binding (Table S1, Figure
513
7A). Importantly regulation of weak 5’ and 3’ ss which commonly occur at
514
alternatively spliced junctions, cis-acting splicing regulatory elements (SREs)
515
were shown to be of importance (63-65). In light of the negative impact of
516
alternative splicing at the level of EJC mRNA binding (Figure 4D), it is not
517
surprising that conventional ESEs and ESSs hardly affect EJC binding at the
21
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
518
level of individual exon-exon junctions. However, whether the position-
519
dependent bias mediated by the UUU-triplet and CUG-triplet containing
520
hexamers towards inhibited or enhanced EJC binding which we have
521
discovered in our Drosophila data set (Figure 5B, C) is due to a direct or indirect
522
influence of these hexamers on splicing remains to be addressed. UUU-triplet
523
containing hexamers which are strongly biased towards inhibition of EJC
524
binding, could potentially function as yet undefined 5’ESS in Drosophila.
525
Interestingly, CUG-triplet containing hexamers which are strongly biased
526
towards enhanced EJC binding, share some sequence similarity with a
527
previously predicted CUG containing 5’ ESE of short intron splice sites (64). It
528
therefore appears likely that the CUG-triplet and UUU-triplet hexamers exert
529
their effect on EJC binding as a yet undefined and novel class of SREs.
530 531
RNA modalities influencing EJC binding in mammals and fly
532
In agreement with previous reports in mammals (36-38,66), we find that
533
the extent of EJC occupancy varies between mRNAs and exon-exon junctions
534
also in Drosophila. The splice site score next to a junction correlates with
535
increased EJC deposition in the fly, and this relationship between splicing
536
efficiency and EJC deposition has been proposed also in previous mammalian
537
studies (67,68). Analysis of published mammalian Btz iCLIP data (38) revealed
538
several modalities that correlate with the increased binding landscape of the
539
EJC on mRNAs in mammals as well as in Drosophila, including the high number
540
of introns, high transcript abundance and the sequence context of individual
541
exon-exon junctions (Figure S8). Interestingly, the presence of long introns has
542
a slightly negative effect, and the amount of alternative splicing a slightly
22
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
543
positive effect on EJC occupancy in mammals (Figure 4D, S8); the latter is in
544
agreement with previous observations (36-38). Previous studies on mammalian
545
tissue cell cultures have reported that EJC enriched junctions contain a
546
relatively high proportion of so called non-canonical protection sites, which
547
were enriched for RBP consensus sequences of the SR protein family (36,37).
548
Our analysis of Btz iCLIP data from mammals (38) confirms that the presence
549
of ESEs in upstream exons and 5’ISEs in introns correlates with enhanced EJC
550
binding (Table S2). Moreover, we have identified a group of junctions in
551
mammals containing AGAA hexamers that are biased for enhanced EJC
552
binding (Figure S7A), but their effects are not especially strong near the
553
canonical EJC deposition site (Figure S7B). These hexamers match the AGAA
554
encompassing consensus sequence of the mammalian SR protein SRSF10
555
known to function as splicing enhancers (69,70) and have been found
556
previously in EJC bound exon-exon junctions (36-38). Not only do our in silico
557
results agree with these reports and support the proposed cooperative binding
558
of EJC with SR proteins (36-38), they also partially explain the EJC’s preference
559
in mammals for alternatively spliced mRNAs.
560
One observation deriving from the analysis of published mammalian Btz
561
iCLIP data sets is surprising, however. While we have observed junctions in
562
Drosophila to be enhanced or inhibited in EJC binding by specific base pairing
563
probability (bpp)-profiles, thus by specific RNA folding categories (Figure 6A-
564
D), we could not detect any striking differences between overall bpp-profiles of
565
exon-exon junctions with enhanced or inhibited EJC binding in mammals
566
(Figure S8B). Indeed, the only aspect of RNA structure shared by mammals
567
and Drosophila is the negative effect of dsRNA when directly overlapping with
23
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
568
the canonical EJC deposition site (Figure S8C) (50). In Drosophila, however,
569
the presence of dsRNA close to canonical deposition sites enhances EJC
570
binding, an effect that is not observed in mammalian cells.
571 572
Insights into evolution and divergence of EJC functions
573
Our findings regarding the differences in the RNA modalities enriched at
574
highly occupied mammalian and Drosophila EJC sites provide insight into the
575
expansion of EJC functions during eukaryotic evolution. Spliceosome catalyzed
576
splicing reactions are bidirectional and efficient formation of exon-exon
577
junctions during RNA maturation is achieved by Prp22-induced release of
578
spliceosomes from mRNAs (71-73). Importantly, EJC is absent in organisms
579
with low rates of RNA splicing such as Saccharomyces cerevisiae, but present
580
in organisms with high splicing rates such as Schizosaccharomyces pombe
581
(74-76). This suggests that with the increased demand for splicing accuracy in
582
higher eukaryotes, the EJC evolved to function as an exon-exon junction “lock”
583
inhibiting spliceosome reassembly. Ensuring splicing reaction directionality, the
584
EJC further evolved to become a central component of the NMD pathway (7,20-
585
26) in mammals, where more than 95% of all genes are alternatively spliced
586
(77). This may provide an explanation as to why EJCs in mammals are enriched
587
on alternatively spliced transcripts. In Drosophila, where only 30% of all genes
588
appear to be alternatively spliced (78), the EJC is not a component of the main
589
NMD pathway (31). We therefore propose that although the EJC-NMD pathway
590
evolved before the segregation of the proto- and deuterostome clades, it gained
591
importance to complement the faux 3’UTR-NMD pathway during the evolution
24
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
592
of vertebrates (79), where RNA surveillance and spatio-temporal control of
593
gene expression are essential.
594
Similarly, the recruitment of EJC and interacting proteins upon splicing
595
of mRNA to facilitate the localization of the mRNA so far seems exclusive to
596
Drosophila. Two Drosophila specific features that modulate EJC binding,
597
namely the presence of a large intron and secondary structure near the
598
junction, are also predictive of mRNA localization. Although the precise strength
599
of association between these features and mRNA localization remains to be
600
verified with larger and more quantitative datasets, previous studies with the
601
SOLE in oskar RNA have shown that RNA structure and EJC binding are
602
indeed crucial for oskar mRNA localization (30).
603 604 605
ACKNOWLEDGMENTS
606
We thank Marco Blanchette, Matthias Hentze, Isabel Palacios, and Jean-Yves
607
Roignant for their gifts of antibodies and fly stocks. We thank Sandra Müller,
608
Alessandra Reversi and Anna Cyrklaff for technical assistance. We thank
609
Vladimir Benes and the EMBL Gene Core Facility for their advice and service.
610
We thank Mandy Rettel, Frank Stein and the EMBL Proteomics Core Facility,
611
for their service and assistance with mass spectrometry analysis. We are
612
grateful to members of the Ephrussi lab for discussions during the course of
613
the work.
614 615 616
AUTHOR CONTRIBUTIONS
25
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
617
Conceived and designed wet lab experiments: AO. Performed the wet lab
618
experiments: AO. Contributed reagents/materials/analysis tools: AO, LG, NH.
619
Conceived and designed the in silico analyses: LG, NH, AO. Analyzed and
620
interpreted the results: AO, LG, NH, AE. Contributed to the writing of the
621
manuscript: AO, LG, NH, JU, AE.
622 623 624
FUNDING
625
AO
626
Vetenskapsradet (Reg. No. 2010-6728) and from Marie Curie Actions (FP7-
627
PEOPLE-IEF No. 2763207), and by a Deutsche Forschungsgemeinschaft
628
Network Grant (DFG FOR 2333). This study was funded by the DFG (FOR
629
2333) and the EMBL.
was
supported
by
postdoctoral
fellowships
from
the
Swedish
630 631 632
CONFLICT OF INTEREST
633 634
The authors declare that they have no conflicting or competing interests.
635 636 637
REFERENCES
638 639
1.
Bono, F., Ebert, J., Lorentzen, E. and Conti, E. (2006) The crystal
640
structure of the exon junction complex reveals how it maintains a stable
641
grip on mRNA. Cell, 126, 713-725.
26
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
642
2.
Stroupe, M.E., Tange, T.O., Thomas, D.R., Moore, M.J. and Grigorieff,
643
N. (2006) The three-dimensional arcitecture of the EJC core. Journal of
644
molecular biology, 360, 743-749.
645
3.
Tange, T.O., Shibuya, T., Jurica, M.S. and Moore, M.J. (2005)
646
Biochemical analysis of the EJC reveals two new factors and a stable
647
tetrameric protein core. Rna, 11, 1869-1883.
648
4.
Le Hir, H., Izaurralde, E., Maquat, L.E. and Moore, M.J. (2000) The
649
spliceosome deposits multiple proteins 20-24 nucleotides upstream of
650
mRNA exon-exon junctions. The EMBO journal, 19, 6860-6869.
651
5.
Steckelberg, A.L., Altmueller, J., Dieterich, C. and Gehring, N.H. (2015)
652
CWC22-dependent pre-mRNA splicing and eIF4A3 binding enables
653
global deposition of exon junction complexes. Nucleic acids research,
654
43, 4687-4700.
655
6.
Barbosa, I., Haque, N., Fiorini, F., Barrandon, C., Tomasetto, C.,
656
Blanchette, M. and Le Hir, H. (2012) Human CWC22 escorts the
657
helicase eIF4AIII to spliceosomes and promotes exon junction complex
658
assembly. Nature structural & molecular biology, 19, 983-990.
659
7.
Alexandrov, A., Colognori, D., Shu, M.D. and Steitz, J.A. (2012) Human
660
spliceosomal protein CWC22 plays a role in coupling splicing to exon
661
junction
662
Proceedings of the National Academy of Sciences of the United States
663
of America, 109, 21313-21318.
664
8.
665
complex
deposition
and
nonsense-mediated
decay.
Ballut, L., Marchadier, B., Baguet, A., Tomasetto, C., Seraphin, B. and Le Hir, H. (2005) The exon junction core complex is locked onto RNA by
27
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
666
inhibition of eIF4AIII ATPase activity. Nature structural & molecular
667
biology, 12, 861-869.
668
9.
Andersen, C.B.F., Ballut, L., Johansen, J.S., Chamieh, H., Nielsen, K.H.,
669
Oliveira, C.L.P., Pedersen, J.S., Séraphin, B., Hir, H.L. and Andersen,
670
G.R. (2006) Structure of the Exon Junction Core Complex with a
671
Trapped DEAD-Box ATPase Bound to RNA. Science, 313, 1968-1972.
672
10.
Andersen, C.B., Ballut, L., Johansen, J.S., Chamieh, H., Nielsen, K.H.,
673
Oliveira, C.L., Pedersen, J.S., Seraphin, B., Le Hir, H. and Andersen,
674
G.R. (2006) Structure of the exon junction core complex with a trapped
675
DEAD-box ATPase bound to RNA. Science, 313, 1968-1972.
676
11.
Bono, F. and Gehring, N.H. (2011) Assembly, disassembly and
677
recycling: the dynamics of exon junction complexes. RNA biology, 8, 24-
678
29.
679
12.
Roignant, J.Y. and Treisman, J.E. (2010) Exon junction complex
680
subunits are required to splice Drosophila MAP kinase, a large
681
heterochromatic gene. Cell, 143, 238-250.
682
13.
Lloret-Llinares, M., Mapendano, C.K., Martlev, L.H., Lykke-Andersen, S.
683
and Jensen, T.H. (2016) Relationships between PROMPT and gene
684
expression. RNA biology, 13, 6-14.
685
14.
Ashton-Beaucage, D., Udell, C.M., Lavoie, H., Baril, C., Lefrancois, M.,
686
Chagnon, P., Gendron, P., Caron-Lizotte, O., Bonneil, E., Thibault, P. et
687
al. (2010) The exon junction complex controls the splicing of MAPK and
688
other long intron-containing transcripts in Drosophila. Cell, 143, 251-262.
28
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
689
15.
Ashton-Beaucage, D. and Therrien, M. (2011) The exon junction
690
complex: a splicing factor for long intron containing transcripts? Fly, 5,
691
224-233.
692
16.
Gatfield, D., Le Hir, H., Schmitt, C., Braun, I.C., Kocher, T., Wilm, M. and
693
Izaurralde, E. (2001) The DExH/D box protein HEL/UAP56 is essential
694
for mRNA nuclear export in Drosophila. Current biology : CB, 11, 1716-
695
1721.
696
17.
Shiimori, M., Inoue, K. and Sakamoto, H. (2013) A specific set of exon
697
junction complex subunits is required for the nuclear retention of
698
unspliced RNAs in Caenorhabditis elegans. Molecular and cellular
699
biology, 33, 444-456.
700
18.
Nott, A., Le Hir, H. and Moore, M.J. (2004) Splicing enhances translation
701
in mammalian cells: an additional function of the exon junction complex.
702
Genes & development, 18, 210-222.
703
19.
Chazal, P.E., Daguenet, E., Wendling, C., Ulryck, N., Tomasetto, C.,
704
Sargueil, B. and Le Hir, H. (2013) EJC core component MLN51 interacts
705
with eIF3 and activates translation. Proceedings of the National
706
Academy of Sciences of the United States of America, 110, 5903-5908.
707
20.
Gehring, N.H., Kunz, J.B., Neu-Yilik, G., Breit, S., Viegas, M.H., Hentze,
708
M.W. and Kulozik, A.E. Exon-Junction Complex Components Specify
709
Distinct Routes of Nonsense-Mediated mRNA Decay with Differential
710
Cofactor Requirements. Molecular cell, 20, 65-75.
711
21.
712
Singh, G., Jakob, S., Kleedehn, M.G. and Lykke-Andersen, J. (2007) Communication with the exon-junction complex and activation of
29
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
713
nonsense-mediated decay by human Upf proteins occur in the
714
cytoplasm. Molecular cell, 27, 780-792.
715
22.
Shibuya, T., Tange, T.O., Stroupe, M.E. and Moore, M.J. (2006)
716
Mutational analysis of human eIF4AIII identifies regions necessary for
717
exon junction complex formation and nonsense-mediated mRNA decay.
718
Rna, 12, 360-374.
719
23.
Palacios, I.M., Gatfield, D., St Johnston, D. and Izaurralde, E. (2004) An
720
eIF4AIII-containing complex required for mRNA localization and
721
nonsense-mediated mRNA decay. Nature, 427, 753-757.
722
24.
Okada-Katsuhata, Y., Yamashita, A., Kutsuzawa, K., Izumi, N.,
723
Hirahara, F. and Ohno, S. (2012) N- and C-terminal Upf1
724
phosphorylations create binding platforms for SMG-6 and SMG-5:SMG-
725
7 during NMD. Nucleic acids research, 40, 1251-1266.
726
25.
Melero, R., Buchwald, G., Castano, R., Raabe, M., Gil, D., Lazaro, M.,
727
Urlaub, H., Conti, E. and Llorca, O. (2012) The cryo-EM structure of the
728
UPF-EJC complex shows UPF1 poised toward the RNA 3' end. Nature
729
structural & molecular biology, 19, 498-505, S491-492.
730
26.
Buchwald, G., Ebert, J., Basquin, C., Sauliere, J., Jayachandran, U.,
731
Bono, F., Le Hir, H. and Conti, E. (2010) Insights into the recruitment of
732
the NMD machinery from the crystal structure of a core EJC-UPF3b
733
complex. Proceedings of the National Academy of Sciences of the
734
United States of America, 107, 10050-10055.
735
27.
736
van Eeden, F.J., Palacios, I.M., Petronczki, M., Weston, M.J. and St Johnston, D. (2001) Barentsz is essential for the posterior localization of
30
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
737
oskar mRNA and colocalizes with it to the posterior pole. The Journal of
738
cell biology, 154, 511-523.
739
28.
740 741
Hachet, O. and Ephrussi, A. (2004) Splicing of oskar RNA in the nucleus is coupled to its cytoplasmic localization. Nature, 428, 959-963.
29.
Hachet, O. and Ephrussi, A. (2001) Drosophila Y14 shuttles to the
742
posterior of the oocyte and is required for oskar mRNA transport. Current
743
biology : CB, 11, 1666-1674.
744
30.
Ghosh, S., Marchand, V., Gaspar, I. and Ephrussi, A. (2012) Control of
745
RNP motility and localization by a splicing-dependent structure in oskar
746
mRNA. Nature structural & molecular biology, 19, 441-449.
747
31.
Behm-Ansmant, I., Gatfield, D., Rehwinkel, J., Hilgers, V. and Izaurralde,
748
E. (2007) A conserved role for cytoplasmic poly(A)-binding protein 1
749
(PABPC1) in nonsense-mediated mRNA decay. The EMBO journal, 26,
750
1591-1601.
751
32.
Zimyanin, V.L., Belaya, K., Pecreaux, J., Gilchrist, M.J., Clark, A., Davis,
752
I. and St Johnston, D. (2008) In vivo imaging of oskar mRNA transport
753
reveals the mechanism of posterior localization. Cell, 134, 843-853.
754
33.
Ghosh, S., Obrdlik, A., Marchand, V. and Ephrussi, A. (2014) The EJC
755
binding and dissociating activity of PYM is regulated in Drosophila. PLoS
756
genetics, 10, e1004455.
757
34.
Lomant, A.J. and Fairbanks, G. (1976) Chemical probes of extended
758
biological structures: synthesis and properties of the cleavable protein
759
cross-linking reagent [35S]dithiobis(succinimidyl propionate). Journal of
760
molecular biology, 104, 243-261.
31
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
761
35.
Schweizer, E., Angst, W. and Lutz, H.U. (1982) Glycoprotein topology
762
on intact human red blood cells reevaluated by cross-linking following
763
amino group supplementation. Biochemistry, 21, 6807-6818.
764
36.
Singh, G., Kucukural, A., Cenik, C., Leszyk, J.D., Shaffer, S.A., Weng,
765
Z. and Moore, M.J. (2012) The cellular EJC interactome reveals higher-
766
order mRNP structure and an EJC-SR protein nexus. Cell, 151, 750-
767
764.
768
37.
Sauliere, J., Murigneux, V., Wang, Z., Marquenet, E., Barbosa, I., Le
769
Tonqueze, O., Audic, Y., Paillard, L., Roest Crollius, H. and Le Hir, H.
770
(2012) CLIP-seq of eIF4AIII reveals transcriptome-wide mapping of the
771
human exon junction complex. Nature structural & molecular biology, 19,
772
1124-1131.
773
38.
Hauer, C., Sieber, J., Schwarzl, T., Hollerer, I., Curk, T., Alleaume, A.M.,
774
Hentze, M.W. and Kulozik, A.E. (2016) Exon Junction Complexes Show
775
a Distributional Bias toward Alternatively Spliced mRNAs and against
776
mRNAs Coding for Ribosomal Proteins. Cell reports, 16, 1588-1603.
777
39.
Nielsen, K.H., Chamieh, H., Andersen, C.B., Fredslund, F., Hamborg,
778
K., Le Hir, H. and Andersen, G.R. (2009) Mechanism of ATP turnover
779
inhibition in the EJC. Rna, 15, 67-75.
780
40.
Castello, A., Horos, R., Strein, C., Fischer, B., Eichelbaum, K.,
781
Steinmetz, L.M., Krijgsveld, J. and Hentze, M.W. (2013) System-wide
782
identification of RNA-binding proteins by interactome capture. Nature
783
protocols, 8, 491-500.
784
41.
785
Castello, A., Fischer, B., Eichelbaum, K., Horos, R., Beckmann, B.M., Strein, C., Davey, N.E., Humphreys, D.T., Preiss, T., Steinmetz, L.M. et
32
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
786
al. (2012) Insights into RNA biology from an atlas of mammalian mRNA-
787
binding proteins. Cell, 149, 1393-1406.
788
42.
Bienkowski, R.S., Banerjee, A., Rounds, J.C., Rha, J., Omotade, O.F.,
789
Gross, C., Morris, K.J., Leung, S.W., Pak, C., Jones, S.K. et al. (2017)
790
The Conserved, Disease-Associated RNA Binding Protein dNab2
791
Interacts with the Fragile X Protein Ortholog in Drosophila Neurons. Cell
792
reports, 20, 1372-1384.
793
43.
Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner,
794
D.J., Luscombe, N.M. and Ule, J. (2011) iCLIP--transcriptome-wide
795
mapping of protein-RNA interactions with individual nucleotide
796
resolution. Journal of visualized experiments : JoVE.
797
44.
Kataoka, N., Diem, M.D., Kim, V.N., Yong, J. and Dreyfuss, G. (2001)
798
Magoh, a human homolog of Drosophila mago nashi protein, is a
799
component of the splicing-dependent exon-exon junction complex. The
800
EMBO journal, 20, 6424-6433.
801
45.
Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L.,
802
Ren, J., Li, W.W. and Noble, W.S. (2009) MEME Suite: tools for motif
803
discovery and searching. Nucleic acids research, 37, W202-W208.
804
46.
Love, M.I., Huber, W. and Anders, S. (2014) Moderated estimation of
805
fold change and dispersion for RNA-seq data with DESeq2. Genome
806
biology, 15, 550.
807
47.
Wilk, R., Hu, J., Blotsky, D. and Krause, H.M. (2016) Diverse and
808
pervasive subcellular distributions for both coding and long noncoding
809
RNAs. Genes & development, 30, 594-609.
33
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
810
48.
Lecuyer, E., Yoshida, H., Parthasarathy, N., Alm, C., Babak, T.,
811
Cerovina, T., Hughes, T.R., Tomancak, P. and Krause, H.M. (2007)
812
Global analysis of mRNA localization reveals a prominent role in
813
organizing cellular architecture and function. Cell, 131, 174-187.
814
49.
815 816
Simon, B., Masiewicz, P., Ephrussi, A. and Carlomagno, T. (2015) The structure of the SOLE element of oskar mRNA. Rna, 21, 1444-1453.
50.
Mishler, D.M., Christ, A.B. and Steitz, J.A. (2008) Flexibility in the site of
817
exon junction complex deposition revealed by functional group and RNA
818
secondary structure alterations in the splicing substrate. Rna, 14, 2657-
819
2670.
820
51.
Brooks, A.N., Duff, M.O., May, G., Yang, L., Bolisetty, M., Landolin, J.,
821
Wan, K., Sandler, J., Booth, B.W., Celniker, S.E. et al. (2015) Regulation
822
of alternative splicing in Drosophila by 56 RNA binding proteins.
823
Genome research, 25, 1771-1780.
824
52.
825 826
Therneau, T. and Atkinson, B. (2018). 4.1-13 ed, pp. rpart: Recursive Partitioning and Regression Trees.
53.
Ule, J., Jensen, K.B., Ruggiu, M., Mele, A., Ule, A. and Darnell, R.B.
827
(2003) CLIP identifies Nova-regulated RNA networks in the brain.
828
Science, 302, 1212-1215.
829
54.
Licatalosi, D.D., Mele, A., Fak, J.J., Ule, J., Kayikci, M., Chi, S.W., Clark,
830
T.A., Schweitzer, A.C., Blume, J.E., Wang, X. et al. (2008) HITS-CLIP
831
yields genome-wide insights into brain alternative RNA processing.
832
Nature, 456, 464-469.
34
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
833
55.
Ule, J. (2009) High-throughput sequencing methods to study neuronal
834
RNA-protein interactions. Biochemical Society transactions, 37, 1278-
835
1280.
836
56.
Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner,
837
D.J., Luscombe, N.M. and Ule, J. (2010) iCLIP reveals the function of
838
hnRNP particles in splicing at individual nucleotide resolution. Nature
839
structural & molecular biology, 17, 909-915.
840
57.
Wang, Z., Kayikci, M., Briese, M., Zarnack, K., Luscombe, N.M., Rot, G.,
841
Zupan, B., Curk, T. and Ule, J. (2010) iCLIP predicts the dual splicing
842
effects of TIA-RNA interactions. PLoS biology, 8, e1000530.
843
58.
Tollervey, J.R., Curk, T., Rogelj, B., Briese, M., Cereda, M., Kayikci, M.,
844
Konig, J., Hortobagyi, T., Nishimura, A.L., Zupunski, V. et al. (2011)
845
Characterizing the RNA targets and position-dependent splicing
846
regulation by TDP-43. Nature neuroscience, 14, 452-458.
847
59.
Ince-Dunn, G., Okano, H.J., Jensen, K.B., Park, W.Y., Zhong, R., Ule,
848
J., Mele, A., Fak, J.J., Yang, C., Zhang, C. et al. (2012) Neuronal Elav-
849
like (Hu) proteins regulate RNA splicing and abundance to control
850
glutamate levels and neuronal excitability. Neuron, 75, 1067-1080.
851
60.
Modic, M., Ule, J. and Sibley, C.R. (2013) CLIPing the brain: studies of
852
protein-RNA interactions important for neurodegenerative disorders.
853
Molecular and cellular neurosciences, 56, 429-435.
854
61.
Alexandrov, A., Colognori, D. and Steitz, J.A. (2011) Human eIF4AIII
855
interacts with an eIF4G-like partner, NOM1, revealing an evolutionarily
856
conserved function outside the exon junction complex. Genes &
857
development, 25, 1078-1090.
35
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
858
62.
Choudhury, S.R., Singh, A.K., McLeod, T., Blanchette, M., Jang, B.,
859
Badenhorst, P., Kanhere, A. and Brogna, S. (2016) Exon junction
860
complex proteins bind nascent transcripts independently of pre-mRNA
861
splicing in Drosophila melanogaster. eLife, 5.
862
63.
Shepard, P.J., Choi, E.-A., Busch, A. and Hertel, K.J. (2011) Efficient
863
internal exon recognition depends on near equal contributions from the
864
3′ and 5′ splice sites. Nucleic acids research, 39, 8928-8937.
865
64.
Brooks, A.N., Aspden, J.L., Podgornaia, A.I., Rio, D.C. and Brenner,
866
S.E. (2011) Identification and experimental validation of splicing
867
regulatory elements in Drosophila melanogaster reveals functionally
868
conserved splicing enhancers in metazoans. Rna, 17, 1884-1894.
869
65.
Koren, E., Lev-Maor, G. and Ast, G. (2007) The Emergence of
870
Alternative 3′ and 5′ Splice Site Exons from Constitutive Exons. PLoS
871
computational biology, 3, e95.
872
66.
Sauliere, J., Haque, N., Harms, S., Barbosa, I., Blanchette, M. and Le
873
Hir, H. (2010) The exon junction complex differentially marks spliced
874
junctions. Nature structural & molecular biology, 17, 1269-1271.
875
67.
Gudikote, J.P., Imam, J.S., Garcia, R.F. and Wilkinson, M.F. (2005) RNA
876
splicing promotes translation and RNA surveillance. Nature structural &
877
molecular biology, 12, 801-809.
878
68.
Custódio, N., Carvalho, C., Condado, I., Antoniou, M., Blencowe, B.J.
879
and Carmo-Fonseca, M. (2004) In vivo recruitment of exon junction
880
complex proteins to transcription sites in mammalian cell nuclei. RNA
881
(New York, N.Y.), 10, 622-633.
36
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
882
69.
Clery, A., Jayne, S., Benderska, N., Dominguez, C., Stamm, S. and
883
Allain, F.H. (2011) Molecular basis of purine-rich RNA recognition by the
884
human SR-like protein Tra2-beta1. Nature structural & molecular
885
biology, 18, 443-450.
886
70.
Tsuda, K., Someya, T., Kuwasako, K., Takahashi, M., He, F., Unzai, S.,
887
Inoue, M., Harada, T., Watanabe, S., Terada, T. et al. (2011) Structural
888
basis for the dual RNA-recognition modes of human Tra2-beta RRM.
889
Nucleic acids research, 39, 1538-1553.
890
71.
Hoskins, A.A. and Moore, M.J. (2012) The spliceosome: a flexible,
891
reversible macromolecular machine. Trends in biochemical sciences,
892
37, 179-188.
893
72.
894 895
reversible splicing catalysis. RNA (New York, N.Y.), 14, 1975-1978. 73.
896 897
Smith, D.J. and Konarska, M.M. (2008) Mechanistic insights from
Tseng, C.K. and Cheng, S.C. (2008) Both catalytic steps of nuclear premRNA splicing are reversible. Science, 320, 1782-1784.
74.
Wood, V., Gwilliam, R., Rajandream, M.A., Lyne, M., Lyne, R., Stewart,
898
A., Sgouros, J., Peat, N., Hayles, J., Baker, S. et al. (2002) The genome
899
sequence of Schizosaccharomyces pombe. Nature, 415, 871-880.
900
75.
Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann,
901
H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M. et al. (1996) Life
902
with 6000 Genes. Science, 274, 546.
903
76.
Wen, J. and Brogna, S. (2010) Splicing‐dependent NMD does not
904
require the EJC in Schizosaccharomyces pombe. The EMBO journal,
905
29, 1537.
37
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
906
77.
Gromadzka, A.M., Steckelberg, A.L., Singh, K.K., Hofmann, K. and
907
Gehring, N.H. (2016) A short conserved motif in ALYREF directs cap-
908
and EJC-dependent assembly of export complexes on spliced mRNAs.
909
Nucleic acids research, 44, 2348-2361.
910
78.
Gibilisco, L., Zhou, Q., Mahajan, S. and Bachtrog, D. (2016) Alternative
911
Splicing within and between Drosophila Species, Sexes, Tissues, and
912
Developmental Stages. PLoS genetics, 12, e1006464.
913
79.
Eberle, A.B., Stalder, L., Mathys, H., Orozco, R.Z. and Mühlemann, O.
914
(2008) Posttranscriptional Gene Regulation by Spatial Rearrangement
915
of the 3′ Untranslated Region. PLoS biology, 6, e92.
916
80.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W. and
917
Smyth, G.K. (2015) limma powers differential expression analyses for
918
RNA-sequencing and microarray studies. Nucleic acids research, 43,
919
e47.
920 921 922
FIGURE LEGENDS
923 924
Figure 1. Dithio(bi-)succinimidyl-propionate (DSP), stabilizes EJC in its
925
mRNA bound state.
926 927
(A) SDS-PAGE and silver staining of proteins co-precipitated with oligo-
928
d(T)25 bound mRNAs from cytoplasm. Left to right: Input cytoplasmic
929
samples (0.01% of total input; lanes 1-4); protein MW standards (lane 5);
930
“beads-only” control precipitate from “all-condition-mixture” (3.3%, lane 6);
38
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
931
individual oligo-d(T)25 precipitates (3.3%; lanes 7-10). Treatment conditions,
932
“untreated”, “UV irradiated”, “DSP supplemented” and “UV irradiated-DSP
933
supplemented” are indicated (+ or -) above image. Note that for samples treated
934
with DSP ex vivo, UV-exposure does not increase the amount of recovered
935
proteins.
936
(B) Western blot analysis of oligo-d(T)25 precipitates. Gel loading order
937
same as in (A); 0.03% of the input cytoplasm and 33% of each precipitate were
938
resolved. Blot was probed with antibodies directed against the proteins
939
indicated at the right of the panel. Khc: kinesin heavy chain; PABP: cytoplasmic
940
poly A binding protein; RpS6 = small ribosomal subunit protein S6.
941
(C) Quantification of proteins detected on the western blot. Crosslinking
942
conditions are indicated in upper panel.
943
densitometry measurement using the Fiji image analysis package. Relative
944
protein abundance is the fraction of the signal in the precipitate relative to the
945
cytoplasm. Note that in contrast to the RNA-binding eIF4AIII, the non-RNA-
946
binding EJC subunit Y14 is only detected when the cytoplasm was treated with
947
DSP.
948
(D) Schematic of the net effect of crosslinking with UV versus DSP.
949
Exposure of the cytoplasm to UV leads primarily to stabilization of direct protein-
950
RNA interactions. DSP treatment results in efficient retention of proteins
951
associated with RNA by stabilization of polypeptide interactions either within
952
individual an RBP or between an RBP and other moieties within a complex.
Signals were quantified by
953 954
39
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
955
Figure 2. DSP crosslinking stabilizes EJC for stringent
956
immunoprecipitation and RNA fragmentation
957 958
(A) Comparison of IP washing conditions for RNA isolation. Anti-GFP RNA
959
IP assays on DSP-treated cytoplasm of GFP-Mago and GFP tag expressing
960
flies. 0.02% of RNA isolated from input lysate (lane 1) and 20% from GFP tag
961
(lanes 2 and 3) and GFP-Mago (lanes 4 and 5) RIP precipitates were resolved
962
by capillary gel electrophoresis on an RNA 6000 Pico Chip Bioanalyzer.
963
Washing conditions are indicated at bottom. SW: standard stringency washing
964
conditions; HSW: high stringency washing conditions (see Materials and
965
Methods).
966
(B) DSP stabilized EJC core is resistant to RNaseI treatment. Effects of
967
RNaseI treatment on GFP-Mago co-immunoprecipitations in DSP crosslinked
968
and untreated cytoplasm. Western blots of anti-GFP IPs from DSP treated and
969
from untreated cytoplasm of GFP-Mago (lanes 5-8) and GFP tag (lanes 1-4)
970
expressing flies processed under HSW conditions. Primary antibodies utilized
971
are indicated on the right. Inputs (0.01%) are shown in lane 1 and 5. Bead-only
972
control precipitates are shown in lanes 2 and 6. IP precipitates (20%) from DSP
973
treated (lanes 3 and 7) or untreated cytoplasm (lanes 4 and 8). Details of
974
samples and experimental conditions indicated at top of panel.
975
(C) RNA fragmentation depletes EJC unrelated proteins from GFP-Mago
976
co-precipitates. Anti-GFP IPs (HSW condition) from DSP treated cytoplasm of
977
GFP-Mago and GFP tag expressing flies. Precipitates were subjected to isobar
978
labeling and peptide content was defined by tandem mass spectrometry.
979
Presented bar plots of protein-enrichment are defined by Limma(80). GFP-
40
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
980
Mago specific protein enrichments in “intact RNA” and “fragmented RNA”
981
conditions are highlighted in left and right plot, respectively. Y-axis shows
982
individual proteins detected. X-axis shows scale of enrichment (Log2 fold
983
change). Dashed line indicates average enrichment of all significant proteins in
984
each condition. Protein-enrichment >2x or 0.05).
987 988 989
Figure 3. EJC binding to mRNA in Drosophila cytoplasm occurs within
990
exons at canonical deposition sites.
991 992
(A) EJC ipaRt library reads map to exonic sites in the Drosophila genome.
993
Summary of genomic features detected in mRBP footprinting and EJC ipaRt
994
sequencing results. Y-axis indicates proportion (percent) of all uniquely aligning
995
read-counts. Color-code of genomic features is highlighted in the legend on
996
right side of the plot. Note that EJC protected sites map in majority to exons
997
and UTR-exons.
998
(B) EJC protection median is on upstream exons approximately -22 nt
999
from the 3’ end. Read coverage profiles from EJC ipaRt and mRBP footprinting
1000
cDNA libraries. Coordinates of metagene covering -+50 nt of exon-exon
1001
junctions are indicated on x-axis. Note position “0” defines the last nucleotide
1002
of upstream exons. Scale of mean sequencing read coverage for all normalized
1003
biological replicates is indicated on y-axis. Profiles from RBP protected and
1004
EJC protected junctions are presented in grey and red respectively. Note that
41
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1005
mRBP footprinting sequencing reads show a homogenous RBP protection
1006
profile over the entire metagene body. On the contrary, sequencing reads from
1007
EJC ipaRt indicate a modal distribution of EJC protections with a median at -
1008
21.5 nt upstream of the exon-exon ligation point. Region of maximal EJC
1009
protection (protection core) spans from -27 nt to -15 nt and is highlighted in
1010
orange.
1011
(C) Genomic coverage profile visualized by IGB viewer. Coverage-profiles
1012
of 3 individual sequencing replicates from EJC ipaRt and mRNP footprinting
1013
along the oskar and RpL32 genes. Size and position of gene regions are
1014
indicated. Exons and introns are indicated below the coverage profiles as boxes
1015
or lines, respectively. Coverage depth (normalized reads) is indicated on the y-
1016
axis. Note that cDNA reads from EJC protected fragments map upstream of
1017
and in direct proximity to splice sites, while reads from RBP protected fragments
1018
distribute over whole exon bodies. Sequencing coverage depth of EJC
1019
protected sites at exon-exon junctions is variable.
1020
(D) Protection site peaks (modes) in EJC ipaRt libraries cluster primarily
1021
within 50nt around splice junctions. Density plot highlighting distribution of
1022
coverage peaks in EJC ipaRt in respect to the splice site distance. Y-axis
1023
defines estimated density of peaks at defined splice site distances. X-axis
1024
indicates log10-transformed distance to splice site. Vertical dashed line
1025
highlights border between proximal and remote protection peak estimates. Note
1026
all peaks in ipaRt were defined within exons at a 20 nt frame resolution. Peaks
1027
chosen for the analysis were 2x more covered in ipaRt than in mRBP
1028
footprinting and had a coverage > 30 reads.
42
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1029
(E) Proximal (canonical) protection peaks in EJC ipaRt libraries are
1030
stronger covered than remote protection peaks. Density plot showing the
1031
distribution of estimated coverage of splice junctions proximal (blue) and
1032
remote (grey) protection peaks. Y-axis defines estimated densities and x-axis
1033
the peak coverage . Note that peak coverage stands for the sum of all reads
1034
within a +/-10 nt window surrounding a protection site peak.
1035
(F) Peaks within canonical EJC binding sites map to first and internal
1036
exons. Bar plot showing proportion of peaks mapping to first, internal and last
1037
exons. Peaks mapping within or in direct proximity to canonical EJC binding
1038
regions (proximal peaks) are highlighted in dark blue. Peaks remote from EJC
1039
binding regions are highlighted in dark grey. Estimated peak proportions within
1040
exon classes are indicated in the plot. Y-axis indicates proportion in percent; X-
1041
axis indicates the tested exon classes of a metagene. Note that proximal peaks
1042
are nearly exclusively found in first and internal exons while remote peaks are
1043
detected in all three classes of exons.
1044
(G) Proportion of remote protection peaks in EJC ipaRt libraries
1045
decreases with sequencing coverage. Plot highlighting the proportion of
1046
remote peaks among all EJC ipaRt sequencing coverage peaks, and among
1047
sequencing coverage subsets of the best 75%, best 50% and best 25% covered
1048
protection peaks. Y-axis defines proportion (in %) of remote peaks. X-axis
1049
highlights peak coverage subsets. Note that proportion of EJC protection peaks
1050
reduces significantly when increased coverage cutoffs are chosen.
1051 1052
43
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1053
Figure 4.
1054
specialized cellular functions is determined by gene architecture.
Preferential recruitment of EJC to mRNAs of genes with
1055 1056
(A) MA-plot of DESeq results from EJC ipaRt and mRBP footprinting.
1057
Genes that are either enriched or depleted for EJC (RBP enriched) are
1058
indicated in red or grey, respectively. Genes that are not significantly different
1059
(p.adjusted >0.05) between the EJC ipaRt and mRBP libraries are transparent.
1060
Relative enrichment (log2FoldChange) is indicated on the y-axis. Base Mean
1061
of signal is highlighted on the x-axis. Dashed line defines log2foldChange =0.
1062
(B) GO term analysis of EJC enriched and depleted transcripts
1063
|log2FoldChange| >1. GO terms for biological processes of EJC enriched
1064
transcripts (left plot) and of EJC depleted transcripts (right plot) are presented
1065
to the left and the right of the plots, respectively. Y-axis list GO categories of
1066
the biological processes most highly represented. Gene counts in individual
1067
categories versus overall count of analyzed genes (Gene ratios) are shown on
1068
the x-axis. Legend indicating number of genes falling into a particular GO term
1069
(size of circles) and its significance of association (color of circles) are (circles)
1070
shown on the plots is highlighted in the center.
1071
(C) Scatter-Boxplot showing DESeq enrichment estimates of Drosophila
1072
mRNAs with known localization patterns in the early embryo.
1073
products in DESeq analysis were subset to maternally expressed mRNAs,
1074
which localize or do not localize to specific foci in early embryos (48). Relative
1075
enrichment (log2FoldChange) is indicated on y-axis. Localization categories
1076
are indicated on x-axis. Non-localizing and localizing gene products are
1077
highlighted in green and blue, respectively. Horizontal solid line in
Gene
44
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1078
corresponding box plots indicates median log2FoldChange for each of the
1079
categories. Dashed line highlights log2FoldChange =0.
1080
(D) Result of Multiple regression model for EJC enrichment highlighting
1081
parameters which contribute most to preferential binding of EJC to
1082
mRNA. Plots showing the relative contribution and effect of each factor (listed
1083
on y-axis) to EJC enrichment, as estimated from the full regression model. For
1084
the plot showing relative importance (left), estimates were obtained by
1085
bootstrapping the data; the dot indicates the median and whiskers indicate the
1086
95% confidence interval. For the plot showing the effect of each factor (right),
1087
the T-statistic of each factor was calculated from the estimate and the standard
1088
error of each coefficient obtained after fitting the full model.
1089 1090 1091
Figure 5. EJC assembly reveals a sequence bias
1092 1093
(A) EJC enrichment – observed versus expected. Scatterplot showing EJC
1094
enrichment defined by DESeq for individual exon-exon junction relative to
1095
DESeq estimates of corresponding templates. Enrichment for each exon-exon
1096
junction (log2FoldChange exon-exon junction) was calculated from read counts
1097
falling within 50bp upstream and 50bp downstream of the exon-exon junction.
1098
The EJC enrichment for the exon-exon junction was compared to its associated
1099
gene and a Z score and p-value was calculated. Exon-exon junctions whose
1100
EJC enrichment was not significantly different from that of its associated gene
1101
(p.adjusted >0.05) are shown in grey. Exon-exon junctions with significantly
1102
higher EJC enrichment than their associated genes are highlighted in red, and
45
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1103
those with significantly lower EJC enrichment are highlighted in blue. The exon-
1104
exon junction EJC enrichment is highly correlated with the corresponding gene
1105
EJC enrichment (Spearman correlation = 0.664) and twice as many exon-exon
1106
junctions show significantly lower EJC enrichment than their corresponding
1107
gene product: 4334 vs. 2275, out of a total of 15186 junctions tested).
1108
(B) Hexamers containing CUG and UUU contribute most to deviation
1109
between observed and expected EJC enrichment. Plot showing the effect
1110
of each hexamer on EJC enrichment for each exon-exon junction, after
1111
accounting for splice strength and presence of an ESS. For each hexamer, the
1112
deviation (of the exon-exon junction EJC enrichment from its gene EJC
1113
enrichment) was regressed against (1) the presence / absence of hexamer, (2)
1114
splice strength, and (3) ESS counts. The x-axis shows the associated T-statistic
1115
obtained for the hexamer effect after fitting such a model for each individual
1116
hexamer. Dendrogram of hexamers was performed using hierarchical
1117
clustering with Ward’s method, based on pairwise Damerau–Levenshtein
1118
distance modelling between all hexamers. CUG containing hexamers show a
1119
concordant positive effect and UUU containing hexamers show a concordant
1120
negative effect on EJC enrichment.
1121
(C) CUG and UUU effect EJC binding is position biased. Plot showing the
1122
positional effect of the tri-nucleotides CUG and UUU. Similar to the analysis
1123
performed for hexamers, for each possible position of the tri-nucleotide, the
1124
deviation (of the exon-exon junction EJC enrichment from its gene EJC
1125
enrichment) was regressed against (1) the presence / absence of CUG / UUU
1126
at the position, (2) splice strength, and (3) ESS counts. Y-axis shows the
1127
associated T-statistic obtained for the position effect after fitting such a model
46
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1128
for each individual position upstream of the exon-exon junction. X-axis defines
1129
the nt coordinates in the upstream exon. Both CUG and UUU show a highly
1130
positive or negative T-statistic around -18 bp upstream of exon-exon junction,
1131
indicating a strong position-specific effect.
1132 1133
Figure 6. mRNA secondary structures modulate EJC binding
1134 1135
(A) Pairing probability profiles of EJC bound exon-exon junctions.
1136
Predicted Base pairing probability (bpp) profiles, by RNAplfold from Vienna
1137
RNA-package, of junctions with enhanced, inhibited and unaffected EJC
1138
binding. Black dashed, red and grey solid lines highlight average bpp profiles
1139
of junctions that are unaffected, enhanced and inhibited for EJC binding
1140
respectively. Note: The region utilized for RNA-Fold analysis covers the last
1141
37nt of the upstream exon and the first 28nt of the downstream exon. Y-axis
1142
shows predicted bpp. X-axis highlights nucleotide positions relative to exon-
1143
exon junction. Position 0 represents last RNA nucleotide of upstream exons.
1144
Dashed vertical line highlights the coordinate (-21.7) of the average EJC
1145
protection median.
1146
(B) Definition of bpp regions distinct between exon-exon junctions with
1147
enhanced and inhibited EJC binding. To define a cut-off for significantly
1148
distinct regions in the exon-exon junctions with enhanced and inhibited EJC
1149
binding, the bpp significance for each nucleotide in the two junction groups was
1150
tested by ANOVA, then subjected to permutation testing. Note that at alpha of
1151
0.05 the regions -11 to -16 and -20 to -24 are significantly distinct in bpp
1152
between junctions enhanced and depleted in EJC binding. Y-axis defines –
47
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1153
log10 transformed ANOVA p-values. Horizontal red dashed line indicates cutoff
1154
at a=0.05 of the permutation test. X-axis highlights nucleotide position relative
1155
to exon-exon junction.
1156
(C) Folding categories of analyzed junctions. To define folding categories,
1157
exon-exon junctions were clustered based on bpp estimates within the region -
1158
24 to -11 (see B). Clustering was performed with MClust using Gaussian
1159
mixture model assuming 4 clusters, and from the resulting categories the mean
1160
bpp profile of all cluster members are shown as individual bpp profile plots.
1161
Shaded region around solid lines indicates 95% confidence interval estimated
1162
using normal approximation to binomial distribution.
1163
indicate borders of selected region for bpp profile clustering. Y-axis shows
1164
predicted bpp estimates, x-axis highlights nucleotide positions relative to exon-
1165
exon junction.
1166
(D) Impact of RNA structures on EJC binding. Segmented bar plot showing
1167
proportion of significantly affected exon-exon junctions that are enhanced for
1168
EJC binding. Horizontal dashed line indicates overall proportion of junctions
1169
with enhanced EJC binding. Segments referring to proportion of junctions with
1170
enhanced or inhibited EJC binding are highlighted in red or grey, respectively.
1171
Y-axis defines proportion (in %) of all junctions significantly effected in EJC
1172
binding. X-axis indicates plotted folding categories (shown in panel C). Percent
1173
estimates for junctions with enhanced EJC binding are highlighted within the
1174
bar segments. Note that folding categories 2 and 3 have a positive, while folding
1175
category 4 - which has an increased bpp within EJC deposition sites - has a
1176
negative impact on EJC binding.
Vertical dashed lines
48
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
1177
(E-F) RNA stem structures impact EJC binding coordinates. Plots showing
1178
distribution densities of EJC protection site medians. Double stranded (ds) RNA
1179
content in Region A and B was identified using the mean of rounded bpp
1180
estimates [0 and 1] in the corresponding sequence stretches. High and low
1181
dsRNA content are defined as >0.7 and 50 nt
EJC protected 0.0
0
B
1 2 Log10(splice site distance [nt])
3
E
max. median density of EJC protection −21,7
6e-07
proximal read coverage peaks ≤ 50 nt (n=31715)
EJC protected
remote read coverage peaks >50 nt (n=1498)
1.0
Density
EJC protected core region
4e-07
RBP protected
0.5
2e-07
0
0.0 0 exon-exon junction
exon (n-1)
2
50 nt
25
3
exon n
C
4
Log10 (peak_coverage)
F Rep. #1 Rep. #1
proportion (%) of peaks at transcript exons in EJC ipaRt libraries
−50 nt
−25
Rep. #2 Rep. #2 Rep. #3
Rep. #3 8,936,000
8,937,000
8,938,000
79.0%
80%
67.2% proximal coverage peaks (canonical EJC binding)
60%
remote read coverage peaks
40%
20%
24.6%
20.6% 8.3% 0.4%
0%
3’
5’ 3R: 8,935,126 - 8,938,395 (+)
first exon
oskar
Rep. #1
int. exon
last exon
G
Rep. #1 Rep. #2 Rep. #2 Rep. #3 Rep. #3 30,046,000
30,045,600
30,045,200
percentage of remote coverage peaks in EJC ipaRt libraries
along exon-exon junctions
mean read_coverage
1.5
5%
4.51%
4%
2.74%
3% 2%
1.67% 0.84%
1% 0%
3R: 30,045,229 - 30,046,161 (-)
RpL32
EJC protected RBP protected
all EJC coverage maxima
75% best covered
50% best covered
25% best covered
Figure 3
A.
B. GO category: Biological processes
GO category: Biological processes EJC enriched genes (n = 3108)
p.adjusted 0.05
0
−4
−8 RBP enriched genes (n = 3837) 4
8
12
16
log2(baseMean)
20 40 60
Count 180 200 220 240
−log2(p.adjust) 20 40 60 80
EJC enriched (RBP depleted)
log2FoldChange >1 & p.adj.