The transcriptome-wide landscape and modalities of EJC ... - bioRxiv

0 downloads 0 Views 10MB Size Report
Nov 4, 2018 - showed significant enrichment upon RNA fragmentation (Figure 2C), ... EJC binding in Drosophila cytoplasm maps to canonical deposition.
bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1 2

The transcriptome-wide landscape and modalities of EJC

3

binding in adult Drosophila

4 5 6

Ales Obrdlik1,*, Gen Lin1, Nejc Haberman2, Jernej Ule2,3, and Anne Ephrussi1,*

7 8

1

9 10 11 12 13 14

2

European Molecular Biology Laboratory, Heidelberg, 69117, Germany

Department for Neuromuscular Diseases UCL Institute of Neurology, London WC1N 3BG, United Kingdom 3

The Francis Crick Institute, London NW1 1AT, United Kingdom

* corresponding authors: [email protected], [email protected]

15 16 17

KEYWORDS: Exon Junction Complex, Drosophila EJC, ipaRt, organismal

18

interactome, DSP crosslinking, EJC-ipaRt, cytoplasm, EJC assembly

19

regulation

20 21

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

22

ABSTRACT

23 24

Splicing-dependent assembly of the exon junction complex (EJC) at canonical

25

sites -20 to -24 nucleotides upstream of exon-exon junctions in mRNAs occurs

26

in all higher eukaryotes and affects most major regulatory events in the life of a

27

transcript. In mammalian cell cytoplasm, EJC is essential for efficient RNA

28

surveillance, while in Drosophila the most essential cytoplasmic EJC function

29

is in localization of oskar mRNA. Here we developed a method for isolation of

30

protein complexes and associated RNA-targets (ipaRt), which provides a

31

transcriptome-wide view of RNA binding sites of the fully assembled EJC in

32

adult Drosophila. We find that EJC binds at canonical positions, with highest

33

occupancy on mRNAs from genes comprising multiple splice sites and long

34

introns. Moreover, the occupancy is highest at junctions adjacent to strong

35

splice sites, CG-rich hexamers and RNA structures. These modalities have not

36

been identified by previous studies in mammals, where more binding was seen

37

at non-canonical positions. The most highly occupied transcripts in Drosophila

38

have increased tendency to be maternally localized, and are more likely to

39

derive from genes involved in differentiation or development. Taken together,

40

we identify the RNA modalities that specify EJC assembly in Drosophila on a

41

biologically coherent set of transcripts.

42 43



2

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

44

INTRODUCTION

45 46

The exon junction complex (EJC) consists of a hetero-tetramer core

47

composed of eIF4AIII, Mago, Y14 and Barentsz (Btz) (1,2), and auxiliary factors

48

that form the EJC periphery (3). The complex assembles on mRNAs during

49

splicing, -20 to -24 nucleotides (nt) upstream of exon-exon junctions (4). EJC

50

assembly is a multi-step process that begins with the CWC22-mediated

51

deposition of the DEAD-box helicase eIF4AIII on nascent pre-mRNAs (5-7) and

52

is followed by the recruitment of Mago and Y14, forming a pre-EJC

53

intermediate. The pre-EJC is stably bound to RNA due to the ATPase inhibiting

54

activity of the (non-RNA-binding) Mago-Y14 heterodimer, which “locks” eIF4AIII

55

helicase in its RNA-bound state (1,2,8-10). Once formed, the pre-EJC is

56

completed by recruitment of Barentsz (Btz), forming mature EJCs (1,3,11). The

57

roles of the EJC in post-transcriptional control of gene expression are manifold.

58

In the nucleus, EJC subunits have been shown to play a role in splicing (12-

59

15), mRNA export (16), and nuclear retention of intron-containing RNAs (17).

60

In the cytoplasm, the EJC is reported to play a role in translation (18,19),

61

nonsense-mediated decay (NMD) (7,20-26) and RNA localization (23,27-30).

62

While most of the EJC’s functions appear conserved, in Drosophila the EJC is

63

not crucial for NMD (31), but it is essential for oskar mRNA transport and

64

localization within the developing oocyte (23,27-30,32,33). To better

65

understand the engagement of the EJC in the fly we set out to define the EJC

66

mRNA interactome in adult Drosophila melanogaster using a novel strategy

67

that stabilizes mRNA binding proteins (mRBPs) associated with their RNA

68

templates within multi-protein mRNP assemblies. This method through the use



3

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

69

of the crosslinking agent dithio(bis-) succinimidylpropionate (DSP) captures

70

stable and transient protein interactions in close proximity (34,35) and allows

71

definition of the binding sites of specific protein (holo-) complexes associated

72

with their RNA templates (ipaRt). Indeed, our analysis of EJC protected sites

73

defined by ipaRt reveals that, in Drosophila, EJC binding occurs at canonical

74

deposition sites (4), with a median coordinate ~22 nt upstream of exon-exon

75

junctions. While in mammals EJC mediated protection outside canonical sides

76

has been reported (36,37), we find that in Drosophila the degree of non-

77

canonical EJC mediated RNA protection is minimal. We show in Drosophila

78

that RNA polymerase II transcripts primarily protected by the EJC derive from

79

genes involved in differentiation or development, while mRNAs primarily

80

protected by mRBPs derive from genes with homeostatic functions. Our

81

analysis suggests that the EJC’s bias for transcripts is in Drosophila a

82

consequence of several modalities in the genes’ architecture, particularly the

83

number of splice sites and intron length. Moreover, EJC binding is enhanced

84

by adjacent RNA secondary structures and CUG-rich hexamer sequences

85

located 3’ to the EJC binding site. These modalities have not been identified in

86

previous studies of mammalian EJC binding sites (36-38), which could either

87

reflect the greater specificity of our method for the fully assembled EJC, or could

88

reflect differences in EJC binding between flies and human. Our study provides

89

a first comprehensive transcriptome-wide view of EJC-RNA interactions in a

90

whole organism, which unravels RNA modalities that contribute to the

91

unforeseen biological coherence of the bound transcripts.

92 93

RESULTS



4

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

94 95

Stabilization of the Exon Junction Complex on mRNAs by DSP

96

The EJC is maintained in its RNA-bound state through the direct

97

interaction of the Mago-Y14 hetero-dimer with the otherwise dynamically

98

binding RNA helicase eIF4AIII (1-3,9,22,39). EJC binding to RNA is labile, as

99

under stringent washing conditions (~ 1M salt concentrations) the interaction of

100

eIF4AIII and Mago-Y14 is abolished and the RNA is released from the complex

101

(36). We therefore hypothesized that the EJC might be stabilized on its RNA

102

targets by introducing covalent bonds between the Mago-Y14 heterodimer and

103

eIF4AIII, which would render the protein-RNA complex resistant to the high salt

104

concentrations commonly used in iCLIP studies. Furthermore, a stabilized EJC

105

complex would enable us to “pull” on EJC subunits other than the RNA-binding

106

eIF4AIII, ensuring isolation of the complex under stringent conditions. To test

107

this

108

succinimidylpropionate (DSP), which crosslinks reversibly primary amino

109

groups of polypeptides in close proximity (34,35). We isolated poly(A)

110

containing mRNPs on an oligo d(T)25 resin (40,41) from cytoplasm of adult

111

Drosophila either kept untreated or treated with UV, DSP, or UV plus DSP

112

(Figure 1). SDS-PAGE silver staining and western analyses of mRNA-RNP

113

precipitates (Figure 1A,B) revealed that irradiation of cytoplasmic lysates by UV

114

ex vivo only marginally increased co-precipitation of proteins with poly(A)

115

containing RNAs (Figure 1A, lane 7 and 8). Upon UV irradiation, only faint

116

signals of known RBPs such as eIF4AIII and cytoplasmic poly(A) binding

117

protein (PABP) were detected in the poly(A) RNA precipitates. Non RNA-

118

binding EJC subunits such as Y14 were not detected (Figure 1B, lane 7 and 8,



we

made

use

of

the

bivalent

crosslinking

agent

dithio(bis-)

5

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

119

Figure 1C) in agreement with previous observations (41). In contrast, treatment

120

of cytoplasmic lysates with DSP led to strong protein co-precipitation with

121

mRNAs (Figure 1A, compare lanes 7-10). Western analysis of precipitates from

122

DSP and UV-DSP treated cytoplasmic lysates revealed strong signals not only

123

for direct mRNA binding proteins such as eIF4AIII and PABP, but also mRNP

124

components not directly bound to RNA, such as Y14 (compare Figure 1A-B,

125

lanes 7-10, Figure 1C). In none of the precipitates were cytoplasmic proteins

126

such as kinesin heavy chain (Khc) or the small ribosomal subunit protein RpS6

127

observed (Figure 1B, lanes 7-10, Figure 1C), confirming the stringency of the

128

assay. Furthermore, precipitates from the “beads” only control were free of all

129

proteins tested (Figure 1A, lane 6), indicating that artifacts due to DSP or UV

130

derived treatment are unlikely. These observations suggest that, for EJC

131

stabilization on RNA, DSP-mediated covalent bond formation between

132

individual EJC subunits is superior to UV-crosslinking and support the use of

133

this agent when studying other mRNP assemblies (Figure 1D).

134 135

ipaRt: a novel approach for high quality isolation of EJC complexes

136

associated to RNA templates

137

Of all the tagged EJC subunits we tested in the fly, GFP-Mago showed

138

the highest degree of incorporation into endogenous EJCs (Figure S1B).

139

Furthermore, the eIF4AIII subunit was additionally found to co-sediment with

140

polysome fractions in sucrose density gradients, independently of Mago-Y14

141

(Figure S1C), suggesting that the DEAD-box helicase might have yet unknown

142

EJC independent functions in the fly (see Supplementary Results). Therefore,

143

we carried out EJC specific RNA immunoprecipitation (RIP) from DSP treated



6

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

144

cytoplasmic extracts prepared from GFP-Mago and GFP-tag expressing flies.

145

By titrating salt and detergent concentrations, we identified stringent washing

146

conditions (see Materials and Methods) that yielded high quality RNA profiles

147

from GFP-Mago RIPs and only scant RNA profiles from GFP control RIPs,

148

compared with standard IP washing conditions (Figure 2A; compare lanes 2-

149

5). To test whether the presence of RNA in GFP-Mago precipitates was a

150

consequence of its incorporation into the EJC rather than by virtue of transient

151

interactions of Mago with other RBPs, we treated the DSP treated or untreated

152

lysates immunoprecipitates with RNAseI. Western analysis revealed the

153

presence of all tested EJC subunits in the GFP-Mago precipitates (Figure 2B,

154

lanes 7-8), but no protein other than GFP itself in the GFP only controls (Figure

155

2B, lanes 3-4). In contrast we detected no signal for the proteins probed in the

156

“beads only” control precipitates (Figure 2B, lanes 2 and 6) and observed

157

signals for the RNA non-binding Khc only in lysate inputs (Figure 2B, compare

158

lanes 1, 5; lanes 2-4, 6-8), indicating high stringency of the assay.

159

The stabilizing effect of DSP on the EJC is evident from the enhanced

160

eIF4AIII signals in GFP-Mago precipitates when cytoplasm was treated with

161

DSP prior to immunoprecipitation (Figure 2B, lanes 5, 7-8). Conversely, the

162

PABP signal in the GFP-Mago precipitates disappeared upon incubation with

163

RNase of both the DSP treated and the untreated samples (Figure 2B, lanes 1,

164

5; lanes 2-4, 6-8). This shows that even when exposed to DSP, proteins whose

165

association with the EJC is bridged by RNA can be removed by RNA

166

fragmentation.

167

To confirm the “cleansing effect” of RNAseI, we perfomed IPs from DSP

168

treated cytoplasm with or without an RNA fragmentation step and analyzed the



7

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

169

precipitates by tandem mass spectrometry (MS). Expression set analysis of MS

170

signals obtained in GFP-Mago and GFP control precipitates identified 45

171

versus 35 significantly enriched proteins in the RNase untreated and treated

172

samples respectively (Figure 2C, Figure S3). While all EJC subunits were

173

enriched independently of RNA integrity (Figure 2C), only upon RNA

174

fragmentation was Btz enriched to a similar degree as Mago, Y14, and eIF4AIII.

175

Except for the poly(A) binding protein Nab2 (42), no EJC-unrelated RBPs

176

showed significant enrichment upon RNA fragmentation (Figure 2C), showing

177

that RNA fragmentation by RNaseI increases the sensitivity and specificity of

178

EJC IPs.

179

Based on these findings, we conclude that DSP is a suitable tool for

180

stablilization and isolation of EJC RNA complexes from animal tissues and that

181

our protocol provides a reliable approach for isolation of proteins (or protein

182

complexes) associated to their RNA targets. Consequently we termed our

183

experimental strategy “ipaRt”.

184 185

EJC binding in Drosophila cytoplasm maps to canonical deposition

186

sites

187

We sought to determine which sites in the Drosophila transcriptome are

188

protected by the EJC as opposed to other mRNA binding proteins (mRBPs).

189

To this end, we performed EJC ipaRt and oligo(dT) mediated mRNP capture in

190

parallel, both followed by an RNase digestion step (see Material and Methods),

191

and constructed cDNA libraries of the protein-protected RNA fragments as

192

previously described for iCLIP (43).

193

revealed that more than 92% of all reads aligned uniquely to the Drosophila

194

genome (Figure S3B). 85% of EJC ipaRt reads mapped to exons as opposed



Analysis of the sequencing results

8

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

195

to 34% in the mRBP footprinting mapped reads, indicating specificity of the

196

ipaRt library (Figure 3A).

197

To define the median binding coordinates of the protected sites, we

198

determined the sequence coverage +/- 50 nt of exon-exon junctions in EJC

199

ipaRt and mRNP footprinting (Figure 3B) and averaged the coverage profile

200

over all junctions. The mean coverage profile in mRBP footprinting appeared

201

evenly distributed, indicating an absence of protection bias (Figure 3B). In

202

contrast, the protected sites in EJC ipaRt were located almost exclusively in

203

upstream exons, with an EJC coverage median -21,7 nt 5’ to the exon-exon

204

junction (Figure 3B), consistent with previous studies (4,16,44). Similarly, we

205

estimated a 13nt long region of saturated RNA protection in the EJC RNA

206

footprints, from -27 to -15nt 5’ of the exon-exon junction (Figure S3D) (4,8,36).

207 208

Protection sites remote of exon-exon junctions in EJC ipaRt are not of

209

EJC origin

210

Studies of mammalian EJCs have reported a high frequency of EJC-

211

mediated protection outside of canonical binding sites (non-canonical EJC

212

deposition sites) (36,37). To test whether this non-canonical distribution is

213

representative of EJC protection across the Drosophila transcriptome, we

214

determined coverage maxima for every exon-exon junction protected by EJC.

215

The majority (~ 95.5%) of protection peaks in EJC ipaRt libraries mapped to

216

sites of canonical EJC binding, proximal to exon-exon junctions (Figure 3 C, D).

217

The remaining ipaRt protection coverage peaks were located remotely (>50nt)

218

of canonical EJC binding regions (Figure 3D,G and S4A).



9

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

219

Our analysis shows that proximal peaks map mainly to internal exons

220

(79%), to first exons (20.6%), and only minimally to terminal exons (0.4%), as

221

expected given the splicing-dependent deposition of EJCs upstream of splice

222

junctions. Remote protection site peaks mostly mapped to internal exons (67%)

223

and last exons (25%), and only minimally (8%) to first exons of the bound

224

mRNAs (Figure 3F).

225

Three main features characterized the remote peaks in our EJC ipaRt

226

libraries: low abundance, low sequencing coverage (Figure 3E,G) and relatively

227

low expression compared with proximal peaks (Figure S4C,D). Further analysis

228

using AME (45) revealed that the remote peaks in EJC ipaRt are significantly

229

enriched in RNA binding motifs corresponding to known Drosophila splicing

230

regulators (Figure S4E), whose binding might be a consequence of DSP

231

crosslinking and co-purification due to direct interaction with the EJC. Our

232

analysis shows that, in contrast to EJC in mammals (36,37), the proportion of

233

remote EJC binding sites is negligible in Drosophila.

234 235

EJCs mark multi-intron genes important for differentiation and

236

development

237

We determined which RNAs are bound preferentially by EJC versus

238

other RBPs using DESeq (46). This revealed a bias in EJC binding towards

239

mRNAs of protein coding genes comprising greater than 1 exon (Figure S5D)

240

and identified 3332 enriched and 4436 depleted genes (FDR < 0.05 and log2

241

fold change > log2(1.5)). Gene ontology (GO) term analysis of EJC enriched

242

genes (Figure 4A) revealed significant association with genes involved in

243

development or specialized cellular functions such as cell polarity,



10

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

244

differentiation and cell signaling. In contrast, genes with homeostatic functions,

245

involved in cytoskeletal and chromatin organization, in transcription or

246

translation, and metabolic processes are biased for protection by other RBPs

247

(Figure 4B, compare left and right plot). The bias of EJC binding to mRNAs

248

from genes with functions in development and cell polarity, suggested that

249

transcripts under spatial or temporal control might also show a preference for

250

EJC binding. Consistent with this hypothesis, analysis of EJC and RBP

251

protection sites on Drosophila transcripts annotated in the FlyFISH RNA

252

localization database (47,48) revealed that localized maternal mRNAs are more

253

likely to be bound by EJC than non-localizing transcripts (Figure 4C).

254

It was reported that in mammalian cells EJCs are enriched at mRNAs

255

from alternatively spliced genes (38). In the fly, the EJC has been reported to

256

promote correct splicing of long intron-containing genes (12,14). This

257

relationship between EJCs and gene architecture led us to ask which features

258

could explain the gene-to-gene variation in EJC deposition. We used a multiple

259

regression model (see Materials and Methods) to assess how five features

260

(number of introns, maximum intron length, transcript abundance, transcript

261

length and the degree of alternative splicing) influence gene deposition of EJC.

262

We checked that the effects estimated from the model holds given underlying

263

correlations of the features. For example, after accounting for intron number,

264

which has a strong effect on EJC binding (Figure S6A), we determined that

265

intron length has a strong effect on EJC deposition (Figure S6B), while

266

alternative splicing has only a minimal effect on EJC binding (Figure S6C). We

267

assessed the relative importance of each feature from the multivariate

268

regression analysis (Figure 4D). Two features dominated our model of EJC



11

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

269

deposition: the number of introns per gene and the maximum intron length of

270

the gene (Figure 4D) are positively associated with EJC deposition. The

271

enhancement of EJC deposition with intron length agrees with previous studies

272

showing that splicing of long intron-containing genes is sensitive to EJC activity

273

(12,14).

274 275

Splice site strength and hexamer composition influence EJC deposition

276

We estimated enrichment of EJC at the junction level using reads within

277

+/- 50 nt of the splice site and observed a strong dependency on the gene EJC

278

estimate (Pearson correlation = 0.66, p < 2e-16). This suggests that the

279

junction’s EJC profile is primarily determined by its parent gene architecture.

280

We next tested whether any exon-exon junction deviates significantly from its

281

parent gene EJC binding. ~31% of detected junctions have an enrichment that

282

deviates significantly from the gene level (Figure 5A). Furthermore we observed

283

that EJCs reads cover only a subset of junctions within a gene and show higher

284

read-coverage coefficient of variation than reads protected by other RBPs

285

(Figure S6D and S6E).

286

We asked if a known sequence context related to splicing might be

287

responsible for the variability in EJC binding to exon-exon junctions within a

288

gene. We found that strong 5’ and 3’ splice site signals, and presence of intronic

289

5’ ISE correlates with increased EJC deposition, while the presence of an

290

exonic 5’ ESS correlates with reduced EJC deposition (Table S1, Figure 7A),

291

consistent with the fact that EJC assembly is dependent on splicing.

292

Surprisingly, 5’ and 3’ ESEs and intronic ISEs at 3’ splice sites have no effect.

293

Next we tested the effects of unannotated hexamers, while accounting for ESS,



12

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

294

ISE and splice strength. We detected 63 hexamers that are associated with a

295

change in EJC deposition. By clustering the hexamers based on sequence

296

similarity, two major groups with either a negative effect or a positive effect on

297

EJC assembly emerged (Figure 5B). Strikingly, 5 out of the 16 hexamers

298

associated with increased EJC deposition contain the trinucleotide CUG, and

299

28 out of the 47 hexamers associated with decreased EJC deposition contain

300

the trinucleotide UUU (Figure 5B). The strongest effect of these CUG or UUU

301

trinucleotides is observed when they are present around the region

302

downstream of EJC binding (approximately -16 to -18nt). This indicates that the

303

sequence composition of this region is a strong determinant of EJC binding

304

(Figure 5C).

305 306

RNA structure modulates the degree and position of EJC binding in

307

Drosophila

308

Deposition of an EJC at the first exon-exon junction and presence of a

309

structured element next to the deposition site are required for localization of

310

oskar mRNA at the posterior pole of the Drosophila oocyte (30,49).

311

Interestingly, we observed a 2.38-fold stronger enrichment of the first oskar

312

exon-exon junction than anticipated from oskar mRNA enrichment in our data,

313

indicating that structures near EJC binding sites might affect EJC assembly.

314

To test whether RNA structures might have an impact on EJC binding,

315

we estimated for every junction the probability of base pairing for each

316

nucleotide -37 to +28 bp of the splice site. We observed three distinct average

317

base pairing probability (bpp) profiles for exon-exon junctions with unaffected,

318

positively correlated, or negatively correlated EJC binding (Figure 6A). Two



13

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

319

regions showed significantly different bpps for junctions with a positive versus

320

a negative effect on EJC binding (Figure 6B). The first region, located in the

321

canonical EJC binding site, showed a decreased bpp for junctions with a

322

positive EJC binding effect (Figure 6A,B) and increased bpp for junctions with

323

negative EJC binding effect (Figure 6A,B). Surprisingly the second region,

324

located directly downstream of the canonical deposition site (Figure 6A,B),

325

showed an elevated bpp near junctions with positive EJC binding, but

326

decreased bpp at junctions with negative EJC binding (Figure 6A). This result

327

indicates that while EJC binding in Drosophila occurs on single stranded RNA

328

(ssRNA), in agreement with previous reports (1,9), EJC binding to RNA may be

329

enhanced by RNA secondary structures proximal to the EJC binding site.

330

Given the redundant information between bpp of each nt pair, we

331

performed dimension reduction on bpp profiles within the -24 to -11 region

332

(Materials and Methods, Figure 6B) using a Gaussian mixture model, to

333

facilitate subsequent analysis of EJC binding. We obtained 4 folding categories

334

(Figure 6C) and observed an association between significant positive EJC

335

binding (log2fold-change >log2(1.5)) and junctions harboring folding categories

336

2 and 3, which contain a bpp elevation downstream of, or surrounding, EJC

337

binding sites (Figure 6C, D). Junctions with an unstructured profile (folding

338

category 1) show no such bias, and junctions with a bpp elevation in the EJC

339

binding site (category 4) are associated with a negative EJC binding effect. This

340

suggests that, when located near EJC deposition sites, RNA structures may

341

positively affect EJC binding in Drosophila (Figure 6C, 6D lanes 2 and 3) and

342

could explain the enhanced binding of EJC to the first exon-exon junction in

343

oskar mRNA.



14

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

344

Previous in vitro experiments have shown that RNA secondary

345

structures affect EJC assembly site coordinates (50). We asked whether

346

predicted stable RNA stem structures within or flanking a canonical EJC binding

347

site at exon-exon junctions have any impact on the precise coordinates of EJC

348

assembly in the fly. For this we estimated potential dsRNA content in region A,

349

spanning from -30 to -21 and for region B spanning from -23 to -14 within exon-

350

exon junctions (see sketch in Figure 6E). We focused on junctions with a

351

positive EJC binding effect to allow robust estimates based on junctions with

352

high coverage. The analysis revealed a strong downstream shift of EJC

353

assembly coordinates when dsRNA content in region A was high and low in

354

region B, and only a minor shift when the dsRNA content was low in region A

355

and high in region B (Figure 6F). Taken together, these observations confirm

356

that the structural context of exon-exon junctions not only affects EJC binding

357

efficiency, but also directs the site of EJC assembly.

358 359

Comparison of human and Drosophila datasets reveals common factors

360

that influence EJC enrichment.

361

Previous studies of the EJC in Drosophila and human have reported

362

differences between these two species in terms of EJC protein components

363

and cellular function. We asked whether any of the factors we identified in

364

Drosophila would also influence EJC deposition in human cell lines. Using the

365

tools and models described in previous sections, we analyzed CLIP enrichment

366

of the EJC component BTZ in HeLa cells (51). Results from the multiple

367

regression analysis showed that the number of introns is a major determinant

368

of the gene-to-gene variation in EJC deposition (Figure S8A). This agrees with



15

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

369

our finding in Drosophila and suggests that this mechanism is conserved

370

between Drosophila and humans. In contrast to our observations in the fly, we

371

found that in mammals the extent of alternative splicing in genes has a small

372

but positive effect on EJC deposition (Figure S8A), in agreement with previous

373

findings that alternatively spliced genes are over-represented in EJC enriched

374

genes (37,38). However, in contrast to Drosophila, in humans intron length did

375

not facilitate but rather antagonizes EJC mRNA binding (Figure S8A).

376

Next we investigated the factors that determine variability in EJC

377

deposition within genes (Table S2, Figure S8D). Similar to our Drosophila

378

findings, high 5’ and 3’ splice strength and presence of an ISE at the 5’ junction

379

enhances EJC deposition within a junction. In this analysis, presence of ESEs

380

in the upstream, and of ESS in the downstream 50nt strongly affects EJC

381

deposition, something we did not observe in Drosophila (Figure 7A). A possible

382

explanation for this is that ESE/ESS/ISE sequences are annotated based on

383

experiments in human cell lines and may not exert a similar effect in Drosophila.

384

Among the 238 ESEs, we observed 28 hexamers containing AGAA which is

385

similar to a motif (GAAGA) found in a previous EJC clip study (37,38). We

386

separated ESEs according to presence of AGAA, and indeed found that the

387

ESEs containing AGAA are associated with stronger EJC deposition. We asked

388

whether bpp-profiles differ between junctions with positive and negative EJC

389

binding. We found that an overall negative bpp (-24 to -18) around the EJC

390

deposition site favors EJC deposition (Figure S8B,C). Unlike Drosophila, we

391

did not find other regions whose bpp are associated with increased EJC

392

deposition in mammalian cells. Taken together, these results indicate that

393

across Drosophila and human, not only do conserved regulatory mechanisms



16

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

394

such as intron counts, splice strength, or structural hindrance within EJC

395

deposition sites influence EJC deposition, but also divergent regulatory factors

396

such as ESEs in mammals or RNA folding in proximity of EJC deposition sites

397

in Drosophila can affect EJC deposition.

398 399

Features that inform on EJC binding may predict mRNA localization

400

The bias of EJC binding to mRNAs from genes with functions in

401

development and cell polarity suggested that EJC binding might be indicative

402

of transcripts under spatial or temporal control. Consistent with this hypothesis,

403

analysis of EJC and RBP protection sites on Drosophila transcripts annotated

404

in the FlyFISH RNA localization database (47,48) revealed that localized

405

maternal mRNAs are more bound by EJC than non-localizing transcripts

406

(Figure 4C). We postulated that modalities underlying EJC enrichment at the

407

gene and junction level can inform us about the localization of a transcript and

408

applied decision tree learning on RNA localization using the R package rpart

409

(52). Given the imbalance between localized to non-localized dataset, we used

410

Cohen’s kappa coefficient to assess the predictive value of the model using

411

different data groups. The use of gene features and features of the most

412

enriched EJC junction is sufficient to achieve predictive accuracy comparable

413

to a model incorporating all possible features (Figure 8A). Using the gene and

414

highest EJC covered junction information as a working model, we asked which

415

variables are important in this model. To prevent bias in estimate, the data were

416

bootstrapped 1000 times. We observed that transcript length, maximum intron

417

size in the gene, and 5’ splice strength and folding of the most enriched junction

418

are more useful predictors (Figure 8B) compared to other variables. This



17

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

419

suggests that features of gene architecture that orchestrate EJC deposition can

420

distinguish localizing and non-localizing mRNA and might be important for

421

mRNA localization.

422 423 424

DISCUSSION

425 426

ipaRt: a method for high-confidence identification of protein binding

427

sites on RNAs in vivo

428

We have profiled the landscape of EJC binding across the transcriptome

429

of a whole animal, Drosophila melanogaster, and determined the parameters

430

that influence the distribution of the complex on RNAs in the organism. Previous

431

knowledge of EJC-RNA interactions was based on UV-crosslinking-based

432

experiments in specific cell types grown as homogeneous cultures for the

433

individual studies (37,38,43,53-60). While UV-crosslinking remains a method of

434

choice for identification of protein binding sites on nucleic acids, due to the

435

inefficient penetration of UV light into tissues and organisms, the method is

436

most useful when applied to cells in culture. In contrast, our analysis of EJC

437

distribution in the tissues of whole Drosophila flies was made possible by ipaRt,

438

which employs the crosslinking agent DSP to freeze protein-protein interactions

439

within otherwise dynamic RNP complexes, such as the EJC.

440

We have demonstrated that DSP-mediated covalent bond formation

441

between the RNA helicase eIF4AIII and the Mago-Y14 heterodimer preserves

442

EJCs in their “locked” state on mRNAs (1,2,8,9) and that efficient recovery of

443

the bound RNAs does not require their crosslinking to eIF4AIII using UV light.



18

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

444

Our “ipaRt” approach, like CLIP and iCLIP (53,56), enables highly stringent

445

washing of the samples. In support of the robustness and reliability of our DSP

446

based assays, we demonstrated high reproducibility not only among technical

447

but also biological replicates of EJC ipaRt, as well as mRBP footprinting

448

sequencing results (Figure S5A).

449

Furthermore, ipaRt allows enables use of non-RNA-binding subunits of

450

the EJC, such as Mago, as immunoprecipitation baits. This is highly relevant in

451

the context of studies of the EJC, as we and others have shown that its RNA-

452

binding subunit, the RNA helicase eIF4AIII, may have other, EJC-independent

453

functions in the cell (Figure S1C) (61,62). ipaRt afforded us the option of using

454

Mago as our EJC bait, and indeed this is a main reason for the high-quality

455

definition of the EJC binding landscape in the fly cytoplasm that we have

456

achieved. The protection site reads we obtained from EJC ipaRt map almost

457

exclusively to canonical EJC deposition sites (4) with a median protection ~22nt

458

of the upstream exon’s 3’end. In contrast to mammalian EJC CLIP and RIP

459

studies, in which eIF4AIII served as an immunoprecipitation bait (36,37), EJC

460

ipaRt sequencing reads mapping to regions distant from canonical deposition

461

sites are of low abundance and sequencing coverage. While this discrepancy

462

could reflect differences in EJC engagement in humans and Drosophila, it more

463

likely reflects the choice of bait or the cell compartment in which the analysis

464

was executed. Indeed, a recent study in human cell lines revealed that when

465

the cytoplasmic EJC component Btz was chosen as the immunoprecipitation

466

bait rather than eIF4AIII, the proportion of non-canonical EJC deposition sites

467

was negligible (38).



19

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

468

Finally, in ipaRt the DSP crosslinker is applied ex vivo during tissue

469

disruption, and does not require inhibition of translation in vivo. We therefore

470

consider ipaRt a method of choice for functional investigations of protein-RNA

471

complexes in fully developed organisms and organ tissues.

472 473

Regulation of EJC assembly in Drosophila

474

Through our analysis, we defined factors that contribute to or inhibit EJC

475

assembly on mRNAs and at individual exon-exon junctions in Drosophila. From

476

this we deduce that the landscape of EJC binding to RNAs is sculpted through

477

regulation of EJC assembly at two levels in the fly (Figure 7B).

478

At the upstream regulatory level, the degree to which EJCs are

479

assembled on an mRNA is dictated by the complexity of the gene’s architecture:

480

mRNAs produced from genes of simple architecture are marked by fewer EJCs,

481

while mRNAs from genes of complex architecture, comprising multiple splice

482

sites and long introns are EJC bound to a higher degree. Due to the EJC’s

483

assembly co-transcriptionally during splicing (4-7) it is not surprising that

484

mRNAs from genes containing a larger number of introns are more likely to be

485

bound by EJC. However, it is unprecedented that EJC assembly is enhanced

486

on mRNAs of genes with large introns (Figure 4D). Although the EJC has been

487

implicated to play a role in exon definition in the case of long-intron containing

488

genes in Drosophila (12,14), the fact we observe a preference of EJC binding

489

to the mRNAs, but not to the pre-mRNA splice junctions of long-intron

490

containing genes (Figure S6F) disqualifies this simple explanation. We

491

therefore propose that it is the number and the resting time of co-

492

transcriptionally assembled spliceosomes at active RNA pol II sites that



20

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

493

determines the likelihood of CWC22-dependent eIF4AIII recruitment and

494

deposition, and thereby ascertains the degree of EJC assembly at mRNAs

495

(Figure 7B).

496

At the downstream regulatory level, after EJC assembly rates at

497

transcripts are defined, deposition of EJCs along mRNA exon-exon junctions is

498

modulated by the structural and sequence context of the splice sites (Figure 7

499

A,B). dsRNA stem structures in exon-exon junctions of Drosophila mRNAs

500

either antagonize EJC assembly when present within canonical EJC deposition

501

sites, or enhance EJC assembly when located in the vicinity of the EJC

502

deposition site (Figure 6D, 7A). Absence of dsRNA within the EJC binding

503

moiety is in agreement with reported preference of EJCs for ssRNA (1,9,50);

504

yet it remains to be elucidated, how and why EJC binding is positively affected

505

when RNA-stem structures are found in its direct proximity on the bound

506

template.

507

While it is likely that the structural context of exon-exon junctions in

508

Drosophila directly influences the degree of EJC assembly, sequence

509

composition derived effects on EJC binding to mRNA are the consequence of

510

the assigned roles of these sequences during pre-mRNA splicing. We have

511

demonstrated that exon-exon junctions with strong 5’ and strong 3’ splice sites

512

(ss) are biased towards junctions with enhanced EJC binding (Table S1, Figure

513

7A). Importantly regulation of weak 5’ and 3’ ss which commonly occur at

514

alternatively spliced junctions, cis-acting splicing regulatory elements (SREs)

515

were shown to be of importance (63-65). In light of the negative impact of

516

alternative splicing at the level of EJC mRNA binding (Figure 4D), it is not

517

surprising that conventional ESEs and ESSs hardly affect EJC binding at the



21

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

518

level of individual exon-exon junctions. However, whether the position-

519

dependent bias mediated by the UUU-triplet and CUG-triplet containing

520

hexamers towards inhibited or enhanced EJC binding which we have

521

discovered in our Drosophila data set (Figure 5B, C) is due to a direct or indirect

522

influence of these hexamers on splicing remains to be addressed. UUU-triplet

523

containing hexamers which are strongly biased towards inhibition of EJC

524

binding, could potentially function as yet undefined 5’ESS in Drosophila.

525

Interestingly, CUG-triplet containing hexamers which are strongly biased

526

towards enhanced EJC binding, share some sequence similarity with a

527

previously predicted CUG containing 5’ ESE of short intron splice sites (64). It

528

therefore appears likely that the CUG-triplet and UUU-triplet hexamers exert

529

their effect on EJC binding as a yet undefined and novel class of SREs.

530 531

RNA modalities influencing EJC binding in mammals and fly

532

In agreement with previous reports in mammals (36-38,66), we find that

533

the extent of EJC occupancy varies between mRNAs and exon-exon junctions

534

also in Drosophila. The splice site score next to a junction correlates with

535

increased EJC deposition in the fly, and this relationship between splicing

536

efficiency and EJC deposition has been proposed also in previous mammalian

537

studies (67,68). Analysis of published mammalian Btz iCLIP data (38) revealed

538

several modalities that correlate with the increased binding landscape of the

539

EJC on mRNAs in mammals as well as in Drosophila, including the high number

540

of introns, high transcript abundance and the sequence context of individual

541

exon-exon junctions (Figure S8). Interestingly, the presence of long introns has

542

a slightly negative effect, and the amount of alternative splicing a slightly



22

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

543

positive effect on EJC occupancy in mammals (Figure 4D, S8); the latter is in

544

agreement with previous observations (36-38). Previous studies on mammalian

545

tissue cell cultures have reported that EJC enriched junctions contain a

546

relatively high proportion of so called non-canonical protection sites, which

547

were enriched for RBP consensus sequences of the SR protein family (36,37).

548

Our analysis of Btz iCLIP data from mammals (38) confirms that the presence

549

of ESEs in upstream exons and 5’ISEs in introns correlates with enhanced EJC

550

binding (Table S2). Moreover, we have identified a group of junctions in

551

mammals containing AGAA hexamers that are biased for enhanced EJC

552

binding (Figure S7A), but their effects are not especially strong near the

553

canonical EJC deposition site (Figure S7B). These hexamers match the AGAA

554

encompassing consensus sequence of the mammalian SR protein SRSF10

555

known to function as splicing enhancers (69,70) and have been found

556

previously in EJC bound exon-exon junctions (36-38). Not only do our in silico

557

results agree with these reports and support the proposed cooperative binding

558

of EJC with SR proteins (36-38), they also partially explain the EJC’s preference

559

in mammals for alternatively spliced mRNAs.

560

One observation deriving from the analysis of published mammalian Btz

561

iCLIP data sets is surprising, however. While we have observed junctions in

562

Drosophila to be enhanced or inhibited in EJC binding by specific base pairing

563

probability (bpp)-profiles, thus by specific RNA folding categories (Figure 6A-

564

D), we could not detect any striking differences between overall bpp-profiles of

565

exon-exon junctions with enhanced or inhibited EJC binding in mammals

566

(Figure S8B). Indeed, the only aspect of RNA structure shared by mammals

567

and Drosophila is the negative effect of dsRNA when directly overlapping with



23

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

568

the canonical EJC deposition site (Figure S8C) (50). In Drosophila, however,

569

the presence of dsRNA close to canonical deposition sites enhances EJC

570

binding, an effect that is not observed in mammalian cells.

571 572

Insights into evolution and divergence of EJC functions

573

Our findings regarding the differences in the RNA modalities enriched at

574

highly occupied mammalian and Drosophila EJC sites provide insight into the

575

expansion of EJC functions during eukaryotic evolution. Spliceosome catalyzed

576

splicing reactions are bidirectional and efficient formation of exon-exon

577

junctions during RNA maturation is achieved by Prp22-induced release of

578

spliceosomes from mRNAs (71-73). Importantly, EJC is absent in organisms

579

with low rates of RNA splicing such as Saccharomyces cerevisiae, but present

580

in organisms with high splicing rates such as Schizosaccharomyces pombe

581

(74-76). This suggests that with the increased demand for splicing accuracy in

582

higher eukaryotes, the EJC evolved to function as an exon-exon junction “lock”

583

inhibiting spliceosome reassembly. Ensuring splicing reaction directionality, the

584

EJC further evolved to become a central component of the NMD pathway (7,20-

585

26) in mammals, where more than 95% of all genes are alternatively spliced

586

(77). This may provide an explanation as to why EJCs in mammals are enriched

587

on alternatively spliced transcripts. In Drosophila, where only 30% of all genes

588

appear to be alternatively spliced (78), the EJC is not a component of the main

589

NMD pathway (31). We therefore propose that although the EJC-NMD pathway

590

evolved before the segregation of the proto- and deuterostome clades, it gained

591

importance to complement the faux 3’UTR-NMD pathway during the evolution



24

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

592

of vertebrates (79), where RNA surveillance and spatio-temporal control of

593

gene expression are essential.

594

Similarly, the recruitment of EJC and interacting proteins upon splicing

595

of mRNA to facilitate the localization of the mRNA so far seems exclusive to

596

Drosophila. Two Drosophila specific features that modulate EJC binding,

597

namely the presence of a large intron and secondary structure near the

598

junction, are also predictive of mRNA localization. Although the precise strength

599

of association between these features and mRNA localization remains to be

600

verified with larger and more quantitative datasets, previous studies with the

601

SOLE in oskar RNA have shown that RNA structure and EJC binding are

602

indeed crucial for oskar mRNA localization (30).

603 604 605

ACKNOWLEDGMENTS

606

We thank Marco Blanchette, Matthias Hentze, Isabel Palacios, and Jean-Yves

607

Roignant for their gifts of antibodies and fly stocks. We thank Sandra Müller,

608

Alessandra Reversi and Anna Cyrklaff for technical assistance. We thank

609

Vladimir Benes and the EMBL Gene Core Facility for their advice and service.

610

We thank Mandy Rettel, Frank Stein and the EMBL Proteomics Core Facility,

611

for their service and assistance with mass spectrometry analysis. We are

612

grateful to members of the Ephrussi lab for discussions during the course of

613

the work.

614 615 616

AUTHOR CONTRIBUTIONS



25

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

617

Conceived and designed wet lab experiments: AO. Performed the wet lab

618

experiments: AO. Contributed reagents/materials/analysis tools: AO, LG, NH.

619

Conceived and designed the in silico analyses: LG, NH, AO. Analyzed and

620

interpreted the results: AO, LG, NH, AE. Contributed to the writing of the

621

manuscript: AO, LG, NH, JU, AE.

622 623 624

FUNDING

625

AO

626

Vetenskapsradet (Reg. No. 2010-6728) and from Marie Curie Actions (FP7-

627

PEOPLE-IEF No. 2763207), and by a Deutsche Forschungsgemeinschaft

628

Network Grant (DFG FOR 2333). This study was funded by the DFG (FOR

629

2333) and the EMBL.

was

supported

by

postdoctoral

fellowships

from

the

Swedish

630 631 632

CONFLICT OF INTEREST

633 634

The authors declare that they have no conflicting or competing interests.

635 636 637

REFERENCES

638 639

1.

Bono, F., Ebert, J., Lorentzen, E. and Conti, E. (2006) The crystal

640

structure of the exon junction complex reveals how it maintains a stable

641

grip on mRNA. Cell, 126, 713-725.



26

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

642

2.

Stroupe, M.E., Tange, T.O., Thomas, D.R., Moore, M.J. and Grigorieff,

643

N. (2006) The three-dimensional arcitecture of the EJC core. Journal of

644

molecular biology, 360, 743-749.

645

3.

Tange, T.O., Shibuya, T., Jurica, M.S. and Moore, M.J. (2005)

646

Biochemical analysis of the EJC reveals two new factors and a stable

647

tetrameric protein core. Rna, 11, 1869-1883.

648

4.

Le Hir, H., Izaurralde, E., Maquat, L.E. and Moore, M.J. (2000) The

649

spliceosome deposits multiple proteins 20-24 nucleotides upstream of

650

mRNA exon-exon junctions. The EMBO journal, 19, 6860-6869.

651

5.

Steckelberg, A.L., Altmueller, J., Dieterich, C. and Gehring, N.H. (2015)

652

CWC22-dependent pre-mRNA splicing and eIF4A3 binding enables

653

global deposition of exon junction complexes. Nucleic acids research,

654

43, 4687-4700.

655

6.

Barbosa, I., Haque, N., Fiorini, F., Barrandon, C., Tomasetto, C.,

656

Blanchette, M. and Le Hir, H. (2012) Human CWC22 escorts the

657

helicase eIF4AIII to spliceosomes and promotes exon junction complex

658

assembly. Nature structural & molecular biology, 19, 983-990.

659

7.

Alexandrov, A., Colognori, D., Shu, M.D. and Steitz, J.A. (2012) Human

660

spliceosomal protein CWC22 plays a role in coupling splicing to exon

661

junction

662

Proceedings of the National Academy of Sciences of the United States

663

of America, 109, 21313-21318.

664

8.

665

complex

deposition

and

nonsense-mediated

decay.

Ballut, L., Marchadier, B., Baguet, A., Tomasetto, C., Seraphin, B. and Le Hir, H. (2005) The exon junction core complex is locked onto RNA by



27

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

666

inhibition of eIF4AIII ATPase activity. Nature structural & molecular

667

biology, 12, 861-869.

668

9.

Andersen, C.B.F., Ballut, L., Johansen, J.S., Chamieh, H., Nielsen, K.H.,

669

Oliveira, C.L.P., Pedersen, J.S., Séraphin, B., Hir, H.L. and Andersen,

670

G.R. (2006) Structure of the Exon Junction Core Complex with a

671

Trapped DEAD-Box ATPase Bound to RNA. Science, 313, 1968-1972.

672

10.

Andersen, C.B., Ballut, L., Johansen, J.S., Chamieh, H., Nielsen, K.H.,

673

Oliveira, C.L., Pedersen, J.S., Seraphin, B., Le Hir, H. and Andersen,

674

G.R. (2006) Structure of the exon junction core complex with a trapped

675

DEAD-box ATPase bound to RNA. Science, 313, 1968-1972.

676

11.

Bono, F. and Gehring, N.H. (2011) Assembly, disassembly and

677

recycling: the dynamics of exon junction complexes. RNA biology, 8, 24-

678

29.

679

12.

Roignant, J.Y. and Treisman, J.E. (2010) Exon junction complex

680

subunits are required to splice Drosophila MAP kinase, a large

681

heterochromatic gene. Cell, 143, 238-250.

682

13.

Lloret-Llinares, M., Mapendano, C.K., Martlev, L.H., Lykke-Andersen, S.

683

and Jensen, T.H. (2016) Relationships between PROMPT and gene

684

expression. RNA biology, 13, 6-14.

685

14.

Ashton-Beaucage, D., Udell, C.M., Lavoie, H., Baril, C., Lefrancois, M.,

686

Chagnon, P., Gendron, P., Caron-Lizotte, O., Bonneil, E., Thibault, P. et

687

al. (2010) The exon junction complex controls the splicing of MAPK and

688

other long intron-containing transcripts in Drosophila. Cell, 143, 251-262.



28

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

689

15.

Ashton-Beaucage, D. and Therrien, M. (2011) The exon junction

690

complex: a splicing factor for long intron containing transcripts? Fly, 5,

691

224-233.

692

16.

Gatfield, D., Le Hir, H., Schmitt, C., Braun, I.C., Kocher, T., Wilm, M. and

693

Izaurralde, E. (2001) The DExH/D box protein HEL/UAP56 is essential

694

for mRNA nuclear export in Drosophila. Current biology : CB, 11, 1716-

695

1721.

696

17.

Shiimori, M., Inoue, K. and Sakamoto, H. (2013) A specific set of exon

697

junction complex subunits is required for the nuclear retention of

698

unspliced RNAs in Caenorhabditis elegans. Molecular and cellular

699

biology, 33, 444-456.

700

18.

Nott, A., Le Hir, H. and Moore, M.J. (2004) Splicing enhances translation

701

in mammalian cells: an additional function of the exon junction complex.

702

Genes & development, 18, 210-222.

703

19.

Chazal, P.E., Daguenet, E., Wendling, C., Ulryck, N., Tomasetto, C.,

704

Sargueil, B. and Le Hir, H. (2013) EJC core component MLN51 interacts

705

with eIF3 and activates translation. Proceedings of the National

706

Academy of Sciences of the United States of America, 110, 5903-5908.

707

20.

Gehring, N.H., Kunz, J.B., Neu-Yilik, G., Breit, S., Viegas, M.H., Hentze,

708

M.W. and Kulozik, A.E. Exon-Junction Complex Components Specify

709

Distinct Routes of Nonsense-Mediated mRNA Decay with Differential

710

Cofactor Requirements. Molecular cell, 20, 65-75.

711

21.

712

Singh, G., Jakob, S., Kleedehn, M.G. and Lykke-Andersen, J. (2007) Communication with the exon-junction complex and activation of



29

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

713

nonsense-mediated decay by human Upf proteins occur in the

714

cytoplasm. Molecular cell, 27, 780-792.

715

22.

Shibuya, T., Tange, T.O., Stroupe, M.E. and Moore, M.J. (2006)

716

Mutational analysis of human eIF4AIII identifies regions necessary for

717

exon junction complex formation and nonsense-mediated mRNA decay.

718

Rna, 12, 360-374.

719

23.

Palacios, I.M., Gatfield, D., St Johnston, D. and Izaurralde, E. (2004) An

720

eIF4AIII-containing complex required for mRNA localization and

721

nonsense-mediated mRNA decay. Nature, 427, 753-757.

722

24.

Okada-Katsuhata, Y., Yamashita, A., Kutsuzawa, K., Izumi, N.,

723

Hirahara, F. and Ohno, S. (2012) N- and C-terminal Upf1

724

phosphorylations create binding platforms for SMG-6 and SMG-5:SMG-

725

7 during NMD. Nucleic acids research, 40, 1251-1266.

726

25.

Melero, R., Buchwald, G., Castano, R., Raabe, M., Gil, D., Lazaro, M.,

727

Urlaub, H., Conti, E. and Llorca, O. (2012) The cryo-EM structure of the

728

UPF-EJC complex shows UPF1 poised toward the RNA 3' end. Nature

729

structural & molecular biology, 19, 498-505, S491-492.

730

26.

Buchwald, G., Ebert, J., Basquin, C., Sauliere, J., Jayachandran, U.,

731

Bono, F., Le Hir, H. and Conti, E. (2010) Insights into the recruitment of

732

the NMD machinery from the crystal structure of a core EJC-UPF3b

733

complex. Proceedings of the National Academy of Sciences of the

734

United States of America, 107, 10050-10055.

735

27.

736

van Eeden, F.J., Palacios, I.M., Petronczki, M., Weston, M.J. and St Johnston, D. (2001) Barentsz is essential for the posterior localization of



30

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

737

oskar mRNA and colocalizes with it to the posterior pole. The Journal of

738

cell biology, 154, 511-523.

739

28.

740 741

Hachet, O. and Ephrussi, A. (2004) Splicing of oskar RNA in the nucleus is coupled to its cytoplasmic localization. Nature, 428, 959-963.

29.

Hachet, O. and Ephrussi, A. (2001) Drosophila Y14 shuttles to the

742

posterior of the oocyte and is required for oskar mRNA transport. Current

743

biology : CB, 11, 1666-1674.

744

30.

Ghosh, S., Marchand, V., Gaspar, I. and Ephrussi, A. (2012) Control of

745

RNP motility and localization by a splicing-dependent structure in oskar

746

mRNA. Nature structural & molecular biology, 19, 441-449.

747

31.

Behm-Ansmant, I., Gatfield, D., Rehwinkel, J., Hilgers, V. and Izaurralde,

748

E. (2007) A conserved role for cytoplasmic poly(A)-binding protein 1

749

(PABPC1) in nonsense-mediated mRNA decay. The EMBO journal, 26,

750

1591-1601.

751

32.

Zimyanin, V.L., Belaya, K., Pecreaux, J., Gilchrist, M.J., Clark, A., Davis,

752

I. and St Johnston, D. (2008) In vivo imaging of oskar mRNA transport

753

reveals the mechanism of posterior localization. Cell, 134, 843-853.

754

33.

Ghosh, S., Obrdlik, A., Marchand, V. and Ephrussi, A. (2014) The EJC

755

binding and dissociating activity of PYM is regulated in Drosophila. PLoS

756

genetics, 10, e1004455.

757

34.

Lomant, A.J. and Fairbanks, G. (1976) Chemical probes of extended

758

biological structures: synthesis and properties of the cleavable protein

759

cross-linking reagent [35S]dithiobis(succinimidyl propionate). Journal of

760

molecular biology, 104, 243-261.



31

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

761

35.

Schweizer, E., Angst, W. and Lutz, H.U. (1982) Glycoprotein topology

762

on intact human red blood cells reevaluated by cross-linking following

763

amino group supplementation. Biochemistry, 21, 6807-6818.

764

36.

Singh, G., Kucukural, A., Cenik, C., Leszyk, J.D., Shaffer, S.A., Weng,

765

Z. and Moore, M.J. (2012) The cellular EJC interactome reveals higher-

766

order mRNP structure and an EJC-SR protein nexus. Cell, 151, 750-

767

764.

768

37.

Sauliere, J., Murigneux, V., Wang, Z., Marquenet, E., Barbosa, I., Le

769

Tonqueze, O., Audic, Y., Paillard, L., Roest Crollius, H. and Le Hir, H.

770

(2012) CLIP-seq of eIF4AIII reveals transcriptome-wide mapping of the

771

human exon junction complex. Nature structural & molecular biology, 19,

772

1124-1131.

773

38.

Hauer, C., Sieber, J., Schwarzl, T., Hollerer, I., Curk, T., Alleaume, A.M.,

774

Hentze, M.W. and Kulozik, A.E. (2016) Exon Junction Complexes Show

775

a Distributional Bias toward Alternatively Spliced mRNAs and against

776

mRNAs Coding for Ribosomal Proteins. Cell reports, 16, 1588-1603.

777

39.

Nielsen, K.H., Chamieh, H., Andersen, C.B., Fredslund, F., Hamborg,

778

K., Le Hir, H. and Andersen, G.R. (2009) Mechanism of ATP turnover

779

inhibition in the EJC. Rna, 15, 67-75.

780

40.

Castello, A., Horos, R., Strein, C., Fischer, B., Eichelbaum, K.,

781

Steinmetz, L.M., Krijgsveld, J. and Hentze, M.W. (2013) System-wide

782

identification of RNA-binding proteins by interactome capture. Nature

783

protocols, 8, 491-500.

784

41.

785

Castello, A., Fischer, B., Eichelbaum, K., Horos, R., Beckmann, B.M., Strein, C., Davey, N.E., Humphreys, D.T., Preiss, T., Steinmetz, L.M. et



32

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

786

al. (2012) Insights into RNA biology from an atlas of mammalian mRNA-

787

binding proteins. Cell, 149, 1393-1406.

788

42.

Bienkowski, R.S., Banerjee, A., Rounds, J.C., Rha, J., Omotade, O.F.,

789

Gross, C., Morris, K.J., Leung, S.W., Pak, C., Jones, S.K. et al. (2017)

790

The Conserved, Disease-Associated RNA Binding Protein dNab2

791

Interacts with the Fragile X Protein Ortholog in Drosophila Neurons. Cell

792

reports, 20, 1372-1384.

793

43.

Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner,

794

D.J., Luscombe, N.M. and Ule, J. (2011) iCLIP--transcriptome-wide

795

mapping of protein-RNA interactions with individual nucleotide

796

resolution. Journal of visualized experiments : JoVE.

797

44.

Kataoka, N., Diem, M.D., Kim, V.N., Yong, J. and Dreyfuss, G. (2001)

798

Magoh, a human homolog of Drosophila mago nashi protein, is a

799

component of the splicing-dependent exon-exon junction complex. The

800

EMBO journal, 20, 6424-6433.

801

45.

Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L.,

802

Ren, J., Li, W.W. and Noble, W.S. (2009) MEME Suite: tools for motif

803

discovery and searching. Nucleic acids research, 37, W202-W208.

804

46.

Love, M.I., Huber, W. and Anders, S. (2014) Moderated estimation of

805

fold change and dispersion for RNA-seq data with DESeq2. Genome

806

biology, 15, 550.

807

47.

Wilk, R., Hu, J., Blotsky, D. and Krause, H.M. (2016) Diverse and

808

pervasive subcellular distributions for both coding and long noncoding

809

RNAs. Genes & development, 30, 594-609.



33

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

810

48.

Lecuyer, E., Yoshida, H., Parthasarathy, N., Alm, C., Babak, T.,

811

Cerovina, T., Hughes, T.R., Tomancak, P. and Krause, H.M. (2007)

812

Global analysis of mRNA localization reveals a prominent role in

813

organizing cellular architecture and function. Cell, 131, 174-187.

814

49.

815 816

Simon, B., Masiewicz, P., Ephrussi, A. and Carlomagno, T. (2015) The structure of the SOLE element of oskar mRNA. Rna, 21, 1444-1453.

50.

Mishler, D.M., Christ, A.B. and Steitz, J.A. (2008) Flexibility in the site of

817

exon junction complex deposition revealed by functional group and RNA

818

secondary structure alterations in the splicing substrate. Rna, 14, 2657-

819

2670.

820

51.

Brooks, A.N., Duff, M.O., May, G., Yang, L., Bolisetty, M., Landolin, J.,

821

Wan, K., Sandler, J., Booth, B.W., Celniker, S.E. et al. (2015) Regulation

822

of alternative splicing in Drosophila by 56 RNA binding proteins.

823

Genome research, 25, 1771-1780.

824

52.

825 826

Therneau, T. and Atkinson, B. (2018). 4.1-13 ed, pp. rpart: Recursive Partitioning and Regression Trees.

53.

Ule, J., Jensen, K.B., Ruggiu, M., Mele, A., Ule, A. and Darnell, R.B.

827

(2003) CLIP identifies Nova-regulated RNA networks in the brain.

828

Science, 302, 1212-1215.

829

54.

Licatalosi, D.D., Mele, A., Fak, J.J., Ule, J., Kayikci, M., Chi, S.W., Clark,

830

T.A., Schweitzer, A.C., Blume, J.E., Wang, X. et al. (2008) HITS-CLIP

831

yields genome-wide insights into brain alternative RNA processing.

832

Nature, 456, 464-469.



34

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

833

55.

Ule, J. (2009) High-throughput sequencing methods to study neuronal

834

RNA-protein interactions. Biochemical Society transactions, 37, 1278-

835

1280.

836

56.

Konig, J., Zarnack, K., Rot, G., Curk, T., Kayikci, M., Zupan, B., Turner,

837

D.J., Luscombe, N.M. and Ule, J. (2010) iCLIP reveals the function of

838

hnRNP particles in splicing at individual nucleotide resolution. Nature

839

structural & molecular biology, 17, 909-915.

840

57.

Wang, Z., Kayikci, M., Briese, M., Zarnack, K., Luscombe, N.M., Rot, G.,

841

Zupan, B., Curk, T. and Ule, J. (2010) iCLIP predicts the dual splicing

842

effects of TIA-RNA interactions. PLoS biology, 8, e1000530.

843

58.

Tollervey, J.R., Curk, T., Rogelj, B., Briese, M., Cereda, M., Kayikci, M.,

844

Konig, J., Hortobagyi, T., Nishimura, A.L., Zupunski, V. et al. (2011)

845

Characterizing the RNA targets and position-dependent splicing

846

regulation by TDP-43. Nature neuroscience, 14, 452-458.

847

59.

Ince-Dunn, G., Okano, H.J., Jensen, K.B., Park, W.Y., Zhong, R., Ule,

848

J., Mele, A., Fak, J.J., Yang, C., Zhang, C. et al. (2012) Neuronal Elav-

849

like (Hu) proteins regulate RNA splicing and abundance to control

850

glutamate levels and neuronal excitability. Neuron, 75, 1067-1080.

851

60.

Modic, M., Ule, J. and Sibley, C.R. (2013) CLIPing the brain: studies of

852

protein-RNA interactions important for neurodegenerative disorders.

853

Molecular and cellular neurosciences, 56, 429-435.

854

61.

Alexandrov, A., Colognori, D. and Steitz, J.A. (2011) Human eIF4AIII

855

interacts with an eIF4G-like partner, NOM1, revealing an evolutionarily

856

conserved function outside the exon junction complex. Genes &

857

development, 25, 1078-1090.



35

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

858

62.

Choudhury, S.R., Singh, A.K., McLeod, T., Blanchette, M., Jang, B.,

859

Badenhorst, P., Kanhere, A. and Brogna, S. (2016) Exon junction

860

complex proteins bind nascent transcripts independently of pre-mRNA

861

splicing in Drosophila melanogaster. eLife, 5.

862

63.

Shepard, P.J., Choi, E.-A., Busch, A. and Hertel, K.J. (2011) Efficient

863

internal exon recognition depends on near equal contributions from the

864

3′ and 5′ splice sites. Nucleic acids research, 39, 8928-8937.

865

64.

Brooks, A.N., Aspden, J.L., Podgornaia, A.I., Rio, D.C. and Brenner,

866

S.E. (2011) Identification and experimental validation of splicing

867

regulatory elements in Drosophila melanogaster reveals functionally

868

conserved splicing enhancers in metazoans. Rna, 17, 1884-1894.

869

65.

Koren, E., Lev-Maor, G. and Ast, G. (2007) The Emergence of

870

Alternative 3′ and 5′ Splice Site Exons from Constitutive Exons. PLoS

871

computational biology, 3, e95.

872

66.

Sauliere, J., Haque, N., Harms, S., Barbosa, I., Blanchette, M. and Le

873

Hir, H. (2010) The exon junction complex differentially marks spliced

874

junctions. Nature structural & molecular biology, 17, 1269-1271.

875

67.

Gudikote, J.P., Imam, J.S., Garcia, R.F. and Wilkinson, M.F. (2005) RNA

876

splicing promotes translation and RNA surveillance. Nature structural &

877

molecular biology, 12, 801-809.

878

68.

Custódio, N., Carvalho, C., Condado, I., Antoniou, M., Blencowe, B.J.

879

and Carmo-Fonseca, M. (2004) In vivo recruitment of exon junction

880

complex proteins to transcription sites in mammalian cell nuclei. RNA

881

(New York, N.Y.), 10, 622-633.



36

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

882

69.

Clery, A., Jayne, S., Benderska, N., Dominguez, C., Stamm, S. and

883

Allain, F.H. (2011) Molecular basis of purine-rich RNA recognition by the

884

human SR-like protein Tra2-beta1. Nature structural & molecular

885

biology, 18, 443-450.

886

70.

Tsuda, K., Someya, T., Kuwasako, K., Takahashi, M., He, F., Unzai, S.,

887

Inoue, M., Harada, T., Watanabe, S., Terada, T. et al. (2011) Structural

888

basis for the dual RNA-recognition modes of human Tra2-beta RRM.

889

Nucleic acids research, 39, 1538-1553.

890

71.

Hoskins, A.A. and Moore, M.J. (2012) The spliceosome: a flexible,

891

reversible macromolecular machine. Trends in biochemical sciences,

892

37, 179-188.

893

72.

894 895

reversible splicing catalysis. RNA (New York, N.Y.), 14, 1975-1978. 73.

896 897

Smith, D.J. and Konarska, M.M. (2008) Mechanistic insights from

Tseng, C.K. and Cheng, S.C. (2008) Both catalytic steps of nuclear premRNA splicing are reversible. Science, 320, 1782-1784.

74.

Wood, V., Gwilliam, R., Rajandream, M.A., Lyne, M., Lyne, R., Stewart,

898

A., Sgouros, J., Peat, N., Hayles, J., Baker, S. et al. (2002) The genome

899

sequence of Schizosaccharomyces pombe. Nature, 415, 871-880.

900

75.

Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann,

901

H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M. et al. (1996) Life

902

with 6000 Genes. Science, 274, 546.

903

76.

Wen, J. and Brogna, S. (2010) Splicing‐dependent NMD does not

904

require the EJC in Schizosaccharomyces pombe. The EMBO journal,

905

29, 1537.



37

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

906

77.

Gromadzka, A.M., Steckelberg, A.L., Singh, K.K., Hofmann, K. and

907

Gehring, N.H. (2016) A short conserved motif in ALYREF directs cap-

908

and EJC-dependent assembly of export complexes on spliced mRNAs.

909

Nucleic acids research, 44, 2348-2361.

910

78.

Gibilisco, L., Zhou, Q., Mahajan, S. and Bachtrog, D. (2016) Alternative

911

Splicing within and between Drosophila Species, Sexes, Tissues, and

912

Developmental Stages. PLoS genetics, 12, e1006464.

913

79.

Eberle, A.B., Stalder, L., Mathys, H., Orozco, R.Z. and Mühlemann, O.

914

(2008) Posttranscriptional Gene Regulation by Spatial Rearrangement

915

of the 3′ Untranslated Region. PLoS biology, 6, e92.

916

80.

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W. and

917

Smyth, G.K. (2015) limma powers differential expression analyses for

918

RNA-sequencing and microarray studies. Nucleic acids research, 43,

919

e47.

920 921 922

FIGURE LEGENDS

923 924

Figure 1. Dithio(bi-)succinimidyl-propionate (DSP), stabilizes EJC in its

925

mRNA bound state.

926 927

(A) SDS-PAGE and silver staining of proteins co-precipitated with oligo-

928

d(T)25 bound mRNAs from cytoplasm. Left to right: Input cytoplasmic

929

samples (0.01% of total input; lanes 1-4); protein MW standards (lane 5);

930

“beads-only” control precipitate from “all-condition-mixture” (3.3%, lane 6);



38

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

931

individual oligo-d(T)25 precipitates (3.3%; lanes 7-10). Treatment conditions,

932

“untreated”, “UV irradiated”, “DSP supplemented” and “UV irradiated-DSP

933

supplemented” are indicated (+ or -) above image. Note that for samples treated

934

with DSP ex vivo, UV-exposure does not increase the amount of recovered

935

proteins.

936

(B) Western blot analysis of oligo-d(T)25 precipitates. Gel loading order

937

same as in (A); 0.03% of the input cytoplasm and 33% of each precipitate were

938

resolved. Blot was probed with antibodies directed against the proteins

939

indicated at the right of the panel. Khc: kinesin heavy chain; PABP: cytoplasmic

940

poly A binding protein; RpS6 = small ribosomal subunit protein S6.

941

(C) Quantification of proteins detected on the western blot. Crosslinking

942

conditions are indicated in upper panel.

943

densitometry measurement using the Fiji image analysis package. Relative

944

protein abundance is the fraction of the signal in the precipitate relative to the

945

cytoplasm. Note that in contrast to the RNA-binding eIF4AIII, the non-RNA-

946

binding EJC subunit Y14 is only detected when the cytoplasm was treated with

947

DSP.

948

(D) Schematic of the net effect of crosslinking with UV versus DSP.

949

Exposure of the cytoplasm to UV leads primarily to stabilization of direct protein-

950

RNA interactions. DSP treatment results in efficient retention of proteins

951

associated with RNA by stabilization of polypeptide interactions either within

952

individual an RBP or between an RBP and other moieties within a complex.

Signals were quantified by

953 954



39

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

955

Figure 2. DSP crosslinking stabilizes EJC for stringent

956

immunoprecipitation and RNA fragmentation

957 958

(A) Comparison of IP washing conditions for RNA isolation. Anti-GFP RNA

959

IP assays on DSP-treated cytoplasm of GFP-Mago and GFP tag expressing

960

flies. 0.02% of RNA isolated from input lysate (lane 1) and 20% from GFP tag

961

(lanes 2 and 3) and GFP-Mago (lanes 4 and 5) RIP precipitates were resolved

962

by capillary gel electrophoresis on an RNA 6000 Pico Chip Bioanalyzer.

963

Washing conditions are indicated at bottom. SW: standard stringency washing

964

conditions; HSW: high stringency washing conditions (see Materials and

965

Methods).

966

(B) DSP stabilized EJC core is resistant to RNaseI treatment. Effects of

967

RNaseI treatment on GFP-Mago co-immunoprecipitations in DSP crosslinked

968

and untreated cytoplasm. Western blots of anti-GFP IPs from DSP treated and

969

from untreated cytoplasm of GFP-Mago (lanes 5-8) and GFP tag (lanes 1-4)

970

expressing flies processed under HSW conditions. Primary antibodies utilized

971

are indicated on the right. Inputs (0.01%) are shown in lane 1 and 5. Bead-only

972

control precipitates are shown in lanes 2 and 6. IP precipitates (20%) from DSP

973

treated (lanes 3 and 7) or untreated cytoplasm (lanes 4 and 8). Details of

974

samples and experimental conditions indicated at top of panel.

975

(C) RNA fragmentation depletes EJC unrelated proteins from GFP-Mago

976

co-precipitates. Anti-GFP IPs (HSW condition) from DSP treated cytoplasm of

977

GFP-Mago and GFP tag expressing flies. Precipitates were subjected to isobar

978

labeling and peptide content was defined by tandem mass spectrometry.

979

Presented bar plots of protein-enrichment are defined by Limma(80). GFP-



40

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

980

Mago specific protein enrichments in “intact RNA” and “fragmented RNA”

981

conditions are highlighted in left and right plot, respectively. Y-axis shows

982

individual proteins detected. X-axis shows scale of enrichment (Log2 fold

983

change). Dashed line indicates average enrichment of all significant proteins in

984

each condition. Protein-enrichment >2x or 0.05).

987 988 989

Figure 3. EJC binding to mRNA in Drosophila cytoplasm occurs within

990

exons at canonical deposition sites.

991 992

(A) EJC ipaRt library reads map to exonic sites in the Drosophila genome.

993

Summary of genomic features detected in mRBP footprinting and EJC ipaRt

994

sequencing results. Y-axis indicates proportion (percent) of all uniquely aligning

995

read-counts. Color-code of genomic features is highlighted in the legend on

996

right side of the plot. Note that EJC protected sites map in majority to exons

997

and UTR-exons.

998

(B) EJC protection median is on upstream exons approximately -22 nt

999

from the 3’ end. Read coverage profiles from EJC ipaRt and mRBP footprinting

1000

cDNA libraries. Coordinates of metagene covering -+50 nt of exon-exon

1001

junctions are indicated on x-axis. Note position “0” defines the last nucleotide

1002

of upstream exons. Scale of mean sequencing read coverage for all normalized

1003

biological replicates is indicated on y-axis. Profiles from RBP protected and

1004

EJC protected junctions are presented in grey and red respectively. Note that



41

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1005

mRBP footprinting sequencing reads show a homogenous RBP protection

1006

profile over the entire metagene body. On the contrary, sequencing reads from

1007

EJC ipaRt indicate a modal distribution of EJC protections with a median at -

1008

21.5 nt upstream of the exon-exon ligation point. Region of maximal EJC

1009

protection (protection core) spans from -27 nt to -15 nt and is highlighted in

1010

orange.

1011

(C) Genomic coverage profile visualized by IGB viewer. Coverage-profiles

1012

of 3 individual sequencing replicates from EJC ipaRt and mRNP footprinting

1013

along the oskar and RpL32 genes. Size and position of gene regions are

1014

indicated. Exons and introns are indicated below the coverage profiles as boxes

1015

or lines, respectively. Coverage depth (normalized reads) is indicated on the y-

1016

axis. Note that cDNA reads from EJC protected fragments map upstream of

1017

and in direct proximity to splice sites, while reads from RBP protected fragments

1018

distribute over whole exon bodies. Sequencing coverage depth of EJC

1019

protected sites at exon-exon junctions is variable.

1020

(D) Protection site peaks (modes) in EJC ipaRt libraries cluster primarily

1021

within 50nt around splice junctions. Density plot highlighting distribution of

1022

coverage peaks in EJC ipaRt in respect to the splice site distance. Y-axis

1023

defines estimated density of peaks at defined splice site distances. X-axis

1024

indicates log10-transformed distance to splice site. Vertical dashed line

1025

highlights border between proximal and remote protection peak estimates. Note

1026

all peaks in ipaRt were defined within exons at a 20 nt frame resolution. Peaks

1027

chosen for the analysis were 2x more covered in ipaRt than in mRBP

1028

footprinting and had a coverage > 30 reads.



42

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1029

(E) Proximal (canonical) protection peaks in EJC ipaRt libraries are

1030

stronger covered than remote protection peaks. Density plot showing the

1031

distribution of estimated coverage of splice junctions proximal (blue) and

1032

remote (grey) protection peaks. Y-axis defines estimated densities and x-axis

1033

the peak coverage . Note that peak coverage stands for the sum of all reads

1034

within a +/-10 nt window surrounding a protection site peak.

1035

(F) Peaks within canonical EJC binding sites map to first and internal

1036

exons. Bar plot showing proportion of peaks mapping to first, internal and last

1037

exons. Peaks mapping within or in direct proximity to canonical EJC binding

1038

regions (proximal peaks) are highlighted in dark blue. Peaks remote from EJC

1039

binding regions are highlighted in dark grey. Estimated peak proportions within

1040

exon classes are indicated in the plot. Y-axis indicates proportion in percent; X-

1041

axis indicates the tested exon classes of a metagene. Note that proximal peaks

1042

are nearly exclusively found in first and internal exons while remote peaks are

1043

detected in all three classes of exons.

1044

(G) Proportion of remote protection peaks in EJC ipaRt libraries

1045

decreases with sequencing coverage. Plot highlighting the proportion of

1046

remote peaks among all EJC ipaRt sequencing coverage peaks, and among

1047

sequencing coverage subsets of the best 75%, best 50% and best 25% covered

1048

protection peaks. Y-axis defines proportion (in %) of remote peaks. X-axis

1049

highlights peak coverage subsets. Note that proportion of EJC protection peaks

1050

reduces significantly when increased coverage cutoffs are chosen.

1051 1052



43

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1053

Figure 4.

1054

specialized cellular functions is determined by gene architecture.

Preferential recruitment of EJC to mRNAs of genes with

1055 1056

(A) MA-plot of DESeq results from EJC ipaRt and mRBP footprinting.

1057

Genes that are either enriched or depleted for EJC (RBP enriched) are

1058

indicated in red or grey, respectively. Genes that are not significantly different

1059

(p.adjusted >0.05) between the EJC ipaRt and mRBP libraries are transparent.

1060

Relative enrichment (log2FoldChange) is indicated on the y-axis. Base Mean

1061

of signal is highlighted on the x-axis. Dashed line defines log2foldChange =0.

1062

(B) GO term analysis of EJC enriched and depleted transcripts

1063

|log2FoldChange| >1. GO terms for biological processes of EJC enriched

1064

transcripts (left plot) and of EJC depleted transcripts (right plot) are presented

1065

to the left and the right of the plots, respectively. Y-axis list GO categories of

1066

the biological processes most highly represented. Gene counts in individual

1067

categories versus overall count of analyzed genes (Gene ratios) are shown on

1068

the x-axis. Legend indicating number of genes falling into a particular GO term

1069

(size of circles) and its significance of association (color of circles) are (circles)

1070

shown on the plots is highlighted in the center.

1071

(C) Scatter-Boxplot showing DESeq enrichment estimates of Drosophila

1072

mRNAs with known localization patterns in the early embryo.

1073

products in DESeq analysis were subset to maternally expressed mRNAs,

1074

which localize or do not localize to specific foci in early embryos (48). Relative

1075

enrichment (log2FoldChange) is indicated on y-axis. Localization categories

1076

are indicated on x-axis. Non-localizing and localizing gene products are

1077

highlighted in green and blue, respectively. Horizontal solid line in



Gene

44

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1078

corresponding box plots indicates median log2FoldChange for each of the

1079

categories. Dashed line highlights log2FoldChange =0.

1080

(D) Result of Multiple regression model for EJC enrichment highlighting

1081

parameters which contribute most to preferential binding of EJC to

1082

mRNA. Plots showing the relative contribution and effect of each factor (listed

1083

on y-axis) to EJC enrichment, as estimated from the full regression model. For

1084

the plot showing relative importance (left), estimates were obtained by

1085

bootstrapping the data; the dot indicates the median and whiskers indicate the

1086

95% confidence interval. For the plot showing the effect of each factor (right),

1087

the T-statistic of each factor was calculated from the estimate and the standard

1088

error of each coefficient obtained after fitting the full model.

1089 1090 1091

Figure 5. EJC assembly reveals a sequence bias

1092 1093

(A) EJC enrichment – observed versus expected. Scatterplot showing EJC

1094

enrichment defined by DESeq for individual exon-exon junction relative to

1095

DESeq estimates of corresponding templates. Enrichment for each exon-exon

1096

junction (log2FoldChange exon-exon junction) was calculated from read counts

1097

falling within 50bp upstream and 50bp downstream of the exon-exon junction.

1098

The EJC enrichment for the exon-exon junction was compared to its associated

1099

gene and a Z score and p-value was calculated. Exon-exon junctions whose

1100

EJC enrichment was not significantly different from that of its associated gene

1101

(p.adjusted >0.05) are shown in grey. Exon-exon junctions with significantly

1102

higher EJC enrichment than their associated genes are highlighted in red, and



45

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1103

those with significantly lower EJC enrichment are highlighted in blue. The exon-

1104

exon junction EJC enrichment is highly correlated with the corresponding gene

1105

EJC enrichment (Spearman correlation = 0.664) and twice as many exon-exon

1106

junctions show significantly lower EJC enrichment than their corresponding

1107

gene product: 4334 vs. 2275, out of a total of 15186 junctions tested).

1108

(B) Hexamers containing CUG and UUU contribute most to deviation

1109

between observed and expected EJC enrichment. Plot showing the effect

1110

of each hexamer on EJC enrichment for each exon-exon junction, after

1111

accounting for splice strength and presence of an ESS. For each hexamer, the

1112

deviation (of the exon-exon junction EJC enrichment from its gene EJC

1113

enrichment) was regressed against (1) the presence / absence of hexamer, (2)

1114

splice strength, and (3) ESS counts. The x-axis shows the associated T-statistic

1115

obtained for the hexamer effect after fitting such a model for each individual

1116

hexamer. Dendrogram of hexamers was performed using hierarchical

1117

clustering with Ward’s method, based on pairwise Damerau–Levenshtein

1118

distance modelling between all hexamers. CUG containing hexamers show a

1119

concordant positive effect and UUU containing hexamers show a concordant

1120

negative effect on EJC enrichment.

1121

(C) CUG and UUU effect EJC binding is position biased. Plot showing the

1122

positional effect of the tri-nucleotides CUG and UUU. Similar to the analysis

1123

performed for hexamers, for each possible position of the tri-nucleotide, the

1124

deviation (of the exon-exon junction EJC enrichment from its gene EJC

1125

enrichment) was regressed against (1) the presence / absence of CUG / UUU

1126

at the position, (2) splice strength, and (3) ESS counts. Y-axis shows the

1127

associated T-statistic obtained for the position effect after fitting such a model



46

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1128

for each individual position upstream of the exon-exon junction. X-axis defines

1129

the nt coordinates in the upstream exon. Both CUG and UUU show a highly

1130

positive or negative T-statistic around -18 bp upstream of exon-exon junction,

1131

indicating a strong position-specific effect.

1132 1133

Figure 6. mRNA secondary structures modulate EJC binding

1134 1135

(A) Pairing probability profiles of EJC bound exon-exon junctions.

1136

Predicted Base pairing probability (bpp) profiles, by RNAplfold from Vienna

1137

RNA-package, of junctions with enhanced, inhibited and unaffected EJC

1138

binding. Black dashed, red and grey solid lines highlight average bpp profiles

1139

of junctions that are unaffected, enhanced and inhibited for EJC binding

1140

respectively. Note: The region utilized for RNA-Fold analysis covers the last

1141

37nt of the upstream exon and the first 28nt of the downstream exon. Y-axis

1142

shows predicted bpp. X-axis highlights nucleotide positions relative to exon-

1143

exon junction. Position 0 represents last RNA nucleotide of upstream exons.

1144

Dashed vertical line highlights the coordinate (-21.7) of the average EJC

1145

protection median.

1146

(B) Definition of bpp regions distinct between exon-exon junctions with

1147

enhanced and inhibited EJC binding. To define a cut-off for significantly

1148

distinct regions in the exon-exon junctions with enhanced and inhibited EJC

1149

binding, the bpp significance for each nucleotide in the two junction groups was

1150

tested by ANOVA, then subjected to permutation testing. Note that at alpha of

1151

0.05 the regions -11 to -16 and -20 to -24 are significantly distinct in bpp

1152

between junctions enhanced and depleted in EJC binding. Y-axis defines –



47

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1153

log10 transformed ANOVA p-values. Horizontal red dashed line indicates cutoff

1154

at a=0.05 of the permutation test. X-axis highlights nucleotide position relative

1155

to exon-exon junction.

1156

(C) Folding categories of analyzed junctions. To define folding categories,

1157

exon-exon junctions were clustered based on bpp estimates within the region -

1158

24 to -11 (see B). Clustering was performed with MClust using Gaussian

1159

mixture model assuming 4 clusters, and from the resulting categories the mean

1160

bpp profile of all cluster members are shown as individual bpp profile plots.

1161

Shaded region around solid lines indicates 95% confidence interval estimated

1162

using normal approximation to binomial distribution.

1163

indicate borders of selected region for bpp profile clustering. Y-axis shows

1164

predicted bpp estimates, x-axis highlights nucleotide positions relative to exon-

1165

exon junction.

1166

(D) Impact of RNA structures on EJC binding. Segmented bar plot showing

1167

proportion of significantly affected exon-exon junctions that are enhanced for

1168

EJC binding. Horizontal dashed line indicates overall proportion of junctions

1169

with enhanced EJC binding. Segments referring to proportion of junctions with

1170

enhanced or inhibited EJC binding are highlighted in red or grey, respectively.

1171

Y-axis defines proportion (in %) of all junctions significantly effected in EJC

1172

binding. X-axis indicates plotted folding categories (shown in panel C). Percent

1173

estimates for junctions with enhanced EJC binding are highlighted within the

1174

bar segments. Note that folding categories 2 and 3 have a positive, while folding

1175

category 4 - which has an increased bpp within EJC deposition sites - has a

1176

negative impact on EJC binding.



Vertical dashed lines

48

bioRxiv preprint first posted online Nov. 4, 2018; doi: http://dx.doi.org/10.1101/459354. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.

1177

(E-F) RNA stem structures impact EJC binding coordinates. Plots showing

1178

distribution densities of EJC protection site medians. Double stranded (ds) RNA

1179

content in Region A and B was identified using the mean of rounded bpp

1180

estimates [0 and 1] in the corresponding sequence stretches. High and low

1181

dsRNA content are defined as >0.7 and 50 nt

EJC protected 0.0

0

B

1 2 Log10(splice site distance [nt])

3

E

max. median density of EJC protection −21,7

6e-07

proximal read coverage peaks ≤ 50 nt (n=31715)

EJC protected

remote read coverage peaks >50 nt (n=1498)

1.0

Density

EJC protected core region

4e-07

RBP protected

0.5

2e-07

0

0.0 0 exon-exon junction

exon (n-1)

2

50 nt

25

3

exon n

C

4

Log10 (peak_coverage)

F Rep. #1 Rep. #1

proportion (%) of peaks at transcript exons in EJC ipaRt libraries

−50 nt

−25

Rep. #2 Rep. #2 Rep. #3

Rep. #3 8,936,000

8,937,000

8,938,000

79.0%

80%

67.2% proximal coverage peaks (canonical EJC binding)

60%

remote read coverage peaks

40%

20%

24.6%

20.6% 8.3% 0.4%

0%

3’

5’ 3R: 8,935,126 - 8,938,395 (+)

first exon

oskar

Rep. #1

int. exon

last exon

G

Rep. #1 Rep. #2 Rep. #2 Rep. #3 Rep. #3 30,046,000

30,045,600

30,045,200

percentage of remote coverage peaks in EJC ipaRt libraries

along exon-exon junctions

mean read_coverage

1.5

5%

4.51%

4%

2.74%

3% 2%

1.67% 0.84%

1% 0%

3R: 30,045,229 - 30,046,161 (-)

RpL32

EJC protected RBP protected

all EJC coverage maxima

75% best covered

50% best covered

25% best covered

Figure 3

A.

B. GO category: Biological processes

GO category: Biological processes EJC enriched genes (n = 3108)

p.adjusted 0.05

0

−4

−8 RBP enriched genes (n = 3837) 4

8

12

16

log2(baseMean)

20 40 60

Count 180 200 220 240

−log2(p.adjust) 20 40 60 80

EJC enriched (RBP depleted)

log2FoldChange >1 & p.adj.