Article
Control of Gene Expression in Senescence through Transcriptional Read-Through of Convergent Protein-Coding Genes Graphical Abstract
Authors Lisa Muniz, Maharshi Krishna Deb, Marion Aguirrebengoa, Sandra Lazorthes, Didier Trouche, Estelle Nicolas
Correspondence
[email protected]
In Brief Muniz et al. identified a family of functional antisense RNAs produced by transcriptional read-through downstream of convergent genes. These RNAs are activated during senescence by mechanisms relying on the control of RNA pol II elongation rate and H2A.Z local occupancy, emphasizing the importance of controlling chromatin structure at intergenic regions.
Highlights d
START RNAs are produced by transcriptional read-through at convergent genes
d
In senescence, they repress the expression of the genes to which they are antisense
d
RNA pol II elongation rate is regulated downstream of convergent genes at START loci
d
H2A.Z histone variant represses START RNAs in proliferative cells
Muniz et al., 2017, Cell Reports 21, 2433–2446 November 28, 2017 ª 2017 The Author(s). https://doi.org/10.1016/j.celrep.2017.11.006
Data and Software Availability GSE85085
Cell Reports
Article Control of Gene Expression in Senescence through Transcriptional Read-Through of Convergent Protein-Coding Genes Lisa Muniz,1,2 Maharshi Krishna Deb,1,2 Marion Aguirrebengoa,1 Sandra Lazorthes,1 Didier Trouche,1,3 and Estelle Nicolas1,3,4,* 1LBCMCP,
Centre de Biologie Inte´grative (CBI), Universite´ de Toulouse, CNRS, UPS, Toulouse, France authors contributed equally to this work 3These authors contributed equally to this work 4Lead Contact *Correspondence:
[email protected] https://doi.org/10.1016/j.celrep.2017.11.006 2These
SUMMARY
Antisense RNAs are non-coding RNAs that can regulate their corresponding sense RNAs and are generally produced from specific promoters. We uncover here a family of antisense RNAs, named START RNAs, produced during cellular senescence by transcriptional read-through at convergent protein-coding genes. Importantly, START RNAs repress the expression of their corresponding sense RNAs. In proliferative cells, we found that the Pol II elongation rate is limited downstream of TTS at START loci, allowing transcription termination to occur before Pol II reaches the convergent genes, thus preventing antisense RNA production and interference with the expression of the convergent genes. START RNAs are repressed by H2A.Z histone variant, whose local occupancy decreases in senescence. Our results thus uncover a mechanism of gene expression regulation relying on read-through antisense transcript production at convergent genes, underlining the functional importance of chromatin regulation in the control of RNA pol II elongation rate at intergenic regions. INTRODUCTION Cell adaptation to intra- or extra-cellular environmental changes involves the regulation of specific set of genes, which is orchestrated by factors affecting the local chromatin structure and the efficiency of transcription initiation or elongation. One example of such adaptation to environmental changes is cellular senescence, which can be induced in mammalian cells in response to various stresses. Senescence is characterized by a potent and irreversible cell-cycle arrest (Campisi and d’Adda di Fagagna, 2007). It is generally considered as a major anticancer pathway by restricting the proliferation of cells, which have lost their normal control of cellular proliferation (Xue et al., 2007). It is associated with the setting up of a specific genetic program.
Another frequent hallmark of senescence is the reorganization of chromatin into condensed foci, named SAHF (senescenceassociated heterochromatic foci) (Salama et al., 2014). Accordingly, more and more data highlight the role of chromatin modifiers in the control of senescence. Recent findings have also uncovered the importance of long non-coding RNAs (ncRNAs) as major players in the control of specific gene expression. Long ncRNAs regulate gene expression in cis or in trans, mainly by allowing the correct structuration or targeting of chromatin-modifying complexes, such as Polycomb group proteins (Magistri et al., 2012). Because of associated changes in gene expression and global chromatin structure, the commitment of cells into senescence is an ideal context for investigating the interplay between long ncRNAs and chromatin modifiers. For example, thanks to the study of progression into senescence, we uncovered the unexpected finding that ncRNAs can regulate the genomic localization of the histone variant H2A.Z (Lazorthes et al., 2015). Non-coding RNAs can be antisense to protein-coding genes. Such antisense RNAs can affect the expression of the protein encoded by the gene to which they are antisense by various mechanisms. For example, some antisense RNAs affect the efficiency of mRNA translation, mRNA stability, or the control of the local chromatin structure (Faghihi and Wahlestedt, 2009; Pelechano and Steinmetz, 2013). Antisense RNAs are usually produced from specific promoters located downstream a protein-coding gene or from internal promoters arising from protein-coding regions. However, in yeast mutants, transcriptional read-through (loss of transcription termination) from one of the genes of a convergent gene pair has been shown to produce a transcript antisense to the other convergent gene. In wild-type cells, these transcripts are repressed by a pathway that involves the H2A.Z histone variant and the RNAi machinery and relies on the degradation of these transcripts by the RNA exosome (Zhang et al., 2011; Zofall et al., 2009). In human cell lines, ncRNA gene loci have been reported to produce such read-through antisense RNAs in a few particular situations: at an imprinted locus in neuronal cells and by inhibiting the microRNA-processing machinery (Dhir et al., 2015; Powell et al., 2013; Skourti-Stathaki and Proudfoot, 2014). In addition, in response to herpes simplex
Cell Reports 21, 2433–2446, November 28, 2017 ª 2017 The Author(s). 2433 This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
virus 1 (HSV-1) infection, host transcription termination is disrupted, leading to extensive transcriptional read-through, including antisense transcription at convergent genes (Rutkowski et al., 2015). Here, we observed in cells undergoing senescence that antisense RNAs are produced at some protein-coding convergent gene loci because of inefficient transcription termination. Increase of such read-through antisense RNAs regulates the expression of the genes to which they are antisense. We demonstrate at two of these loci that, in proliferative cells, the RNA polymerase II (Pol II) elongation rate downstream of the poly(A) site of one gene is decreased, therefore limiting the extent of transcriptional read-through and the production of RNA antisense to the converging gene. Importantly, these read-through antisense RNAs are repressed in proliferative cells by the histone variant H2A.Z local occupancy. We thus uncovered the functional importance of the regulation of read-through antisense RNAs at convergent protein-coding genes for correct genome expression in a physiological situation. RESULTS Antisense Transcripts Produced by Transcriptional Read-Through Occur during Oncogene-Induced Senescence at Convergent Gene Loci In order to identify functions and regulatory mechanisms of antisense RNAs in humans, we asked whether senescence induced by oncogene activation in WI38 fibroblasts could be associated with the appearance of antisense transcripts produced by loss of accurate transcription termination at convergent gene loci. For clarity, in the convergent gene pair, we called the gene from which the putative read-through transcript originates the ‘‘forward gene’’ and the gene to which the putative read-through is antisense the ‘‘reverse gene’’ (Figure 1A). On two independent replicates of strand-specific RNAsequencing (RNA-seq) datasets (proliferation and senescence), we applied a multisteps pipeline (see Supplemental Experimental Procedures for a detailed description) to identify the genomic boundaries of read-through antisense transcripts at convergent gene loci, whose expression changes in senescence. For each experiment, we selected such transcripts, which are reproducibly activated in senescence and with respect to their forward gene in the two independent RNA-seq experiments, likely reflecting a defect in transcription termination. We refer to these RNAs as ‘‘START RNAs’’ (for senescence-triggered antisense read-through RNAs) (see Figure S1 for examples of such START RNAs). We identified 65 START RNAs in one RNA-seq experiment and 66 in the other, out of which 40 were found in common, which is a highly significant intersection (p < 10300, c2 test), considering that there are 4,068 converging genes in the genome from which START RNAs could originate (Supplemental Experimental Procedures), underlining the robustness of the analysis. We thus identified 91 START RNAs in total (see Table S1 for the list and Figure 1B for metadata analysis of strand-specific transcripts at these 91 loci) from protein-encoding genes and annotated lincRNAs (long intergenic ncRNAs). Importantly, START RNAs were expressed at much higher levels than those of intergenic 2434 Cell Reports 21, 2433–2446, November 28, 2017
sequences between all expressed converging genes and at levels comparable to those of expressed genes (Figures 1B and S2A), indicating that they are not produced by transcriptional noise. Forward genes were characterized by high expression levels, whereas reverse genes were expressed at levels comparable to those of all expressed genes (Figures 1B and S2). Although we removed from this analysis LOC (uncharacterized) ncRNA- and miRNA (microRNA)-annotated genes, because their 30 end annotation is usually not precise enough, we found 3 and 4 senescence-induced antisense readthrough transcripts potentially produced from, respectively, LOC ncRNA- and miRNA-annotated genes (Table S2). As expected, the 91 START RNAs are reproducibly activated in senescence in the two independent RNA-seq experiments (Figure 1C). Consistent with a defect in transcription termination of the forward genes in senescence, they are significantly more increased than their forward genes (Figure 1C). Interestingly, we found that the expression of START RNAs increased more in senescence far from the transcription termination site (TTS) of their forward gene (read-through transcript, part 2) than close to the TTS (read-through transcript, part 1) (Figure 1C). This result is consistent with an increase in the length of transcriptional read-through in senescence as Pol II transcribes a few kilobases downstream of the TTS in normal conditions (Fong et al., 2015; Nojima et al., 2015). Validation of the Senescence-Associated Expression Changes at Two START Loci To investigate in depth the regulation and function of START RNAs, we focused on two examples of loci harboring START RNAs, the ARHGAP18/LAMA2 and the MFSD4B (KIAA1919) /REV3L gene loci, with a senescence-associated increase in antisense transcripts to LAMA2 and REV3L genes, respectively (see Figures 2A and S3A for a larger scale view at the ARHGAP18/LAMA2 locus). Indeed, LAMA2 and REV3L gene products are linked to senescence-associated processes or diseases. Mutation of LAMA2 (laminin, alpha 2), one of the subunits of merosin, a component of the extracellular matrix, causes congenital muscular dystrophy (Helbling-Leclerc et al., 1995), whereas REV3L (DNA polymerase zeta) is required for the maintenance of genomic integrity and cell proliferation (Lange et al., 2012). The increase in antisense transcription at these loci could, indeed, originate from a defect in transcription termination of the forward genes, ARHGAP18 and KIAA1919, because the expression of these genes does not change much in senescence, while expression of transcript increases just downstream of their TTS up to the convergent gene (see Figures 2A and 2B for quantification of the RNA-seq data). Moreover, as observed for the 91 START RNAs (Figure 1C), the ARHGAP18 and KIAA1919 START RNAs increase more in senescence further from the TTS of their forward gene than closer to it, consistent with an increase in read-through transcripts (Figure S3B). In addition, the reverse genes, LAMA2 and REV3L, are repressed when their antisense RNAs are induced (Figures 2A and 2B). The senescence-associated changes in the expression of LAMA2 and REV3L protein-coding genes, of the START RNAs, and of the sense and antisense transcripts of LAMA2 and
Figure 1. Occurrence of Antisense RNAs Generated by Regulated Transcriptional Read-Through in Senescence (START RNAs) (A) Schematic representations of antisense RNA generated by transcriptional read-through (START RNA) from a convergent gene (the forward gene) located on the plus strand (+) of DNA (left) or on the minus strand () of DNA (right). The START RNA is antisense to the other gene of the convergent gene pair (reverse gene). (B) Proliferative and senescent WI38 hTERT RAF1-ER cells were subjected to two independent strand-specific RNA-seq experiments (RNA-seq #1 and RNA-seq #2). Metadata analyses of the 2 RNA-seq datasets in proliferation (PROLIF, green) or in senescence (SEN, red) at the 91 START loci are shown. Each gene and intergenic regions were divided into 20 equal regions. For each of these regions, the mean of the normalized number of aligned reads per base was computed for each gene, and the median within the gene population is plotted. The forward strand and the reverse strand of RNA-seq datasets are shown with a full lane and a dashed lane, respectively. (C) Boxplots showing the log2 of the expression change in senescence, log2(sen/prolif), of the forward genes and of different analyzed regions in the 91 START RNAs: the antisense (AS) parts, the entire read-through (RT) transcript domains, corresponding to the entire START RNAs, as well as their first (read-through, part 1) and second (read-through, part 2) halves. Two independent datasets are shown. The read-through transcript domains and the AS parts of the 91 START RNAs are significantly more increased in senescence than their forward genes (paired Wilcoxon test). Note that the second halves of START RNAs are significantly more increased in senescence than the first halves (paired Wilcoxon test). See also Figures S1 and S2 and Tables S1 and S2.
Cell Reports 21, 2433–2446, November 28, 2017 2435
Figure 2. Analysis of START Expression at the ARHGAP18/LAMA2 and KIAA1919/ REV3L Loci (A) Total RNA was extracted from cells in proliferation (PROLIF) or in senescence (SEN) and subjected to strand-specific RNA-seq experiments. For the (+) and () DNA strands at the ARHGAP18/ LAMA2 and KIAA1919/REV3L loci, strand-specific RNA-seq tracks (RNA-seq#1) show the density of aligned reads normalized by the total number of aligned reads multiplied by 100 millions. RefSeq genes (hg38) are also shown. The red dotted arrows show the boundaries of the START RNAs found by the algorithm. (B) The total number of reads from strand-specific RNA-seq data (RNA-seq#1) in the indicated genomic regions in proliferative and senescent cells were calculated for the two loci. The log2 of the SEN/PROLIF value is plotted. The chromosome strand of the analyzed region is annotated. The values for the forward gene (ARHGAP18 or KIAA1919), the intergenic region of the readthrough (RT) transcript, the read-through entire domain, the antisense (AS) part of the readthrough (AS part to LAMA2 or REV3L), and the reverse gene (LAMA2 or REV3L) are shown. (C) WI38 hTERT RAF1-ER cells were induced (+ 4HT) or not induced (- 4-HT) to enter senescence for 3 days. Total RNA was extracted and subjected to random qRT-PCR using the indicated primers: e, exon; e-e, exon-exon junction; and down, downstream of the TTS of the gene. Data were normalized to GAPDH mRNA expression and calculated relative to 1 in proliferative cells for each experiment. Data are indicated as means ± SD from 3 independent experiments. (D) Same as in (C), except that total RNA was subjected to strand-specific qRT-PCR and analyzed using the indicated primers (i, intron). See also Figure S3.
REV3L were confirmed by reverse transcription followed by qPCR (qRT-PCR) (Figures 2C and 2D). We confirmed that the increase in antisense expression was not due to an increase in the expression from the upstream genes, since ARHGAP18 and KIAA1919 gene expression did not vary much upon senescence induction (Figure 2C). ARHGAP18 and KIAA1919 START RNAs Are Produced by a Defect in Transcription Termination of Their Forward Gene ARHGAP18 and KIAA1919 START RNAs could be produced through an accumulation of transcriptional read-through in senescence or through transcription initiation at a promoter located close to the 30 end of the ARHGAP18 or KIAA1919 gene. To discriminate between these two possibilities, we in2436 Cell Reports 21, 2433–2446, November 28, 2017
tended to test whether these transcripts antisense to LAMA2 or REV3L were produced from a long RNA originating from the promoter of the forward gene (ARHGAP18 and KIAA1919, respectively). We used an siRNA (small interfering RNA)-based approach because, in a previous study, we successfully used this strategy to deplete nuclear ncRNAs in these cells (Lazorthes et al., 2015). Senescent cells were transfected with siRNAs targeting the two ends of the putative RNA at the ARHGAP18/LAMA2 locus (that is, the exon 1 of ARHGAP18 or the intergenic region located between the two genes; see Figure 3A). We found that the siRNA targeting the exon 1 of ARHGAP18 decreased by approximately 5-fold the expression of ARHGAP18 mRNA (measured in the spliced transcript) and of the ARHGAP18 mRNA/pre-mRNA (measured in exon 1), demonstrating that the siRNA is efficient (Figure 3B). The effect on ARHGAP18 pre-mRNA (measured in the last intron) was much lower (about 40%), as expected, since siRNAs are more efficient on mature cytoplasmic RNAs. A similar 40%
Figure 3. ARHGAP18 and KIAA1919 START RNAs Are Generated by Transcriptional Read-Through (A) Schematic representation of the ARHGAP18/ LAMA2 and KIAA1919/REV3L loci with the location of siRNAs (orange) and PCR primers (purple) used in Figures 3 and 4. (B) Senescent cells were transfected using an siRNA targeting ARHGAP18 exon 1 (ARHG e1) or control (Ctrl). 72 hr after transfection, total RNA was extracted and subjected to random qRT-PCR using the indicated primers (left) or analyzed by strand-specific qRT-PCR to monitor the expression of the region antisense to LAMA2 (right). Data were normalized to GAPDH mRNA expression and calculated relative to 1 in siRNA Ctrl-treated cells for each experiment. Data are indicated as means ± SD from 3 independent experiments. (C) Same as in (B), except that cells were transfected using an siRNA targeting the read-through region (ARHG R-th) in the intergenic region between ARHGAP18 and LAMA2 (38.7 kb downstream of the TTS of ARHGAP18). (D) Same as in (B), except that cells were transfected using an siRNA targeting KIAA1919 read-through region (KIAA R-th, 2.5 kb downstream of the TTS of KIAA1919), and strand-specific qRT-PCR was performed to monitor the expression of the region antisense to REV3L (right panel).
Cell Reports 21, 2433–2446, November 28, 2017 2437
Figure 4. START RNAs Repress the Genes to which They Are Antisense (A) Total RNAs from senescent cells transfected with an siRNA targeting ARHGAP18 exon 1 (ARHG e1), targeting the read-through region (ARHG R-th) or control (Ctrl) were extracted and analyzed by strand-specific qRT-PCR for LAMA2 (pre-mRNA/ mRNA) sense expression (e64). LAMA2 mRNA expression (e64-65) was analyzed by random qRTPCR using primers in an exon-exon junction. Data were normalized to GAPDH mRNA expression and calculated relative to 1 in siRNA Ctrl-treated cells for each experiment. Data are indicated as means ± SD from 3 independent experiments. (B) Same as in (A), except that cells were transfected using an siRNA targeting KIAA1919 read-through region (KIAA R-th). REV3L (pre-mRNA) sense expression was monitored by strand-specific qRTPCR in an intron (left panel, i32) or by random qRTPCR using primers in an intron located far away from the read-through antisense transcript (right panel, i2). The spliced REV3L mRNA was assessed by random qRT-PCR using primers in an exon-exon junction (right panel, e32-e33). (C) Total RNAs from proliferative or senescent cells were extracted and subjected to strand-specific RNA-seq experiments. For the 91 loci harboring START RNAs, the log2 of the variation in senescence (log2(sen/prolif)) of the expression of the reverse genes was computed and represented as a boxplot for both RNA-seq experiments (#1 and #2). Note the significant decrease (Wilcoxon test) in the expression of the reverse gene population compared to all expressed genes to which an antisense transcript is increased in senescence (2,135 and 1,349 genes in the first and second RNA-seq replicates, respectively). (D) Senescent cells were transfected using the indicated siRNAs. Transfected cells were then subjected to a colony formation assay in duplicate. Colony number was counted twice in each plate and calculated relative to 1 for the control siRNA. Data are indicated as means ± SD from four entirely independent experiments. See also Figure S4.
depletion was observed both on the RNA product measured in the intergenic region (1 kb downstream of the end of ARHGAP18 gene) and when measuring the transcript antisense to LAMA2 by strand-specific qRT-PCR (Figure 3B). This indicates that the RNA produced downstream of the ARHGAP18 gene and making a transcript antisense to LAMA2 originates from a long RNA containing the exon 1 of ARHGAP18 and not from a distinct promoter downstream of the 30 end of ARHGAP18. We confirmed this result with the siRNA targeting the intergenic region (Figure 3C). We observed an approximately 5-fold decrease in RNA expression at the intergenic region, indicating that siRNA-mediated depletion was efficient. Strikingly, the effect of this siRNA was similar on ARHGAP18 pre-mRNA, on the transcript antisense to LAMA2, and on ARHGAP18 mRNA (Figure 3C). Altogether, these results indicate that a large transcript, originating from the ARHGAP18 promoter and reaching the opposite gene, accumulates in senescence by transcriptional read-through from the ARHGAP18 gene. Similar results were obtained using an siRNA targeting the intergenic region between the two convergent genes KIAA1919 and REV3L: we found that this siRNA decreased to the same extent RNA expression measured in the intergenic region, the KIAA1919 mRNA/pre-mRNA, as well as the transcript antisense to REV3L (Figure 3D). Taken together, these data indicate that, at least on these two convergent gene loci, antisense RNAs are, indeed, generated by transcriptional read-through from the forward gene. 2438 Cell Reports 21, 2433–2446, November 28, 2017
START RNAs Repress the Expression of Their Reverse Genes In senescence, while the ARHGAP18 and KIAA1919 START RNAs increased, their reverse genes (LAMA2 and REV3L, respectively) were strongly repressed, suggesting that the two START RNAs could inhibit their expression. Accordingly, we found that depletion of the ARHGAP18 START RNA using both siRNAs led to an increase in LAMA2 pre-mRNA/mRNA (measured in an exon) (Figure 4A). This increase was inversely correlated to the decrease of ARHGAP18 START RNA, but not that of ARHGAP18 mRNA (compare Figures 3B and 3C). Moreover, strong depletion of ARHGAP18 START RNA using the siRNA in the intergenic region between ARHGAP18/LAMA2 also leads to an increase in the mature mRNA (measured in an exon/exon junction) (Figure 4A). Very similar results were obtained when depleting the KIAA1919 START RNA, with an increase in REV3L pre-mRNA (measured in introns) and mature mRNA expression (Figure 4B). Thus, these two START RNAs repress the expression of the genes to which they are antisense. Importantly, when we analyzed the expression changes in senescence of the 91 reverse genes, which are the genes to which the START RNAs are antisense, we found that this population is significantly repressed in senescence (difference to 0: p values = 0.014 and 0.02 for the first and second replicates of RNA-seq, respectively; Wilcoxon test) (Figure 4C). In particular, the population of the 91 reverse genes is significantly more repressed, compared to all genes
to which an antisense RNA is activated in senescence (Figure 4C). Moreover, within this population, 66% of the genes (60/91) are repressed in senescence, log2(sen/prolif [senescence/proliferation]) < 0, in the two RNA-seq replicates. This proportion is significantly higher than expected, with 54% of all expressed genes (8,752/16,288) being repressed in senescence, log2(sen/prolif) < 0, in the two RNA-seq experiments (p < 0.02, c2 test). The data strongly suggest that START RNAs repress the genes to which they are antisense (the reverse genes) and could, thus, participate in the control of the genetic program of senescence. Interestingly, gene ontology (GO) analysis demonstrated that genes involved in positive regulation of cell proliferation are specifically enriched within these 91 reverse genes (see Figure S4A for GO results). We next tested whether depleting specific START RNAs (ARHGAP18 and KIAA1919 START RNAs as well as START RNAs that are antisense to genes involved in proliferation control, corresponding to examples shown in Figure S1) could affect the senescence-associated permanent cell-cycle arrest. Because this permanent cell-cycle arrest is dependent on both the Rb and p53 pathways (Jeanblanc et al., 2012; Lazorthes et al., 2015), we also performed such experiments in cells made defective for either of these pathways by depleting p16 or p21. We first analyzed whether START depletion allows a significant number of senescent cells to re-enter the cell cycle by monitoring 5-ethynyl-2’-deoxyuridine (EdU) incorporation. We found that depleting the SERBP1 START RNA (generating a transcript antisense to IL12RB2) in senescent cells could promote EdU incorporation in a p16-depleted context, whereas depleting ARHGAP18, KIAA1919, or LINC01433 START RNAs (generating, respectively, antisense to LAMA2, REV3L, and ADRA1D) did not induce any significant effect (Figure S4B). Thus, some siRNAs targeting START RNAs can allow a significant fraction of senescent cells to enter S phase. We also performed a clonogenic assay, which is different from measuring EdU incorporation as it allows measuring sustained cell proliferation, even in a small fraction of senescent cells. We found that depletion of LINC01433 or ARHGAP18 START RNAs in senescent cells consistently led to an increase in the number of clones (Figures 4D and S4C). Altogether, the data in Figures 4 and S4 indicate that some siRNAs targeting START RNAs can reverse the permanent cell proliferation arrest associated with senescence. Importantly, in the case of the siRNAs targeting ARHGAP18 and SERBP1 START RNAs at least, this reversal was not due to forward gene depletion. Indeed, using siRNAs targeting SERBP1 or ARHGAP18 mRNAs, which decrease their respective mRNA at least as efficiently as the siRNA targeting the START RNA (compare the effects of siRNAs targeting the START RNA or the mRNA on ARGHAP18 mRNA expression in Figures 3B and 3C or on SERBP1 mRNA expression in Figure S4F), no effect on cell proliferation could be observed (Figures S4D and S4E). Finally, depleting LAMA2 prevented the increase in colony number induced by the depletion of ARHGAP18 START RNA (Figure S4D), indicating that the regulation of LAMA2 expression is required for the function of ARHGAP18 START RNA in senescence.
Altogether, these experiments confirm that some START RNAs participate in the maintenance of senescence-associated permanent proliferation arrest. Production of START RNAs Is due to an Increase in Transcription Downstream of the TTS in Senescent Cells We next investigated the mechanism underlying the activation in senescence of ARHGAP18 and KIAA1919 START RNAs. We first analyzed whether senescence is accompanied by a change in their stability. We treated cells with actinomycin D to inhibit transcription, and we monitored the half-life of START transcripts in proliferative and senescent cells. We found that these two START RNAs were more stable than introns, with a half-life between 2.5 and 5 hr. However, we did not find any significant change in their half-life in senescence, suggesting that the change in their expression is mostly transcriptional (Figure 5A). We next analyzed the level of transcription by the capture of nascent transcripts. We found that, indeed, compared to the transcription level in the ARHGAP18 gene (measured in exon1), a weak increase in transcription was detected just downstream of the end of the ARHGAP18 gene (1 kb) in senescence. However, ongoing transcription levels strongly increased further downstream in the intergenic region of the ARHGAP18/LAMA2 locus (38.3 kb down from the ARHGAP18 TTS) (Figure 5B). Altogether, these data indicate that the increase in readthrough transcript expression at these two START loci is largely due to an increase in transcription, indicating that Pol II transcribes farther from the TTS in senescent cells than in proliferative cells, so that the extent of transcriptional read-through increases in senescence. Production of START RNAs Is due to an Increase in the Pol II Elongation Rate Downstream of the TTS in Senescent Cells The extent of transcriptional read-through is dictated by the rate of different competing dynamic processes, including the rate of cleavage of the nascent RNA at the poly(A) site, the processivity of exonucleases, and the elongation rate of Pol II downstream the TTS (Proudfoot, 2016; Richard and Manley, 2009). Indeed, it was found using Pol II mutants that the rate of elongation by Pol II mutants correlates with the extent of read-through (Fong et al., 2015). We thus analyzed whether the Pol II elongation rate downstream the TTS of the ARHGAP18 gene changed in senescence. To monitor the Pol II elongation rate, we followed the retreating wave of transcription during kinetics after inhibition of Pol II promoter-proximal pause escape by flavopiridol, a potent P-TEFb inhibitor (Chao and Price, 2001; Jonkers et al., 2014; Rahl et al., 2010). We monitored, at different time points following flavopiridol addition, the expression levels of various transcript regions, including the pre-mRNA of the forward genes and the START RNAs. In this assay, the beginning of the RNA decrease approximately reflects the time taken by the last committed RNA polymerase to transcribe from the promoter-proximal pause site to the region analyzed by qRT-PCR, whereas the slope of the curve after this point reflects the stability of the RNA analyzed. Cell Reports 21, 2433–2446, November 28, 2017 2439
Figure 5. The Increase in ARHGAP18 START RNA Levels in Senescence Is Transcriptional (A) Total RNA from cells in senescence (SEN) and proliferation (PROLIF) were analyzed by qRT-PCR using the indicated primers at different time points following actinomycin D treatment. The levels of RNA were normalized to those of GAPDH mRNA and then normalized to 100% at time 0, for each experiment. Data are indicated as means ± SD from 3 independent experiments (logarithmic scale). (B) Senescent (+4-HT) and proliferative (4-HT) cells were subjected to an EU pulse labeling of 1 hr, after which nascent RNAs were captured. Nascent RNAs were then analyzed by qRT-PCR using the indicated primers. Data were normalized to GAPDH mRNA expression and calculated relative to 1 in proliferative cells. A representative experiment out of two is shown (means ±SD from the qPCR sample triplicates).
At the ARHGAP18 gene, we found that Pol II takes about 30 min to reach the intron 3-exon 4 junction (located 76 kb away from the TSS) (Figure S5A) and about 1 hr to reach intron 14 or the region located immediately downstream of the ARHGAP18 gene (located, respectively, 131 and 134 kb away from the TSS) (Figure 6), times that are consistent with the typical elongation rate of Pol II (1–4 kb/min) (Ardehali and Lis, 2009; Jonkers and Lis, 2015). Up to this location, the elongation rate 2440 Cell Reports 21, 2433–2446, November 28, 2017
of Pol II is similar in proliferative and senescent cells, since the two curves begin to decrease at the same time point (Figure 6). Strikingly, however, when we analyzed the expression of the read-through transcript 38.3 kb downstream of the TTS of the ARHGAP18 gene, which is 171 kb away from the ARHGAP18 TSS, we observed a major difference between proliferative and senescent cells. Whereas, in senescent cells, the decrease also began about 1 hr following flavopiridol treatment, this
Figure 6. Pol II Elongation Rate Increases in Senescence after the Poly(A) Site of the ARHGAP18 and KIAA1919 Genes Total RNA from cells in senescence (SEN) and in proliferation (PROLIF) were analyzed by qRT-PCR at different time points following flavopiridol treatment. The values for ARHGAP18 mRNA (e1) and GAPDH pre-mRNA (i1) (as controls) or ARHGAP18 intron 3-exon 4 junction (i3-e4), ARHGAP18 intron 14 (i14), ARHGAP18 read-through (1 kb and 38.3 kb downstream of the TTS of ARHGAP18 gene), and KIAA1919 read-through (measured 22.3 kb downstream of the TTS of KIAA1919) are shown. The levels of RNA are normalized to those of GAPDH mRNA and then normalized to 100% at time 0 for each experiment. Data are indicated as means ± SD from 3 independent experiments (logarithmic scale). When the SD is higher than the mean (for ARHGAP18 i3-e4 in proliferation, 2 hr), only the upper error bar is shown. See also Figure S5.
decrease was delayed by 1 hr in proliferative cells (Figure 6). After the beginning of RNA decrease, the curves were parallel, as expected, given that its stability does not change between proliferation and senescence (see Figure 5A). These data indicate that the elongation rate of Pol II is strongly decreased in proliferative cells at the ARHGAP18 START region. Importantly, we confirmed this result by monitoring nascent transcripts during a
similar kinetics of flavopiridol treatment (Figure S5C), as in the original technique used to monitor Pol II elongation rate (Jonkers et al., 2014). Similar results were obtained for the KIAA1919 START region, 22.3 kb downstream of the KIAA1919 TTS (Figure 6). These data indicate that, in proliferative cells, the Pol II elongation rate is limited downstream of the poly(A) sites of the ARHGAP18 and Cell Reports 21, 2433–2446, November 28, 2017 2441
KIAA1919 genes. In senescent cells, these mechanisms are no longer effective, leading to an increase in the extent of readthrough and likely explain the increase of these two START RNAs. START RNAs Are Repressed by H2A.Z in Proliferative Cells Because changes in the Pol II elongation rate downstream of the TTS at two START loci explain the increase in the extent of transcriptional read-through, we hypothesized that epigeneticbased mechanisms, meaning mechanisms relying on chromatin modifications without changing the DNA sequence and varying dependently on cell fate, could be involved in this regulation. In fission yeast, the histone variant H2A.Z and RNA-processing machineries have been involved in the repression of readthrough antisense transcripts at convergent gene loci (Anver et al., 2014; Zhang et al., 2011; Zofall et al., 2009). Interestingly, H2A.Z expression levels decrease in oncogene-induced (BRAFV600E) or replicative senescence of human melanocytes (Vardabasso et al., 2015). Moreover depleting H2A.Z in proliferative cells induces senescence (Ge´vry et al., 2007). In RAF1 oncogene-induced senescence, we confirmed that H2A.Z protein levels strongly decrease (Figure 7A). This suggests that this strong decrease in H2A.Z in RAF1-induced senescent cells might be important for the setting up of the senescent genetic program and could, therefore participate in regulating START RNA expression. We thus investigated whether the two START RNAs studied earlier are repressed by H2A.Z in proliferative cells. We found that depleting the histone variant H2A.Z led to an increase in ARHGAP18 and KIAA1919 START RNAs in proliferative cells (Figures 7B and 7C). Such an increase was not observed for ARHGAP18 pre-mRNA and mRNA, although ARHGAP18 was also stimulated when H2A.Z was strongly inhibited (siH2A.Z#1) (Figures 7B and 7C). KIAA1919 mRNA expression was stimulated by both siRNAs, although the less efficient siRNA (siH2A.Z#2) leads to a lower increase of mRNA expression than of START expression (Figure 7C). These data thus indicate that H2A.Z represses transcriptional read-through at least at the ARHGAP18 locus in proliferative cells. Finally, we analyzed the expression changes of START loci in two replicates of RNA-seq experiments in proliferative cells transfected with an siRNA targeting H2A.Z. Statistical analyses indicated that, as for ARHGAP18 START RNA, transcriptional read-through in this population of 91 START loci was significantly activated upon H2A.Z depletion in proliferative cells compared to forward gene expression (Figures 7D and S6A). Moreover, for each RNA-seq replicate in proliferative cells transfected with H2A.Z siRNA, more than 40 START RNAs out of 91 were more activated upon H2A.Z depletion than the pre-mRNA of their forward gene. These data indicate that H2A.Z represses the extent of transcriptional read-through at least at some START loci. Other regulators than H2A.Z are, however, likely involved, because depletion of H2A.Z does not induce the activation of the whole set of START RNAs. To analyze whether the global decrease in H2A.Z expression (Figure 7A) translates into a local decrease in H2A.Z occupancy at START loci, we performed H2A.Z chromatin immunoprecipita2442 Cell Reports 21, 2433–2446, November 28, 2017
tion (ChIP)-qPCR experiments. We found that H2A.Z occupancy decreased in senescence both at the promoter of the ARHGAP18 gene and at two locations in the region from which the read-through RNA is transcribed (Figure 7E). These data suggest that, in proliferative cells, H2A.Z is located on the ARHGAP18/LAMA2 locus to prevent accumulation of the readthrough RNA. To analyze whether this finding could be generalized to the 91 START loci, we next performed H2A.Z ChIP-sequencing (ChIPseq) experiments. As expected, we observed peaks of H2A.Z around the TSS, thus validating our ChIP-seq data (see Figure S6B for metadata analysis of H2A.Z occupancy at the TSS with respect to gene expression). We next analyzed senescence-induced changes in H2A.Z occupancy at START loci. In senescent cells, we observed a significant decrease in H2A.Z occupancy both at promoters of the forward genes and on the intergenic regions of the 91 START loci (Figures 7F and S6C), in a manner reminiscent of what we observed at the ARHGAP18/LAMA2 locus (Figure 7E). Importantly, this decrease was still observed when data were normalized for nucleosome occupancy (obtained by a ChIP-seq monitoring histone H3 occupancy) (Figure S6D). Thus, the decrease in H2A.Z occupancy at the 91 intergenic regions of the START loci correlates with the increase in transcriptional read-through during senescence induction. Taken together, these data indicate that transcriptional read-through is repressed in proliferative cells by H2A.Z-dependent mechanisms and that the senescence-associated decrease in H2A.Z expression allows the induction of these START RNAs in senescent cells. DISCUSSION We report here the existence of antisense RNAs produced by transcriptional read-through in a physiological context and from human protein-coding convergent genes. These RNAs are common in senescence, representing a whole family of antisense RNAs, whose expression is regulated during senescence progression. Moreover, we show that START RNAs can inhibit the expression of the gene to which they are antisense. Our data demonstrate that regulation of specific gene expression can occur through controlled transcriptional read-through from a convergent protein-coding gene. As such, this mechanism of gene expression regulation would participate in the response to environmental changes. The mechanism by which START RNAs repress their convergent gene expression remains to be elucidated, as it could be mediated by the RNAs themselves or by their transcription. Indeed, siRNAs have been previously described to affect the transcription of their target RNA (Stojic et al., 2016), and our unpublished results indicate that an siRNA targeting START RNA affects its transcription. At that point, we thus cannot discriminate between these two possibilities. START RNAs could, for example, act through the recruitment of chromatin-modifying enzymes. Examples of chromatin-modifying enzymes recruited by ncRNAs in cis have been widely described (Pelechano and Steinmetz, 2013), including by ourselves in cells undergoing senescence (Lazorthes et al., 2015). However, START RNA
Figure 7. ARHGAP18 and KIAA1919 Read-Through RNAs Are Repressed by H2A.Z in Proliferative Cells (A) WI38 hTERT RAF1-ER cells were induced, or not induced, to enter senescence by the addition of 4-HT for 3 days. Total cell extracts were analyzed by western blot using the indicated antibodies. One representative experiment out of 7 is shown. (B) Proliferative cells were transfected using the indicated siRNAs. 72 hr following transfection, total RNA was prepared and subjected to random qRT-PCR using the indicated primers. Data were normalized to GAPDH mRNA expression and calculated relative to 1 in siRNA control (Ctrl)-treated cells for each experiment. Data are indicated as means ± SD from 3 independent experiments. (C) Same as in (B), except that the H2A.Z#2 siRNA was used, and one representative experiment out of two is shown (means ± SD from the qPCR sample triplicates). (D) Same as in (B), except that total RNAs were prepared, depleted of rRNA, and sequenced. For the 91 START loci, the log2 of the variation upon H2A.Z depletion (log2(siH2A.Z/siCtrl)) of the expression of the forward genes and of the read-through domains (START RNAs) were computed. Note the significant increase (paired-Wilcoxon test) in the expression of the START RNAs compared to the forward genes upon H2A.Z depletion. The RNA-seq replicate #2 is shown. (E) Cells in proliferation (PROLIF) or senescence (SEN) were subjected to ChIP experiments using H2A.Z antibodies, using H3 antibodies, and without antibody (NA). The indicated sequences were quantified by qPCR in the inputs and the immunoprecipitates. The enrichment of the indicated sequence in the H2A.Z immunoprecipitate was calculated relative to the input and normalized to the enrichment in H3 immunoprecipitate. The ratio between the indicated sequence and a control sequence (GAPDH e1) was then performed and calculated relative to 1 in the H2A.Z ChIP in proliferative cells for each experiment. Data are indicated as means ± SD from 4 or 5 independent experiments. (F) Proliferative and senescent cells were subjected to an H2A.Z ChIP-seq analysis (ChIP-seq replicate #2). Boxplots show the log2 of the variation in senescence (log2(sen/prolif)) of H2A.Z occupancy at the forward gene TSS and the intergenic regions for the loci harboring START RNAs with an intergenic region >1 kb (to limit the analysis to intergenic regions because of the ChIP resolution of around 500 bp [69/91]). Note that H2A.Z occupancy significantly decreased in senescence among this population on these two regions. The p values of the difference to 0 are 5.33 , 1013 for TSS (Student’s t test) and 1.41 , 1012 for intergenic regions (Wilcoxon-Mann-Whitney test). See also Figures S6 and S7.
Cell Reports 21, 2433–2446, November 28, 2017 2443
transcription could, by itself, interfere with the transcription of the converging gene. Transcription, indeed, induces the denaturation of double-stranded DNA associated with Pol II progression, which could affect the binding of sequence-specific transcription factors, thereby affecting gene expression (Pelechano and Steinmetz, 2013; Stojic et al., 2016). Such a mechanism could probably explain the lower expression level of downstream genes compared to that of upstream genes present in tandem in the genome that we observe in our RNA-seq experiments (see Figure S2A). Convergent transcription could also lead to a reduction of gene expression by transcriptional collision mechanisms (Pelechano and Steinmetz, 2013; Prescott and Proudfoot, 2002). Interestingly, START RNA expression strongly decreases at the converging gene boundary (Figures 1B and S2C), in agreement with this mechanism taking place at START loci. We also found that the expression levels of convergent genes were lower than those of other genes (divergent genes and genes in tandem). Although we did not detect a global occurrence of read-through and antisense transcripts at these loci, this general tendency of convergent genes to be less expressed could reflect transcriptional interference between convergent genes (Figure S2A). In this study, we also provide major insights into START RNA regulation. We propose that the genes from which START RNAs are produced are prone to read-through transcription, because their transcription termination efficiency is intrinsically low. Indeed, read-through is already observed from the ARHGAP18 and KIAA1919 genes in proliferative cells, although to a lower extent than that in senescent cells (Figure 2A). This finding can be extended to the 91 START loci, in which read-through is found in proliferative cells from the forward gene at much higher levels than that from the reverse gene (Figure S7A). We propose that, in proliferative cells, the Pol II is slowed down downstream of the poly(A) site in order to favor transcription termination at such genes with low transcription termination efficiency. In senescence, these mechanisms are less efficient, and the elongation rate of Pol II concomitantly increases, allowing Pol II to reach the converging gene before transcription termination, therefore producing an antisense RNA. Expression of START RNAs would, thus, be regulated by the rate of elongation by Pol II downstream of the poly(A) site. Our data, thus, underline the importance of controlling the Pol II elongation rate downstream of TTS for the correct control of gene expression. Our data suggest that the histone variant H2A.Z participates in the epigenetic-based mechanisms that restrict the Pol II elongation rate at intergenic regions, therefore limiting the extent of transcriptional read-through. Indeed, we found that depleting H2A.Z increased transcriptional read-through at a significant number of genes (Figure S7B). Moreover, our data indicate that the occurrence of transcriptional read-through at START loci correlates during senescence induction with a decreased occupancy of H2A.Z at intergenic regions. Interestingly, H2A.Z was also shown to repress antisense RNAs produced by readthrough transcription in yeast (Zofall et al., 2009). However, in this latter case, H2A.Z was shown to promote antisense RNA degradation by the RNA exosome instead of repressing their transcription. Our study indicates that, while the control of anti2444 Cell Reports 21, 2433–2446, November 28, 2017
sense read-through transcripts by H2A.Z is conserved, underlining its functional importance, the molecular mechanism involved has diverged during evolution. Such a mechanism could allow the regulation of a whole family of antisense RNAs in regions prone to read-through transcription, and, as a consequence, this could be a major mechanism controlling the genetic program of senescence. Depletion of H2A.Z is known to promote senescence (Ge´vry et al., 2007), a finding that we confirmed in our experimental settings (Figures S7C and S7D), supporting the hypothesis that START RNA regulation could be important for senescence induction. In agreement with this, GO analysis demonstrated that genes involved in the positive regulation of cell proliferation are specifically enriched within the population of reverse genes that are repressed by START RNAs (Figure S4A). Moreover, depletion of specific START RNAs, although not having a drastic effect in senescence maintenance, leads to an inhibition of senescence-associated proliferation arrest in a subset of senescent cells (Figures 4D and S4). Although depletion of many START RNAs simultaneously may have much stronger effects, senescence reversal in only few cells could be sufficient to favor the emergence of post-senescence precancerous cells. Thus, our finding suggests the functional importance of START RNAs in the maintenance of the senescent phenotype. Commitment into a given cell fate would, as such, induce a specific signature of antisense read-through RNAs that could participate in setting up the genetic program associated with this cell fate. The mechanism of gene regulation that we uncover here, without new transcription initiation events, might be a simple and rapid way to respond to stress signals, such as oncogenic stress, as shown here. In agreement with this hypothesis, osmotic stress induces widespread read-through transcription in human cells (Vilborg et al., 2015) and a global non-coding RNA response in S. pombe (Leong et al., 2014). Moreover, cancerous cells were shown to exhibit frequent transcriptional read-through that could lead to chimeric RNAs at neighboring genes (Grosso et al., 2015; Maher et al., 2009; Varley et al., 2014). Our findings thus suggest that stress-response genes or other classes of genes that need to be rapidly regulated upon environmental changes may be evolutionary selected to be convergent to other genes, allowing their rapid regulation by the mechanism we describe here. This could provide a basis for conservation during the evolution of positioning and orientation of genes within eukaryotic genomes.
EXPERIMENTAL PROCEDURES Cell Culture WI38 hTERT RAF1-ER cells, which are immortalized by hTERT expression and contain an inducible RAF1 oncogene fused to an estrogen receptor (ER), were maintained in minimum essential medium (MEM) supplemented with glutamine, non-essential amino acids, sodium pyruvate, penicillin-streptomycin, and 10% fetal bovine serum in normoxic culture conditions (5% O2) (Jeanblanc et al., 2012). For the induction of oncogene-induced senescence, cells were treated with 20 nM 4-HT (H7904, Sigma) for 3 days. siRNA transfection was performed using the Dharmafect 4 reagent (Dharmacon) according to the manufacturer’s recommendations, except that 100 nM siRNA was used, and an equal volume of the culture medium was added 24 hr after transfection.
Cells were then harvested 48 hr later. Actinomycin D (A1410 Sigma) was used at a final concentration of 10 mg/mL. Flavopiridol (F3055 Sigma) was used at a final concentration of 1 mM.
analyzed using the Operetta High-Content Imaging System (PerkinElmer, Harmony Imaging Software 4.1). After data acquisition, subsequent analyses were performed with the Columbus 2.5.0 software (PerkinElmer).
Antibodies and Western Blotting GAPDH antibody (MAB 374) was purchased from Millipore. H3 (ab1791) and H2A.Z (ab4174) antibodies were purchased from Abcam. Whole-cell protein extracts were prepared using boiling buffer as in Lazorthes et al. (2015). Western blots were performed using standard procedures.
DATA AND SOFTWARE AVAILABILITY
RNA Extraction and Reverse Transcription Total RNA was prepared using the MasterPure RNA Purification Kit (Epicentre Biotechnologies) supplemented with Baseline-ZERO DNase (Epicentre) according to the manufacturer’s recommendations. Detailed procedures for strand-specific and random primed qRT-PCR are described in the Supplemental Experimental Procedures. ChIP-Seq and RNA-Seq ChIP was performed as previously described (Lazorthes et al., 2015), and total RNA was extracted as described earlier. Preparation of the different samples for sequencing and sequencing strategy are detailed in the Supplemental Experimental Procedures. Statistical Tests For each list of log2 ratios obtained as described in the Supplemental Experimental Procedures, we applied the statistical test of Shapiro to determine whether the list of ratios is normally distributed (p > 0.05) or not normally distributed (p < 0.05). To compare 2 lists, if at least one of the lists is not normally distributed, we applied the Mann-Whitney-Wilcoxon test; otherwise, we applied Student’s t test (Student’s t test if variances from the two lists are homogeneous; Welch t test if not). In both cases, if the p value is 0) in both proliferation and senescence datasets for the two RNA-seq replicates and based on the gene size and intergenic region size of more than 100 nt. Each gene and intergenic regions were divided in 20 equal regions. For each of these regions, the mean of the normalized number of aligned reads per base was computed for each gene and the median within the gene population is plotted. The (-) strand of RNA-seq datasets is shown with a dashed lane. Note the general trend of gene repression during senescence induced by the activation of RAF1 oncogene, as we already observed on 2 chromosomes (Lazorthes et al., 2015). (B) Same as in (A) except that isolated genes (without any neighboring genes in the 100 kb upstream or downstream regions) were analyzed. Note that the median signals of RNA-seq datasets on both strands were 0 for isolated genes, for (+) stranded-genes (281) and (-) stranded genes (276). (C) Graphs showing only the reverse genes of the 91 START loci that are shown in Figure 1B. To compute metadata analyses of RNA-seq datasets at the 91 START loci, the two DNA strands were assigned based on the direction of the forward genes or the reverse genes. The expression levels of the reverse genes (dashed lanes) at START loci are comparable to all expressed convergent genes shown in (A). The forward strand (antisense parts of the START RNAs) is shown with a full line.
Figure S3: Analyses of RNA-seq datasets in proliferation and senescence at the ARHGAP18/LAMA2 and at the KIAA1919/REV3L loci. Related to Figure 2. (A) Larger scale view of the (-) strand of RNA-seq datasets at the ARHGAP18/LAMA2 convergent gene locus. WI38 hTERT RAF1-ER cells were induced or not to enter senescence by 4-HT addition for 3 days. Total RNA was extracted and subjected to strand-specific RNA-Seq experiments (replicates 1 and 2). Strand-specific RNA-seq tracks show the density of aligned reads normalized by the total number of reads multiplied by 100 millions. Annotations from the Ref Seq (hg38), visualized in IGB browser, are also shown. The region indicated by dotted arrow corresponds to putative read-through RNA arising from the ARHGAP18 gene and whose expression is increased in senescence, while the ARHGAP18 gene expression does not change much. (B) The ARHGAP18 and KIAA1919 START RNAs are more increased far from the TTS than close to the TTS of the gene from which they originate. Strand-specific RNA-seq datasets from two independent experiments (RNA-seq #1 and RNA-seq #2) were analyzed to show the expression changes in senescence (log2(sen/prolif)) at each 10 kb interval of the region starting from the TSS of ARHGAP18 or KIAA1919 genes to the end of the read-through domain. ARHGAP18 gene (133 kb) and KIAA1919 gene (9.8kb) are shown with black arrows. Note that the increase in START RNA expression (red triangle, 194 kb for ARHGAP18 START RNA or 115 kb for KIAA1919 START RNA) is greater as one goes farther from the TTS.
Figure S4: Role of specific START RNAs in senescence maintenance. Related to Figure 4. (A) Gene Annotations co-ocurrence discovery (GeneCodis) analysis of the 91 START RNAs’ reverse genes. Only GO categories with three or more associated genes and with an adjusted p-value 25. We sorted data by position on the genome, and created an index of each bam file (bai format). For the ChIP-seq datasets, we removed PCR duplicates (i.e. reads mapped at the exact same position on the genome) using the samtools command rmdup-s. We obtained the following numbers of aligned reads: 58,217,899 (paired-end, RNA-Seq #1 in proliferation); 56,211,122 (paired-end, RNA-Seq #1 in senescence); 60,397,157 (paired-end, siCtrl RNA-Seq #1); 59,362,119 (paired-end, siH2A.Z RNA-Seq #1); 87,492,889 (paired-end, RNA-Seq #2 in proliferation); 118,930,777 (paired-end, RNA-Seq #2 in senescence); 124,469,392 (paired-end, siCtrl RNA-Seq #2); 165,195,640 (paired-end, siH2A.Z RNASeq #2); 35,734,480 (H2A.Z ChIP-Seq #1 in proliferation); 26,471,830 (H2A.Z ChIP-Seq #1 in senescence); 28,510,436 (H2A.Z ChIP-Seq #2 in proliferation); 23,072,049 (H2A.Z ChIP-Seq #2 in senescence); 61,367,366 (H3 ChIP-Seq in proliferation) and 58,281,965 (H3 ChIP-Seq in senescence). Aligned reads on the dm6 genome were as follows: 793,404 reads in the H2A.Z ChIP-Seq #2 in proliferation; 999,843 reads in the H2A.Z ChIP-Seq #2 in senescence; 52,944 reads in the H3 ChIP-Seq in proliferation and 56,277 reads in the H3 ChIP-Seq in senescence. We then converted the files in wiggle files using R via the rtracklayer bioconductor package. 2) Normalization: On these wiggle files, we then applied a normalization factor for each base position of the genome. For the RNA-Seq datasets, the normalization factor was: 108 / total numbers of aligned reads. For the first replicates of H2A.Z ChIP-Seq experiments, the normalization factor was: 107/ total numbers of aligned reads. For these first replicates of H2A.Z ChIP-seq (that do not include Spike-in chromatin), we added to the normalization by the total number of aligned reads another normalization step. For that, we calculated the mean number of reads surrounding TSS of specific genes that were analyzed by different numbers of conventional H2A.Z ChIP experiments followed by qPCR, as described in the Table below. We then calculated the difference between the variation in senescence versus proliferation obtained by ChIP-seq data and the variation in senescence versus proliferation obtained by conventional ChIP-qPCR. We then multiply this factor (2.27) to the mean number of reads per base pair in proliferation to obtain standardized data used throughout the paper. Promoter name ARHGAP18 CYR61 CDKN2A (p16) CDKN2B CDKN2A (p14ARF) SYDE2 RNU6-1 (NR_004394.1) KIAA1919 GAPDH DDAH1 (NM_001134445.1) mean (standard. factor)
Variation (Sen/Prolif) 1 ChIP-Seq (Log2) 0.7277 2.3814 0.3428 -0.7849 -0.3491 0.4838 0.4838 0.4784 0.7189 1.6407
Variation (Sen/Prolif) 2 ChIP-qPCR (Log2) -0.3426 (3) 0.6964 (2) -0.9476 (5) -1.9344 (5) -1.1693 (5) -1.4888 (2) -0.7330 (3) -0.0192 (3) 0.3248 (3) -0.3514 (2)
Difference ChIP-Seq/ChIPqPCR (Log2) 1.0703 1.6850 1.2904 1.1495 0.8202 1.6953 1.2169 0.4977 0.3941 1.9921 1.1811
Table showing the calculation of the standardization factor in the first replicate of H2A.Z ChIP-seq experiment in proliferation and senescence.
1
Variation of H2A.Z occupancy at promoters (from -1000 to +1000 from the TSS) between senescent and proliferative cells calculated from ChIP-Seq datasets. 2 Variation of H2A.Z occupancy at promoters between senescent and proliferative cells calculated by ChIP-qPCR. The number of independent experiments is indicated between brackets.
For the second replicates of the H2A.Z ChIP-Seq experiments and for the H3 ChIP-seq datasets (that do include Spikein chromatin), numbers of reads were adjusted by the Drosophila genome reference. The normalization factor was calculated as recommended by the manufacturer (sample 1 with the lowest Drosophila tag count / sample2). The ChIP-Seq and RNA-Seq normalized wig files as well as RefSeq genes were visualized and explored using the Integrative Genome Browser (IGB). To simplify the visualization of RefSeq genes, only one transcript variant is shown (the transcript variant #1 if several transcript variants exist with the same start and end or the longest transcript variant if several transcript variants exist with different starts or ends). 3- Calculation on a given interval of the genome: From the normalized wig files, we calculated on a given interval (e.g. the forward gene) the mean of the normalized (as described above) number of aligned reads per base pair (roughly corresponding to RPK10M (Reads Per Kb per 10 Millions of mapped reads) for datasets that were normalized with 108 / total number of mapped reads, as described above). For calculation of the ratios between 2 conditions (e.g. senescence versus proliferation), we added 0.01 to the mean of the normalized number of aligned reads per base pair, in each condition (e.g. normalized number reads in senescence +0.01/ normalized number reads in proliferation +0.01). When we calculated the mean over introns of a gene: we computed the mean of the normalized number of aligned reads per base for all introns and then we computed the mean of these values. Note that although the first replicate of the RNA-seq siCtrl and siH2A.Z was designed to be strand-specific, those datasets did not show strand-specificity. For this dataset, we thus only analyzed intervals, which do not show evidence of transcription on the 2 strands of DNA, such as the intergenic regions of the convergent gene pairs to analyze START RNA expression (Figure S6A). Identification of START RNAs using RNA-Seq datasets in senescence and proliferation This following graph summarizes the pipeline applied for the identification of START RNAs. All these steps are fully described below.
Part 1) Finding transcripts regulated in senescence in the 2 replicates: To identify START RNAs, we had to define their boundaries over the genome. To search for the boundaries of unannotated transcripts, we first designed an algorithm that we called “domainer” to determine all transcribed domains regulated (activated or repressed) in senescence existing in the two datasets (RNA-Seq in proliferation and in senescence). We applied this algorithm for the 2 RNA-seq experiments in proliferation and in senescence (replicates #1 and #2). This “domainer” is a multi-steps algorithm, as described in details below, that starts from very small intervals to compute the fold changes and then merge consecutive intervals whenever the conditions are conserved (absolute log2ratio Sen/Prolif > 0.5 and of the same sign) to finally define a large “transcript domain”, characterized by its genomic boundaries and whose expression changes in senescence. This analysis is exemplified on one locus for the 2 RNA-seq replicates #1 and #2 at the end of this section (Part 1) and described in details below: STEP 1: First, for each dataset (proliferation and senescence) we divided the entire genome into intervals of 200 bp. For each interval with a mean of the normalized number of aligned reads per base lower than 1, we set the mean at 1 by manually setting the total normalized number of aligned reads at 200. We did such a correction in order to avoid dividing by zero or a number close to zero, which could distort our computation of the log2 ratio Senescence/Proliferation. After this step, we computed, for each interval, the log2 ratio of the mean of the normalized number of aligned reads per base in Senescence over the mean of the normalized number of aligned reads per base in Proliferation. We computed this ratio for both (+) and (-) strands. Because we were looking for domains regulated in Senescence, we kept intervals with an absolute log2 ratio higher than 0.5. STEP 2: For each strand, we then merged consecutive intervals with ratios of the same sign and then we re-computed the log2 ratio Senescence over Proliferation for these new domains. STEP 3: For each strand, we merged consecutive intervals with ratios of the same sign when they were closer than 5 kb. We re-computed the log2 ratio Senescence over Proliferation for these new domains. STEP 4: We removed domains shorter than 1 kb to restrict the analysis to long transcribed regions. STEP 5: We next merged again consecutive domains with ratios of the same sign when they were closer than 5 kb. We re-computed the log2 ratio Senescence over Proliferation for these final domains. FINAL: We selected domains with an absolute log2 ratio higher than 0.5. This final list of domains (35,717 and 15,727 transcript domains were found in the first and second RNA-seq replicates, respectively) represented transcripts regulated (activated or repressed) in senescence and was crossed with the list of convergent genes in Part 3.
Examples of transcript domains identified by the algorithm in the two replicates of RNA-seq datasets in proliferation and senescence at the ARHGAP18 locus Strand-specific RNA-seq tracks ((-) strand) show the density of the number of aligned reads normalized by the total number of reads multiplied by 100 millions. Annotation from the Ref Seq (hg38) is also shown. The different steps of the algorithm used to find the boundaries of transcripts whose expression changes in senescence are shown. This algorithm consists of a sequential fusion of small consecutive domains whenever their fold change in senescence is conserved. Note that at this locus, the algorithm found a long transcript domain downstream the ARHGAP18 gene in both RNA-seq replicates.
Part 2) Finding convergent gene loci: We downloaded from the UCSC table browser the database hg38 (Dec. 2013 GRCh38/hg38), group Genes and Gene Prediction, track RefSeq Genes, table RefFlat containing all natural mRNAs and non-coding RNAs. For each gene (unique gene name), if the gene mapped at one location on the genome with only one variant, we kept for this gene one line with the coordinates of the gene. If the gene mapped at different places in the genome, we kept one line per position and counted the gene as many times as it mapped in different locations. If the gene mapped at one location but with more than one variant, we kept one line for the locus of the gene, the coordinates starting at the first start of the variants and ending at the last end of the variants (28,450). To select convergent gene pairs, we removed from this list the genes for which the boundaries were entirely included in the boundaries of another gene. We ended with 23,620 RefSeq genes. From this database, we removed HIST (histone) genes and we selected convergent genes. We ended with a list of convergent gene loci (6,342) that we used for Part 3. Part 3) Identifying ART RNAs: To select antisense read-through (ART) RNAs, we kept pairs of convergent genes (selected in Part 2) when there was at least one domain (identified in Part 1) present 2 kb downstream of one of the 2 genes and overlapping on the other gene in the antisense orientation. When read-through occurred on both strands, we kept genes twice, one for each strand. Then, we cleaned our selection by removing read-throughs when the convergent genes overlapped. We removed convergent gene pair when at least one of the two genes was not expressed, i.e. the signal is log2(0.01), in proliferation or in senescence. We ended with 2 lists of 1130 and 784 convergent gene pairs with a transcript domain (activated or repressed in senescence) that could be read-through from one of the convergent genes generating an antisense to the other gene (ART RNAs), identified in the first and second replicates of RNA-seq datasets, respectively. To find START loci, we removed convergent genes when at least one of the genes was a miRNA (microRNA) or a LOC (uncharacterized) ncRNA annotated gene, resulting in two lists of 1088 and 749 ART RNAs, from the first and second RNA-seq replicates, respectively. However, we analyzed miRNA or LOC ncRNA annotated gene loci separately with the same criteria as for the identification of START RNAs as described below in Part 4. Part 4) Identifying START RNAs: For each of these lists of transcript domains found in Part 3, which were identified in each RNA-seq replicate and corresponding to putative ART RNAs, we then applied stringent parameters based on the reproducibility between the 2 RNA-seq replicates. For that, we thus computed over different genomic regions of these convergent loci the fold change in senescence for the 2 RNA-seq replicates. We first selected ART RNAs that are activated in senescence (log2ratio (senescence/proliferation) of the antisense part of the ART RNA >0.5) in both RNA-seq replicates. We then selected activated ART RNAs with respect to the forward gene by calculating a read-through index, that is when the subtraction between the log2 ratio (senescence / proliferation) on the antisense part of the read-through and the log2 ratio (senescence / proliferation) on the forward gene pre-mRNA (mean of the means of the normalized number of aligned reads per base for all introns of the gene) was higher than 0 in both RNA-seq replicates. Finally, we removed from these lists ART RNAs when the score (that is the mean of the normalized number of aligned reads per base) computed for the intergenic region between the 2 convergent genes in senescence was lower than 1 in at least one of the RNA-seq replicates. This last cutoff was made to favor that the density was continuous from the forward gene towards the antisense part of the read-through. We ended with 65 and 66 transcript domains found in each RNA-seq replicate, respectively. These transcript domains correspond to START RNAs that are, reproductively in both RNA-seq replicates, activated in senescence (log2 ratio sen/prolif >0.5), activated in senescence with respect to their forward gene pre-mRNA expression (read-through index >0) and well-expressed in senescence (score in senescence of the read-through in the intergenic region >1). As 40 of these transcript domains were found in common, we kept the genomic boundaries of the one with the longest readthrough for further analysis, ending with a list of 91 transcripts.
For the Chi2 test, we tested the probability that 40 of these transcript domains were found in common between the 2 RNA-seq replicates, considering that there were 4068 converging genes in the genome from which START RNAs could originate. We selected 4068 convergent genes for this test using similar criteria than for START selection: we removed convergent genes when at least one of the two genes in the convergent gene pair was not expressed, if it contained a histone, miRNA or LOC ncRNA gene and if they were overlapping convergent genes (a range size for the intergenic region between the 2 convergent genes of 500 to 200,000 nt was used). We thus ended up with 91 START RNAs, which are reproducibly in both RNA-seq replicates 1) activated in senescence, 2) more activated than their forward gene in senescence and 3) well-expressed in senescence. This final list with the genomic annotation of these 91 START RNAs was then used throughout the manuscript for detailed analysis on two replicates of the various genome-wide datasets (RNA-seq in proliferative and senescent cells, RNAseq following H2A.Z depletion, H2A.Z and H3 ChIP-seq) (Table S1). For miRNA or LOC ncRNA gene loci, we applied the same criteria as for START RNA selection (log2Ratio of the antisense part >0.5 in senescence, Read-through index >0 and Score of the intergenic region in senescence >1). For the read-through index, if there was no annotation of introns, we used the genomic boundaries of the forward gene to compute the read-through index. One LINC RNA convergent gene locus, which did not present annotation of introns, but presented the characteristics cited above was added to this list. The final list of the convergent gene loci containing a miRNA or LOC ncRNA gene, which presents antisense read-through transcript activated in senescence, is shown in Table S2. Boxplots and metadata Figures and statistical tests were done using R. Boxplots show the median, the 1st and 3rd quantiles and the highest and lowest values (excluding outliers). Values were considered outliers if they were lower than 1st quantile – 1.5 x (3rd quantile – 1st quantile) or higher than 3rd quantile + 1.5 x (3rd quantile – 1st quantile). For metadata analysis of H2A.Z enrichment at +/- 2kb around the TSS (Figure S6B), we first divided the genes in four equal classes (high, medium high, medium low, low) based on their expression in both senescence and proliferation RNA-Seq datasets for each RNA-seq replicate. For the first replicate of RNA-Seq datasets, the following numbers of genes were obtained: High = 5969, Medium High = 5135, Medium Low = 5343 and Low = 6084. For the second replicate of RNA-Seq datasets, the following numbers of genes were obtained: High = 6079, Medium High = 5366, Medium Low = 5300 and Low = 6016. For each base in the region +/- 2kb around the TSS, we computed the mean of the normalized number of aligned reads for all the genes of the different classes for each replicate of H2A.Z ChIP-Seq datasets in senescence and in proliferation. In Figure 4, to compare the expression changes of the reverse genes in senescence with all the other genes or all the other genes to which an antisense transcript is activated, we used the same cutoffs than for the identification of START loci. As such, we removed from these lists the genes that were not expressed in proliferation or senescence (i.e. the signal is log2(0.01) on the gene body) and we removed histone, miRNA and LOC ncRNA genes for these analyses. For the Chi2 test, we thus analyzed the expression changes of all expressed genes in the 2 RNA-seq replicates (16288 genes in common). We also analyzed all the genes to which an antisense transcript is increased in senescence (2135 and 1349 genes in the first and second replicates, respectively).
Supplemental References Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837. Contrepois, K., Thuret, J.Y., Courbeyrette, R., Fenaille, F., and Mann, C. (2012). Deacetylation of H4-K16Ac and heterochromatin assembly in senescence. Epigenetics Chromatin 5, 15. Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., et al. (2012). Landscape of transcription in human cells. Nature 489, 101-108. Mattera, L., Courilleau, C., Legube, G., Ueda, T., Fukunaga, R., Chevillard-Briet, M., Canitrot, Y., Escaffit, F., and Trouche, D. (2010). The E1A-associated p400 protein modulates cell fate decisions by the regulation of ROS homeostasis. PLoS Genet 6, e1000983. Orlando, D.A., Chen, M.W., Brown, V.E., Solanki, S., Choi, Y.J., Olson, E.R., Fritz, C.C., Bradner, J.E., and Guenther, M.G. (2014). Quantitative ChIP-Seq normalization reveals global modulation of the epigenome. Cell Rep 9, 1163-1170. Parkhomchuk, D., Borodina, T., Amstislavskiy, V., Banaru, M., Hallen, L., Krobitsch, S., Lehrach, H., and Soldatov, A. (2009). Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res 37, e123. Tran, T., Teoh, C.M., Tam, J.K., Qiao, Y., Chin, C.Y., Chong, O.K., Stewart, A.G., Harris, T., Wong, W.S., Guan, S.P., Leung, B.P., Gerthoffer, W.T., Unruh, H., and Halayko, A.J. (2013). Laminin drives survival signals to promote a contractile smooth muscle phenotype and airway hyperreactivity. FASEB J 27, 3991-4003.