Chromatin and DNA sequences in defining promoters

0 downloads 0 Views 646KB Size Report
(downstream promoter element) [33], the MTE (motif ten element) ...... [93] J.P. Thomson, P.J. Skene, J. Selfridge, T. Clouaire, J. Guy, S. Webb, A.R. Kerr, ...
BBAGRM-00643; No. of pages: 11; 4C: 4, 8 Biochimica et Biophysica Acta xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Biochimica et Biophysica Acta journal homepage: www.elsevier.com/locate/bbagrm

Review

Chromatin and DNA sequences in defining promoters for transcription initiation☆ Ferenc Müller b,⁎, Làszlò Tora a,c,⁎⁎ a Cellular Signaling and Nuclear Dynamics Program, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), UMR 7104 CNRS, UdS, INSERM U964, BP 10142, F-67404 Illkirch Cedex, CU de Strasbourg, France b School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, B15 2TT Edgbaston, Birmingham, UK c School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore

a r t i c l e

i n f o

Article history: Received 26 June 2013 Received in revised form 11 November 2013 Accepted 11 November 2013 Available online xxxx Keywords: Epigenetic mark Nucleosome depleted region CpG island Core promoter RNA polymerase II Transcription initiation

a b s t r a c t One of the key events in eukaryotic gene regulation and consequent transcription is the assembly of general transcription factors and RNA polymerase II into a functional pre-initiation complex at core promoters. An emerging view of complexity arising from a variety of promoter associated DNA motifs, their binding factors and recent discoveries in characterising promoter associated chromatin properties brings an old question back into the limelight: how is a promoter defined? In addition to position-dependent DNA sequence motifs, accumulating evidence suggests that several parallel acting mechanisms are involved in orchestrating a pattern marked by the state of chromatin and general transcription factor binding in preparation for defining transcription start sites. In this review we attempt to summarise these promoter features and discuss the available evidence pointing at their interactions in defining transcription initiation in developmental contexts. This article is part of a Special Issue entitled: Chromatin and epigenetic regulation of animal development, edited by Dr. Peter Verrijzer and Dr. Elissa Lei. © 2013 Elsevier B.V. All rights reserved.

1. Introduction One of the key events in eukaryotic gene regulation and consequent transcription is the assembly of general transcription factors (called GTFs or TFIIs) and RNA polymerase II (Pol II) into a preinitiation complex (PIC) at core promoters. Initially GTFs were purified biochemically from human HeLa cell or rat liver extracts and defined in vitro as a set of factors essential for accurate transcription initiation at a strong TATA box- and initiator element (Inr)-containing viral core promoter [1,2]. The Pol II core promoter is defined as the stretch of DNA from where the transcription of a Pol II transcribed RNA is started [3]. GTFs include TFIID, which is composed of TATA-binding protein (TBP) and 13 TBP-associated factors (TAFs) [4–6]. According to the textbook view, the binding and recognition of core promoter sequences by the canonical TFIID is the first step that nucleates PIC formation. These early in vitro experiments led to generalised models of Pol II transcription initiation that were biased by the nature of the strong ☆ This article is part of a Special Issue entitled: Chromatin and epigenetic regulation of animal development, edited by Dr. Peter Verrijzer and Dr. Elissa Lei. ⁎ Correspondence to: F. Mueller, School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. Tel.: +44 12141 42895. ⁎⁎ Correspondence to: L. Tora, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), UMR 7104 CNRS, UdS, INSERM U964, BP 10142, F-67404 Illkirch Cedex, CU de Strasbourg, France. Tel.: +33 3 88 653444. E-mail addresses: [email protected] (F. Müller), [email protected] (L. Tora).

viral promoters, the naked DNA templates used and the cellular extracts, prepared from highly differentiated cells. Subsequent functional and genetic studies carried out in model systems, which reflect various stages of development in different model organisms, revealed the existence of alternative initiation complexes that have been suggested to replace canonical TFIID ([7–9] and refs therein). The diversity of the core Pol II promoter binding machinery has also been coupled with the characterisation of many different core promoter types and chromatin architectures at promoters [3,10]. The diversities in core promoter binding factors, core promoter elements and the associated chromatin signatures argue for a yet unappreciated dynamic regulatory step in transcription that is central to cellular homeostasis and thus to many developmental processes. The complexity arising from a variety of promoter-associated DNA motifs their binding factors and associated chromatin properties brings an old question to the fore once again: how is a promoter defined in the cells of an organism? This question has not been satisfactorily answered despite 30 years of investigation. Accumulating evidence suggests that several parallel acting mechanisms are involved in orchestrating a pattern marked by the state of chromatin and general transcription factor binding in preparation for directing transcription at predefined sites. These mechanisms can be associated with at least 3 sets of core promoter features in vertebrates, which together provide an integrated platform for transcription initiation: i) DNA sequence motifs in promoters, such as the TATA-box, remain important factors in recruitment of initiation complex-forming transcription factors, as suggested

1874-9399/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

2

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

by the original textbook view; ii) a second promoter-associated property is a characteristic base composition at transcriptional start site regions, which may promote nucleosome-free regions or support nucleosome depletion by chromatin remodelling; and iii) thirdly, the enrichment of CG dinucleotides in many promoters appears to provide templates for epigenetic mechanisms involved in defining promoter regions such as DNA (hypo)methylation and deposition of promoter associated histone modifications, at least in mammals. In this review we attempt to summarise these promoter features and discuss the available evidence supporting their interactions in various developmental and other ontogenic contexts. 2. Positionally constrained sequence motifs involved in transcription initiation 2.1. Core promoter elements Due to our still limited understanding of the exact molecular recognition mechanisms that lead to Pol II transcription initiation in vivo, prediction of eukaryotic Pol II promoters by sequence analysis remained uncertain and inaccurate. Pol II core promoters are recognised as modular in structure and many of them contain specific consensus sequence elements which are positionally constrained in relation to the transcription start site (TSS) [10–13]. The advancement of high-throughputsequencing-based TSS mapping technologies, such as CAGE, Gro-cap, RAMPAGE and Cap seq allowed genome wide (GW) mapping of the transcription starting positions of mRNAs on a variety of eukaryotic genomes and thus, the very accurate identification of transcription start sites (TSSs) ([14–20] and refs therein). These GW mapping methods fostered generalised models to aid in understanding of eukaryotic Pol II transcription regulation [14]. Core promoters typically span between −50 and +50 relative to the +1 TSS. The modular structure of core promoters consists of multiple short sequence elements that can be dispersed or overlapping and surrounding the TSS. Their functions are mostly context-dependent ([12] and refs therein) and their organisation is often gene or pathway specific (i.e. see the ribosomal protein gene and translation-associated snoRNA gene promoter-associated TCT initiator in the next paragraph in this section). Although core promoters direct initiation of transcription, they exhibit different transcriptional initiation properties. Depending on the number of sites from where Pol II initiates transcription, core promoters can be divided in two main categories: “single” or “narrow” TSS-containing promoters, where Pol II transcription initiates at a single, or narrow cluster of base pairs, and “broad” TSS promoters, in which transcription is initiated at several seemingly independent sites within a window of about 50–100 bps [21,22]. Interestingly, genes with restricted spatial expression patterns often corresponding to tissue-specific structural genes tend to have a single TSS-containing promoter, while genes with ubiquitous spatial expression often have a broad TSS promoters [14,16] (see also below). Despite the fact that many eukaryotic core promoters contain specific sequence motifs, i.e. a TATA box and/or an Inr, there seemed to be no universal motifs, which would allow the unambiguous definition of a core promoter in a given eukaryotic genome. This view has recently been challenged by the Pugh lab by analysing the organisation of the of human TFIIB-containing initiation complexes genome-wide [23]. They reported the widespread existence of a human core promoter consensus (SSRCGCCTATAWAWRNRTDKKKK(N)13YYA+1NWYY) that could be built from 4 previously described consensus motifs: the upstream TFIIB responsive element (BREu; SSRCGCC) [24], the TATA box consensus (TATAWAWR) [25], the downstream BRE (BREd; RTDKKKK) [26] and the Inr (YYA+1NWYY) [27], where the tolerance for mismatches in these elements would be 2–3–2–1, respectively [23]. Furthermore, this study detected about 8000 PICs at expressed coding gene promoters, and about 160,000 PICs genome-wide, but did not define whether the detected TFIIB-containing PICs were at single TSS-, or broad TSS-containing promoters. Note however, that TFIIB-recognition elements (BREu and

BREd) can have positive or negative effects on transcription in a promoter context-dependent manner [28]. At present it is not well understood how and where PICs are assembled on “broad” TSS-containing core promoters and how Pol II initiates transcription from seemingly random, independent sites within a broad window. At a “broad” TSS-containing core promoter several scenarios are conceivable: i) canonical or non-canonical TFIID complexes bind specifically to several sites at the broad promoter region and together with TFIIB determine multiple positions for each PIC formation that could also vary from cell to cell, ii) PICs assemble always at the same position, but the diffuseness in start site selection on these promoters occurs downstream of PIC assembly, or iii) the size of the nucleosome free regions would be longer on these broad TSS-containing promoters than on sharp TSS-containing promotes allowing several less specific TFIID/PIC positioning and the consequent multiple starts of transcription (see below). As several excellent recent reviews have described the characterisation and sequence composition of all the identified core promoter elements [12–14,29] such as the TATA box [30–32], the Inr [27], the DPE (downstream promoter element) [33], the MTE (motif ten element) [34], the Motif 1 and 6 elements [35], the BREu and BREd [24,26], the DRE (DNA replication-related-element) [35,36], the DCE (downstream core element) [37], the XCPE (X core promoter element) [38], the TCE (translational control element) [39], and the PB (the pause button) [40] (see also Fig. 1 in [14]), here we focus on just one and briefly describe the TCT initiator element, which demonstrates a specialised core promoter type adapted to a specialised biological process. The canonical transcription initiation region is characterised either by a full Inr motif (YYA(+1)NWYY) [41], or by YR(+1) consensus [21] in the TSS of polyadenylated transcripts of mammals. In contrast, a conserved but divergent sequence motif was described in a number of vertebrate genes including all of the ribosomal proteins [42,43]. One of the main features of the promoters of this gene class was a cytosine +1 at the dominant TSS surrounded by a tract of 4 to 13 pyrimidines, occasionally interrupted by one or two guanosine residues. This motif was termed as polypyrimidine initiator [42]. Later this initiator (also named as TCT) was further analysed functionally in Drosophila [44–46]. The TCT motif coincides with and replaces the canonical initiator at the TSS in nearly all Drosophila ribosomal protein genes, as well as in at least 48 human ribosomal protein genes [42,46]. In addition, the core promoters of some genes encoding translation initiation and elongation factors also contain a TCT element. In the TCT motif, which spans from −2 to +6 relative to the +1 transcription start site, pyrimidine nucleotides encompass the C+1 start site. This is a distinct type of initiation from that of the canonical, where Pol II has a very strong preference for A/G +1 start sites. Furthermore, in vitro foot-printing experiments demonstrated that the TCT element is not recognised by the canonical TFIID [46]. Thus, it appears that the TCT motif-containing promoters recruit a specialised PIC distinct from TFIID to achieve high levels of expression of ribosomal protein genes as well as to coordinate the relative amounts of corresponding translation initiation and elongation factors. This example demonstrates how a divergent and specialised sequence motif together with a yet unknown initiation complex coevolved for a specialised transcriptional function. 2.2. Differential usage of core promoter elements Alternative core promoter usage in a developmental stage-specific manner is a means for the utilisation of alternative transcription initiation mechanisms for individual genes active during development [8,13]. For example, transcripts expressed in the oocyte in Drosophila are deposited and maternally inherited in the embryo and have been shown to be transcribed from alternative promoters, selectively utilised in development [47]. Core promoters of genes expressing maternally inherited transcripts showed differences in motif composition when compared to zygotically active promoters. For example, DRE, Ohler 1 and 6 elements are enriched

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

in Drosophila genes that are transcribed in the female germline, and deposited in developing oocytes, while the canonical elements (TATA box, Inr or DPE) are largely absent from these genes [47–49]. In turn, TATA box and Inr are enriched in embryonically transcribed zygotic genes, particularly those that encode developmental regulators [47,50]. A global analysis of mammalian promoters concluded that alternative promoters mostly play a role at highly regulated developmental genes and that single peak promoter genes are more likely involved in general cellular processes active in a broad range of tissues [13,14]. Alternative promoter usage with changes in core promoter motif dependence has also been described in vertebrates. For example, mouse embryos from the one-cell to eight-cell stage of development or undifferentiated mouse embryonic stem cells did not require the TATA box for herpes simplex virus thymidine kinase promoter activity. Instead, the need for a functional TATA box was developmentally acquired and was shown to be dependent on at least two parameters: the differentiated state of the cell and stimulation of the promoter by either an enhancer or a transactivator [51]. Whether this kind of ontogenic specialisation of promoters is a general phenomenon, remains to be explored. The differential utilisation of core promoter motifs in development appears to coincide with differential expression of PIC subunits, which together suggest that the composition of PICs assembled at alternative promoters likely varies in a developmental stage or signalling pathway-specific manner [52,53]. Thus, sequence elements in core promoters are directly associated with initiation patterns and the spatiotemporal conditions under which they are utilised.

2.3. Promoter classes are characterised by variable transcription initiation patterns Initially, when comparing a limited number of promoter sequences, the existence of two classes of promoters was suggested, the TATAcontaining and the TATA-less promoters [54]. Recent genome-wide TSS identification methods extended these original observations. These GW studies indicated that vertebrate TATA-less core promoters often have high-CG content and are characterised by their overlap with CpG islands [3]. These CG-rich promoters are mostly associated with broad TSScontaining promoters characterised by multiple TSSs and have been associated with widely expressed or developmentally regulated genes [22]. The second class of promoters (originally called TATA-containing) has low-CG content and exhibits a precise start site at which most transcription initiates from a single nucleotide position (‘single’ or ‘narrow’ TSS promoters). Many, but not all, have a TATA box at a fixed distance from the TSS [14]. This distance is about 30 bps in metazoans, whereas a variable distance of 40–120 nucleotides is observed in yeast [55]. TATA-box-containing promoters are often associated with tissue-specific transcription [16,22]. A recent review when comparing GW studies from human and Drosophila cells, and combining core promoter sequence motif identification and gene ontology analyses suggested a more elaborate sub-classification of promoters and called them types I, II and III (see Table 2 in [14]). Type I has been suggested to consist of tissue-specific and/or ontogenic stage specific promoters characterised by a high enrichment for a TATA box at an appropriate distance from an Inr element. Type II promoters are associated with ‘housekeeping’ genes and genes that are regulated at the level of individual cells; in Drosophila they have either a DRE or a combination of motifs (Motifs 1 or 6, DRE) [29]. In mammals, type II promoters are also associated with ubiquitously expressed genes, and there is usually a short CpG island that overlaps with the TSS. Type III promoters in Drosophila have an Inr element only, or an Inr element plus a DPE. These promoters are preferentially associated with developmentally regulated genes, the expression of which is precisely coordinated across different cells in a tissue [50]. In mammals, type III promoters are associated with developmental

3

genes, which harbour several large CpG islands that often extend across the whole body of the gene [14]. 3. Nucleosome positioning and its sequence determinants at promoters The structure and composition of chromatin at promoters have a fundamental role in maintaining different cell states [56,57]. Nucleosomes are characteristically positioned and/or depleted at regulatory regions in eukaryotic genomes and thus contribute to the definition of promoters and enhancers. Most transcribed genes have reduced nucleosome occupancy over their promoters, organised in a canonical − 1 nucleosome, nucleosome-depleted region (NDR), and + 1 nucleosome arrangement (Fig. 1). Note that + 1 or − 1 nucleosomes are downstream, or upstream relative to TSS/direction of transcription. The well-positioned −1 and +1 nucleosomes contain the variant histone H2A.Z (called Htz1 in yeast) flanking the NDRs at the TSS of active genes. Nucleosome positioning and depletion at promoters is a highly dynamic process influenced by DNA sequences, DNA-binding transcription factors, histone variants, post-translational histone modifiers, and ATP-dependent chromatin remodelling complexes ([58–60] and refs therein) (Fig. 1). NDRs at promoters are also associated with Poll II pausing and divergent transcription of short unstable nuclear RNAs [61,62]. The GW mapping of nucleosomes in Saccharomyces cerevisiae has explored how the characteristic nucleosome positioning at promoters is regulated by the underlying DNA sequence. This work has revealed a 10 bp periodicity of bendable dinucleotides throughout nearly the entire 147 bp region wrapped around the histone octamer [63]. This distinctive dinucleotide pattern was proposed to facilitate the sharp bending of DNA around the nucleosome. These include ~10 bp periodic AA, TT or TA dinucleotides that oscillate in phase with each other and out of phase with ~10-bp periodic GC dinucleotides. Moreover, the linker regions exhibit a strong preference for sequences that resist DNA bending and thus disfavour nucleosome formation and likely contribute to nucleosome-free regions upstream of the +1 nucleosome ([59] and refs therein). Despite the fact that sequence preferences contribute substantially to nucleosome organisation in vivo, evidence for antinucleosomal DNA sequences has been previously reported in eukaryotes. Poly(dA:dT) and poly(dG:dC) tracts are intrinsically stiff and thus inhibitory to nucleosome formation [64–66]. It has been reported when analysing yeast and Caenorhabditis elegans genomes that sequences, which are less favourable for nucleosome formation are overrepresented at promoters [64,67]. Yeast promoters based on their NDRs have been divided into 2 classes; i) the housekeeping gene promoters, which have well defined NDRs (also called nucleosome-free region, or NFR), where the +1 nucleosome contains Htz1 (and the −1 nucleosome sometimes contains also Htz1), the promoter has no TATA box, while it contains longer poly(dA:dT) stretches, and is TFIID regulated, and ii) the inducible or stressregulated gene promoters, however contain less well defined −1 and +1 nucleosome positions, contain a consensus TATA box with shorter poly(dA:dT) and are SAGA regulated ([68] and refs therein). In yeast, the location of TSS 12–13 nucleotides inside the border of the +1 nucleosomes may suggest that the +1 nucleosome can control transcription initiation. Using differential MNase digestion of chromatin and highthroughput sequencing, a special group of nucleosomes termed “fragile nucleosomes” throughout the yeast genome was demonstrated mainly at NDRs. At stress-regulated genes, the presence of fragile nucleosomes prior to the occurrence of environmental changes suggests that the “fragile” nucleosomes at these promoters prepare the chromatin and the consequent transcription for a quick response to the environmental changes [69]. Drosophila and mammalian promoters have a different frequency and distribution of mononucleotides in their promoters than that in S. cerevisiae. Drosophila promoters are generally A/T rich with a peak of

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

4

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

NDR

AAAAATTTTT

a

TSS

-1

b

CpG

+1

CpG

TSS +1

-1 Act

c TSS

TSS

+1 +

-1

+1

d

REM

MOD

TSS +1

-1 H2A.Z

H2A.Z

H2A.Z

H2A.Z

e H3.3

-1

H3.3

TSS +1

Fig. 1. Mechanisms defining nucleosome depleted or nucleosome free regions in core promoters. These mechanisms may act individually and/or cooperatively, and involve a) DNA sequences, such as poly(dA:dT) tracts, that are intrinsically inhibitory to nucleosome formation; b) CpG density enrichment marks, which are hypomethylated (labelled with empty circles) and can nucleate H3K4me3 occupancy (see Fig. 2), c) DNA binding transcription factors, d) chromatin remodelers that reposition nucleosomes and histone modification complexes that deposit posttranslational modifications on nucleosomes around the TSS, and e) specific histone variants that replace canonical histones may create “fragile” nucleosomes, are dynamically positioned and/or removed from promoters. Abbreviations are: NDR, nucleosome depleted region; REM, ATP-dependent chromatin remodeler complexes; MOD, chromatin remodelling co-activator complexes that modify histones covalently; Act, transcription activation domain; TSS: transcription start site; H: histone.

A and T dinucleotides between −200 bp and the TSS. In contrast, many mammalian promoters have high C+G content and are characterised by CpG islands (CGIs) [70]. As non-homopolymeric (G+C)-rich regions generally favour nucleosome formation in vitro, human promoters were suggested to favour high nucleosome occupancy [65]. Recent deep sequencing data of human nucleosome positions indicated however that the influence of sequence on positioning of nucleosomes in vivo is rather modest [71]. In addition, nucleosome exclusion by (dA:dT) dinucleotide stretches has been suggested to have a less important role in nucleosome positioning within fly and human genomes [71,72]. Thus, it appears that anti-nucleosomal-positioning DNA sequences contribute to transcriptional regulation in eukaryotes, but the extent of their importance seems to vary from yeast to mammals. The mechanisms that control the size and location of NDRs are not yet fully understood. NDRs are thought to be flanked at least one, or both, sides by histone H2A.Z variant-containing nucleosomes in eukaryotic cells [72–75]. In addition to these observations, the distribution of nucleosomes carrying both H2A.Z and H3.3 histone variants in vertebrates was described to be unstable and distinct from the distributions of nucleosomes carrying either H3.3 or H2A.Z alone [60]. The NDRs of active promoters are likely to be occupied to a certain extent by labile H3.3/H2A.Z nucleosomes. These unstable nucleosomes could then serve to prevent the NDR from being covered by adjacent canonical nucleosomes or other factors. At the same time, because of their relative instability the H3.3/H2A.Z nucleosomes could more easily be displaced by transcription factors or different components of the PIC. These results suggest a model for the chromatin structure at vertebrate promoters where the TSS regions dynamically cycle between being occupied by unstable nucleosomes or by GTFs, and being occupied by canonical nucleosomes without variant histones when the TSS is silent [60]. In addition to DNA sequences and histone variants, transcription factors also play an important role in regulating nucleosome occupancy at promoters. Transcription factor binding sites are often found within NDRs, and it is widely believed that transcription factors participate in

nucleosome positioning. Transcription factors also recruit chromatin modifiers and remodelling complexes that are also involved in modifying and reshuffling nucleosomes around promoters. Despite the fact that DNA sequences can be potential drivers of nucleosome organisation at certain promoters, chromatin modifying and remodelling machineries often override sequence signals and can drive nucleosomes to occupy intrinsically unfavourable DNA elements or evict nucleosomes from intrinsically favourable sites. In agreement it has been demonstrated in S. cerevisiae that at the majority of promoters, normal positioning of NDR-flanking nucleosomes requires the essential multisubunit chromatin modelling complex RSC, as in RSC-depleted cells, NDRs shrink such that the average positions of flanking nucleosomes move toward predicted positions [76]. Moreover, it has been proposed that RSC can partially unwound nucleosome complexes around the NDRs and this can facilitate activator binding to NDR sites [77]. Thus, chromatin remodelling can displace or destabilise nucleosomes from the vicinity of TSSs, leaving nucleosome-free DNA for PIC assembly. As TFIID/TBP cannot bind nucleosomal DNA [78] the nucleosome-free region may be the signal for TFIID binding, PIC assembly and consequent Pol II transcription. Following its binding to the pre-assembled PIC scaffold, Pol II starts to transcribe and then pauses + 30 to + 60 bp downstream from the TSS at many if not all transcribed gene promoters ([79,80] and refs therein). In Drosophila systematic assessment of genome-wide Pol II pausing positions relative to the TSS revealed two characteristic groups of pausing patterns: focused proximal (Prox) and dispersed distal (Dist) pausing. On genes having strong Prox Pol IIs, a complex interaction model has been suggested that involves the physical interaction of the PIC scaffold, which can extend up to 30 bp from the TSS (28), with the paused Pol II complex [80]. These GW observations basically extend the space that the PIC (including the engaged and paused Pol II) occupies at promoters and at the very 5′ end of the transcription units. Furthermore, it has been proposed, that in Drosophila most genes position the +1 nucleosome such that they interact with a transcriptionally

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

engaged, but paused polymerase [72]. As in Drosophila and mammals, Pol II pausing downstream of core promoters of transcribed genes is coincident with lower nucleosome occupancy in the NDRs, it has been suggested that nucleosomes are positioned by paused polymerases binding downstream of the TSSs ([62] and refs therein). Thus, at present it is not known whether the + 1 nucleosome is causative or just participatory in the pausing and consequent transcription regulation. Nevertheless, it seems that PIC components such as TFIID and paused Pol II compete with nucleosomes for core promoter occupancy in the NDR and this is an important regulatory step in transcription. In mammalian cells NDRs at most promoters were also reported to be regulated by divergent transcription [61]. Surprisingly, on transcribed Pol II gene promoters, a second peak of transcriptionallyengaged polymerase was found to pause in the anti-sense direction upstream of promoters. Interestingly, this pair of paused polymerases would keep this region nucleosome free or depleted. Thus, the NDR associated with most active promoters could also, at least in part, be defined by divergent transcription [61]. Thus it is conceivable that paused polymerases in both orientations are necessary to keep the NDRs open and free of nucleosomes during the expression period of a given gene. 4. CpG island, a platform for promoter associated epigenetic regulation DNA methylation is a reversible epigenetic mark with wide ranging functions in transcriptional silencing, genomic imprinting and cellular differentiation [81,82]. Methylation of cytosine at CpG dinucleotides is established by the methyltransferases DNMT3A and DNMT3B and is subsequently maintained by the methyltransferase DNMT1. In vertebrates DNA methylation of CpGs is associated with transcriptional repression across the genome through methyl cytosine binding proteins (MBD) recruiting histone deacetylases (HDAC) or through blocking transcription factor binding. Core promoter regions often carry enrichment for CpG islands, or CGIs ([83], reviewed in [84]). CGIs often mark transcription initiation sites and have been suggested to be present in the majority of mammalian promoters including both tissue specific and constitutively active genes [85]. Many CGIs have also been identified either within or between characterised transcription units and have been termed “orphan” CGIs. Interestingly, GW analyses have confirmed that about 40% of orphan CGIs represent novel promoters [85,86]. Mammalian cells also use active DNA demethylation for gene regulation. Most CGIs when demethylated participate in promoter function by destabilising nucleosomes, thereby attracting the transcription machinery and thus creating a transcriptionally permissive chromatin state ([87] and refs therein). Methylation-free CGIs appear in orthologous genes among vertebrates suggesting a conserved epigenetic feature [88], which mostly remains hypomethylated even in terminally differentiated tissues irrespective of gene activity [89]. A recent study suggests that protection from DNA methylation is a built-in characteristic of CGI-containing promoters and is associated with R loop structures [90]. CGIs are characterised by coverage with nucleosomes, which carry promoter associated posttranslational histone modifications (PTMs, e.g. H3K4me3, H3K9Ac and H3K27me3), which raises the question of the role of CGIs in regulation by epigenetic mechanisms involved in defining promoter regions and promoter activity. Strikingly, CGIs are almost always a feature at promoters active during early embryonic development [91]. The family of zinc finger CXXC proteins, which include MLL, Cfp1 and KDM proteins (reviewed in [92]) have been shown to recognise CGIs and recruit chromatin modifying enzymes to regulate methylation of histone tails associated with both active and inactive chromatin. CGI sequences exogenously inserted into the genome can induce the recruitment of chromatin modifying machineries that deposit H3K4me3 [93] and as a result CGIs at promoters tend to be marked by H3K4me3 independently of gene activity [94,95]. Thus,

5

CGIs themselves can nucleate the deposition of H3K4me3, but this alone does not suffice to induce the recruitment of transcriptional machinery to a promoter [93]. In other words, CGIs do not function as promoters themselves. How they contribute to the generation of transcriptionally permissive regions at core promoters (but not at sites of ectopic CGIs with H3K4me3) is unclear. Nevertheless, CGI regions do coincide with short NDRs at the transcription start region [96]. Thus, nucleosome depletion has been proposed to be due to nucleosome instability intrinsic to CGIs, however, the argument against this model is that NDRs tend to be shorter than the CGIs. Alternatively, binding of pioneering transcription factors and general transcription factors, which often contain GC rich binding motifs, such as Sp1, co-occurs in CGIs [84,97,98], which may better explain the nucleosome depletion in core promoters. Consistent with a model, which suggests that a transcriptionally permissive environment is generated at CGIs, Pol II has been found to be present in many CGI promoter regions even when transcripts were not detected from the corresponding genes [84,94]. However, recent papers studying DNA methylation patterns in diverse vertebrates suggest that CGIs as calculated in mammals are not accurate predictors of DNA methylation patterns at promoters in anamniotes (such as fish and frogs) in which CpG and GC contents are substantially different [88]. Yet, a high degree of evolutionary conservation of orthologous unmethylated promoters was observed between fish and mammals [88,99]. This raises the question of what exactly is the equivalent signal in fish, originally associated with CGIs in promoters of mammals, and whether there is a unifying DNA sequence property responsible for the highly conserved pattern of promoter hypomethylation in vertebrate species with very different GC contents and CpG densities. In addition, it seems that CGIs are not the only sequence determinants known to contribute to hypomethylation of promoters. The comprehensive analysis of Lienert and colleagues identified promoter specific short sequence modules uncoupled from CpG density that were able to define hypomethylation in ES cells on the nanog promoter in a transcription-independent manner [100]. These DNA sequences contained binding motifs for Rfx family factors and together with CGIs were suggested to generate methylation depleted regions in promoters. Besides generating transcriptionally permissive states, CpG methylation is also associated with transcriptional silencing through chromatin condensation into a transcriptionally repressive conformation [101–103]. Silencing of CGI promoters is achieved through recruitment of the Polycomb protein complex PRC2, which is followed by H3K27 trimethylation and DNA methylation of otherwise methylation-free, broad CGIs, and subsequent silencing [104]. A unique subclass of genes encoding transcription factors and developmental regulators appears to share exceptionally long CGIs in ES cells and embryos [105]. These long hypermethylated CGIs are also covered by H3K27me3 indicating that Polycomb repression is clustered across the whole gene locus in this subclass of genes [88,95,104,106]. Intriguingly, the presence of H3K27me3 on promoters of lineage specifying transcription factors and other developmental regulators has been shown to coincide with H3K4me3 in pluripotent ES cells and early embryos [107–109] and the coexistence of these contrasting epigenetic marks has been referred to as bivalency (reviewed in [110]). Cooccurrence of both repressive and activity associated histone marks on promoters was not confirmed in Xenopus, where these marks were seen to be deposited in separate cellular lineages from the commencement of their appearance [106] and bivalency has not been detected in Drosophila either. Nevertheless, bivalency in mammalian ES cells has been well established and recently the distribution of H3K4me3 and H3K27me3 on nucleosomes molecularly dissected [111]. Voigt et al. (2012) provide direct evidence for asymmetrically modified nucleosomes with opposing histone tails carrying the opposing histone marks and suggest that these asymmetric nucleosomes are responsible for pluripotency-associated functions [111]. Bivalency may contribute to correct spatial and temporal regulation of lineage specifying genes through poising the promoter for later lineage specific activity by H3K4me3 and

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

6

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

parallel repression through H3K27me3 to avoid premature or ectopic activation (reviewed in [110]). How the opposing histone modifications target these special classes of genes remains mostly unknown. In ES cells H3K4me3 is likely deposited in a CGI associated manner, and this process includes interaction between Wdr5 of the MLL complex and pluripotency factors such as Oct4 ([112,113] and see in this issue Jarabek et al.). The mechanism and sequence specificity of targeting of Polycomb repression complexes to promoters remain one of the key problems of epigenetics. Sporadic data so far suggest involvement of cis-acting Polycomb response elements in Drosophila and in mammals [114–116] as well as trans acting long non-coding RNAs, such as HOTAIR ([117] and see in this issue Nakagawa and Kageyama), have also been shown to bring PRC2 complexes to their targets (reviewed in [118,119]). 5. Hierarchies and interplay of epigenetic mechanisms and transcription initiation — a developmental perspective The co-localisation of DNA sequence information in promoters for transcription factor binding and for deposition of promoter-specific histone modification marks associated with lineage specifying regulatory interactions raises the question of hierarchy between transcription regulatory mechanisms at core promoters. Recent development in elucidating the epigenetic events acting on promoters during early development in vertebrate embryos may provide clues to these hierarchies. However, the technological limitations of detecting transcription factor binding and histone modifications at very early stage embryos and in the same cells have yet prevented conclusive description of temporal events and thus hindered the analysis of causative relationships. Nevertheless, high-resolution GW analyses of transcription initiation and associated chromatin states have been carried out in a variety of vertebrate organisms with focus on early development when genome activation can be associated with epigenetic regulatory mechanisms. Here we focus on the development of anamniotes, which provide a unique advantage among vertebrates for the analysis of the temporal hierarchies of transcriptional and epigenetic events on promoters. These vertebrate model organisms have a long early period of transcriptionally inactive cell divisions during which epigenetic events can be monitored on promoters and their interaction with transcription initiation addressed. Early cell division in fish and frogs is characterised by a series of fast, synchronous cell divisions, which lack G phases and transcriptional activity until the midblastula transition [120–122]. Promoters are mostly hypomethylated from the start of zebrafish embryonic development despite major remodelling of DNA methylation patterns inherited from the oocyte and sperm [99,123]. The generally hypomethylated promoters become marked by promoter-associated histone modifications (H3K4me3, H3K27 me3 and H3K9ac) when the zygotic genome is activated [106,109,124] suggesting that histone modification patterns contribute to the preparation of promoters for gene activation. Whether this pre-marking involves general transcription factor binding has not yet been demonstrated. A sharp upregulation of H3K4me3 at the start of transcription has been described in fish and frog [106,109] suggesting transcription-coupled enrichment for H3K4me3 at promoters [109]. However, low but distinct levels of histone modification (H3K4me3, H3K27me3, H3K9a) were observed on promoters of homeostasis and developmental genes well before zygotic genome activation [124], suggesting pre-marking of promoters prior to developmentally controlled activation during embryo-genesis. This observation suggests that histone modification patterns could be predictive or even instructive for transcription regulation during early development, however the temporal resolution of transcription initiation and histone modification patterns on promoters does not allow the establishment of causative relationships. Histone modifications may merely indicate the establishment of the permissive state of promoters before genome activation. While the causality between chromatin modifications and dynamic transcription is debated [125,126], it is undisputed that histone

modifications – or at least the presence of modifiable histone residues – are required for correct gene regulation. Loss-of-function evidence implies requirements for histone PTMs in Polycomb repression of developmental genes [127] and maintenance of epigenetic inheritance in the frog embryo [128]. Modifiable residues in histones have been shown to be critical for heterochromatin formation in the mammalian embryo [129]. Moreover, loss of function of proteins in H3K4 trimethylating complexes and PRC2 histone modifying complexes leads to severe embryonic lethal and ES cell phenotypes (reviewed in [110]). Interestingly, the presence of PTMs (e.g. H3K4me3, H3K27me3) on promoters in human, mouse and zebrafish sperm [130–133] has also been described. In zebrafish these modifications appear to occur on an overlapping set of genes that are pre-marked in early embryos [124]. Similarly, in early mouse embryos, expression of genes shows correlation with histone modification patterns observed in the sperm [131]. These remarkable parallels suggest that epigenetic information laid down during germ-cell formation in the form of histone modifications, could be inherited from the sperm to the embryo. With current technologies this suggestion is hard to prove and the possibility of a transgenerational epigenetic inheritance through histone marks remains an exciting hypothesis at this point. Histone marks can be inherited during cell division (reviewed in [134,135]), which may occur through the copying of histone PTM patterns by TrxG and Polycomb (PcG) proteins during replication [136,137]. Evidence for trans-generational epigenetic inheritance is accumulating (reviewed in [138]), but whether histone PTMs are involved in this process remains contested. Nevertheless, a H3K4me3 methylation complex composed of ASH-2, WDR-5 and SET-2 (and probably other subunits) has recently been implicated in transgenerational epigenetic transmission in C. elegans [139]. The presence of similar histone modifications on promoters in sperm and the early embryo may, however, merely reflect the intrinsic capacity of specific DNA sequences (such as CGIs) to drive similar modification patterns. Indeed, establishment of H3K4me3 occurs de novo in the pre-transcriptional embryo upon developmental beta-catenin signalling, which promotes Pol II binding to beta-catenin target promoters before zygotic genome activation and transcription [140]. This observation is in line with the model in which the deposition of promoterassociated marks, such as H3K4me3, occurs before transcription due to the recruitment of paused Pol II (reviewed in [141]). The recently uncovered molecular interactions between H3K4me3 and TAF3 in mammalian cells suggest an alternative, by which H3K4me3 may recruit the preinitiation complex. Thus, according to this model, the positive mark precedes Pol II [142,143] (and see below). Taken together, promoter-associated histone PTMs are either inherited or deposited in the early embryo prior to gene activation and potentially represent regulatory signals for developmental gene regulation. Determination of the contribution of modified histones with respect to defining the site of transcription initiation during zygotic genome activation would require both, i) a detailed, high-resolution analysis of histone PTMs in the gametes and the earliest stages of embryos, and ii) analysis of the transcriptome before and after genome activation. This has not yet been possible due to the technological constraint of a limited number of cells, particularly in mammalian embryos. Future work will be required to explore the earliest events on promoters of the developing embryo leading to transcriptional activation of the zygotic genome. 6. Protein interactions between PIC and promoter associated chromatin “Reading”, or binding to, histone PTMs by PIC components around core promoters is thought to participate in transcriptional regulatory mechanisms [144]. The tandem bromodomains of human TATA-binding protein-associated factor-1 (TAF1) bind diacetylated histone H4 tails [145], an epigenetic mark that is often present at transcriptionally active promoters. Thus, the bromodomains of TAF1 may serve to target TFIID to

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

promoters near or within regions, which contain nucleosomes with H4 acetylation. Additional HAT cofactors could then acetylate appropriate lysine residues on histone H4 near core promoters thereby increasing the affinity of TFIID for the promoter or stabilising its binding. Interestingly, many transcriptional co-activator or chromatin modifying complexes contain bromodomains that may bind to other acetylated histone residues and thereby participate in further opening the NDRs or keeping them open for more efficient transcription. The mouse TAF1 homologue Brdt, a testis-specific member of the double bromodomain-containing protein family, cooperatively binds via its two bromodomains to hyperacetylated histone H4 tails and is implicated in the marked chromatin remodelling that follows histone hyperacetylation during spermiogenesis [146]. Thus, Brdt is an essential regulator of male germ cell differentiation that uses its bromodomains in a developmentally controlled manner, first to drive a specific spermatogenic gene expression programme by binding to hyperacelyted H4 tails, and later to control the tight packaging of the male genome [147]. Furthermore, a direct interaction between the PHD domain of human TAF3, another TFIID subunit, and H3K4me3 has been demonstrated to be important for Pol II transcription on target promoters [142]. In a further study, H3K4me3–TAF3/TFIID interactions were shown to nucleate PIC assembly and potentiate transcriptional activation [143]. Moreover, in vitro the H3K4me3–TAF3/TFIID interaction became crucial for function when the tested promoter was TATA-less [143]. Thus, it is conceivable that in vivo H3K4me3–TAF3/TFIID interactions are important in stimulating PIC formation and transcription initiation mainly on TATA-less core promoters. As TFIID is supposed to bind directly to nucleosome-free promoter DNA sequences, the TAF1–H4ac and the TAF3–H3K4me3 interaction suggest that TFIID would bind to nucleosomes on the borders of nucleosome-free regions established around promoters and that the TAF chromatin interactions could participate in further stabilising and/ or localisation of TFIID at or close to the TSSs [144]. This model provides an attractive possibility because promoters are generally located near nucleosome free regions and would include a direct targeting of TFIID to transcriptionally relevant positive chromatin marks. Such a mechanism could play a role in enhanced initiation through increased recruitment of TFIID to genes containing a combination of multiply acetylated and methylated nucleosomes at the borders of NDRs in the vicinity of TSSs. Nevertheless, future experiments will be needed to better understand how PIC subunits achieve combinatorial readout of multiple histone modifications possibly on both sides of NDRs and how these histone PTM binding domains contribute to regulating transcription initiation by Pol II on different types of promoters bearing distinct sets of epigenetic modifications. At this stage two scenarios are conceivable: i) positive histone PTMs are first deposited around the NDRs by transcriptional co-activators and the TFIID-containing PIC including Pol II would bind to this pre-marked region, or ii) TFIID-containing PIC including Pol II will first bind to the nucleosome depleted promoters and the positive histone PTMs are deposited as a consequence of this binding to mark promoters that will become active (Fig. 2). In both cases a synergistic relationship between TFIID binding and promoterassociated positive chromatin marks would be needed for the transcription machinery to achieve efficient transcription initiation in vivo (Fig. 2). 7. Perspectives and future directions Despite the major breakthroughs of the recent decade in defining transcription start sites at single nucleotide resolution and genome wide observations of promoter characteristics, yet the old question of what defines a core promoter, remains only partially answered. While a range of sequence determinants of general transcription factor binding have been identified and GTF complexes binding to promoters characterised, we remain unable to predict TSS position and distribution

7

from the analysis of the DNA sequence, indicating that the underlying codes are only partially understood. Importantly, the relationship between transcription factor binding sites and nucleosome positioning signals in defining the chromatin architecture at promoter regions is even less well understood. Taken together, the definition of the core promoter remains an unresolved problem, but expected to consist of sequence determinants for 3 layers of information: i) nucleosome positioning, ii) promoter associated histone variants and modifications and iii) core promoter binding transcription factor binding sites. The relative importance of these determinants appears to vary between promoter classes and promoter function in distinct spatial and temporal ontogenic contexts. To better understand the promoter code, the new genome wide assays need to be combined with classic reductionist approaches and the functional relevance of predicted sequence determinants meticulously interrogated on selected examples by conventional as well as novel molecular biology tools. Below, we highlight areas of investigation, which are likely to enhance our understanding of promoter biology in the future. High throughput sequencing technologies have made a huge impact on genomics and functional genomics and are expected to dominate in many areas of biomedical research for years to come. A previously unanticipated complexity of gene regulation has been uncovered by multinational genomics, such as ENCODE and FANTOM. However, these pioneering efforts have only reached as far as describing the components of regulatory networks, out of which many networks remain at the early discovery phase and are far from being comprehensive. The predominant majority of the landmark papers, such as those published by the ENCODE consortia, achieve the daunting task of cataloguing units of DNA regulatory elements, RNAs and proteins, which mostly represent snapshots in isolated cell, cell culture or homogenised embryos ([148] and refs therein). These GW, but rather descriptive, datasets have already provoked a truly dramatic shift in the narrative about basic genetic units and raise provocative questions about definitions such as, what is a gene? What is a functional genomic element? And what is the difference between an enhancer and a promoter? The countless new questions arising with these omics discoveries urgently demand the development of efficient and more functional assays for digging deeper into mechanisms. It is expected that transcription regulation analysis on the large scale will have to utilise the conventional more functional, yet indispensable assays of genetics and or molecular biology. Forward and reverse genetic tools and other perturbation methods will have to be applied together with in vitro biochemical tests and omics in the discovery of causative relationships between various epigenetic mechanisms involved in transcription regulation underlying cellular and organismal phenotypes. There is an increasing pressure on computational biologists to analyse the exploding volume of data in a multi-dimensional manner and to bring about visualisation tools for a fast expanding but not necessarily computer-savvy user base [149–151]. Another key development required to make sense of the complexity emerging from omics data is dissecting the data components generated from many millions of heterogeneous cells. GW protocols based on low cell numbers (reviewed in [152]), such as nanoCAGE [153], and micro ChIP [154] methods, and single cell based analyses (e.g. [155]) are required i) to further investigate bivalency of histone modifications; ii) to explain parallel observation of multiple and divergent transcription at regulatory sites such as promoters and enhancers; iii) to make sense of complex patterns of histone modification marks at single sites of transcription activity and associated with a variety of non-coding RNAs; and iv) to address the parameters of temporal dynamics during the cell cycle or in development. For centuries, direct observation of events taking place in the nucleus has relied on light microscopy and its limited optical resolution. However, several new super-resolution imaging technologies have recently been developed that bypass this limit. These new technologies are either based on tailored illumination, nonlinear fluorophore responses, or the precise localisation of single molecules [156]. Overall,

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

8

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

NDR

a

TSS

-1

b

+1

c Pol II

?

TAFs

TFIID TSS -1

TSS +1

-1

+1

Pol II

Pol II TAFs

TAFs

TFIID

TFIID TSS

TSS -1

+1

-1

+1

Fig. 2. TFIID and PIC subunits, including Pol II, achieve a combinatorial readout of positive histone PTMs and nucleosome-depleted regions (NDRs) at, or close to transcription start sites (TSSs). a) NDRs are generated. For further details see Fig. 1 and text. In regulating transcription initiation by Pol II on different types of promoters bearing distinct sets of epigenetic modification two scenarios are conceivable: b) TFIID-containing PIC including Pol II first binds to the nucleosome-depleted promoters and the positive histone PTMs are deposited as a consequence of this binding to mark promoters that will become active. c) Positive histone PTMs are first deposited around the NDRs by transcriptional co-activator complexes and the TFIID-containing PIC including Pol II binds to this pre-marked region as a second step. In both cases (see panels b and c) a synergistic relation between TFIID binding and promoterassociated positive chromatin marks would be needed for efficient transcription of full-length transcripts.

these new approaches have created unprecedented new possibilities to investigate the structure and function of dynamic processes in eukaryotic cells, especially in the nucleus [157]. The new super-resolution imaging methods will allow the integration of the mostly ChIP-seq and larger cell population based GW mapping information with high resolution spatial information gained on the dynamical behaviour of chromatin modifications, movements of transcription factors and their transient interactions with chromatin. These novel single-cell imaging methods with close molecular resolutions are expected to revolutionise chromatin biology and transcriptional regulation in the near future. Acknowledgement We apologise to colleagues whose work could only be cited indirectly within reviews because of space limitations. We are grateful to M. Featherstone for critically reading the manuscript and for suggestions. The work of the authors was supported by grants from ANR (ChromACT) to LT and ZF-Health Framework 7 Integrating Project of the European Commission to FM. References [1] R.G. Roeder, The complexities of eukaryotic transcription initiation: regulation of preinitiation complex assembly, Trends Biochem. Sci. 16 (1991) 402–408. [2] R.G. Roeder, The role of general initiation factors in transcription by RNA polymerase II, Trends Biochem. Sci. 21 (1996) 327–335. [3] T. Juven-Gershon, J.Y. Hsu, J.W. Theisen, J.T. Kadonaga, The RNA polymerase II core promoter — the gateway to transcription, Curr. Opin. Cell Biol. 20 (2008) 253–259. [4] G. Orphanides, T. Lagrange, D. Reinberg, The general transcription factors of RNA polymerase II, Genes Dev. 10 (1996) 2657–2683. [5] L. Tora, A unified nomenclature for TATA box binding protein (TBP)-associated factors (TAFs) involved in RNA polymerase II transcription, Genes Dev. 16 (2002) 673–675.

[6] M.C. Thomas, C.M. Chiang, The general transcription machinery and general cofactors, Crit. Rev. Biochem. Mol. Biol. 41 (2006) 105–178. [7] J.C. Dantonel, S. Quintin, L. Lakatos, M. Labouesse, L. Tora, TBP-like factor is required for embryonic RNA polymerase II transcription in C. elegans, Mol. Cell 6 (2000) 715–722. [8] F. Muller, A. Zaucker, L. Tora, Developmental regulation of transcription initiation: more than just changing the actors, Curr. Opin. Genet. Dev. 20 (2010) 533–540. [9] J.A. Goodrich, R. Tjian, Unexpected roles for core promoter recognition factors in cell-type-specific transcription and gene regulation, Nat. Rev. Genet. 11 (2010) 549–558. [10] F. Muller, M.A. Demeny, L. Tora, New problems in RNA polymerase II transcription initiation: matching the diversity of core promoters with a variety of promoter recognition factors, J. Biol. Chem. 282 (2007) 14685–14689. [11] S.T. Smale, J.T. Kadonaga, The RNA polymerase II core promoter, Annu. Rev. Biochem. 72 (2003) 449–479. [12] T. Juven-Gershon, J.T. Kadonaga, Regulation of gene expression via the core promoter and the basal transcriptional machinery, Dev. Biol. 339 (2010) 225–229. [13] U. Ohler, D.A. Wassarman, Promoting developmental transcription, Development 137 (2010) 15–26. [14] B. Lenhard, A. Sandelin, P. Carninci, Metazoan promoters: emerging characteristics and insights into transcriptional regulation, Nat. Rev. Genet. 13 (2010) 233–245. [15] P. Batut, A. Dobin, C. Plessy, P. Carninci, T.R. Gingeras, High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression, Genome Res. 23 (2013) 169–180. [16] R.A. Hoskins, J.M. Landolin, J.B. Brown, J.E. Sandler, H. Takahashi, T. Lassmann, C. Yu, B.W. Booth, D. Zhang, K.H. Wan, L. Yang, N. Boley, J. Andrews, T.C. Kaufman, B.R. Graveley, P.J. Bickel, P. Carninci, J.W. Carlson, S.E. Celniker, Genome-wide analysis of promoter architecture in Drosophila melanogaster, Genome Res. 21 (2011) 182–192. [17] W.S. Kruesi, L.J. Core, C.T. Waters, J.T. Lis, B.J. Meyer, Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation, eLife 2 (2013) e00808. [18] T.L. Saito, S. Hashimoto, S.G. Gu, J.J. Morton, M. Stadler, T. Blumenthal, A. Fire, S. Morishita, The transcription start site landscape of C. elegans, Genome Res. 23 (2013) 1348–1361. [19] W. Gu, H.C. Lee, D. Chaves, E.M. Youngman, G.J. Pazour, D. Conte Jr., C.C. Mello, CapSeq and CIP-TAP identify Pol II start sites and reveal capped small RNAs as C. elegans piRNA precursors, Cell 151 (2012) 1488–1500. [20] R.A. Chen, T.A. Down, P. Stempor, Q.B. Chen, T.A. Egelhofer, L.W. Hillier, T.E. Jeffers, J. Ahringer, The landscape of RNA polymerase II transcription initiation in C. elegans reveals promoter and enhancer architectures, Genome Res. 23 (2013) 1339–1347.

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx [21] P. Carninci, A. Sandelin, B. Lenhard, S. Katayama, K. Shimokawa, J. Ponjavic, C.A. Semple, M.S. Taylor, P.G. Engstrom, M.C. Frith, A.R. Forrest, W.B. Alkema, S.L. Tan, C. Plessy, R. Kodzius, T. Ravasi, T. Kasukawa, S. Fukuda, M. Kanamori-Katayama, Y. Kitazume, H. Kawaji, C. Kai, M. Nakamura, H. Konno, K. Nakano, S. Mottagui-Tabar, P. Arner, A. Chesi, S. Gustincich, F. Persichetti, H. Suzuki, S.M. Grimmond, C.A. Wells, V. Orlando, C. Wahlestedt, E.T. Liu, M. Harbers, J. Kawai, V.B. Bajic, D.A. Hume, Y. Hayashizaki, Genome-wide analysis of mammalian promoter architecture and evolution, Nat. Genet. 38 (2006) 626–635. [22] A. Sandelin, P. Carninci, B. Lenhard, J. Ponjavic, Y. Hayashizaki, D.A. Hume, Mammalian RNA polymerase II core promoters: insights from genome-wide studies, Nat. Rev. Genet. 8 (2007) 424–436. [23] B.J. Venters, B.F. Pugh, Genomic organization of human transcription initiation complexes, Nature 502 (2013) 53–58. [24] T. Lagrange, A.N. Kapanidis, H. Tang, D. Reinberg, R.H. Ebright, New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB, Genes Dev. 12 (1998) 34–44. [25] G.A. Patikoglou, J.L. Kim, L. Sun, S.H. Yang, T. Kodadek, S.K. Burley, TATA element recognition by the TATA box-binding protein has been conserved throughout evolution, Genes Dev. 13 (1999) 3217–3230. [26] W. Deng, S.G. Roberts, A core promoter element downstream of the TATA box that is recognized by TFIIB, Genes Dev. 19 (2005) 2418–2423. [27] S.T. Smale, D. Baltimore, The “initiator” as a transcription control element, Cell 57 (1989) 103–113. [28] W. Deng, B. Malecova, T. Oelgeschlager, S.G. Roberts, TFIIB recognition elements control the TFIIA–NC2 axis in transcriptional regulation, Mol. Cell. Biol. 29 (2009) 1389–1400. [29] U. Ohler, Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction, Nucleic Acids Res. 34 (2006) 5943–5950. [30] R.P. Lifton, M.L. Goldberg, R.W. Karp, D.S. Hogness, The organization of the histone genes in Drosophila melanogaster: functional and evolutionary implications, Cold Spring Harb. Symp. Quant. Biol. 42 (Pt 2) (1978) 1047–1051. [31] J. Corden, B. Wasylyk, A. Buchwalder, P. Sassone-Corsi, C. Kedinger, P. Chambon, Promoter sequences of eukaryotic protein-coding genes, Science 209 (1980) 1406–1414. [32] R. Breathnach, P. Chambon, Organization and expression of eucaryotic split genes coding for proteins, Annu. Rev. Biochem. 50 (1981) 349–383. [33] T.W. Burke, J.T. Kadonaga, Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters, Gene Dev 10 (1996) 711–724. [34] C.Y. Lim, B. Santoso, T. Boulay, E. Dong, U. Ohler, J.T. Kadonaga, The MTE, a new core promoter element for transcription by RNA polymerase II, Genes Dev. 18 (2004) 1606–1617. [35] U. Ohler, G.C. Liao, H. Niemann, G.M. Rubin, Computational analysis of core promoters in the Drosophila genome, Genome Biol. 3 (2002)(RESEARCH0087-0087). [36] A. Hochheimer, S. Zhou, S. Zheng, M.C. Holmes, R. Tjian, TRF2 associates with DREF and directs promoter-selective gene expression in Drosophila, Nature 420 (2002) 439–445. [37] D.H. Lee, N. Gershenzon, M. Gupta, I.P. Ioshikhes, D. Reinberg, B.A. Lewis, Functional characterization of core promoter elements: the downstream core element is recognized by TAF1, Mol. Cell. Biol. 25 (2005) 9674–9686. [38] Y. Tokusumi, Y. Ma, X. Song, R.H. Jacobson, S. Takada, The new core promoter element XCPE1 (X Core Promoter Element 1) directs activator-, mediator-, and TATA-binding protein-dependent but TFIID-independent RNA polymerase II transcription from TATA-less promoters, Mol. Cell. Biol. 27 (2007) 1844–1858. [39] R.J. Katzenberger, E.A. Rach, A.K. Anderson, U. Ohler, D.A. Wassarman, The Drosophila Translational Control Element (TCE) is required for high-level transcription of many genes that are specifically expressed in testes, PLoS One 7 (2012) e45009. [40] D.A. Hendrix, J.W. Hong, J. Zeitlinger, D.S. Rokhsar, M.S. Levine, Promoter elements associated with RNA Pol II stalling in the Drosophila embryo, Proc. Natl. Acad. Sci. U. S. A. 105 (2008) 7762–7767. [41] P. Bucher, Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences, J. Mol. Biol. 212 (1990) 563–578. [42] N. Hariharan, R.P. Perry, Functional dissection of a mouse ribosomal protein promoter: significance of the polypyrimidine initiator and an element in the TATA-box region, Proc. Natl. Acad. Sci. U. S. A. 87 (1990) 1526–1530. [43] C.M. Smith, J.A. Steitz, Classification of gas5 as a multi-small-nucleolar-RNA (snoRNA) host gene and a member of the 5′-terminal oligopyrimidine gene family reveals common features of snoRNA host genes, Mol. Cell. Biol. 18 (1998) 6897–6909. [44] R.P. Perry, The architecture of mammalian ribosomal protein promoters, BMC Evol. Biol. 5 (2005) 15. [45] S. Roepcke, D. Zhi, M. Vingron, P.F. Arndt, Identification of highly specific localized sequence motifs in human ribosomal protein gene promoters, Gene 365 (2006) 48–56. [46] T.J. Parry, J.W. Theisen, J.Y. Hsu, Y.L. Wang, D.L. Corcoran, M. Eustice, U. Ohler, J.T. Kadonaga, The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery, Genes Dev. 24 (2010) 2013–2018. [47] E.A. Rach, H.Y. Yuan, W.H. Majoros, P. Tomancak, U. Ohler, Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome, Genome Biol. 10 (2009) R73. [48] P.C. FitzGerald, D. Sturgill, A. Shyakhtenko, B. Oliver, C. Vinson, Comparative genomics of Drosophila and human core promoters, Genome Biol. 7 (2006) R53. [49] T.A. Down, C.M. Bergman, J. Su, T.J. Hubbard, Large-scale discovery of promoter motifs in Drosophila melanogaster, PLoS Comput. Biol. 3 (2007) e7.

9

[50] P.G. Engstrom, S.J. Ho Sui, O. Drivenes, T.S. Becker, B. Lenhard, Genomic regulatory blocks underlie extensive microsynteny conservation in insects, Genome Res. 17 (2007) 1898–1908. [51] S. Majumder, M.L. DePamphilis, TATA-dependent enhancer stimulation of promoter activity in mice is developmentally acquired, Mol. Cell. Biol. 14 (1994) 4258–4268. [52] L. Xiao, M. Kim, J. DeJong, Developmental and cell type-specific regulation of core promoter transcription factors in germ cells of frogs and mice, Gene Expr. Patterns 6 (2006) 409–419. [53] F. Muller, L. Tora, TBP2 is a general transcription factor specialized for female germ cells, J. Biol. 8 (2009) 97. [54] S.T. Smale, Transcription initiation from TATA-less promoters within eukaryotic protein-coding genes, Biochim. Biophys. Acta 1351 (1997) 73–88. [55] M. Seizl, H. Hartmann, F. Hoeg, F. Kurth, D.E. Martin, J. Soding, P. Cramer, A conserved GA element in TATA-less RNA polymerase II promoters, PLoS One 6 (2011) e27595. [56] K. Sha, L.A. Boyer, The Chromatin Signature of Pluripotent Cells, 2008. [57] N.S. Christophersen, K. Helin, Epigenetic control of embryonic stem cell fate, J. Exp. Med. 207 (2010) 2287–2295. [58] C. Jiang, B.F. Pugh, Nucleosome positioning and gene regulation: advances through genomics, Nat. Rev. Genet. 10 (2009) 161–172. [59] K. Struhl, E. Segal, Determinants of nucleosome positioning, Nat. Struct. Mol. Biol. 20 (2013) 267–273. [60] C. Jin, C. Zang, G. Wei, K. Cui, W. Peng, K. Zhao, G. Felsenfeld, H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions, Nat. Genet. 41 (2009) 941–945. [61] A.C. Seila, L.J. Core, J.T. Lis, P.A. Sharp, Divergent transcription: a new feature of active promoters, Cell Cycle 8 (2009) 2557–2564. [62] D.A. Gilchrist, K. Adelman, Coupling polymerase pausing and chromatin landscapes for precise regulation of transcription, Biochim. Biophys. Acta 1819 (2012) 700–706. [63] N. Kaplan, I. Moore, Y. Fondufe-Mittendorf, A.J. Gossett, D. Tillo, Y. Field, T.R. Hughes, J.D. Lieb, J. Widom, E. Segal, Nucleosome sequence preferences influence in vivo nucleosome organization, Nat. Struct. Mol. Biol. 17 (2010) 918–920(author reply 920–912). [64] N. Kaplan, I.K. Moore, Y. Fondufe-Mittendorf, A.J. Gossett, D. Tillo, Y. Field, E.M. LeProust, T.R. Hughes, J.D. Lieb, J. Widom, E. Segal, The DNA-encoded nucleosome organization of a eukaryotic genome, Nature 458 (2009) 362–366. [65] Y. Zhang, Z. Moqtaderi, B.P. Rattner, G. Euskirchen, M. Snyder, J.T. Kadonaga, X.S. Liu, K. Struhl, Intrinsic histone–DNA interactions are not the major determinant of nucleosome positions in vivo, Nat. Struct. Mol. Biol. 16 (2009) 847–852. [66] R. Wu, H. Li, Positioned and G/C-capped poly(dA:dT) tracts associate with the centers of nucleosome-free regions in yeast promoters, Genome Res. 20 (2010) 473–484. [67] A. Valouev, J. Ichikawa, T. Tonthat, J. Stuart, S. Ranade, H. Peckham, K. Zeng, J.A. Malek, G. Costa, K. McKernan, A. Sidow, A. Fire, S.M. Johnson, A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning, Genome Res. 18 (2008) 1051–1063. [68] O.J. Rando, F. Winston, Chromatin and transcription in yeast, Genetics 190 (2012) 351–387. [69] Y. Xi, J. Yao, R. Chen, W. Li, X. He, Nucleosome fragility reveals novel functional states of chromatin and poises genes for activation, Genome Res. 21 (2011) 718–724. [70] V.B. Bajic, S.L. Tan, A. Christoffels, C. Schonbach, L. Lipovich, L. Yang, O. Hofmann, A. Kruger, W. Hide, C. Kai, J. Kawai, D.A. Hume, P. Carninci, Y. Hayashizaki, Mice and men: their promoter properties, PLoS Genet. 2 (2006) e54. [71] A. Valouev, S.M. Johnson, S.D. Boyd, C.L. Smith, A.Z. Fire, A. Sidow, Determinants of nucleosome organization in primary human cells, Nature 474 (2011) 516–520. [72] T.N. Mavrich, C. Jiang, I.P. Ioshikhes, X. Li, B.J. Venters, S.J. Zanton, L.P. Tomsho, J. Qi, R.L. Glaser, S.C. Schuster, D.S. Gilmour, I. Albert, B.F. Pugh, Nucleosome organization in the Drosophila genome, Nature 453 (2008) 358–362. [73] I. Albert, T.N. Mavrich, L.P. Tomsho, J. Qi, S.J. Zanton, S.C. Schuster, B.F. Pugh, Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome, Nature 446 (2007) 572–576. [74] R.M. Raisner, P.D. Hartley, M.D. Meneghini, M.Z. Bao, C.L. Liu, S.L. Schreiber, O.J. Rando, H.D. Madhani, Histone variant H2A.Z marks the 5′ ends of both active and inactive genes in euchromatin, Cell 123 (2005) 233–248. [75] H. Zhang, D.N. Roberts, B.R. Cairns, Genome-wide dynamics of Htz1, a histone H2A variant that poises repressed/basal promoters for activation through histone loss, Cell 123 (2005) 219–231. [76] P.D. Hartley, H.D. Madhani, Mechanisms that specify promoter nucleosome location and identity, Cell 137 (2009) 445–458. [77] M. Floer, X. Wang, V. Prabhu, G. Berrozpe, S. Narayan, D. Spagna, D. Alvarez, J. Kendall, A. Krasnitz, A. Stepansky, J. Hicks, G.O. Bryant, M. Ptashne, A RSC/nucleosome complex determines chromatin architecture and facilitates activator binding, Cell 141 (2010) 407–418. [78] J.L. Workman, R.E. Kingston, Alteration of nucleosome structure as a mechanism of transcriptional regulation, Annu. Rev. Biochem. 67 (1998) 545–579. [79] K. Anamika, A. Gyenis, L. Poidevin, O. Poch, L. Tora, RNA polymerase II pausing downstream of core histone genes is different from genes producing polyadenylated transcripts, PLoS ONE 7 (2012). [80] H. Kwak, N.J. Fuda, L.J. Core, J.T. Lis, Precise maps of RNA polymerase reveal how promoters direct initiation and pausing, Science 339 (2013) 950–953. [81] A. Bird, DNA methylation patterns and epigenetic memory, Genes Dev. 16 (2002) 6–21. [82] W. Reik, Stability and flexibility of epigenetic gene regulation in mammalian development, Nature 447 (2007) 425–432.

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

10

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

[83] A.P. Bird, M.H. Taggart, R.D. Nicholls, D.R. Higgs, Non-methylated CpG-rich islands at the human alpha-globin locus: implications for evolution of the alpha-globin pseudogene, EMBO J. 6 (1987) 999–1004. [84] A.M. Deaton, A. Bird, CpG islands and the regulation of transcription, Genes Dev. 25 (2011) 1010–1022. [85] A.K. Maunakea, R.P. Nagarajan, M. Bilenky, T.J. Ballinger, C. D'Souza, S.D. Fouse, B.E. Johnson, C. Hong, C. Nielsen, Y. Zhao, G. Turecki, A. Delaney, R. Varhol, N. Thiessen, K. Shchors, V.M. Heine, D.H. Rowitch, X. Xing, C. Fiore, M. Schillebeeckx, S.J. Jones, D. Haussler, M.A. Marra, M. Hirst, T. Wang, J.F. Costello, Conserved role of intragenic DNA methylation in regulating alternative promoters, Nature 466 (2010) 253–257. [86] R.S. Illingworth, U. Gruenewald-Schneider, S. Webb, A.R. Kerr, K.D. James, D.J. Turner, C. Smith, D.J. Harrison, R. Andrews, A.P. Bird, Orphan CpG islands identify numerous conserved promoters in the mammalian genome, PLoS Genet. 6 (2010) e1001134. [87] H. Zhang, J.K. Zhu, Active DNA demethylation in plants and animals, Cold Spring Harb. Symp. Quant. Biol. 77 (2012) 161–173. [88] H.K. Long, D. Sims, A. Heger, N.P. Blackledge, C. Kutter, M.L. Wright, F. Grutzner, D.T. Odom, R. Patient, C.P. Ponting, R.J. Klose, Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates, eLife 2 (2013) e00348. [89] M. Weber, I. Hellmann, M.B. Stadler, L. Ramos, S. Paabo, M. Rebhan, D. Schubeler, Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome, Nat. Genet. 39 (2007) 457–466. [90] P.A. Ginno, P.L. Lott, H.C. Christensen, I. Korf, F. Chedin, R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters, Mol. Cell 45 (2012) 814–825. [91] L. Ponger, L. Duret, D. Mouchiroud, Determinants of CpG islands: expression in early embryo and isochore structure, Genome Res. 11 (2001) 1854–1860. [92] H.K. Long, N.P. Blackledge, R.J. Klose, ZF-CxxC domain-containing proteins, CpG islands and the chromatin connection, Biochem. Soc. Trans. 41 (2013) 727–740. [93] J.P. Thomson, P.J. Skene, J. Selfridge, T. Clouaire, J. Guy, S. Webb, A.R. Kerr, A. Deaton, R. Andrews, K.D. James, D.J. Turner, R. Illingworth, A. Bird, CpG islands influence chromatin structure via the CpG-binding protein Cfp1, Nature 464 (2010) 1082–1086. [94] M.G. Guenther, S.S. Levine, L.A. Boyer, R. Jaenisch, R.A. Young, A chromatin landmark and transcription initiation at most promoters in human cells, Cell 130 (2007) 77–88. [95] T.S. Mikkelsen, M. Ku, D.B. Jaffe, B. Issac, E. Lieberman, G. Giannoukos, P. Alvarez, W. Brockman, T.K. Kim, R.P. Koche, W. Lee, E. Mendenhall, A. O'Donovan, A. Presser, C. Russ, X. Xie, A. Meissner, M. Wernig, R. Jaenisch, C. Nusbaum, E.S. Lander, B.E. Bernstein, Genome-wide maps of chromatin state in pluripotent and lineage-committed cells, Nature 448 (2007) 553–560. [96] J.K. Choi, Contrasting chromatin organization of CpG islands and exons in the human genome, Genome Biol. 11 (2010) R70. [97] K.S. Zaret, J.S. Carroll, Pioneer transcription factors: establishing competence for gene expression, Genes Dev. 25 (2011) 2227–2241. [98] J.M. Landolin, D.S. Johnson, N.D. Trinklein, S.F. Aldred, C. Medina, H. Shulha, Z. Weng, R.M. Myers, Sequence features that drive human promoter function and tissue specificity, Genome Res. 20 (2010) 890–898. [99] M.E. Potok, D.A. Nix, T.J. Parnell, B.R. Cairns, Reprogramming the maternal zebrafish genome after fertilization to match the paternal methylation pattern, Cell 153 (2013) 759–772. [100] F. Lienert, C. Wirbelauer, I. Som, A. Dean, F. Mohn, D. Schubeler, Identification of genetic elements that autonomously determine DNA methylation states, Nat. Genet. 43 (2011) 1091–1097. [101] S.L. Berger, The complex language of chromatin regulation during transcription, Nature 447 (2007) 407–412. [102] A. Bird, The essentials of DNA methylation, Cell 70 (1992) 5–8. [103] J. Newell-Price, A.J. Clark, P. King, DNA methylation and silencing of gene expression, Trends Endocrinol. Metab. 11 (2000) 142–148. [104] D.A. Orlando, M.G. Guenther, G.M. Frampton, R.A. Young, CpG island structure and Trithorax/Polycomb chromatin domains in human cells, Genomics 100 (2012) 320–326. [105] R. Sawarkar, R. Paro, Interpretation of developmental signaling at chromatin: the Polycomb perspective, Dev. Cell 19 (2010) 651–661. [106] R.C. Akkers, S.J. van Heeringen, U.G. Jacobi, E.M. Janssen-Megens, K.J. Francoijs, H.G. Stunnenberg, G.J. Veenstra, A hierarchy of H3K4me3 and H3K27me3 acquisition in spatial gene regulation in Xenopus embryos, Dev. Cell 17 (2009) 425–434. [107] B.E. Bernstein, T.S. Mikkelsen, X. Xie, M. Kamal, D.J. Huebert, J. Cuff, B. Fry, A. Meissner, M. Wernig, K. Plath, R. Jaenisch, A. Wagschal, R. Feil, S.L. Schreiber, E.S. Lander, A bivalent chromatin structure marks key developmental genes in embryonic stem cells, Cell 125 (2006) 315–326. [108] V. Azuara, P. Perry, S. Sauer, M. Spivakov, H.F. Jorgensen, R.M. John, M. Gouti, M. Casanova, G. Warnes, M. Merkenschlager, A.G. Fisher, Chromatin signatures of pluripotent cell lines, Nat. Cell Biol. 8 (2006) 532–538. [109] N.L. Vastenhouw, Y. Zhang, I.G. Woods, F. Imam, A. Regev, X.S. Liu, J. Rinn, A.F. Schier, Chromatin signature of embryonic pluripotency is established during genome activation, Nature 464 (2010) 922–926. [110] N.L. Vastenhouw, A.F. Schier, Bivalent histone modifications in early embryogenesis, Curr. Opin. Cell Biol. 24 (2012) 374–386. [111] P. Voigt, G. LeRoy, W.J. Drury III, B.M. Zee, J. Son, D.B. Beck, N.L. Young, B.A. Garcia, D. Reinberg, Asymmetrically modified nucleosomes, Cell 151 (2012) 181–193. [112] M.M. Steward, J.S. Lee, A. O'Donovan, M. Wyatt, B.E. Bernstein, A. Shilatifard, Molecular regulation of H3K4 trimethylation by ASH2L, a shared subunit of MLL complexes, Nat. Struct. Mol. Biol. 13 (2006) 852–854.

[113] Y.S. Ang, S.Y. Tsai, D.F. Lee, J. Monk, J. Su, K. Ratnakumar, J. Ding, Y. Ge, H. Darr, B. Chang, J. Wang, M. Rendl, E. Bernstein, C. Schaniel, I.R. Lemischka, Wdr5 mediates self-renewal and reprogramming via the embryonic stem cell core transcriptional network, Cell 145 (2011) 183–197. [114] C.J. Woo, P.V. Kharchenko, L. Daheron, P.J. Park, R.E. Kingston, A region of the human HOXD cluster that confers Polycomb-group responsiveness, Cell 140 (2010) 99–110. [115] L. Ringrose, R. Paro, Polycomb/Trithorax response elements and epigenetic memory of cell identity, Development 134 (2007) 223–232. [116] A. Sing, D. Pannell, A. Karaiskakis, K. Sturgeon, M. Djabali, J. Ellis, H.D. Lipshitz, S.P. Cordes, A vertebrate Polycomb response element governs segmentation of the posterior hindbrain, Cell 138 (2009) 885–897. [117] M.C. Tsai, O. Manor, Y. Wan, N. Mosammaparast, J.K. Wang, F. Lan, Y. Shi, E. Segal, H.Y. Chang, Long noncoding RNA as modular scaffold of histone modification complexes, Science 329 (2010) 689–693. [118] C. Lanzuolo, V. Orlando, Memories from the Polycomb group proteins, Annu. Rev. Genet. 46 (2012) 561–589. [119] A. Pauli, J.L. Rinn, A.F. Schier, Non-coding RNAs as regulators of embryogenesis, Nat. Rev. Genet. 12 (2011) 136–149. [120] D.A. Kane, C.B. Kimmel, The zebrafish midblastula transition, Development 119 (1993) 447–456. [121] J. Newport, M. Kirschner, A major developmental transition in early Xenopus embryos: I. Characterization and timing of cellular changes at the midblastula stage, Cell 30 (1982) 675–686. [122] J. Newport, M. Kirschner, A major developmental transition in early Xenopus embryos: II. Control of the onset of transcription, Cell 30 (1982) 687–696. [123] L. Jiang, J. Zhang, J.J. Wang, L. Wang, L. Zhang, G. Li, X. Yang, X. Ma, X. Sun, J. Cai, X. Huang, M. Yu, X. Wang, F. Liu, C.I. Wu, C. He, B. Zhang, W. Ci, J. Liu, Sperm, but not oocyte, DNA methylome is inherited by zebrafish early embryos, Cell 153 (2013) 773–784. [124] L.C. Lindeman, I.S. Andersen, A.H. Reiner, N. Li, H. Aanes, O. Ostrup, C. Winata, S. Mathavan, F. Muller, P. Alestrom, P. Collas, Prepatterning of developmental gene expression by modified histones before zygotic genome activation, Dev. Cell 21 (2011) 993–1004. [125] S. Henikoff, A. Shilatifard, Histone modification: cause or cog? Trends Genet. 27 (2011) 389–396. [126] B.M. Turner, The adjustable nucleosome: an epigenetic signaling module, Trends Genet. 28 (2012) 436–444. [127] A.R. Pengelly, O. Copur, H. Jackle, A. Herzig, J. Muller, A histone mutant reproduces the phenotype caused by loss of histone-modifying factor Polycomb, Science 339 (2013) 698–699. [128] R.K. Ng, J.B. Gurdon, Epigenetic memory of an active gene state depends on histone H3.3 incorporation into chromatin in the absence of transcription, Nat. Cell Biol. 10 (2008) 102–109. [129] A. Santenard, C. Ziegler-Birling, M. Koch, L. Tora, A.J. Bannister, M.E. Torres-Padilla, Heterochromatin formation in the mouse embryo requires critical residues of the histone variant H3.3, Nat. Cell Biol. 12 (2010) 853–862. [130] S.F. Wu, H. Zhang, B.R. Cairns, Genes for embryo development are packaged in blocks of multivalent chromatin in zebrafish sperm, Genome Res. 21 (2011) 578–589. [131] S. Erkek, M. Hisano, C.Y. Liang, M. Gill, R. Murr, J. Dieker, D. Schubeler, J.V. Vlag, M.B. Stadler, A.H. Peters, Molecular determinants of nucleosome retention at CpG-rich sequences in mouse spermatozoa, Nat. Struct. Mol. Biol. 20 (2013) 868–875. [132] U. Brykczynska, M. Hisano, S. Erkek, L. Ramos, E.J. Oakeley, T.C. Roloff, C. Beisel, D. Schubeler, M.B. Stadler, A.H. Peters, Repressive and active histone methylation mark distinct promoters in human and mouse spermatozoa, Nat. Struct. Mol. Biol. 17 (2010) 679–687. [133] S.S. Hammoud, D.A. Nix, H. Zhang, J. Purwar, D.T. Carrell, B.R. Cairns, Distinctive chromatin in human sperm packages genes for embryo development, Nature 460 (2009) 473–478. [134] D. Moazed, Mechanisms for the inheritance of chromatin states, Cell 146 (2011) 510–518. [135] E. Danchin, A. Charmantier, F.A. Champagne, A. Mesoudi, B. Pujol, S. Blanchet, Beyond DNA: integrating inclusive inheritance into an extended theory of evolution, Nat. Rev. Genet. 12 (2011) 475–486. [136] K.H. Hansen, A.P. Bracken, D. Pasini, N. Dietrich, S.S. Gehani, A. Monrad, J. Rappsilber, M. Lerdrup, K. Helin, A model for transmission of the H3K27me3 epigenetic mark, Nat. Cell Biol. 10 (2008) 1291–1300. [137] S. Petruk, Y. Sedkov, D.M. Johnston, J.W. Hodgson, K.L. Black, S.K. Kovermann, S. Beck, E. Canaani, H.W. Brock, A. Mazo, TrxG and PcG proteins but not methylated histones remain associated with DNA through replication, Cell 150 (2012) 922–933. [138] L. Daxinger, E. Whitelaw, Understanding transgenerational epigenetic inheritance via the gametes in mammals, Nat. Rev. Genet. 13 (2012) 153–162. [139] E.L. Greer, T.J. Maures, D. Ucar, A.G. Hauswirth, E. Mancini, J.P. Lim, B.A. Benayoun, Y. Shi, A. Brunet, Transgenerational epigenetic inheritance of longevity in Caenorhabditis elegans, Nature 479 (2011) 365–371. [140] S.A. Blythe, S.W. Cha, E. Tadjuidje, J. Heasman, P.S. Klein, beta-Catenin primes organizer gene expression by recruiting a histone H3 arginine 8 methyltransferase, Prmt2, Dev. Cell 19 (2010) 220–231. [141] K. Adelman, J.T. Lis, Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans, Nat. Rev. Genet. 13 (2012) 720–731. [142] M. Vermeulen, K.W. Mulder, S. Denissov, W.W. Pijnappel, F.M. van Schaik, R.A. Varier, M.P. Baltissen, H.G. Stunnenberg, M. Mann, H.T. Timmers, Selective

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003

F. Müller, L. Tora / Biochimica et Biophysica Acta xxx (2013) xxx–xxx

[143]

[144] [145]

[146]

[147]

[148] [149]

anchoring of TFIID to nucleosomes by trimethylation of histone H3 lysine 4, Cell 131 (2007) 58–69. S.M. Lauberth, T. Nakayama, X. Wu, A.L. Ferris, Z. Tang, S.H. Hughes, R.G. Roeder, H3K4me3 interactions with TAF3 regulate preinitiation complex assembly and selective gene activation, Cell 152 (2013) 1021–1036. M. Vermeulen, H.T. Timmers, Grasping trimethylation of histone H3 at lysine 4, Epigenomics 2 (2010) 395–406. R.H. Jacobson, A.G. Ladurner, D.S. King, R. Tjian, Structure and function of a human TAFII250 double bromodomain module [see comments], Science 288 (2000) 1422–1425. J. Moriniere, S. Rousseaux, U. Steuerwald, M. Soler-Lopez, S. Curtet, A.L. Vitte, J. Govin, J. Gaucher, K. Sadoul, D.J. Hart, J. Krijgsveld, S. Khochbin, C.W. Muller, C. Petosa, Cooperative binding of two acetylation marks on a histone tail by a single bromodomain, Nature 461 (2009) 664–668. J. Gaucher, F. Boussouar, E. Montellier, S. Curtet, T. Buchou, S. Bertrand, P. Hery, S. Jounier, A. Depaux, A.L. Vitte, P. Guardiola, K. Pernet, A. Debernardi, F. Lopez, H. Holota, J. Imbert, D.J. Wolgemuth, M. Gerard, S. Rousseaux, S. Khochbin, Bromodomain-dependent stage-specific male genome programming by Brdt, EMBO J. 31 (2012) 3809–3820. J.R. Ecker, W.A. Bickmore, I. Barroso, J.K. Pritchard, Y. Gilad, E. Segal, Genomics: ENCODE explained, Nature 489 (2012) 52–55. R.E. Munro, Y. Guo, Solutions for complex, multi data type and multi tool analysis: principles and applications of using workflow and pipelining methods, Methods Mol. Biol. 563 (2009) 259–271.

11

[150] S. Pettifer, D. Thorne, P. McDermott, J. Marsh, A. Villeger, D.B. Kell, T.K. Attwood, Visualising biological data: a semantic approach to tool and database integration, BMC Bioinforma. 10 (Suppl. 6) (2009) S19. [151] M. Schulz, F. Krause, N. Le Novere, E. Klipp, W. Liebermeister, Retrieval, alignment, and clustering of computational models based on semantic annotations, Mol. Syst. Biol. 7 (2011) 512. [152] D. Wang, S. Bodovitz, Single cell analysis: the new frontier in ‘omics’, Trends Biotechnol. 28 (2010) 281–290. [153] C. Plessy, N. Bertin, H. Takahashi, R. Simone, M. Salimullah, T. Lassmann, M. Vitezic, J. Severin, S. Olivarius, D. Lazarevic, N. Hornig, V. Orlando, I. Bell, H. Gao, J. Dumais, P. Kapranov, H. Wang, C.A. Davis, T.R. Gingeras, J. Kawai, C.O. Daub, Y. Hayashizaki, S. Gustincich, P. Carninci, Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan, Nat. Methods 7 (2010) 528–534. [154] P. Shankaranarayanan, M.A. Mendoza-Parra, W. van Gool, L.M. Trindade, H. Gronemeyer, Single-tube linear DNA amplification for genome-wide studies using a few thousand cells, Nat. Protoc. 7 (2012) 328–338. [155] F. Tang, C. Barbacioru, Y. Wang, E. Nordman, C. Lee, N. Xu, X. Wang, J. Bodeau, B.B. Tuch, A. Siddiqui, K. Lao, M.A. Surani, mRNA-Seq whole-transcriptome analysis of a single cell, Nat. Methods 6 (2009) 377–382. [156] L. Schermelleh, R. Heintzmann, H. Leonhardt, A guide to super-resolution fluorescence microscopy, J. Cell Biol. 190 (2010) 165–175. [157] S. Herbert, H. Soares, C. Zimmer, R. Henriques, Single-molecule localization super-resolution microscopy: deeper and faster, Microsc. Microanal. 18 (2012) 1419–1429.

Please cite this article as: F. Müller, L. Tora, Chromatin and DNA sequences in defining promoters for transcription initiation, Biochim. Biophys. Acta (2013), http://dx.doi.org/10.1016/j.bbagrm.2013.11.003