PhD Thesis - Bejerano Lab, Stanford University

1 downloads 257 Views 46MB Size Report
B.1 Primers used for candidate cerebral wall enhancer transgenics experi- .... In this dissertation, I present the work I have done identifying enhancers, tracing ...... Myers, F. Pauli, B. A. Williams, J. Gertz, G. K. Marinov, T. E. Reddy, J. Viel-.
DEVELOPMENTAL ENHANCERS: ANCIENT ORIGINS, NEOFUNCTIONALIZATION, AND PLEIOTROPY

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF GENETICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Shoa Long Clarke May 2013

Abstract In this dissertational work, I present two studies of the evolution and function of developmental enhancers. I begin in Chapter 1 by discussing motivations for the study of cis-regulatory elements, adding historical context. I go on to summarize key concepts of enhancer function and structure, and I discuss two approaches for genome-wide enhancer prediction. I also summarize the evolutionary developmental biology (evo-devo) framework, which serves as an important backdrop for interpreting the results of the subsequent chapters. In Chapter 2, I describe the discovery of bilaterian conserved regulatory elements (Bicores), the first examples of cis-regulatory elements conserved between the two branches of the bilaterian tree. Bicores show conservation of sequence and gene synteny. Sequence conservation of Bicores reflects conserved patterns of transcription factor binding sites, and the conserved binding sites of Bicores indicate that these elements respond to signaling pathways. Enhancer assays in mouse and zebrafish show that Bicores are developmental enhancers that drive expression in the vertebrate central nervous system. Given that the origins of Bicores can be traced back to over 100 million years before the emergence of the vertebrate central nervous system, we can infer that these elements have neofunctionalized in present day species. In Chapter 3, I identify a set of over 6,000 candidate enhancers that are likely involved in the development of the neocortex. Target gene enrichment analysis as well as motif enrichment analysis support the set as functioning in the E14.5 cerebral wall, the structure from which the neocortex develops. Eight of ten candidates tested in a mouse transgenic enhancer assay drive activity in laminar patterns within the cerebra wall. Many genes associate with more than 10 candidate enhancers, suggesting a high v

level of cis-regulatory redundancy. Nearly a quarter of the candidate enhancers are anciently conserved beyond mammals, and we show that older elements are more likely to be pleiotropic. Pleiotropic elements most likely function in other aspects of central nervous system development, and elements that are conserved to zebrafish function in the developing zebrafish forebrain. Finally, overlap with mobile elements suggests that specific repeat families have played a major role in the generation of enhancers that function in neocortex development. Finally, in Chapter 4, I discuss how enhancer neofunctionalization, pleiotropy, and redundancy fit into a model of enhancer evolution and the evo-devo framework.

vi

Acknowledgments I dedicate this work to the memory of my teacher Seung Ook Choi. Mr. Choi brought discipline to my childhood and taught me to tirelessly pursue excellence and expertise. More importantly, he was the closet thing that I ever had to a father. I would like to begin by thanking my advisor Gill Bejerano for his years of support and mentorship. His insightful comments and difficult questions helped me to continuously grow as a scientist and as a person. I also thank the wonderful members of the Bejerano lab who kept me curious and excited. Surrounded by such brilliant scientists, I always felt inspired to ask one more question. In particular, I must recognize Aaron Wenger and Bruce Schaar who were close collaborators and significant contributors to the work presented here. Aaron is immensely generous with his expertise and his time. He developed and maintained many of the tools and much of the infrastructure that allowed me to continuously ask the questions that most interested me. Moreover, our frequent conversations helped me to formulate many of the ideas presented here. Bruce’s expertise in brain development and experimental biology was invaluable throughout my work. He was fundamental to generating all of the experimental results presented here. I am deeply indebted to my thesis committee, Minx Fuller, Mike Snyder, and Carlos Bustamante. I am humbled by their patience and their kindness. During the most challenging moments of my research, they offered their time, their expertise, and their genuine support. I would like to extend a special thank you to Carlos. As an undergraduate at Cornell University, I worked in his lab, and his influence inspired me to pursue Ph.D. training. He helped me to realize the privileged position scientists vii

hold and the responsibility we have to serve and to educate. I thank the Stanford MSTP and the HHMI Gilliam fellowship for financial support and career development. Furthermore, through these programs, I gained lifelong friends. I am particularly grateful to Lorie Langdon for her administrative support and for helping me to navigate through the entire M.D., Ph.D. process. Lastly, I would like to take a moment to thank my mother, Dianne Clarke. I have never known a better example of compassion, generosity, and humanity. Anything good that I accomplish is because of her, and any achievement that is mine is also hers. Chapter 2 is based on Clarke et. al 2012 [1]. Julie VanderMeer and Nadav Ahituv performed zebrafish experiments and offered important discussions. The Lastz search engine was originally conceived and implemented by Gill Bejerano and Craig Lowe, with further improvements made by Cory McLean, Cory Barr, and Aaron Wenger. Chris Lowe, David McClay, Greg Wray, Will Talbot, and Tom Glenn offered helpful discussions and aid in sea urchin and zebrafish analysis. I also thank Zfin and the Vista Enhancer Browser for providing invaluable public databases of experimental data. Chapter 3 is based on currently unpublished work. Aaron Wenger was an equal contributor to this work. Bruce Schaar and Geetu Tuteja helped to design and implement ChIP-seq experiments. Bruce Schaar and Tisha Chung performed cloning, transgenics, and brain sectioning. Harendra Guturu helped to design and implement the motif enrichment tool.

viii

Contents Abstract

v

Acknowledgments

vii

1 Introduction

1

1.1

Enhancers and gene regulation . . . . . . . . . . . . . . . . . . . . . .

4

1.2

Identifying enhancers . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.2.1

Conserved non-coding elements . . . . . . . . . . . . . . . . .

6

1.2.2

ChIP-seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

The evo-devo framework . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.3

2 Bilaterian conserved regulatory elements 2.1

13

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.1.1

Searching for bilaterian conserved regulatory elements . . . . .

14

2.1.2

Bicore1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.1.3

Bicore2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.2

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.3

Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.4

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.4.1

Coding versus non-coding aligning bases . . . . . . . . . . . .

27

2.4.2

Vertebrate CNEs . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.4.3

Screen for bilaterian conserved regulatory elements . . . . . .

29

2.4.4

Other computational analysis . . . . . . . . . . . . . . . . . .

30

2.4.5

Zebrafish transgenics . . . . . . . . . . . . . . . . . . . . . . .

30

ix

2.4.6

Mouse transgenics

. . . . . . . . . . . . . . . . . . . . . . . .

31

2.4.7

Sea urchin transgenics . . . . . . . . . . . . . . . . . . . . . .

31

3 Developmental enhancers of the neocortex 3.1

32

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.1.1

ChIP-seq identifies candidate cerebral wall enhancers . . . . .

34

3.1.2

Motif enrichments and functional associations . . . . . . . . .

35

3.1.3

Experimental validation of cerebral wall enhancers . . . . . . .

37

3.1.4

Evolutionary origins of cerebral wall enhancers . . . . . . . . .

39

3.2

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3.3

Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.4

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.4.1

p300 ChIP-seq

49

3.4.2

ChIP-seq peak calling

. . . . . . . . . . . . . . . . . . . . . .

49

3.4.3

Functional and expression enrichment analysis with GREAT .

49

3.4.4

Motif discovery and enrichment analysis . . . . . . . . . . . .

50

3.4.5

Mouse transient transgenic enhancer assay and sectioning . . .

50

3.4.6

Evolutionary conservation analysis . . . . . . . . . . . . . . .

51

3.4.7

Overlap with VISTA Enhancer Browser enhancers . . . . . . .

51

3.4.8

Overlap with zebrafish cneBrowser enhancers

. . . . . . . . .

52

3.4.9

Overlap with mobile elements . . . . . . . . . . . . . . . . . .

52

. . . . . . . . . . . . . . . . . . . . . . . . . .

4 Conclusion

53

4.1

Enhancer evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

4.2

Adding to the evo-devo framework . . . . . . . . . . . . . . . . . . .

56

A Supplementary material for Chapter 2

58

A.1 Supplemental Discussion . . . . . . . . . . . . . . . . . . . . . . . . .

58

A.1.1 Past studies . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

A.1.2 Sox21-CNR . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

A.2 Supplementary Figures . . . . . . . . . . . . . . . . . . . . . . . . . .

61

A.3 Supplementary Tables . . . . . . . . . . . . . . . . . . . . . . . . . .

66

x

A.4 Supplementary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . B Supplementary material for Chapter 3

71 75

B.1 Supplementary Figures . . . . . . . . . . . . . . . . . . . . . . . . . .

75

B.2 Supplementary Tables . . . . . . . . . . . . . . . . . . . . . . . . . .

77

Bibliography

78

xi

List of Tables A.1 GREAT enrichment analysis of vertebrate CNEs . . . . . . . . . . . .

66

A.2 Candidate vertCNEs . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

A.3 List of invertebrate species with genome assemblies . . . . . . . . . .

68

A.4 Bicore1 to Id distances . . . . . . . . . . . . . . . . . . . . . . . . . .

69

A.5 Bicore2 to Znf503 distances . . . . . . . . . . . . . . . . . . . . . . .

69

A.6 Zebrafish expression pattern counts . . . . . . . . . . . . . . . . . . .

70

A.7 Primers used in Bicore experiments . . . . . . . . . . . . . . . . . . .

70

B.1 Primers used for candidate cerebral wall enhancer transgenics experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xii

77

List of Figures 2.1

Conservation of CNEs versus coding sequence. . . . . . . . . . . . . .

15

2.2

Computational search for Bicores . . . . . . . . . . . . . . . . . . . .

16

2.3

Summay of bilaterian conserved regulatory elements. . . . . . . . . .

18

2.4

Bicore1 is a bilaterian conserved enhancer. . . . . . . . . . . . . . . .

19

2.5

Bicore2 is a bilaterian conserved enhancer. . . . . . . . . . . . . . . .

22

3.1

Neocortex development and evolution. . . . . . . . . . . . . . . . . .

33

3.2

Enrichment analyses of candidate E14.5 cerebral wall enhancers. . . .

36

3.3

Expression patterns of tested cerebral wall enhancers . . . . . . . . .

38

3.4

Conservation and pleiotropy of candidate cerebral wall enhancers

. .

40

3.5

Overlap of repeat families with candidate cerebral wall enhancers . .

43

3.6

The binomial test for the ChEAP-seq method . . . . . . . . . . . . .

48

A.1 Multiple alignment of Bicore1 with paralogs . . . . . . . . . . . . . .

61

A.2 Bicore1 zebrafish expression at 21 hpf . . . . . . . . . . . . . . . . . .

62

A.3 Background expression of empty vector in zebrafish . . . . . . . . . .

63

A.4 Sox21-CNR locus in the human genome . . . . . . . . . . . . . . . . .

64

A.5 Multiple sequence alignment of Sox21-CNR . . . . . . . . . . . . . . .

65

A.6 Non-syntenic match to Sox21-CNR in D. virilis . . . . . . . . . . . .

65

B.1 Pleiotropy analysis excluding E11.5 forebrain elements . . . . . . . .

76

xiii

xiv

Chapter 1 Introduction In 1865, an Austrian monk stood before a meeting of the Br¨ unn Natural History Society and described the surprising results of years of plant breeding experiments. The man, Gregor Johann Mendel, outlined for the first time a mathematical model of heredity. A year later, he published the paper Versuche u ¨ber Plflanzenhybriden (Experiments in Plant Hybridization), and the first concept of a gene was born [2]. Mendel’s paper was largely ignored for more than three decades after its publication. However, in the early 1900’s, his work was rediscovered, and the field of genetics blossomed. The word “gene” itself was first used in 1909 by Wilhelm Johannsen in his book Elemente der exakten Erblichkeitslehre (The Elements of Heredity) [3]. For the next one hundred years, geneticists focused on describing and understanding genes. Although the term gene may be used loosely, most use it (as I do here) to refer to a genomic locus that encodes the information for generating a protein. The desire to understand genes served as a primary motivation for the human genome project. The initial proposal, submitted to the Department of Energy in 1987, argued: Genes for all the enzymes involved in metabolism, in biosynthesis and in repair need to be localized. Structural proteins, proteins of the immune response, transport proteins and the RNAs of protein synthesis are all important. The genes for hormones, which act in very small amounts, need to be identified and entered in the genome map. The largely unknown control 1

2

CHAPTER 1. INTRODUCTION

proteins, which orchestrate differentiation, development and senescence, may be the most important to characterize. As more genes are identified and their gene products determined, the polygenic disorders like heart disease, hypertension, diabetes, schizophrenia, manic depression and even some symptoms of aging can be attacked. It will become possible to develop methods for early diagnosis and effective treatment [4]. At that time, it was estimated that the human genome contained 50,000 to 150,000 genes [5], and such a large catalog of genes might serve as the basis for human complexity. The completion of the first draft of the human genome at the turn of the millennium marked a dramatic shift in thinking. Careful analysis of the 2.9 billion bases revealed that the human genome contained no more than 25,000 genes [6, 7] (the best estimate being ∼20,500 genes [8]), and less than ∼2% of the genome was protein coding [6, 7]. By comparison, the nematode worm Caenorhabditis elegans (an organism consisting of only 959 somatic cells) has a genome containing ∼20,000 genes [9]. In fact, there exists plant species with many more genes than humans [10, 11]. The announcement of the human gene count sent shockwaves throughout the scientific community. In an article published in The New York Times in 2001, Craig Venter (leader of one of the two competing human genome sequencing groups at the time) was quoted as saying, “there was almost panic because the genes weren’t there.” Another investigator who was interviewed believed the gene counting methods must be flawed and was described as being “unshaken in his estimate of 100,000 to 120,000 genes,” despite overwhelming evidence to the contrary [12]. For the majority of scientists who did accept the surprising gene count, it was clear that a major piece of the puzzle was missing. What that piece looked like or where it would fit into the big picture was not so clear. The first major clue came a year later. The mouse genome was sequenced, and for the first time, the human genome could be compared to that of another mammal. By aligning the two genomes, it was possible to identify those regions that had evolved under selective constraint over the ∼75 million years since the human and mouse lineages diverged. Such conservation of sequence indicates that changes to the given genomic region were associated with a fitness cost, implying that the region serves

3

a function dependent on its sequence. The important finding of the mouse genome project was that not only genes were conserved between human and mouse. In fact, whereas less than 2% of the genome codes for genes, at least 5% of the genome shows a signature of conservation [13]. The majority of the genomic sequence that natural selection has preserved among mammals does not code for genes. Thus, with the birth of the genomics era, the focus of many biologists shifted from genes to everything else. The genome had spoken, making it abundantly clear that genes are only part of the story. If organismal complexity is not correlated with gene number, then how does complexity arise? If the majority of conserved sequence in our genome does not code for genes, what is the function of this conserved noncoding sequence? As these questions have been explored using both computational and experimental methods, evidence has mounted that much of the functional genome is dedicated to regulating gene expression [14]. Many of the highly conserved regions of the genome that do not code for genes are now understood to be “cis-regulatory elements” - genomic sequences that act to regulate the expression of a nearby target gene. Like the term gene, the term cis-regulatory element is not strictly defined. It may refer to enhancers, silencers, or promoters, and some may even apply the term to insulators [15]. I have focused on the study of enhancers. As will be explained in the following sections, the majority of cis-regulatory elements likely function as enhancers, and enhancers may play a crucial role in evolution. In this dissertation, I present the work I have done identifying enhancers, tracing their evolutionary origins, and predicting and testing their function. In this chapter, I will outline the motivations for studying the conservation and function of enhancers. In Chapter 2, I described the discovery of Bicores (bilaterian conserved regulatory elements), developmental enhancers that are the first cis-regulatory sequences found to be conserved between the two major branches of the bilaterian tree (deuterostomes and protostomes). In Chapter 3, I present a genome-wide set of elements that are predicted to function in neocortex development, and I describe computational approaches for analyzing such sets that offer both specific and broad insights into evolution and development. Finally, in Chapter 4, I discuss how the results of the

4

CHAPTER 1. INTRODUCTION

preceding chapters fit into a model of enhancer evolution and how this model fits into the current theory of evolution and development (evo-devo).

1.1

Enhancers and gene regulation

In bilaterian species, precise gene expression patterns are generated through the coordination of cis-regulatory elements, transcription factors, co-factors, and the transcriptional machinery [16, 17]. These interactions impact and are impacted by chromatin structure [18]. Enhancers play a principal role in initiating and driving gene expression. Since the first enhancer was described in 1981 [19], huge strides have been made in understanding the mechanisms of gene regulation. Still much of the molecular biology of enhancers remains contentious. As such, I will focus only on those key features of enhancers that are generally accepted and pertinent to the work presented in the following chapters. An enhancer is a genomic sequence that drives expression of a target gene in a specific context (for example, a specific developmental structure, cell type, cellular event, etc.). The enhancer may lie upstream or downstream of the transcription start site of the target gene (hence the term cis-regulatory element), and the vast majority of enhancers in vertebrate genomes are more than one thousand bases away from their target gene [20]. The greatest observed distance between an enhancer and its target is roughly one million bases [21], though there is no reason to believe an enhancer cannot lie even further away. Although enhancers may lie far from their target genes in linear space, in three dimensional space, they are likely to exist in close proximity to the promoter. This system allows for a given gene to potentially respond to many different enhancers, depending on the context [20]. An enhancers drives expression of a target gene by recruiting transcription factors that are able to directly or indirectly interact with the transcriptional machinery. In order to recruit a transcription factor, an enhancer must be accessible (i.e. not tightly bound to a nucleosome) and it must present a sequence that the transcription factor is able to bind [18]. A given transcription factor will have a preference for binding a specific 6-12 base pair sequence. However, transcription factor binding

1.1. ENHANCERS AND GENE REGULATION

5

is quite promiscuous, and most factors will bind many possible variations on their preferred sequence. Further, some transcription factors may have more than one distinct binding preferences (what some may call a primary and secondary motif) [22]. Thus, an enhancer represents a functional grouping of transcription factor binding sites that, in sum, is able to drive expression of a target gene in a context specific manner. As the codon is the language of genes, transcription factor binding motifs are the language of enhancers. The manner by which enhancers encode context specificity within their DNA sequence is still an area of active research. One leading model posits that specificity is achieved through the combinatorics of transcription factor binding. For example, a well characterized enhancer drives expression of Interferon-β only in cells that have been infected by a virus [23, 24]. The sequence of the enhancer encodes a series of eight transcription factor binding sites. Translating the DNA sequence into the transcription factor binding preferences that it matches, the enhancer reads

(ATF2)-(c-Jun)-(IRF3A)-(IRF7B)-(IRF-3C)-(IRF-7D)-(p50)-(RelA)

If any one of these factors is missing, the enhancer will not activate transcription of Interferon-β. When all the factors are present, they bind the enhancer and recruit the co-activator complex p300. P300 remodels the local chromatin through histone acetylation, allowing for RNA polymerase to access the promoter and initiate transcription [23, 24]. While there may be many cellular contexts where ATF2 is present or where IRF factors are present, only in the context of a viral infection of the cell are all of the appropriate factors present in the nucleus and active. It is likely the case that many enhancers do not operate with such extreme rigidity as the Interferon-β enhancesome [18]. Other enhancers may have flexibility, requiring some subset of possible transcription factors to initiate transcription. Nonetheless, by depending on a specific combination of transcription factors to initiate transcription, an enhancer limits its activity to only those cellular contexts where the appropriate factors are present and active.

6

CHAPTER 1. INTRODUCTION

1.2

Identifying enhancers

Both computational and experimental methods may be used to identify putative enhancers. Here, I will focus on describing the two most commonly used genomewide approaches for discovering enhancers, both of which I utilize in the subsequent chapters.

1.2.1

Conserved non-coding elements

The most commonly used computational method for identifying enhancers relies on finding “conserved non-coding elements” (CNEs). This method is rooted in the central dogma of comparative genomics: conservation implies function. The term “noncoding” can sometimes be used loosely. In this work, I specifically use the term to refer to genomic regions that do not code for proteins or functional non-coding RNAs (micro RNAs, snoRNAs, non-coding genes). Thus, a conserved non-coding region is likely to be functional, but that function is unlikely to be related to a transcriptional product. We therefore infer a possible cis function. There are many methods for identifying conserved genomic regions. The most simplistic approach is to generate a pairwise sequence alignment between two species and scan for windows with sequence identity above some predetermined threshold. For closely related species, this method is not useful, since neutrally evolving sequence may align with high sequence identity if insufficient time has passed to accumulate mutations. Instead, this approach has been most useful when comparing highly diverged species [25, 26]. For example, the neutral branch length (i.e. the expected number of substitutions at each site for neutrally evolving sequence) between humans and fugu fish is ∼2 substitutions per site [27]. Thus, neutrally evolving DNA should be essentially unalignable. For that reason, finding a window that shows, say, 70% sequence identity is strong evidence for conservation. However, pairwise alignments can be fooled by genome assembly contaminations, simple repeats, and low complexity sequences. These problems are partially mitigated by only considering alignments that fall into larger synteny blocks. Still, when only considering two species, many false positive may pass a sequence identity filter.

1.2. IDENTIFYING ENHANCERS

7

More advanced methods define an expected neutral rate of mutation between the species being compared and search for windows that show a lower rate of substitution [13]. This method has also been used to find genomic regions that have resisted insertions and deletions [28]. By defining a neutral rate, it is possible to identify conserved regions in more closely related species. For both simple methods that use sequence identity thresholds and complicated methods that compare observed and expected mutation rates, the use of multiple alignments improves sensitivity and specificity. Using multiple alignments, it is even possible to identify individual bases that appear to have resisted substitutions [29, 30]. Perhaps the most advanced method for identifying conserved regions incorporates a multiple alignment and a phylogeny with estimated neutral branch lengths. A Hidden Markov Model is then used to define genomic regions that are better modeled by a conserved state rather than a neutral state [31]. In my experience, no one method is completely satisfactory, and it is sometimes necessary to integrate multiple metrics of conservation. Several lines of evidence suggest that the majority of CNEs are cis-regulatory elements, and in fact, most likely function as enhancers. CNE’s are not randomly distributed in the genome, but rather cluster around transcription factors and developmental genes (often referred to as transdev genes) [32, 25]. Moreover, as would be expected for enhancers, most CNEs are far from a transcription start site. Over large evolutionary distances, CNEs maintain syntenic relationships with genes, especially transdev genes [33]. One group has tested over one thousand CNEs for enhancer function during mouse development. In their survey, ∼45% of CNEs drive tissue-specific expression at embryonic day 11.5 [34, 35]. Notably, an enhancers that is active at earlier or later time points but whose activity does not span the assayed time point would not be identified by this survey. It is reasonable to think that many of the elements that do not show enhancer activity at embryonic day 11.5 might be active enhancers at another time point. Even if only a quarter of those CNEs that are negative at day 11.5 are functional at some other time point, it tells us that more than half of all conserved non-coding elements in the genome function as developmental enhancers. Although I claim that most CNEs in the genome function as enhancers, I do not

8

CHAPTER 1. INTRODUCTION

exclude the possibility that many of these elements also function as silencers. Assaying for silencers is a challenge, and current methods only allow for in vitro assays (unless an expensive and time consuming knockout experiment is performed). Thus, we lack a good estimate for the prevalance of silencers in the genome. Intriguingly, the same genomic sequence may act as both an enhancer and a silencer [36, 37, 38]. For this reason, I prefer not to make a strong distinction between enhancers and silencers. The same genomic sequence that increases expression of its target gene in one context may decrease expression of that target gene in another context.

1.2.2

ChIP-seq

Computational approaches have the benefit of being genome-wide. However, a major drawback is that the function of a CNE is difficult to predict. While it may be the case that most CNEs are enhancers, predicting the context specificity of CNEs remains an open challenge. Recently, an experimental approach has become the gold standard for genome-wide identification of cis-regulatory elments that function in a given context. The method relies on first generating an antibody to a protein/structure that interacts with DNA (e.g. a transcription factor, co-factor, histone). Through chromatin immunoprecipitation (ChIP), it is possible to use the antibody to separate out those genomic regions that were bound (directly or indirectly) to the protein or structure. Those regions can then be sequenced (hence, ChIP-seq). The sequence reads are aligned back to the genome, and using one of several computational methods, it is possible to identify the sites that were bound to the protein at the time of the experiment [39]. This method was initially used to study specific transcription factors [40]; though, the approach has now been expanded to many other uses. A particularly useful application of ChIP-seq has been the identification of developmental enhancers [41, 42, 43]. P300, a nucleosome acetylator protein, associates with active enhancers. By “ChIPing” on p300, it is therefor possible to identify active enhancers in whatever cellular context the experiment is performed. By dissecting out a developmental tissue (e.g. limb buds, heart, forebrain, etc) at a specific time point and using it as

1.3. THE EVO-DEVO FRAMEWORK

9

a substrate for ChIP-seq, it is possible to identify enhancers that drive expression in that tissue at that time point. ChIP-seq has also been used to identify enhancers by ChIPing for histone modifications that have been associated with active enhancers [40]. However, p300 ChIP-seq is the best studied approach for finding developmental enhancers. One potential shortcoming of ChIP-seq is that it identifies binding events that are both transient and maintained. Given the stochasticity of the cellular milieu, it is easy to imagine that transcription factors transiently interact with many genomic regions. When a factor finds the correct enhancer, we would expect a longer binding event, thanks to cooperativity with other factors, increased stability through the formation of large complexes, and/or strong binding due to the presence of a preferred motif match. Indeed, there is some evidence that transient binding events are prevalent and are less likely to be functional. Moreover, current ChIP-seq methods do not offer a means of differentiating transient from prolonged binding events [44]. Even so, this method has proven to invaluable to several areas of investigation.

1.3

The evo-devo framework

Thanks to the genomic era and modern techniques like ChIP-seq, we have now begun to appreciate the complexity of cis-regulation in the genome. However, even before the genomics era, when the research atmosphere was still largely gene-centric, the importance of cis-regulation was not lost on all. A small number of investigators with almost prophetic foresight, theorized that the basis for organismal complexity may lie outside of genes. In their study of transposable elements, Earic Davidson and Roy Britten hypothesized that such sequences may carry the ability to become cis-regulatory elements. In particular, they believed that new gene regulatory networks might burst into existence through the rapid distribution of a mobile element. According to their theory, evolutionary novelty arose from new gene regulatory networks [45]. Their hypothesis carried the brilliant insight that the evolution of new forms may not depend on the emergence of new genes. Rather, evolution might be driven by alterations to gene

10

CHAPTER 1. INTRODUCTION

regulation. Susumu Ohno, a pioneer of the field of molecular evolution, made the stunning prediction in 1971 that coding sequence would account for less than 2% of the genome. He went on to postulate: Drastic evolutional changes in organisms’ appearances are usually due to changes in regulatory systems rather than in structural genes. Man uses all five digits while the horse stands on its middle-toes. Nevertheless, a digit is a digit. The same set of structural genes are mobilized for the formation of human fingers and equine cannons. It would be safe to say that the creation of additional regulatory systems contributed more to big evolutional changes than did the creation of new structural genes [46]. Four years later, Mary-Claire King and Allan Wilson famously refined this hypothesis. They poignantly noted that the degree of phenotypic difference between humans and chimpanzees cannot be accounted for by the degree of genetic difference when comparing protein sequence and function. They therefore speculated that there might exist nucleotide changes that “affect the production, but not the amino acid sequence, of proteins” [47] They also suggested that chromosomal rearrangements (translocations, inversions) may have important effects on gene expression (though they could not explain a mechanism at the time). Impressively, they appreciated that understanding developmental gene expression would be essential to understanding evolution. They wrote, “Most important for the future study of human evolution would be the demonstration of differences between apes and humans in the timing of gene expression during development, particularly during the development of adaptively crucial organ systems such as the brain” [47]. Although the thinking of King and Wilson was sound, it was based off of limited evidence. After all, it might be that the genes they studied were not the genes that are important for morphogenesis. It took another decade before the genes involved with embryonic patterning and morphogenesis were identified, and the extent of their conservation has been even more surprising than what King and Wilson had observed. Beginning with the Hox cluster [48], it became clear that organisms as different as

1.3. THE EVO-DEVO FRAMEWORK

11

mammals and insects use the same genes to regulate embryonic patterning. It is now accepted that there exists a core set of transcription factors and signaling pathways (sometimes referred to as the developmental toolkit) that is shared by all bilaterians [16, 17]. Many of these genes even predate bilaterians [49, 50]. How is it then, with this same toolkit, “endless forms most beautiful and most wonderful have been, and are being evolved?” The field of evolutionary developmental biology (evo-devo) offers a framework for answering this question, and this framework centers on cis-regulatory elements. The developmental toolkit genes serve multiple functions throughout development [16]. It is their extensive pleiotropy that likely accounts for their deep conservation and points towards cis-regulation as a primary mode of altering the developmental process over evolutionary time. Evo-devo posits that functional mutations in a developmental gene are detrimental because such mutations likely impact each of the many developmental functions of that gene. Instead, the primary mode of morphological evolution (and the means by which toolkit genes have become so pleiotropic) is through the gain and loss of cis-regulatory elements. According to the evo-devo paradigm, cis-regulatory elements are modular; each is responsible for regulating the correct target gene in the correct context. Thus, the gain, loss, or modification of a cis-regulatory element allows a developmental gene to gain or lose a function without altering its other functions [16]. As an example, consider the Sonic hedgehog signaling pathway. The Shh gene is conserved in almost all bilaterians [51, 52]. Knockout of Shh in mice causes neonatal lethality, cyclopia, and defects in the brain, spinal cord, vertebrae, ribs, and limbs [53]. Further studies have implicated Shh in multiple distinct developmental processes, including central nervous system [54], gut [55], and limb [56] development. Like most developmental genes, the Shh locus is rich with CNEs, each a putative cisregulatory element. One such CNE has been found to be an enhancer that drives precise expression of Shh in the developing limb bud. Specific point mutations in this element cause ectopic expression during limb development, and in humans, these mutations cause an autosomal dominant polydactyly phenotype [21]. Homozygous deletion of this enhancer causes loss of limbs in mice, without other phenotypes [57].

12

CHAPTER 1. INTRODUCTION

Thus, one can easily imagine how changes in cis-regulation can lead to changes in Shh expression in a single context, leading to a new morphology. While the theory of evo-devo is elegant and appealing, it remains a challenge to identify cis-regulatory changes that have led to morphological differences between or within species. Over recent years, some notable examples have been discovered. A small number of nucleotide differences in conserved developmental enhancer of Hoxc8 has been associated with delayed expression of Hoxc8 in chicken development compared to mouse development, likely leading to differences in relative lengths of the thoracic region [58]. Some subpopulations of stickleback fish have undergone adaptive reductions in pelvic morphologies. These reductions can be attributed to deletion of a pelvic enhancer of Pitx1 [59]. Enhancer loss has also been associated with human phenotypes. Deletion of a developmental enhancer of the androgen receptor gene is likely the genetic basis for loss of penile spines in humans [60]. As we improve our ability to identify cis-regulatory elements, trace their homologies, and predict their function, we will continue to accumulate examples of the evo-devo process in nature. I hope, in the following chapters, to contribute towards this effort.

Chapter 2 Bilaterian conserved regulatory elements The bilaterian tree unites two major clades, deuterostomes (e.g. humans) and protostomes (e.g. flies) [61]. Protostome species such as insects, nematodes, annelids, and mollusks have served as invaluable model organisms. Much of the utility of these model systems stems from fundamental homologies between the two clades. Across bilaterians, early embryos undergo gastrulation to form three germ layers. These germ layers are patterned along dorsal-ventral and anterior-posterior axes. Underlying these processes are ancient conserved signaling pathways and transcription factors, often interacting as part of conserved genetic circuits. In both deuterostomes and protostomes, the precise expression of each circuit component depends on cis-regulatory elements [16, 17]. As described in Chapter 1, cis-regulatory elements are genomic regions that transcription factors bind in order to modify the expression of a target gene [15]. Cis-regulatory elements are often identified as conserved non-coding elements (CNEs) [32, 25, 35, 34]. Among closely related species, CNEs can show extreme conservation. For example, the human genome contains hundreds of non-coding ultraconserved elements that align to mouse and rat with 100 percent identity across 200 bases or more [32]. Many of these elements function as developmental enhancers [35]. Protostome genomes contain a distinct set of similarly ultraconserved elements 13

14

CHAPTER 2. BILATERIAN CONSERVED REGULATORY ELEMENTS

[62]. Strikingly, in contrast to the genes they regulate, no cis-regulatory elements have previously been found to be conserved between deuterostomes and protostomes [32, 25, 63, 64, 65] (see Appendix Section A.1). Even the oldest known enhancer, conserved between deuterostomes and the cnidarian sea anemone, has not been found to be conserved in protostomes [65]. These observations may suggest that the cis-regulatory component of genetic circuits has been completely rewired between deuterostomes and protostomes. Alternatively, it may be that some ancestral regulatory regions are conserved between these clades and have remained elusive due to limitations in our tools and our sample of bilaterian genomes. If conserved cisregulatory elements do exist, such elements offer a new avenue for exploring how developmental logic is encoded in the genome and how this logic evolves. Here we present the first examples of cis-regulatory elements conserved between deuterostomes and protostomes. These elements have conserved sequence and gene synteny. The conserved sequence reflects conservation of a series of transcription factor binding sites, and we show that these elements function as developmental enhancers.

2.1 2.1.1

Results Searching for bilaterian conserved regulatory elements

Conservation of non-coding sequence is rare compared to coding sequence, even over much shorter evolutionary distances than that between deuterostomes and protostomes. For example, nearly a third of human coding bases (11 Mb out of 34 Mb) align to the amphioxus genome. In stark contrast, less than one percent of CNE bases ( 0, where fb is the fraction of aligning bases of base b . For each element with at least one hit passing these filters, we manually inspected the alignment and the surrounding genomic landscape. As a final filter, we only kept elements that have maintained synteny with the same target gene in vertebrates and the aligning invertebrates. We associated each vertebrate CNE with the two nearest genes in the human genome (hg18). For any hit to an invertebrate genome, we found the nearest annotated mRNAs and compared these to the database of validated Refseq proteins using blastx [117]. If the top hit for this search was an ortholog of the appropriate human target gene, then we called the hit syntenic (Supplementary Tables A.4 and A.5). Five vertebrate CNEs had at least one such syntenic hit (Supplementary Table A.2). To more fully characterize the evolution of these five elements across the metazoan tree of life, we then performed a second more comprehensive and more sensitive search. For each of the five CNEs, we extracted all vertebrate instances of the element using the UCSC 44-way multiple alignment [118] on the hg18 genome browser. As a query to our search, we used all vertebrate instances of each element as well as the previously

30

CHAPTER 2. BILATERIAN CONSERVED REGULATORY ELEMENTS

discovered invertebrate instances. Each query was searched against all available nonvertebrate metazoan sequence data (Figure 2.2). Each new hit found with this comprehensive search was then used as a query to repeat the search process until no new hits were found. We manually inspected all hits, checking for the quality of the alignment and for gene synteny. For hits to genomes without an annotated set of mRNAs, we checked for synteny using nearby spliced ESTs. Of the five elements, one was conserved in chordates, two were conserved in deuterostomes, and two were conserved in both deuterostomes and protostomes (Supplementary Table A.2).

2.4.4

Other computational analysis

Multiple alignments were generated using ClustalW [119] and manually edited using JalView [120]. Conservation profiles of the multiple alignments were generated using WebLogo [121]. Conservation profiles were compared to a library of position weight matrices (PWMs) from Uniprobe [122], TRANSFAC [123], and GENOMATIX. PWMs that best match the substitution pattern in the multiple alignment were manually chosen and aligned to the conservation profile.

2.4.5

Zebrafish transgenics

Human, zebrafish, sea urchin, owl limpet and tick sequences were PCR amplified from genomic DNA samples or synthesized (Genescript, Piscataway, NJ) (Supplementary Dataset A.4 and Supplementary Table A.7). All sequences were cloned into the E1BGFP Tol2 vector [124] using the D-TOPO and Gateway cloning systems (Invitrogen, Life Technologies Corporation). Wildtype AB strain zebrafish were bred according to standard methods and 1-cell stage embryos were injected with the enhancer assay vector and Tol2 transposase mRNA according to previously described methods [125]. Embryos were examined for GFP expression at 6 hours post fertilization (hpf), 21 hpf and 48 hpf. A minimum of two independent transgenic experiments was done for each construct, and over 60 morphologically healthy embryos were scored for GFP expression at each time point.

2.4. METHODS

31

To test for background patterns that may be generated by our expression vector, we injected the empty vector, lacking any added enhancer sequence. We could appreciate weak expression in about 50 percent of fish. However, expression was too minimal to determine cell-type using the dissecting microscope. Using confocal microscopy, we could identify expression in single cells. The most common cell types that showed such weak expression were notochord, skeletal muscle, and heart (Supplementary Figure A.3).

2.4.6

Mouse transgenics

The human genomic region encompassing Bicore2 was synthesized with flanking NotI restriction sites (Genescript, Piscataway, NJ) and cloned into the Not5’hsp68lacZ minimal promoter expression vector [126]. The construct was linearized with SalI prior to injection. Transgenic mice were generated by pronuclear injections of FVB embryos (Xenogen Biosciences, Cranberry, NJ). Embryos were harvested at embryonic day 11.5, fixed, and whole mount stained for lacZ as described [126].

2.4.7

Sea urchin transgenics

The sea urchin Bicore1 sequence was PCR amplified from Strongylocentrotus purpuratus genomic DNA. The product was digested and cloned into the EcoRI-BglII sites of the EpGFPII vector [79]. Carrier DNA was prepared using HindIII digestion of S. purpura sperm, followed by phenol chloroform extraction and precipitation with sodium acetate. The carrier DNA was brought to 0.5 -1 µg/µL, spun, and filtered with a 0.2 µM filter. The construct DNA was mixed with carrier DNA at a 1:3 molar ratio. 50% glycerol was added to the mixture to make the construct DNA a final concentration of 2,000 molecules/2pL. Over 1,000 fertilized eggs were injected and over 700 embryos were scored.

Chapter 3 Developmental enhancers of the neocortex The neocortex (isocortex) is a complex six-layered structure unique to mammals [127, 128]. It has been associated with higher cognitive functions [129], and defects in this structure are the likely source for many neurologic and psychiatric diseases [130]. Understanding how the neocortex develops and how that process evolved is a primary goal of neurobiology. Among all vertebrates, the developing central nervous system segments into a forebrain, midbrain, hindbrain, and spinal cord [95]. The forebrain is further segmented into the telencephalon and diencephalon. In mammals, the neocortex develops from a structure called the cerebral wall, which exists in the dorsal portion of the telencephalon (Figure 3.1A). Early in development, this region consists of a layer of progenitor cells lining the ventricles called the ventricular zone (VZ) (Figure 3.1A-B). Progenitor cells of the VZ undergo two types of division. Symmetric division produces identical daughter cells that remain in the VZ as progenitor cells. Asymmetric division produces an intermediate progenitor cell (IPC) that migrates out of the VZ to form the subventricular and intermediate zones (SVZ-IZ); the other daughter cell migrates past the SVZ-IZ to form structures (e.g. preplate, cortical plate) that go on to become the neocortex. The IPCs in the SVZ-IZ may divide to form more IPCs or may divide into post-mitotic neurons that migrate to the cortical plate and ultimately 32

33

A

Euarchontoglires Eutheria Mammalia Amniota Tetrapoda Gnathostomata

contribute to the neocortex [131, 132]. C

cerebral wall CP SVZ-IZ

Mouse

VZ

emergence of neocortex

dorsal telencephalon

Human Dog Platypus

B

I

Bird

CP

II/III

SVZ-IZ

VI V

Amphibian

VZ

VZ

VI

Fish

E11.5

E14.5

Adult

PP

Figure 3.1: Simplified model of neocortex development and evolution. (A) One hemisphere of a coronal section of an embryonic mouse brain. The neocortex develops from the cerebral wall in the dorsal telencephalon. (B) The six layers of the neocortex develop from progenitor cells and intermediate progenitor cells, located in the VZ and SVZ-IZ respectively. (C) Only mammals have a neocortex, implying it first emerged in the mammalian ancestor. (A-B adapted from [131]). PP preplate, VZ - ventricular zone, SVZ-IZ - subventricular zone and intermediate zone, CP - cortical plate, I-VI - the six layers of the neocortex.

In non-mammalian vertebrates (Figure 3.1C), the early dorsal telencephalon also consists of a VZ. However, only in mammals do post-mitotic neurons organize into a six-layered cortex [127, 132]. In birds, for example, the neurons in the CP develop into the hyperpallium. Although the hyperpallium is topologically analogous to the neocortex, it has a nuclear structure rather than a laminar structure and is thought to be evolutionarily distinct [127]. Furthermore, the development of the hyperpallium does not appear to involve any structure like the SVZ-IZ. Though there is evidence that other structures in the avian brain (nidopallium, mesopallium) do develop from a structure similar to the mammalian SVZ-IZ [133, 134]. While the anatomy, histology, and gene expression patterns of the neocortex have been well studied, there is little known about the cis-regulation of neocortex development. Chromatin immunoprecipitation with the enhancer-associated co-activator

34

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

protein p300 and high throughput sequencing (p300 ChIP-seq) has become the gold standard for identifying active enhancers genome-wide. When applied to embryonic tissue, this method reliably identifies developmental enhancers that drive expression in the appropriate structure at the appropriate time [41, 42, 43]. In order to identify active enhancers during neocortex development, we dissected the cerebral wall of the dorsal telencephalon of embryonic day 14.5 mouse embryos and performed p300 ChIP-seq. Here, we present the first genome-wide set of candidate cerebral wall enhancers. We also present several novel methods for analyzing such datasets and show how these methods offer new insights into neocortex development and evolution.

3.1 3.1.1

Results ChIP-seq identifies candidate cerebral wall enhancers

To identify enhancers that function in neocortex development, we dissected the cerebral wall of the dorsal telencephalon from E14.5 mouse embryos (Figure 3.1A) and performed chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) with an antibody to the enhancer-associated p300 co-activator complex. This approach has successfully identified tissue specific developmental enhancers in several other contexts [41, 42, 43]. We identified 6,629 p300 bound sites (>2.5 kilobases from the nearest transcription start site), which are candidate developmental enhancers of the cerebral wall. As seen with other sets of developmental enhancers [135], the majority of these elements are distal, with 65% being more than 50 kilobases to the nearest transcription start site. We used the Genomic Regions Enrichment of Annotations Tool (GREAT) [135] to evaluate the biological functions of our 6,629 candidate cerebral wall enhancers (Figure 3.2A). GREAT associates each candidate enhancer with its nearby genes, applies the annotations of the genes to the enhancer, and performs a statistical enrichment test. The top Gene Ontology Biological Process enrichments for the candidate enhancers include gliogenesis (300 enhancers; p-value: 1.8 × 10−45 ), axon guidance (418

3.1. RESULTS

35

enhancers; p-value: 2.7 × 10−43 ), and telencephalon development (397 enhancers; pvalue: 1.0 × 10−42 ). According to the Mouse Phenotypes ontology, our candidate cerebral wall enhancers are enriched in the regulatory domains of genes whose knockout or mutation results in abnormal forebrain development (419 enhancers; p-value: 1.9 × 10−56 ) and abnormal brain commissure development (396 enhancers; p-value: 1.7 × 10−57 ). Our set is strongly enriched for occurring near genes expressed in the telencephalon, including specifically the cerebral cortex at Theiler Stage 22 (1,811 enhancers; p-value: 9.5 × 10−124 ), which corresponds to the sampled time point [136]. A recent RNA-seq analysis of microdissected E14.5 developing neocortex identified genes expressed specifically in each of the major developmental zones (VZ, SVZ-IZ, and CP) [137]. Our 6,629 elements are enriched near genes that are exclusively expressed in each zone: VZ (126 elements; p-value: 1.1 × 10−25 ), SVZ-IZ (101 elements; p-value: 8.8 × 10−25 ), and CP (251 elements; p-value: 1.8 × 10−18 ). GREAT analysis also revealed a tendency for candidate cerebral wall enhancers to cluster together, with some genes having tens of elements in their regulatory domains. To determine what would be expected by chance, we randomly distributed the 6,629 peaks across the genome 1,000 times. In this random null, we never observed a gene associate with more than 15 peaks (Figure 3.2B). In the true set, several genes associate with more than 20 peaks. Some of these genes (e.g. Nfib, Sox4, Sox11) have known roles in brain development. Others have unknown functions and are good candidates for future study. For example, the zinc-finger gene Zfp608 is associated with 42 p300 peaks. It is expressed specifically in the SVZ-IZ at E14.5 [137], but its function is unknown. Auts2, a gene associated with autism but with unknown function, has 29 peaks in its regulatory domain. It is expressed in the SVZ-IZ and CP at E14.5 [137].

3.1.2

Motif enrichments and functional associations

To investigate the transcription factors that regulate our 6,629 candidate cerebral wall enhancers, we used a motif discovery and enrichment analysis (see Section 3.4). We identify a number of distinct enriched motifs, many for important regulators of

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

A

GO Biological Process CNS neuron differentiation glial cell differentiation gliogenesis axon guidance telencephalon development Mouse Phenotypes abnormal neuron differentiation

65.3

complete perinatal lethality abnormal nervous system tract abnormal brain commissure morphology abnormal forebrain development

64.0 57.5 56.8 55.7

MGI Expression Theiler stage 22 telencephalon Theiler stage 22 cerebral cortex Theiler stage 24 nervous system Theiler stage 19 forebrain Theiler stage 22 forebrain

candidate enhancers shuffled regions

B

55.8 49.5 44.7 42.6 42.0

10,000 1,000

genes

36

100 10 Auts2

0

40

80

127.2 123.0 121.9 120.0 118.5 120

Nfib

1 0

0

10

20

30

Zfp608

40

regions in gene regulatory domain

-log10 p-value

C Motif Fold Enrichment

Predicted Regulatory Function

Function Fold Enrichment (enhancers with motif compared to all enhancers)

Neurod family

2.39

Notch signaling pathway

1.70

Lhx / Lmx family

2.42

negative regulation of neuron apoptosis

1.56

Nfi family dimer

4.14

cell surface proteins

1.96

Rfx family dimer

3.33

protein kinase activity

2.44

Novel Hox dimer

2.32

autism spectrum disease

2.15

Novel Nfi dimer

2.06

-

-

Transcription Factor

Motif

Figure 3.2: Enrichment analyses support candidate enhancers as having function in the developing neocortex. (A) Top 5 terms per ontology of GREAT enrichments for the candidate cerebral wall enhancers (p-value is the region-based binomial p-value). Theiler state 22 corresponds to E13.5-E15. (B) Number of candidate enhancers in the regulatory domain of a gene compared to random expectation. (C) Motifs enriched in candidate cerebral wall enhancers. Motif fold enrichment is relative to [GC]-matched regions of the mouse genome. Predicted regulatory function is the top relevant GREAT enrichment for the elements with a motif hit compared to all 6,629 elements.

3.1. RESULTS

37

neocortex development (Figure 3.2C). Compared to [GC]-matched regions from the mouse genome, the Neurod/Neurog (2,452 / 6,629 enhancers = 37%; fold: 2.39), Lhx/Lmx (2,129 = 32%; fold: 2.42), Nfi (325 = 5%; fold: 4.14), and Rfx (195 = 3%; fold: 3.33) family motifs are all highly enriched in the candidate cerebral wall enhancers. Factors from all of these families have known roles in mammalian brain development [138, 139, 131, 140]. We also discovered two novel motifs enriched in the set. One motif appears to be an alternative orientation for the Nfi dimer (379 = 6%; fold: 2.06) and the other a novel Hox dimer motif (473 = 7%; fold: 2.32). We hypothesized that a given motif may mark enhancers that regulate genes involved in a specific process or pathway in neocortex development. To test this hypothesis, we used GREAT [135] to evaluate the enrichments of the enhancers with matches to a motif compared to the full set of candidate enhancers. Candidate cerebral wall enhancers with matches to the Neurod/Neurog motif are enriched near genes that are part of the Notch signaling pathway (34 enhancers; fold: 1.70; p-value: 9.0 × 10−5 ). Elements with matches to the Lhx/Lmx family motif are associated with genes that negatively regulate neuron apoptosis (44 enhancers; fold: 1.56; p-value: 3.4 × 10−4 ). We also predict members of the Hox family as upstream regulators of genes associated with autism spectrum (33 enhancers; fold: 2.15; p-value: 1.9×10−5 ).

3.1.3

Experimental validation of cerebral wall enhancers

It has been well validated that p300 ChIP-seq of embryonic tissue has 80% specificity for predicting enhancers that function in the appropriate tissue [41, 42]. Further, p300 ChIP-seq can identify enhancers that function in specific substructures of the sampled tissue [42]. We were thus interested in evaluating our enhancer predictions for elements that drive zone specific expression in the developing neocortex. Because few CP and SVZ-IZ enhancers have been previously characterized, we aimed to identify enhancers in these zones. Transcription factors thought to play an essential role in specifying these layers are Neurod family factors, Tbr1, and Tbr2 [141, 142, 143]. We therefore searched for candidate enhancers with conserved matches to the Neurod and/or Tbr motifs. We chose ten such candidates near genes known or hypothesized

38

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

to play important roles in corticogenesis and tested these elements in a conventional developmental enhancer assay. Eight of the ten assayed candidates drive reproducible expression in the cerebral wall visible in a whole mount (Figure 3.3A-H).

Figure 3.3: Candidate cerebral wall enhancers drive laminar expression in the cerebral wall. Of the 10 assayed candidates, 8 drive reproducible expression in the cerebral wall. (A-H) Whole mounts show expression in the dorsal telencephalon. (I-P) Coronal sections reveal cerebral wall specific expression, exclusive of the ganglionic eminences. (Q-X) Zooms of coronal sections reveal distinct laminar patterns. (Y-AF) In situ of the candidate target gene at E14.5; coronal: Y-Z, saggital: AA-AF. Y,Z,AE from Allen Brain Atlas; AA-AD,AF from Eurexpress. The bottom row shows target gene expression pattern based on [137].

Coronal sections reveal that the assayed enhancers drive dorsal-specific expression, exclusive of the ganglionic eminences of the ventral telencephalon (Figure 3.3I-P), corresponding to the cerebral wall. Sections also reveal laminar restriction of enhancer activity. Two enhancers eltD and eltG drive expression in the most superficial cells of the developing cortex. These patterns precisely match a domain of the expression and functional activity of Tbr1 and Bhlhb5, their respective target genes [144, 143]. The other six enhancers are active primarily in the CP and SVZ-IZ. In total, six of the eight positive enhancers drive expression within the domain of activity of the putative

3.1. RESULTS

39

target gene [145, 137]. Two enhancers drive expression patterns that include a zone outside the detected expression regions of the putative target. These elements eltA and eltF drive expression in the CP and SVZ-IZ although their putative target genes (Eomes/Tbr2 and Id4) are expressed primarily in the SVZ-IZ and VZ (Figure 3.3YAF). It may be that other cis-regulatory elements in these loci act to repress the expression of Tbr2 and Id4 in the CP.

3.1.4

Evolutionary origins of cerebral wall enhancers

We examined the conservation of our 6,629 candidate cerebral wall enhancers to trace the origins of neocortex regulatory elements. The majority (4,278; 65%) of the candidate enhancers exhibit signatures of evolutionary constraint (PhastCons score >350). Very few elements appear specific to the mouse lineage, as over 95% (6,317) are conserved to human. Over 86 percent (5,737) are common to all eutherian (placental) mammals. Many candidate enhancers (1,543; 23%) pre-date the innovation of the neocortex in mammals, and 289 (4%) are conserved to fish (Figure 3.4A). As comparison, fewer than 5 percent of heart p300 ChIP-seq peaks [42] are conserved outside of mammals. At the other end of the spectrum, nearly 40 percent of forebrain p300 ChIP-seq peaks from E11.5 embryos [41] are conserved outside of mammals. The forebrain encompasses both the telencephalon and diencephalon, and at E11.5 it consists of mostly progenitor cells forming a VZ [131] (Figure 3.1B). The deep conservation of E11.5 forebrain enhancers is consistent with the hypothesis that the early forebrain is homologous across vertebrates [95]. The neocortex, however, is a mammalian specific structure [127, 128]. It is striking that more than 1,500 putative enhancers predate the structure they help to pattern. One possibility is that deeply conserved cerebral wall enhancers function in the VZ, a structure that is present in non-mammals [128]. Indeed, compared to the whole set, elements that align to non-mammals are enriched near genes expressed specifically in the VZ (56 elements; p-value: 1.0 × 10−7 ). However, many pre-mammalian elements are not associated with VZ genes. We hypothesized that some cerebral wall enhancers evolved from preexisting enhancers

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

A

B

E14.5 cerebral wall p300

60% ●



● ●

● ● ●



● ● ●●

0%

m eu ou ar se ch on to gl ire s eu th er ia m am m al ia am ni ot a te tra po gn da at ho st om at a

clade

clade

D

120

observed

*

20

0

d

n

or ch

br ai

to

id

m

no

Mouse E11.5 enhancer activity

e

t

b

2

ar he

lim

be tu

n ne

ur al

n

br ai

hi

nd

br ai

id m

fo r

eb

ra i

n

0

4

nk

*

40

expected

6

tru

*

60

observed

8

t

80

expected

in

*

*

10

sp

100

candidate cerebral wall enhancers

candidate cerebral wall enhancers

C

n

0%

● ●

10%

ar



● ●

he



20%

n

20%



ra i



br ai





m eu ou ar se ch on to gl ire s eu th er ia m am m al ia am ni ot a te tra gn po at da ho st om at a

40%



30%

eb





40%

E11.5 heart p300

nd

80%



E11.5 forebrain p300

● ●



fraction conserved

50%

non-exonic basepairs



hi

●●

fraction pleiotropic

100%

fo r

40

Zebrafish enhancer activity

Figure 3.4: Conservation and pleiotropy of candidate cerebral wall enhancers. (A) Evolutionary conservation of candidate cerebral wall enhancers compared to other p300 ChIP-seq sets and non-exonic bases (see Section 3.4). (B) Relationship between pleiotropy of candidate cerebral wall enhancers and evolutionary conservation. Pleiotropy is quantified by overlap with p300 ChIP-seq peaks from E11.5 forebrain, midbrain, limb, and heart. (C) Overlap of candidate cerebral wall enhancers with regions for which the human ortholog has been tested in an E11.5 mouse transgenic enhancer assay (* denotes enrichment p-value < 10−5 ). (D) Overlap of candidate cerebral wall enhancers with regions for which the zebrafish ortholog has been tested in a zebrafish transgenic enhancer assay (* denotes enrichment p-value < 0.05).

3.1. RESULTS

41

that functioned in other contexts. If this hypothesis is correct, we might expect cerebral wall enhancers that are conserved outside of mammals to have functions outside of the developing neocortex. In other words, such enhancers are likely to be pleiotropic. As a measure of pleiotropy, we overlapped our candidate cerebral wall enhancers with sets of p300 ChIP-seq peaks from embryologically distinct tissues (E11.5 forebrain, midbrain, limb, and heart). When comparing pleiotropy to evolutionary age, a clear trend emerges; the older an element, the more likely it is pleiotropic (Figure 3.4B). This trend holds even when the E11.5 forebrain set is not used to measure pleiotropy (Supplementary Figure B.1), suggesting older enhancers are pleiotropic not just in terms of developmental timing but also in terms of developmental space (i.e. organ structure). Corroborating this idea, the ancient transcription factors that pattern the neocortex often function in other contexts of brain development. For example, the cerebellum is a structure found across vertebrates. Several of the conserved regulators of cerebellar development (eg Pax6, Neurod, Tbr) are instrumental in neocortex development [141]. We further assessed the pleiotropy of our candidate cerebral wall enhancers by comparing to a large database of human enhancers [35]. For 214 of our elements, the human ortholog has been tested in a mouse transgenic enhancer assay at E11.5. 148 of these elements function as enhancers at this early time point, most often driving expression in the developing central nervous system (Figure 3.4C). Given these results, we hypothesized that candidate cerebral wall enhancers with conservation in non-mammalian species are most likely to function in the central nervous system of those species. This supposition is supported by data from a large enhancer screen in zebrafish [124]. The zebrafish ortholog for 21 of our elements were assayed, and 20 drive reproducible expression patterns in the developing zebrafish embryo, with 11 driving expression in forebrain (Figure 3.4D). Although some enhancers that function in neocortex development likely evolved from preexisting enhancers, we expect others to have arisen de novo. One hypothesized mechanism for the generation of new enhancers is through the cooption of mobile elements [45]. Bejerano and Lowe have previously described an example of a mobile element becoming a highly conserved developmental enhancer [146] and have shown

42

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

that thousands of mobile elements in the human genome are under strong purifying selection, clustering near developmental genes [147]. To determine if repeat elements may have been coopted as cerebral enhancers, we compared our set to annotated repeat families. 3,591 candidate enhancers overlapped an annotated repeat in the mouse genome or a region that maps to an annotated repeat in the human genome. The MER130 repeat family is not yet annotated in the mouse genome. However, 97 instances are annotated in the human genome, and 89 instances map to the mouse genome. We find 18 candidate cerebral wall enhancers that overlap a MER130 repeat, >70 fold more than would be expected by chance (Figure 3.5). This enrichment is the strongest of all pairings of developmental p300 ChIP-seq sets and repeat families. Other repeat families that are significantly enriched in our candidate cerebral wall enhancers include MER124 (13 fold enriched) and AmnSINE1 (7 fold enriched). These data suggest that the emergence of the mammalian neocortex involved the cooption of specific families of mobile elements as new enhancers. Further study of these such cooption events may shed light on how new gene regulatory networks evolve.

3.2

Discussion

In this chapter, I have described the first genome-wide set of p300 bound regions in the cerebral wall of the dorsal telencephalon at E14.5. This set of candidate cerebral wall enhancers provides a rich dataset for studying neocortex development and evolution. The set is associated with genes with diverse functions in neocortex development. Curiously, some genes carry tens of candidate cerebral wall enhancers in their regulatory domains, many more than would be expected by chance. It has been hypothesized that redundant enhancers exist in order to generate expression patterns that are robust to environmental variation [148, 149]. Furthermore, multiple redundant enhancers likely produce greater precision. Even under ideal conditions, stochasticity results in normal variance in the observed activity of a single enhancer. For a gene with several enhancers, the total variance in the summed activity of the enhancers is less than that of a single strong enhancer (law of large numbers). In the context of development, especially for a complex organ with many cell types, precise

3.2. DISCUSSION

43

80

fold enrichment of repeat & p300 peaks overlap compared to shuffles



70

MER130

60 50 40 30



E11.5 forebrain p300



E11.5 heart p300



E11.5 limb p300



E11.5 midbrain p300



E14.5 cerebral wall p300



20

● ●

10 0



−20

●●● ● ●● ● ●●●● ● ● ●● ●●● ● ● ● ●●● ●●●● ●● ●●● ● ● ●●●●● ● ● ● ●● ● ●● ● ●● ●●● ●●●● ● ●●●●● ● ● ● ●●● ● ●● ● ●● ●● ●●●●● ● ● ●● ●● ● ●● ●●● ●● ●●● ●●● ●●● ●● ● ●●●●●●●●●●●●●●●●●●●●● ● ●●●●

−10

0

10

●●

20

30

40

z-score of repeat & p300 peaks overlap compared to shuffles

Figure 3.5: Cooption of mobile elements as cerebral wall enhancers. Each p300 ChIP-seq set was compared to each repeat family. For each set, the expected number of overlaps was determined using 1,000 simulations where the p300 set was randomly distributed across the genome and overlaps were counted. Fold enrichment is observed/expected. Z-score is (observed-expected)/standard deviation.

44

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

expression patterns are essential. Using motif discovery, we are able to identify transcription factors that are likely to be prevalently bound to our candidate cerebral wall enhancers. We further show that individual motifs can identify subsets of candidate enhancers that show associations with genes involved in specific functions during neocortex development. This proof of principle analysis suggests a computational route for beginning to illicit gene regulatory networks. Notably, we identified enhancers that drive expression in the CP alone or the CP plus SVZ-IZ, but we did not observe any enhancers that drive expression exclusively in the SVZ-IZ. Using the recently published expression data [137], we can see that none of the target genes of our tested enhancers are expressed exclusively in the SVZIZ. In fact, SVZ-IZ specific expression is far less common than the other observed gene expression patterns. This observation makes intuitive sense, given that many of the cells in the SVZ-IZ are either migrating from the VZ or migrating to the CP [131, 132]. Thus, expression precisely in the SVZ-IZ is likely to be less common than expression crossing adjacent zones. Nonetheless, there is a small set of genes that do show SVZ-IZ specific expression. Further study of the cis-regulation of these genes may shed new light on how the SVZ-IZ is defined and how it evolved. By tracing the conservation of our elements across the vertebrate tree, we find that many cerebral wall enhancers have ancient origins. Such enhancers may have therefor neofunctionalized with the evolution of the neocortex. Moreover, we show a general trend that more deeply conserved enhancers are more likely to be pleiotropic. It has been shown that as new organs evolve, old genetic circuits may be reused [150]. As such, the master regulator genes of new organs (e.g. the neocortex) are often ancient and highly pleiotropic. The conventional model, however, suggests that pleiotropic genes are regulated by modular (single function) enhancers [151]. The observation that old enhancers are in fact pleiotropic themselves supports the idea that as a gene takes on a new function, some of its regulatory apparatus is likely to be utilized for that function as well. An alternative mechanism for generating new enhancers is through cooption of mobile elements. Indeed, it has long been hypothesized that mobile elements may offer

3.3. FUTURE DIRECTIONS

45

a means of rapidly spreading new cis-regulatory elements, allowing for the emergence of new regulatory circuitry [45]. Our data suggest that such cooption events may have played a major role in generating the cis-regulatory architecture of neocortex development. We expect future work will reveal why specific repeat families have been most likely to be coopted.

3.3

Future directions

My work has focused on studying properties of cis-regulatory element evolution. I have spent time highlighting how the neofunctionalization of old enhancers allows genes to gain new expression through existing cis-regulatory elements. However, no gene nor cis-regulatory element stands alone. When an enhancer neofunctionalizes, whole gene regulatory networks (or sub-networks) are likely to be coopted. Eventually, we must move from studying gene evolution or enhancer evolution to studying the evolution of entire gene regulatory networks. In Chapter 2, I used transcription factor binding site prediction to identify transcription factors that bind Bicores to regulate expression of the Bicore target genes. These predictions were strongly supported by experimental evidence. Binding site analysis of single cis-regulatory elements is common practice and often offers new insights into gene regulatory networks. However, analysis of a single enhancers only sheds light on a small piece of a larger gene-regulatory network. A genome-wide approach to studying gene regulatory networks has been made possible with ChIPseq. ChIPing on an important transcription factor in a given context can identify the target genes of that factor in that context. However, to elucidate whole networks, it would be necessary to ChIP on all of the active transcription factors in the given context. Even with decreasing sequencing costs, such comprehensive ChIP-seq is not feasible, due to the requirement of good antibodies for each transcription factor. One of the holy grails of computational genomics is to use genome-wide binding site prediction in order to make meaningful inferences about gene regulatory networks on a grander scale. In the past, the utility of genome-wide binding site prediction was limited by sparse transcription factor motif libraries and poorly performing prediction

46

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

methods. Both of these problems have been tackled, and great strides have been made [104]. Even so, binding site prediction alone cannot generate meaningful gene regulatory networks for at least two reasons. The first barrier is the ambiguity of transcription factor binding motifs. Very often, transcription factors from the same family use the same DNA-binding domain and thus have similar or even identical binding motifs. A highly conserved match to the HoxD13 motif could represent a HoxD13 binding site or one of several other Hox genes. Perhaps it is bound by many different Hox factors, depending on the context. Biological context is the second problem with binding site prediction. Genome-wide prediction lacks any context, and gene regulatory networks inherently depend on context. Both of these challenges can be partly overcome through the utilization of contextspecific experimental genomics data. As illustrated in Chapter 3 and elsewhere [41, 42, 43], p300 ChIP-seq allows for genome-wide identification of context-specific (e.g. E14.5 cerebral wall) enhancer sets. Other methods, such as ChIP-seq of histone modifications that mark active enhancers, DNASE-seq or ChIA-Pet that mark open chromatin, are adding to our ability to comprehensively define the set of active cisregulatory elements in a specific biological context. In conjunction with these data, we are accumulating context-specific expression data from RNA-seq experiments and massive in situ hybridization efforts. It is therefore quite feasible to identify both the active cis-regulatory elements and the expressed transcription factors in a given context. Together, these data may be integrated with transcription factor binding site prediction to predict gene regulatory networks. As an example, consider the context of mouse neocortex development at embryonic day 14.5. RNA-seq on this tissue indicates that the transcription factors Otx1 and Tbr1 are expressed in the developing neocortex at day 14.5. For both of these factors, we have well characterized binding motifs. P300 ChIP-seq on this tissue identifies active enhancers around each of these genes. If we observe a conserved motif for Otx1 in a p300 site upstream of Tbr1, this is evidence for Otx1 regulating Tbr1 expression. This thinking, of course, can be extended. By searching for conserved Otx1 binding sites across the whole set p300 sites, we approximate a Otx1 ChIP-seq

3.3. FUTURE DIRECTIONS

47

experiment. An actual ChIP-seq experiment will have better sensitivity since not all Otx1 binding sites will be conserved and since p300 ChIP-seq does not identify all active cis-regulatory elements (though other techniques like DNAse-seq may be more comprehensive). Otx1 ChIP-seq would also have better specificity, since even the best prediction methods have false positives [104]. Nonetheless, by integrating p300 ChIPseq, expression data, and binding site prediction, we can potentially mimic ChIP-seq. By applying this approach to each transcription factor that is expressed in the given context, it may be possible to predict a gene regulatory network. However, by using each binding site prediction as a regulatory link, the network would be noisy and would not distinguish regulatory relationships based on single binding events versus those based on multiple binding events. For example, observing a single instance of the Otx1 motif in a single p300 site near Tbr1 is different from observing five instances of the Otx1 motif across several p300 sites near Tbr1. This problem can be addressed by using an approach similar to what is used for measuring gene annotation enrichments for ChIP-seq data. The Gene Regulatory Enrichment Annotation Tool (GREAT) [135] defines a regulatory domain for each gene and uses a binomial statistic to test for non-random distributions of cis-regulatory elements. Similarly, we could use a binomial statistic to compare the observed and expected number of binding sites for a given factor in p300 sites associated with each possible target gene. I call this approach of combining ChIP-seq, expression, and prediction data “ChEAP-seq” (Figure 3.6). In summary, the ChEAP-seq concept takes advantage of a whole-genome, context specific, set of putative cis-regulatory elements (e.g. p300 ChIP-seq), context specific expression data, and transcription factor binding prediction. False positives are limited by only predicting binding sites in the putative enhancers and by only predicting binding sites for transcription factors that are expressed in the given context. Putative enhancers can be associated with target genes through a validated method [135], and a binomial statistic can be used to generate p-values for predicted regulatory relationships.

48

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

Otx1  predicted  binding  site   E14.5  cerebral  wall  p300  peaks  

A  

Tbr1  

B  

D  

C  

E  

Tbr1  regulatory  domain   Number  of  trials:      n  =  4  (total  predicted  Otx1  binding  sites  in  candidate  cerebral  wall  enhancers)   Number  of  successes:      k  =  3  (predicted  Otx1  binding  sites  within  the  Tbr1  regulatory  domain)   Probability  of  success:      p  =    

C  

Otx1  

+  D  

A  

+  

B  

+  

C  

Binomial  p-­‐value  =  0.05  

+  D   +  

E  

Tbr1  

Figure 3.6: ChEAP-seq binomial statistic. To test the hypothesis that Otx1 regulates Tbr1 in the context of neocortex development at E14.5, we consider the total number of matches to the Otx1 motif in predicted enhancers to be the number of trials. A success is a match to the motif to a predicted enhancer within the Tbr1 regulatory domain (as defined by GREAT). The probability of a random success is the fraction of all predicted enhancer bases that are in the Tbr1 regulatory domain. We can thus calculate a p-value that describes how likely it is to see some number of binding sites for a given factor in the regulatory domain of a given target gene.

3.4. METHODS

3.4 3.4.1

49

Methods p300 ChIP-seq

Embryos were harvested from timed pregnant embryonic day 14.5 (E14.5) Swiss Webster mice (Charles River). The dermis, skull mesenchyme, and bone primordia were removed and cortical caps were dissected with curved forceps and placed in PNGM (Lonza). The medial structures, cortical hem/hippocampus and choroid plexus were cut off in a secondary excision. Dissected tissue (0.15g) was snap frozen in liquid nitrogen. Tissue was fixed in 1% formaldehyde for 15 minutes. Chromatin was isolated, sheared and immunoprecipitation was performed using 30 micrograms of chromatin and 4 micrograms of anti-p300 antibody, C-20 (Santa Cruz SC-585; Genpathway). Chromatin from the same sample was processed for the input control. Library construction and sequencing was done using the Illumina GA II format (Illumina). This produced 17,460,074 uniquely mapped 36bp reads for the treatment and 15,669,334 uniquely mapped reads for the input control.

3.4.2

ChIP-seq peak calling

ChIP-seq reads were mapped to the mouse genome (mm9) using ELAND, retaining only reads that map uniquely with 2 or fewer mismatches. Peaks were called using MACS [152] with the p300 ChIP-seq reads as the treatment file, input DNA reads as the control file, and the parameters –nomodel, –shiftsize=100, -g mm. Peaks overlapped by an exon, within 2.5kb of a transcription start site, or covered over 50 percent by annotated repeats were removed. Exon and transcription start site annotation is from the UCSC knownGene track (build 5) [105]. Repeat annotation is from the RepeatMasker track. The median fold enrichment over input for our 6,629 peaks is 7.11 (and average 7.83).

3.4.3

Functional and expression enrichment analysis with GREAT

To evaluate functional and expression enrichments, we used GREAT v2.0.0 [135] using default parameters, with the exception that a lower region-based binomial fold

50

CHAPTER 3. DEVELOPMENTAL ENHANCERS OF THE NEOCORTEX

criterion (≥1.6) was used for the MGI Expression ontology. We evaluated specific enrichment in the ventricular zone, subventricular zone, and cortical plate using a custom ontology based on a recent RNA-seq dataset [137]. We consider a gene to be specific to a layer if it has a layer RPKM (reads per kilobase of model) >64 and > 2×(RPKM of the adjacent layer). For the SVZ-IZ, we used 2×(average RPKM of CP and VZ).

3.4.4

Motif discovery and enrichment analysis

We compiled a library of 1,339 motifs (position weight matrices) that model the binding preferences for 646 transcription factors. We predicted binding sites using the motifs in the 6,629 candidate cerebral wall enhancers and in randomly selected [GC]-matched regions from the mouse genome at a motif match threshold of 0.9 [153]. Motif fold enrichment is the number of candidate enhancers with a match to the motif divided by the number of random regions with a motif match. To predict biological functions for the subset of enhancers with a match for each enriched motif, we performed a GREAT foreground/background test [135] with default association rules. Significance criteria are: fold >1.5, genes hit >10, FDR 10 cerebral wall enhancers, implying a great deal of cis-regulatory redundancy. I show that some enhancers that play a role in the development of the neocortex are conserved in species that lack a neocortex, and such elements existed prior to the emergence of the neocortex in the mammalian ancestor, implying neofunctionalization. Further, I show evidence that many developmental enhancers are pleiotropic, and evolutionarily older enhancers are more likely to be pleiotropic. In this final chapter, I will discus how enhancer redundancy, neofunctionalization, and pleiotropy fit into our current understanding of enhancer evolution and add to the evo-devo framework. 53

54

CHAPTER 4. CONCLUSION

4.1

Enhancer evolution

As I showed in Chapter 2, cis-regulatory elements are much less likely to be deeply conserved than coding genes (Figure 2.1). This and other observations suggest that enhancers evolve more quickly and turn over more rapidly than the genes they target (at least on average). While a given gene may be conserved across species and the expression pattern of that gene also conserved, it is not necessarily the case that the cis-regulatory elements are also conserved [157]. For example, the receptor tyrosine kinase RET is expressed in specific structures during development, including the neural crest, adrenal medulla, and thyroid. This expression pattern is highly conserved across vertebrates. However, the RET locus contains no cis-regulatory sequence homologies between mammals and teleost fish. In a zebrafish developmental enhancer assay, cis sequences from the zebrafish RET locus drive expression patterns matching RET expression. Surprisingly, cis-regulatory elements from the human RET locus, when tested in zebrafish, also drive expression patterns that match zebrafish RET expression. Human and zebrafish cis-regulatory elements of RET, though functionally homologous, show no sequence homology [158]. ChIP-seq experiments have provided further evidence that conserved transcription factors may regulate conserved target genes through non-conserved binding events. ChIP-seq of four highly conserved liver-specific transcription factors, using human and mouse hepatocytes, showed that most binding events (up to ∼90%) are species specific. Even for the subset of cases when the same factor is binding upstream of the same target gene, two thirds of binding events do not align [159]. Thus, one thought is that cis-regulatory elements are rapidly gained and lost over evolutionary timescales, even while gene regulatory relationships may be preserved. How are cis-regulatory elements are rapidly gained and lost? New genes arise through duplication events [160]. It is likely that many enhancers have arisen through duplication, though demonstrating paralogy of non-coding sequence has been a challenge [161]. As was seen in Chapter 3, new enhancers may also arise from the cooption of transposable elements (Figure 3.5) [45, 146, 162]. Lastly, enhancers may evolve de novo. While it is exceedingly unlikely for a gene to arise by chance through random

4.1. ENHANCER EVOLUTION

55

mutations, the ubiquity of transcription factor binding sites across the genome [14] suggests that new cis-regulatory elements may arise randomly. It is possible that the wide-spread binding of transcription factors to the genome not only serves to allow new enhancers to arise de novo, this mechanism may serve as a driving force for generating the high redundancy of cis-regulatory elements that was described in Chapter 3 (Figure 3.2B). The mechanisms of enhancer loss are straight forward. An enhancer may be deleted, translocated away from its target gene, or deteriorate through mutation. High levels or enhancer redundancy likely relieves selective constraint, allowing for frequent enhancer loss. Thus, I would propose a model where enhancers are rapidly gained and lost over evolutionary time scales. The rapid gain of enhancers allows for the accumulation of redundancy, which (as described in Chapter 3) may be beneficial for achieving more robust and precise expression patterns. Further, redundancy allows for frequent enhancer loss by relieving selective constraint. The dilemma of this model, however, is that it does not explain the striking prevalence of highly conserved cis-regulatory elements in the genome. If complex patterns of gene expression and gene regulatory networks can be maintained without conserving cis-regulatory sequence, why do thousands of putative cis-regulatory elements show clear conservation across vertebrates (Chapter 2), [25], an evolutionary timescale sufficiently long to allow each base of each element to mutate two times [27]? Why might two elements be conserved across the enormous distance spanned by the bilaterian tree, present in species as diverged as humans, acorn worms, sea urchins, and mollusks? The answer to this question should not be surprising. Deep homology of developmental genes has been explained by pleiotropy [16], and I believe pleiotropy is the most likely explanation for deeply conserved enhancers. In Chapter 3, I provide some preliminary evidence that supports this idea. It also makes intuitive sense that those enhancers that operate in more than one context will evolve under greater selective constraint. Indeed, I would speculate that most deeply conserved enhancers are pleiotropic. Of 605 known enhancers that drive expression in embryonic day 11.5

56

CHAPTER 4. CONCLUSION

mice, 275 drive reproducible expression in more than one tissue [34]. 50% of the candidate cerebral wall enhancers that are conserved to fish show evidence of pleiotropy (Figure 3.4B). This pleiotropy analysis relies on comparisons to only four contexts of development at a single time point. If it were possible to assess for pleiotropy over all spatiotemporal contexts, I suspect we would find that highly conserved developmental enhancers (like developmental genes) rarely serve a single function. Figure 3.4B also offers the observation that evolutionarily young enhancers tend not to be pleiotropic. I suspect that when a new enhancer arises and fixes in a population, it is most likely to have a single function. Although it is possible for a new enhancer to have low specificity and thus function in multiple contexts, it is less likely that such an element would be selectively beneficial, and therefor such elements are less likely to fix in the population. Due to cis-regulatory redundancy and high turnover, single function enhancers are ultimately doomed to fade out of the genome, just as they faded in. However, those single function enhancers that gain new functions over time may escape this fate.

4.2

Adding to the evo-devo framework

At the crux of the evo-devo paradigm is the supposition that ancient developmental toolkit genes are pleiotropic but cis-regulatory elements are modular. Developmental pleiotropy generates strong selective constraints, preventing functional mutations in the coding sequence of toolkit genes. The primary mode of morphological evolution, therefor, is through changes in expression patterns of developmental genes. Such expression changes may occur through the gain, loss, or modification of cis-regulatory elements. In this simple but poignant model, a developmental gene that functions in the heart and limb will have an enhancer to drive heart expression and an enhancer to drive limb expression. A new limb morphology may arise if the limb enhancer is mutated to expand the limb expression domain of that gene (e.g. as with point mutations in the Shh enhancers [21]) or if the enhancer is deleted (e.g. as with deletion of the Pitx1 enhancer [59]). Further, such a gene may gain expression in a new tissue

4.2. ADDING TO THE EVO-DEVO FRAMEWORK

57

all together if it acquires a new enhancer through any of the mechanisms described in the previous section. The work presented in Chapter 2 and Chapter 3 does not contradict the evo-devo model. Rather, it suggests that the evo-devo model is a simplification of a more complex reality. A developmental gene that functions in heart and limb patterning does not simply have a heart and limb enhancer but likely has several of each. Further, many of the most conserved regulators of that gene are likely themselves to be pleiotropic. The gene may gain expression in new tissues not only through the acquisition of new enhancers but also through the neofunctionalization of existing enhancers. Such neofunctionalization serves to allow some enhancers to continue to be conserved over great timescales and is responsible for the increased likelihood of pleiotropy with older evolutionary ages. I would argue that the neofunctionalization of enhancers is an inherent necessity to morphological evolution. This necessity can be best understood when we shift our focus from genes and enhancers to whole gene regulatory networks (GRNs). GRNs are the functional units that guide morphogenesis. When a new morphology evolves, it is not simply a matter of a new enhancer driving expression of a toolkit gene in a new tissue. Rather, sub-circuits of a complex GRN are coopted into a new context. Thus, as a new structure emerges (e.g. the vertebrate central nervous system) existing GNRs are utilized. When part or all of an existing GRN is utilized in a new context, the existing cis-regulatory apparatus that ties the GNR together become neofunctionalized. It is no coincidence that the most highly conserved non-coding elements in the genome appear near the most highly pleotropic developmental genes (Supplementary Table A.1). I believe these elements play a role in GRNs that have been repeatedly coopted. Thus, such highly conserved elements have experienced repeated neofunctionalization and likely to be highly pleiotropic. Ultimately, our focus will shift from toolkit genes to toolkit GRNs, and those toolkit GRNs will consist of both deeply conserved pleiotropic genes and deeply conserved pleiotropic enhancers.

Appendix A Supplementary material for Chapter 2 A.1 A.1.1

Supplemental Discussion Past studies

There have been several previous studies that have searched for cis-regulatory elements conserved between deuterostome and protostome species [163, 164, 165, 32, 63, 25, 64, 65]. To my knowledge, the earliest such study describes the HB1 element in the intron of Hoxa7 [163]. HB1 is a short 36 base pair sequence consisting of three homeodomain binding sites. The sequence is conserved among mammals but not well conserved in other vertebrates. Haerry and Ghering were able to align HB1 to Drosophila melanogaster, Drosophila funebris, and Drosophila virilis sequence [163, 164]. However, only 18-20 of the 36 bases (∼55%) match between mouse and fly. The longest contiguous match is 5 bases. While HB1 is found in the intron of Hoxa7, it is absent from the intron of Ubx, the drosophila ortholog to Hoxa7. Instead, it is found in the 5 UTR of Ubx [163]. Moreover, several other matches to HB1 were found in drosophila, including matches around non-hox genes [164]. Mammalian and drosophila HB1 elements are unlikely to be truly orthologous sequences. Rather, HB1 elements represent a common mechanism for the regulation of genes by 58

A.1. SUPPLEMENTAL DISCUSSION

59

homeobox transcription factors. Similarly, Kuntz et al. has found short (Bicore2_human CTCCCACCCTGTTTTCCTCCCTCCCCCCCTTCTTTGGGCATCTCCACCCC TCCATCAATTGTCAATGTTCCTCGACCGCAATCAATCAGTTATTTGTCAG CTCTTGTCAATCCTCCCGTGATTTATGTCAGCTTTTGTTGCTGATTACAA GGCGGGTGCGACTTGAAGGGAAAAAGAGAGAGGGAGAGAGAGACGGAGAG GAAGGAGGAGATTGAGAGGGAACTGGAGGAGGGGAAAAGAGGAGCGGCCT

71

72

APPENDIX A. SUPPLEMENTARY MATERIAL FOR CHAPTER 2

>Bicore2_zebrafish AAGTGACTCTTCTTCTGGATGCTTCTTTTGCCCTTTTCTCTGTCACTCGC GGCACATAATCAGCTTCAAAAGCTGACGTGAAGTAAGAAAAGGATTGACA GCGACTGACAAATAACTGATTGATTGCGGTCGAGCAGAATTGACAGTTGA CGGAGGGGTGGGGATAGAAATAGAAGGGCGGTGTATATTATATTGAAAAC ATGTGACATTCACATCCCTCTATCACTGCATTGGTCTGCTCTTGT >Bicore2_urchin CGATTTATCTCGGGGAATCATACCCCCTTTCAAAAGCCGTTTTTCAAATT ACTCGAGCCATGTTATCAACAACAAAACGCTGACATAAATCAGACGGAAT TGACAAGGCTGACACCGCATTTGATTGATCGCAGAACAGCGAGCGGCACT GGACGCCTGCCAACGGGAATTGCTAATCGCGAATTTGATTGGCTGGCGCG GAGCATGTGACAGGATCACATATCAATCATCCCCG >Bicore2_tick CCGACGGCCGTCCACCCAAGCGGCCCAAAGGAAGACCGCCCCGGCTCTGA CAACAAAAGCTGACATAAATCATGCCTGATTGGCAGCTCTGACAACGGGC TGATGGATAGCGTTGTAAAAGTTGACAGCTGGCCCCACGCCAGATCATTA GCTCGGCCGGGCTTCCATTGGCCGCCGCGACCGCGGAGCGCTTCGGGGGG

Sequence tested in sea urchin transgenic enhancer assay: >Bicore1_urchin_long CGTGGAGGCGTCAAATAGTCTCTGCAATCTACAAGCGCAGACCTAACACA AAAATCCGATATAAAAGACGATTGCCATCTTATGGTCTTCATTTGTATTT TGACGGATTTTTTAGTAGAATATGAGCAGATAGTACTGATATGGTTTTGT AGACAAATAGGATATCATAATCCTGAAAATCTGTATTTTAAACTTCCTCA AAAGTTTTGGACTTTCTTAAAGGTCTGTAAAGTAGACTTGGCAATGTCTC AGGCGGGAGTGCAGGTGTGCAAGCGACGGGTGTGAATAGTGGGCGCCCCC AAAGACCCGACCACCAGTTTAGCAAAGTGACGATCCCAAAAGCCATTTAT

A.4. SUPPLEMENTARY DATA

CGCTGCTCTCGATAGTTTGTTAGTAATTGATTCAGGCTGCGGGACGCCTT TGATGTTGGTGATTGGCGTGTGCGTTACTATGGCGACCGCCAGACCGGCG CCCGTCTGCTATCGGATTGTGGGTGGATGAATGGGTGACGTCACAGATCG TGGCGCCGGTCTGTCGCCGTGAGCTCTGGTAGTATTGGAGGGGGGCTTAG GAGGGCCAATTAATCACCTGTCACTGGGAACCAGCGGAGTTCCACCACAC CCCTAGCCTCAAAATACGAACCCCTGTACAACAGGTGGATGCCACAGACA ACTAACATTATCCATCCGAACATTCTGATACTCCAGATGATTAATGCAAC CATTCCAGAAATGAAGTCGCTCTATTTTAGGGTACTTTAAAAAGTTACGA AACAATAATAATGCAATAAGAATACCGAATGTATGGATACATTTTCGGTT TGAGACTACGAAGTCTCGGGGGAGGTGTATGAGCTTATGAGGGGTATGTT TACAGGTTTACCTATGGTAAAGAATAGGGTGAGTGTTTGGATTACCATCA TGTTCGTCTGCACAGCTGTGAACTGGTTCATGTGACCGCCATATACCAAG G

Sequence tested in mouse transgenic enhancer assay: >Bicore2_human CCAGAAGGAGCACCGGTGGGAGTGTGGACACCGGCCGGACTGTCAACTCC AGGGGCGAAGGGAACCTGCACACCCAGTGTTTTTTCCTTCGCAGTAATCG AGATCCCGCGCGGCGCAGCGCAGCCACCAGGGTAAGAAGGCAAGGTGGGG AGCCGGAGCTGGAAGAAGCCCGCCCGCCCGCTCTAATTTCCTCAGATTCC GCGGCGGAGAAACCAGAAGCTAGATGGGCAGTCGCAGCGGCGGCGGCTCA ACACCGCGAGGAGCGCTGGGCTCTCCGCCCTTCCCGGCCACGTGACGCCC GGGGACGCGTAGATTGGGGCAGCAGCGGGGGTCACATGTTTCCTCTGTTT CACCCTCAGTCTGTCCCCCAACCCCCCATTCTTACTCTCCCACCCTGTTT TCCTCCCTCCCCCCCTTCTTTGGGCATCTCCACCCCTCCATCAATTGTCA ATGTTCCTCGACCGCAATCAATCAGTTATTTGTCAGCTCTTGTCAATCCT CCCGTGATTTATGTCAGCTTTTGTTGCTGATTACAAGGCGGGTGCGACTT GAAGGGAAAAAGAGAGAGGGAGAGAGAGACGGAGAGGAAGGAGGAGATTG AGAGGGAACTGGAGGAGGGGAAAAGAGGAGCGGCCTCCTGGGATGGGGGT

73

74

APPENDIX A. SUPPLEMENTARY MATERIAL FOR CHAPTER 2

GGGGTGGGGGCTCTAAGAAAAAGAATGAAAGAGGCGCACGGTGTCAGGAA AATGAATAGCGAGAGTAAAGTGCGCAGGTGCGCCCAGGGCGCCGAGAGGG GCGCGCAGGCCTGGAGTGTGCGCCTGCCCTCTCGGTGTCGGAGAGACGCC CTTCCACCTCTGGGAGCCTCGGTCTGTTGGGGTCGCGGAGTTCGGGCGCG GCTCCGGGTACCCGAGACCAGCGGCGGCAACTTCTAACACGGGAGATTTC CCGCCACCCCACCCCGCCGCCGCGAGTCCTCGCGGGGCGTGTTGCGTGCG GAGGTCAGGCTGCCACCCTCTGTAGTTCCCTAACCCCAAACTCGGAGACT

Appendix B Supplementary material for Chapter 3 B.1

Supplementary Figures

75

76

APPENDIX B. SUPPLEMENTARY MATERIAL FOR CHAPTER 3

25% ● ●

fraction pleiotropic

20% 15%

● ●



10% ●

5%



a at

st om

da at ho

te

tra

po

a ot ni

am

m

al

ia

ia am m

eu

th

es lir

to g

er

gn

eu

ar

ch

on

m

ou

se

0%

clade

Figure B.1: Figure 3.4B excluding overlaps with E11.5 forebrain elements in measure of pleiotropy. Here, pleiotropy is quantified by overlap with p300 ChIP-seq peaks from E11.5 midbrain, limb, and heart.

B.2. SUPPLEMENTARY TABLES

B.2

77

Supplementary Tables

Table B.1: Primers used to amplify candidate cerebral wall enhancers for mouse transgenics experiments. 5-CACC was added to each left primer

Element

Left primer

Right primer

eltA eltB eltC eltD eltE eltF eltG eltH

AACCCCGCTGTCTTCTAGGT GCGTTCACTTTCCCAGCTAC TGGTTGGAAGTTAGCTCTGATG ATTCCGGAGCCCAGAAGTAT GTCCACGGAGACCAGAGGTA GCATGCTTGTGCTATCCATT TGGACTTCGAATAATGAGGACA TGTCAAAGGCAATTAATGAGAAAA

TTTTTAACAATGTTCACACACTTCA CCTGATAAATGCTGAGGCTAGA AAGAGACTGGGGTTGGAGGT CATGAAACCCTTCAAAATTGC CATGCACATGCACAAGAACA CCCATCATGCATTCTGCTAA TGTTTTCTCTGGCTGTCTCAAA GAGATGGTTGCTTCATTGCAT

Bibliography [1] S. L. Clarke, J. E. VanderMeer, A. M. Wenger, B. T. Schaar, N. Ahituv, and G. Bejerano. Human developmental enhancers conserved between deuterostomes and protostomes. PLoS Genet., 8(8):e1002852, 2012. [2] Gregor Mendel. Versuche u ¨ber plflanzenhybriden. Verhandlungen des naturforschenden Vereines in Br¨ unn, Bd. IV f¨ ur das Jahr:3–47, 1866. [3] Wilhelm Johannsen. Elemente der exakten Erblichkeitslehr. Verlag von Gustav Fischer In Jena, 1909. [4] Ignacio Tinoco Jr., George Cahill, Charles Cantor, Thomas Caskey, Renato Dulbecco, Dean L. Engelhardt, Leroy Hood, Leonard S. Lerman, Mortimer L. Mendelsohn, Robert L. Sinsheimer, Temple Smith, Dieter Sll, Gary Stormo, and Raymond L. White. Report on the human genome initiative. Office of Health and Environmental Research, 1987. [5] U.S. Congress and Office of Technology Assessment. Mapping our genes-the genmne projects.how big, how fast? Washington, DC: U.S. Government Printing Office, OTA-BA-373, 1988. [6] E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, N. Stange-Thomann, 78

BIBLIOGRAPHY

79

N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W. Hillier, J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner, A. Delehaunty, J. B. Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R. A. Gibbs, D. M. Muzny, S. E. Scherer, J. B. Bouck, E. J. Sodergren, K. C. Worley, C. M. Rives, J. H. Gorrell, M. L. Metzker, S. L. Naylor, R. S. Kucherlapati, D. L. Nelson, G. M. Weinstock, Y. Sakaki, A. Fujiyama, M. Hattori, T. Yada, A. Toyoda, T. Itoh, C. Kawagoe, H. Watanabe, Y. Totoki, T. Taylor, J. Weissenbach, R. Heilig, W. Saurin, F. Artiguenave, P. Brottier, T. Bruls, E. Pelletier, C. Robert, P. Wincker, D. R. Smith, L. Doucette-Stamm, M. Rubenfield, K. Weinstock, H. M. Lee, J. Dubois, A. Rosenthal, M. Platzer, G. Nyakatura, S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang, G. Huang, J. Gu, L. Hood, L. Rowen, A. Madan, S. Qin, R. W. Davis, N. A. Federspiel, A. P. Abola, M. J. Proctor, R. M. Myers, J. Schmutz, M. Dickson, J. Grimwood, D. R. Cox, M. V. Olson, R. Kaul, C. Raymond, N. Shimizu, K. Kawasaki, S. Minoshima, G. A. Evans, M. Athanasiou, R. Schultz, B. A. Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Reinhardt, W. R. McCombie, M. de la Bastide, N. Dedhia, H. Blocker, K. Hornischer, G. Nordsiek, R. Agarwala, L. Aravind, J. A. Bailey, A. Bateman, S. Batzoglou, E. Birney, P. Bork, D. G. Brown, C. B. Burge, L. Cerutti, H. C. Chen, D. Church, M. Clamp, R. R. Copley, T. Doerks, S. R. Eddy, E. E. Eichler, T. S. Furey, J. Galagan, J. G. Gilbert, C. Harmon, Y. Hayashizaki, D. Haussler, H. Hermjakob, K. Hokamp, W. Jang, L. S. Johnson, T. A. Jones, S. Kasif, A. Kaspryzk, S. Kennedy, W. J. Kent, P. Kitts, E. V. Koonin, I. Korf, D. Kulp, D. Lancet,

80

BIBLIOGRAPHY

T. M. Lowe, A. McLysaght, T. Mikkelsen, J. V. Moran, N. Mulder, V. J. Pollara, C. P. Ponting, G. Schuler, J. Schultz, G. Slater, A. F. Smit, E. Stupka, J. Szustakowski, D. Thierry-Mieg, J. Thierry-Mieg, L. Wagner, J. Wallis, R. Wheeler, A. Williams, Y. I. Wolf, K. H. Wolfe, S. P. Yang, R. F. Yeh, F. Collins, M. S. Guyer, J. Peterson, A. Felsenfeld, K. A. Wetterstrand, A. Patrinos, M. J. Morgan, P. de Jong, J. J. Catanese, K. Osoegawa, H. Shizuya, S. Choi, Y. J. Chen, and J. Szustakowki. Initial sequencing and analysis of the human genome. Nature, 409(6822):860–921, February 2001.

[7] J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes,

BIBLIOGRAPHY

81

C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J. Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh, and X. Zhu. The sequence of the human genome. Science, 291(5507):1304–1351, February 2001. [8] M. Clamp, B. Fry, M. Kamal, X. Xie, J. Cuff, M. F. Lin, M. Kellis, K. LindbladToh, and E. S. Lander. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. U.S.A., 104(49):19428–19433, December 2007. [9] C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science, 282(5396):2012–2018, December 1998. [10] J. Yu, S. Hu, J. Wang, G. K. Wong, S. Li, B. Liu, Y. Deng, L. Dai, Y. Zhou, X. Zhang, M. Cao, J. Liu, J. Sun, J. Tang, Y. Chen, X. Huang, W. Lin, C. Ye,

82

BIBLIOGRAPHY

W. Tong, L. Cong, J. Geng, Y. Han, L. Li, W. Li, G. Hu, X. Huang, W. Li, J. Li, Z. Liu, L. Li, J. Liu, Q. Qi, J. Liu, L. Li, T. Li, X. Wang, H. Lu, T. Wu, M. Zhu, P. Ni, H. Han, W. Dong, X. Ren, X. Feng, P. Cui, X. Li, H. Wang, X. Xu, W. Zhai, Z. Xu, J. Zhang, S. He, J. Zhang, J. Xu, K. Zhang, X. Zheng, J. Dong, W. Zeng, L. Tao, J. Ye, J. Tan, X. Ren, X. Chen, J. He, D. Liu, W. Tian, C. Tian, H. Xia, Q. Bao, G. Li, H. Gao, T. Cao, J. Wang, W. Zhao, P. Li, W. Chen, X. Wang, Y. Zhang, J. Hu, J. Wang, S. Liu, J. Yang, G. Zhang, Y. Xiong, Z. Li, L. Mao, C. Zhou, Z. Zhu, R. Chen, B. Hao, W. Zheng, S. Chen, W. Guo, G. Li, S. Liu, M. Tao, J. Wang, L. Zhu, L. Yuan, and H. Yang. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 296(5565):79– 92, April 2002. [11] S. A. Goff, D. Ricke, T. H. Lan, G. Presting, R. Wang, M. Dunn, J. Glazebrook, A. Sessions, P. Oeller, H. Varma, D. Hadley, D. Hutchison, C. Martin, F. Katagiri, B. M. Lange, T. Moughamer, Y. Xia, P. Budworth, J. Zhong, T. Miguel, U. Paszkowski, S. Zhang, M. Colbert, W. L. Sun, L. Chen, B. Cooper, S. Park, T. C. Wood, L. Mao, P. Quail, R. Wing, R. Dean, Y. Yu, A. Zharkikh, R. Shen, S. Sahasrabudhe, A. Thomas, R. Cannings, A. Gutin, D. Pruss, J. Reid, S. Tavtigian, J. Mitchell, G. Eldredge, T. Scholl, R. M. Miller, S. Bhatnagar, N. Adey, T. Rubano, N. Tusneem, R. Robinson, J. Feldhaus, T. Macalma, A. Oliphant, and S. Briggs. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 296(5565):92–100, April 2002. [12] Nicholas Wade. Reading the book of life; genome’s riddle: Few genes, much complexity. The New York Times, February 13, 2001. [13] R. H. Waterston, K. Lindblad-Toh, E. Birney, J. Rogers, J. F. Abril, P. Agarwal, R. Agarwala, R. Ainscough, M. Alexandersson, P. An, S. E. Antonarakis, J. Attwood, R. Baertsch, J. Bailey, K. Barlow, S. Beck, E. Berry, B. Birren, T. Bloom, P. Bork, M. Botcherby, N. Bray, M. R. Brent, D. G. Brown, S. D. Brown, C. Bult, J. Burton, J. Butler, R. D. Campbell, P. Carninci, S. Cawley, F. Chiaromonte, A. T. Chinwalla, D. M. Church, M. Clamp, C. Clee, F. S. Collins, L. L. Cook, R. R. Copley, A. Coulson, O. Couronne, J. Cuff, V. Curwen,

BIBLIOGRAPHY

83

T. Cutts, M. Daly, R. David, J. Davies, K. D. Delehaunty, J. Deri, E. T. Dermitzakis, C. Dewey, N. J. Dickens, M. Diekhans, S. Dodge, I. Dubchak, D. M. Dunn, S. R. Eddy, L. Elnitski, R. D. Emes, P. Eswara, E. Eyras, A. Felsenfeld, G. A. Fewell, P. Flicek, K. Foley, W. N. Frankel, L. A. Fulton, R. S. Fulton, T. S. Furey, D. Gage, R. A. Gibbs, G. Glusman, S. Gnerre, N. Goldman, L. Goodstadt, D. Grafham, T. A. Graves, E. D. Green, S. Gregory, R. Guigo, M. Guyer, R. C. Hardison, D. Haussler, Y. Hayashizaki, L. W. Hillier, A. Hinrichs, W. Hlavina, T. Holzer, F. Hsu, A. Hua, T. Hubbard, A. Hunt, I. Jackson, D. B. Jaffe, L. S. Johnson, M. Jones, T. A. Jones, A. Joy, M. Kamal, E. K. Karlsson, D. Karolchik, A. Kasprzyk, J. Kawai, E. Keibler, C. Kells, W. J. Kent, A. Kirby, D. L. Kolbe, I. Korf, R. S. Kucherlapati, E. J. Kulbokas, D. Kulp, T. Landers, J. P. Leger, S. Leonard, I. Letunic, R. Levine, J. Li, M. Li, C. Lloyd, S. Lucas, B. Ma, D. R. Maglott, E. R. Mardis, L. Matthews, E. Mauceli, J. H. Mayer, M. McCarthy, W. R. McCombie, S. McLaren, K. McLay, J. D. McPherson, J. Meldrim, B. Meredith, J. P. Mesirov, W. Miller, T. L. Miner, E. Mongin, K. T. Montgomery, M. Morgan, R. Mott, J. C. Mullikin, D. M. Muzny, W. E. Nash, J. O. Nelson, M. N. Nhan, R. Nicol, Z. Ning, C. Nusbaum, M. J. O’Connor, Y. Okazaki, K. Oliver, E. Overton-Larty, L. Pachter, G. Parra, K. H. Pepin, J. Peterson, P. Pevzner, R. Plumb, C. S. Pohl, A. Poliakov, T. C. Ponce, C. P. Ponting, S. Potter, M. Quail, A. Reymond, B. A. Roe, K. M. Roskin, E. M. Rubin, A. G. Rust, R. Santos, V. Sapojnikov, B. Schultz, J. Schultz, M. S. Schwartz, S. Schwartz, C. Scott, S. Seaman, S. Searle, T. Sharpe, A. Sheridan, R. Shownkeen, S. Sims, J. B. Singer, G. Slater, A. Smit, D. R. Smith, B. Spencer, A. Stabenau, N. Stange-Thomann, C. Sugnet, M. Suyama, G. Tesler, J. Thompson, D. Torrents, E. Trevaskis, J. Tromp, C. Ucla, A. UretaVidal, J. P. Vinson, A. C. Von Niederhausern, C. M. Wade, M. Wall, R. J. Weber, R. B. Weiss, M. C. Wendl, A. P. West, K. Wetterstrand, R. Wheeler, S. Whelan, J. Wierzbowski, D. Willey, S. Williams, R. K. Wilson, E. Winter, K. C. Worley, D. Wyman, S. Yang, S. P. Yang, E. M. Zdobnov, M. C. Zody, and E. S. Lander. Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915):520–562, December 2002.

84

BIBLIOGRAPHY

[14] I. Dunham, A. Kundaje, S. F. Aldred, P. J. Collins, C. A. Davis, F. Doyle, C. B. Epstein, S. Frietze, J. Harrow, R. Kaul, J. Khatun, B. R. Lajoie, S. G. Landt, B. K. Lee, F. Pauli, K. R. Rosenbloom, P. Sabo, A. Safi, A. Sanyal, N. Shoresh, J. M. Simon, L. Song, N. D. Trinklein, R. C. Altshuler, E. Birney, J. B. Brown, C. Cheng, S. Djebali, X. Dong, I. Dunham, J. Ernst, T. S. Furey, M. Gerstein, B. Giardine, M. Greven, R. C. Hardison, R. S. Harris, J. Herrero, M. M. Hoffman, S. Iyer, M. Kelllis, J. Khatun, P. Kheradpour, A. Kundaje, T. Lassman, Q. Li, X. Lin, G. K. Marinov, A. Merkel, A. Mortazavi, S. C. Parker, T. E. Reddy, J. Rozowsky, F. Schlesinger, R. E. Thurman, J. Wang, L. D. Ward, T. W. Whitfield, S. P. Wilder, W. Wu, H. S. Xi, K. Y. Yip, J. Zhuang, B. E. Bernstein, E. Birney, I. Dunham, E. D. Green, C. Gunter, M. Snyder, M. J. Pazin, R. F. Lowdon, L. A. Dillon, L. B. Adams, C. J. Kelly, J. Zhang, J. R. Wexler, E. D. Green, P. J. Good, E. A. Feingold, B. E. Bernstein, E. Birney, G. E. Crawford, J. Dekker, L. Elinitski, P. J. Farnham, M. Gerstein, M. C. Giddings, T. R. Gingeras, E. D. Green, R. Guigo, R. C. Hardison, T. J. Hubbard, M. Kellis, W. J. Kent, J. D. Lieb, E. H. Margulies, R. M. Myers, M. Snyder, J. A. Starnatoyannopoulos, S. A. Tennebaum, Z. Weng, K. P. White, B. Wold, J. Khatun, Y. Yu, J. Wrobel, B. A. Risk, H. P. Gunawardena, H. C. Kuiper, C. W. Maier, L. Xie, X. Chen, M. C. Giddings, B. E. Bernstein, C. B. Epstein, N. Shoresh, J. Ernst, P. Kheradpour, T. S. Mikkelsen, S. Gillespie, A. Goren, O. Ram, X. Zhang, L. Wang, R. Issner, M. J. Coyne, T. Durham, M. Ku, T. Truong, L. D. Ward, R. C. Altshuler, M. L. Eaton, M. Kellis, S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger, C. Xue, G. K. Marinov, J. Khatun, B. A. Williams, C. Zaleski, J. Rozowsky, M. Roder, F. Kokocinski, R. F. Abdelhamid, T. Alioto, I. Antoshechkin, M. T. Baer, P. Batut, I. Bell, K. Bell, S. Chakrabortty, X. Chen, J. Chrast, J. Curado, T. Derrien, J. Drenkow, E. Dumais, J. Dumais, R. Duttagupta, M. Fastuca, K. Fejes-Toth, P. Ferreira, S. Foissac, M. J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H. P. Gunawardena, C. Howald, S. Jha, R. Johnson, P. Kapranov, B. King, C. Kingswood, G. Li, O. J. Luo, E. Park, J. B. Preall, K. Presaud, P. Ribeca, B. A. Risk,

BIBLIOGRAPHY

85

D. Robyr, X. Ruan, M. Sammeth, K. S. Sandu, L. Schaeffer, L. H. See, A. Shahab, J. Skancke, A. M. Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, J. Wrobel, Y. Yu, Y. Hayashizaki, J. Harrow, M. Gerstein, T. J. Hubbard, A. Reymond, S. E. Antonarakis, G. J. Hannon, M. C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guigo, T. R. Gingeras, K. R. Rosenbloom, C. A. Sloan, K. Learned, V. S. Malladi, M. C. Wong, G. P. Barber, M. S. Cline, T. R. Dreszer, S. G. Heitner, D. Karolchik, W. J. Kent, V. M. Kirkup, L. R. Meyer, J. C. Long, M. Maddren, B. J. Raney, T. S. Furey, L. Song, L. L. Grasfeder, P. G. Giresi, B. K. Lee, A. Battenhouse, N. C. Sheffield, J. M. Simon, K. A. Showers, A. Safi, D. London, A. A. Bhinge, C. Shestak, M. R. Schaner, S. K. Kim, Z. Z. Zhang, P. A. Mieczkowski, J. O. Mieczkowska, Z. Liu, R. M. McDaniell, Y. Ni, N. U. Rashid, M. J. Kim, S. Adar, Z. Zhang, T. Wang, D. Winter, D. Keefe, E. Birney, V. R. Iyer, J. D. Lieb, G. E. Crawford, G. Li, K. S. Sandhu, M. Zheng, P. Wang, O. J. Luo, A. Shahab, M. J. Fullwood, X. Ruan, Y. Ruan, R. M. Myers, F. Pauli, B. A. Williams, J. Gertz, G. K. Marinov, T. E. Reddy, J. Vielmetter, E. C. Partridge, D. Trout, K. E. Varley, C. Gasper, A. Bansal, S. Pepke, P. Jain, H. Amrhein, K. M. Bowling, M. Anaya, M. K. Cross, B. King, M. A. Muratet, I. Antoshechkin, K. M. Newberry, K. McCue, A. S. Nesmith, K. I. Fisher-Aylor, B. Pusey, G. DeSalvo, S. L. Parker, S. Balasubramanian, N. S. Davis, S. K. Meadows, T. Eggleston, C. Gunter, J. S. Newberry, S. E. Levy, D. M. Absher, A. Mortazavi, W. H. Wong, B. Wold, M. J. Blow, A. Visel, L. A. Pennachio, L. Elnitski, E. H. Margulies, S. C. Parker, H. M. Petrykowska, A. Abyzov, B. Aken, D. Barrell, G. Barson, A. Berry, A. Bignell, V. Boychenko, G. Bussotti, J. Chrast, C. Davidson, T. Derrien, G. Despacio-Reyes, M. Diekhans, I. Ezkurdia, A. Frankish, J. Gilbert, J. M. Gonzalez, E. Griffiths, R. Harte, D. A. Hendrix, C. Howald, T. Hunt, I. Jungreis, M. Kay, E. Khurana, F. Kokocinski, J. Leng, M. F. Lin, J. Loveland, Z. Lu, D. Manthravadi, M. Mariotti, J. Mudge, G. Mukherjee, C. Notredame, B. Pei, J. M. Rodriguez, G. Saunders, A. Sboner, S. Searle, C. Sisu, C. Snow, C. Steward, A. Tanzer, E. Tapanan, M. L. Tress, M. J. van Baren, N. Walters, S. Washieti, L. Wilming, A. Zadissa, Z. Zhengdong, M. Brent, D. Haussler, M. Kellis, A. Valencia,

86

BIBLIOGRAPHY

M. Gerstein, A. Raymond, R. Guigo, J. Harrow, T. J. Hubbard, S. G. Landt, S. Frietze, A. Abyzov, N. Addleman, R. P. Alexander, R. K. Auerbach, S. Balasubramanian, K. Bettinger, N. Bhardwaj, A. P. Boyle, A. R. Cao, P. Cayting, A. Charos, Y. Cheng, C. Cheng, C. Eastman, G. Euskirchen, J. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74, September 2012. [15] Glenn A Maston, Sara K Evans, and Michael R Green. Transcriptional regulatory elements in the human genome. Annual Review of Genomics and Human Genetics, 7:29–59, 2006. [16] Sean Carroll. Evo-devo and an expanding evolutionary synthesis: A genetic theory of morphological evolution. Cell, 134(1):25–36, July 2008. [17] Isabelle S Peter and Eric H Davidson. Evolution of gene regulatory networks controlling body plan development. Cell, 144(6):970–985, March 2011. [18] F. Spitz and E. E. Furlong. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet., 13(9):613–626, September 2012. [19] J. Banerji, S. Rusconi, and W. Schaffner. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell, 27(2 Pt 1):299–308, December 1981. [20] A. Visel, E. M. Rubin, and L. A. Pennacchio. Genomic views of distant-acting enhancers. Nature, 461(7261):199–205, September 2009. [21] L. A. Lettice, S. J. Heaney, L. A. Purdie, L. Li, P. de Beer, B. A. Oostra, D. Goode, G. Elgar, R. E. Hill, and E. de Graaff. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet., 12(14):1725–1735, July 2003. [22] G. Badis, M. F. Berger, A. A. Philippakis, S. Talukder, A. R. Gehrke, S. A. Jaeger, E. T. Chan, G. Metzler, A. Vedenko, X. Chen, H. Kuznetsov, C. F. Wang, D. Coburn, D. E. Newburger, Q. Morris, T. R. Hughes, and M. L.

BIBLIOGRAPHY

87

Bulyk. Diversity and complexity in DNA recognition by transcription factors. Science, 324(5935):1720–1723, June 2009. [23] T. Maniatis, J. V. Falvo, T. H. Kim, T. K. Kim, C. H. Lin, B. S. Parekh, and M. G. Wathelet. Structure and function of the interferon-beta enhanceosome. Cold Spring Harb. Symp. Quant. Biol., 63:609–620, 1998. [24] D. Panne. The enhanceosome. Curr. Opin. Struct. Biol., 18(2):236–242, April 2008. [25] Adam Woolfe, Martin Goodson, Debbie K Goode, Phil Snell, Gayle K McEwen, Tanya Vavouri, Sarah F Smith, Phil North, Heather Callaway, Krys Kelly, Klaudia Walter, Irina Abnizova, Walter Gilks, Yvonne J K Edwards, Julie E Cooke, and Greg Elgar. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biology, 3(1):e7, January 2005. [26] L. A. Pennacchio, N. Ahituv, A. M. Moses, S. Prabhakar, M. A. Nobrega, M. Shoukry, S. Minovitsky, I. Dubchak, A. Holt, K. D. Lewis, I. Plajzer-Frick, J. Akiyama, S. De Val, V. Afzal, B. L. Black, O. Couronne, M. B. Eisen, A. Visel, and E. M. Rubin. In vivo enhancer analysis of human conserved non-coding sequences. Nature, 444(7118):499–502, November 2006. [27] W. Miller, K. Rosenbloom, R. C. Hardison, M. Hou, J. Taylor, B. Raney, R. Burhans, D. C. King, R. Baertsch, D. Blankenberg, S. L. Kosakovsky Pond, A. Nekrutenko, B. Giardine, R. S. Harris, S. Tyekucheva, M. Diekhans, T. H. Pringle, W. J. Murphy, A. Lesk, G. M. Weinstock, K. Lindblad-Toh, R. A. Gibbs, E. S. Lander, A. Siepel, D. Haussler, and W. J. Kent. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res., 17(12):1797–1808, December 2007. [28] G. Lunter, C. P. Ponting, and J. Hein. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol., 2(1):e5, January 2006.

88

BIBLIOGRAPHY

[29] E. V. Davydov, D. L. Goode, M. Sirota, G. M. Cooper, A. Sidow, and S. Batzoglou. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol., 6(12):e1001025, December 2010. [30] K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, and A. Siepel. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res., 20(1):110–121, January 2010. [31] Adam Siepel, Gill Bejerano, Jakob S Pedersen, Angie S Hinrichs, Minmei Hou, Kate Rosenbloom, Hiram Clawson, John Spieth, Ladeana W Hillier, Stephen Richards, George M Weinstock, Richard K Wilson, Richard A Gibbs, W James Kent, Webb Miller, and David Haussler. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research, 15(8):1034– 1050, August 2005. [32] G. Bejerano, M. Pheasant, I. Makunin, S. Stephen, W. J. Kent, J. S. Mattick, and D. Haussler. Ultraconserved elements in the human genome. Science, 304(5675):1321–5, May 2004. [33] Hiroshi Kikuta, Mary Laplante, Pavla Navratilova, Anna Z Komisarczuk, Pr G Engstrm, David Fredman, Altuna Akalin, Mario Caccamo, Ian Sealy, Kerstin Howe, Julien Ghislain, Guillaume Pezeron, Philippe Mourrain, Staale Ellingsen, Andrew C Oates, Christine Thisse, Bernard Thisse, Isabelle Foucher, Birgit Adolf, Andrea Geling, Boris Lenhard, and Thomas S Becker. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Research, 17(5):545–555, May 2007. [34] Axel Visel, Simon Minovitsky, Inna Dubchak, and Len A Pennacchio. VISTA enhancer browser–a database of tissue-specific human enhancers. Nucleic Acids Research, 35(Database issue):D88–92, January 2007. [35] Len A Pennacchio, Nadav Ahituv, Alan M Moses, Shyam Prabhakar, Marcelo A Nobrega, Malak Shoukry, Simon Minovitsky, Inna Dubchak, Amy Holt, Keith D

BIBLIOGRAPHY

89

Lewis, Ingrid Plajzer-Frick, Jennifer Akiyama, Sarah De Val, Veena Afzal, Brian L Black, Olivier Couronne, Michael B Eisen, Axel Visel, and Edward M Rubin. In vivo enhancer analysis of human conserved non-coding sequences. Nature, 444(7118):499–502, November 2006. [36] A. Bessis, N. Champtiaux, L. Chatelin, and J. P. Changeux. The neuronrestrictive silencer element: a dual enhancer/silencer crucial for patterned expression of a nicotinic receptor gene in the brain. Proc. Natl. Acad. Sci. U.S.A., 94(11):5906–5911, May 1997. [37] Yibin Kang, Chang-Rung Chen, and Joan Massagu. A self-enabling TGFbeta response coupled to stress signaling: Smad engages stress response factor ATF3 for id1 repression in epithelial cells. Molecular Cell, 11(4):915–926, April 2003. [38] K. Y. Kwan, M. M. Lam, Z. Krsnik, Y. I. Kawasawa, V. Lefebvre, and N. Sestan. SOX5 postmitotically regulates migration, postmigratory differentiation, and projections of subplate and deep-layer neocortical neurons. Proc. Natl. Acad. Sci. U.S.A., 105(41):16021–16026, October 2008. [39] T. S. Furey. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat. Rev. Genet., 13(12):840–852, December 2012. [40] A. A. Bhinge, J. Kim, G. M. Euskirchen, M. Snyder, and V. R. Iyer. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res., 17(6):910–916, June 2007. [41] Axel Visel, Matthew J Blow, Zirong Li, Tao Zhang, Jennifer A Akiyama, Amy Holt, Ingrid Plajzer-Frick, Malak Shoukry, Crystal Wright, Feng Chen, Veena Afzal, Bing Ren, Edward M Rubin, and Len A Pennacchio. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature, 457(7231):854–858, February 2009. [42] M. J. Blow, D. J. McCulley, Z. Li, T. Zhang, J. A. Akiyama, A. Holt, I. PlajzerFrick, M. Shoukry, C. Wright, F. Chen, V. Afzal, J. Bristow, B. Ren, B. L.

90

BIBLIOGRAPHY

Black, E. M. Rubin, A. Visel, and L. A. Pennacchio. ChIP-Seq identification of weakly conserved heart enhancers. Nat Genet, 42(9):806–10, September 2010. [43] D. May, M. J. Blow, T. Kaplan, D. J. McCulley, B. C. Jensen, J. A. Akiyama, A. Holt, I. Plajzer-Frick, M. Shoukry, C. Wright, V. Afzal, P. C. Simpson, E. M. Rubin, B. L. Black, J. Bristow, L. A. Pennacchio, and A. Visel. Largescale discovery of enhancers from human heart tissue. Nat Genet, 44(1):89–93, January 2012. [44] C. R. Lickwar, F. Mueller, S. E. Hanlon, J. G. McNally, and J. D. Lieb. Genomewide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function. Nature, 484(7393):251–255, April 2012. [45] R. J. Britten and E. H. Davidson. Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty. The Quarterly review of biology, 46(2):111–38, June 1971. [46] Susumu Ohno. An argument for the genetic simplicity of man and other mammals. Journal of Human Evolution, 1(6):651–662, 1972. [47] M. C. King and A. C. Wilson. Evolution at two levels in humans and chimpanzees. Science, 188(4184):107–116, April 1975. [48] W McGinnis, R L Garber, J Wirz, A Kuroiwa, and W J Gehring. A homologous protein-coding sequence in drosophila homeotic genes and its conservation in other metazoans. Cell, 37(2):403–408, June 1984. [49] Mansi Srivastava, Emina Begovic, Jarrod Chapman, Nicholas H Putnam, Uffe Hellsten, Takeshi Kawashima, Alan Kuo, Therese Mitros, Asaf Salamov, Meredith L Carpenter, Ana Y Signorovitch, Maria A Moreno, Kai Kamm, Jane Grimwood, Jeremy Schmutz, Harris Shapiro, Igor V Grigoriev, Leo W Buss, Bernd Schierwater, Stephen L Dellaporta, and Daniel S Rokhsar. The trichoplax genome and the nature of placozoans. Nature, 454(7207):955–960, August 2008.

BIBLIOGRAPHY

91

[50] Mansi Srivastava, Oleg Simakov, Jarrod Chapman, Bryony Fahey, Marie E A Gauthier, Therese Mitros, Gemma S Richards, Cecilia Conaco, Michael Dacre, Uffe Hellsten, Claire Larroux, Nicholas H Putnam, Mario Stanke, Maja Adamska, Aaron Darling, Sandie M Degnan, Todd H Oakley, David C Plachetzki, Yufeng Zhai, Marcin Adamski, Andrew Calcino, Scott F Cummins, David M Goodstein, Christina Harris, Daniel J Jackson, Sally P Leys, Shengqiang Shu, Ben J Woodcroft, Michel Vervoort, Kenneth S Kosik, Gerard Manning, Bernard M Degnan, and Daniel S Rokhsar. The amphimedon queenslandica genome and the evolution of animal complexity. Nature, 466(7307):720– 726, August 2010. [51] G. Aspock, H. Kagoshima, G. Niklaus, and T. R. Burglin. Caenorhabditis elegans has scores of hedgehog-related genes: sequence and expression analysis. Genome Res., 9(10):909–923, October 1999. [52] T. R. Burglin. Evolution of hedgehog and hedgehog-related genes, their origin from Hog proteins in ancestral eukaryotes and discovery of a novel Hint motif. BMC Genomics, 9:127, 2008. [53] C. Chiang, Y. Litingtung, E. Lee, K. E. Young, J. L. Corden, H. Westphal, and P. A. Beachy. Cyclopia and defective axial patterning in mice lacking Sonic hedgehog gene function. Nature, 383(6599):407–413, October 1996. [54] M. P. Matise and H. Wang. Sonic hedgehog signaling in the developing CNS where it has been and where it is going. Curr. Top. Dev. Biol., 97:75–117, 2011. [55] M. Ramalho-Santos, D. A. Melton, and A. P. McMahon.

Hedgehog sig-

nals regulate multiple aspects of gastrointestinal development. Development, 127(12):2763–2772, June 2000. [56] J. Zhu, E. Nakamura, M. T. Nguyen, X. Bao, H. Akiyama, and S. Mackem. Uncoupling Sonic hedgehog control of pattern and expansion of the developing limb bud. Dev. Cell, 14(4):624–632, April 2008.

92

BIBLIOGRAPHY

[57] T. Sagai, M. Hosoya, Y. Mizushina, M. Tamura, and T. Shiroishi. Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb. Development, 132(4):797–803, February 2005. [58] H. G. Belting, C. S. Shashikant, and F. H. Ruddle. Modification of expression and cis-regulation of Hoxc8 in the evolution of diverged axial morphology. Proc. Natl. Acad. Sci. U.S.A., 95(5):2355–2360, March 1998. [59] Y. F. Chan, M. E. Marks, F. C. Jones, G. Villarreal, M. D. Shapiro, S. D. Brady, A. M. Southwick, D. M. Absher, J. Grimwood, J. Schmutz, R. M. Myers, D. Petrov, B. Jonsson, D. Schluter, M. A. Bell, and D. M. Kingsley. Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science, 327(5963):302–305, January 2010. [60] C. Y. McLean, P. L. Reno, A. A. Pollen, A. I. Bassan, T. D. Capellini, C. Guenther, V. B. Indjeian, X. Lim, D. B. Menke, B. T. Schaar, A. M. Wenger, G. Bejerano, and D. M. Kingsley. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature, 471(7337):216–219, March 2011. [61] Casey W Dunn, Andreas Hejnol, David Q Matus, Kevin Pang, William E Browne, Stephen A Smith, Elaine Seaver, Greg W Rouse, Matthias Obst, Gregory D Edgecombe, Martin V Srensen, Steven H D Haddock, Andreas SchmidtRhaesa, Akiko Okusu, Reinhardt Mbjerg Kristensen, Ward C Wheeler, Mark Q Martindale, and Gonzalo Giribet. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature, 452(7188):745–749, April 2008. [62] Evgeny A Glazov, Michael Pheasant, Elizabeth A McGraw, Gill Bejerano, and John S Mattick. Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. Genome Research, 15(6):800–808, June 2005.

BIBLIOGRAPHY

93

[63] International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature, 432(7018):695–716, December 2004. [64] Tanya Vavouri, Klaudia Walter, Walter R Gilks, Ben Lehner, and Greg Elgar. Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans. Genome Biology, 8(2):R15, 2007. [65] Jos Luis Royo, Ignacio Maeso, Manuel Irimia, Feng Gao, Isabelle S Peter, Carla S Lopes, Salvatore D’Aniello, Fernando Casares, Eric H Davidson, Jordi Garcia-Fernndez, and Jos Luis Gmez-Skarmeta. Transphyletic conservation of developmental regulatory state in animal evolution. Proceedings of the National Academy of Sciences of the United States of America, 108(34):14186–14191, August 2011. [66] R.S. Harris. Improved pairwise alignment of genomic DNA. PhD thesis, The Pennsylvania State University, 2007. [67] Paramvir Dehal, Yutaka Satou, Robert K Campbell, Jarrod Chapman, Bernard Degnan, Anthony De Tomaso, Brad Davidson, Anna Di Gregorio, Maarten Gelpke, David M Goodstein, Naoe Harafuji, Kenneth E M Hastings, Isaac Ho, Kohji Hotta, Wayne Huang, Takeshi Kawashima, Patrick Lemaire, Diego Martinez, Ian A Meinertzhagen, Simona Necula, Masaru Nonaka, Nik Putnam, Sam Rash, Hidetoshi Saiga, Masanobu Satake, Astrid Terry, Lixy Yamada, Hong-Gang Wang, Satoko Awazu, Kaoru Azumi, Jeffrey Boore, Margherita Branno, Stephen Chin-Bow, Rosaria DeSantis, Sharon Doyle, Pilar Francino, David N Keys, Shinobu Haga, Hiroko Hayashi, Kyosuke Hino, Kaoru S Imai, Kazuo Inaba, Shungo Kano, Kenji Kobayashi, Mari Kobayashi, Byung-In Lee, Kazuhiro W Makabe, Chitra Manohar, Giorgio Matassi, Monica Medina, Yasuaki Mochizuki, Steve Mount, Tomomi Morishita, Sachiko Miura, Akie Nakayama, Satoko Nishizaka, Hisayo Nomoto, Fumiko Ohta, Kazuko

94

BIBLIOGRAPHY

Oishi, Isidore Rigoutsos, Masako Sano, Akane Sasaki, Yasunori Sasakura, Eiichi Shoguchi, Tadasu Shin-i, Antoinetta Spagnuolo, Didier Stainier, Miho M Suzuki, Olivier Tassy, Naohito Takatori, Miki Tokuoka, Kasumi Yagi, Fumiko Yoshizaki, Shuichi Wada, Cindy Zhang, P Douglas Hyatt, Frank Larimer, Chris Detter, Norman Doggett, Tijana Glavina, Trevor Hawkins, Paul Richardson, Susan Lucas, Yuji Kohara, Michael Levine, Nori Satoh, and Daniel S Rokhsar. The draft genome of ciona intestinalis: insights into chordate and vertebrate origins. Science (New York, N.Y.), 298(5601):2157–2167, December 2002.

[68] Nicholas H Putnam, Thomas Butts, David E K Ferrier, Rebecca F Furlong, Uffe Hellsten, Takeshi Kawashima, Marc Robinson-Rechavi, Eiichi Shoguchi, Astrid Terry, Jr-Kai Yu, E Lia Benito-Gutirrez, Inna Dubchak, Jordi Garcia-Fernndez, Jeremy J Gibson-Brown, Igor V Grigoriev, Amy C Horton, Pieter J de Jong, Jerzy Jurka, Vladimir V Kapitonov, Yuji Kohara, Yoko Kuroki, Erika Lindquist, Susan Lucas, Kazutoyo Osoegawa, Len A Pennacchio, Asaf A Salamov, Yutaka Satou, Tatjana Sauka-Spengler, Jeremy Schmutz, Tadasu Shin-I, Atsushi Toyoda, Marianne Bronner-Fraser, Asao Fujiyama, Linda Z Holland, Peter W H Holland, Nori Satoh, and Daniel S Rokhsar. The amphioxus genome and the evolution of the chordate karyotype. Nature, 453(7198):1064–1071, June 2008.

[69] Erica Sodergren, George M Weinstock, Eric H Davidson, R Andrew Cameron, Richard A Gibbs, Robert C Angerer, Lynne M Angerer, Maria Ina Arnone, David R Burgess, Robert D Burke, James A Coffman, Michael Dean, Maurice R Elphick, Charles A Ettensohn, Kathy R Foltz, Amro Hamdoun, Richard O Hynes, William H Klein, William Marzluff, David R McClay, Robert L Morris, Arcady Mushegian, Jonathan P Rast, L Courtney Smith, Michael C Thorndyke, Victor D Vacquier, Gary M Wessel, Greg Wray, Lan Zhang, Christine G Elsik, Olga Ermolaeva, Wratko Hlavina, Gretchen Hofmann, Paul Kitts, Melissa J Landrum, Aaron J Mackey, Donna Maglott, Georgia Panopoulou, Albert J Poustka, Kim Pruitt, Victor Sapojnikov, Xingzhi Song, Alexandre Souvorov, Victor Solovyev, Zheng Wei, Charles A Whittaker, Kim Worley, K James

BIBLIOGRAPHY

95

Durbin, Yufeng Shen, Olivier Fedrigo, David Garfield, Ralph Haygood, Alexander Primus, Rahul Satija, Tonya Severson, Manuel L Gonzalez-Garay, Andrew R Jackson, Aleksandar Milosavljevic, Mark Tong, Christopher E Killian, Brian T Livingston, Fred H Wilt, Nikki Adams, Robert Bell, Seth Carbonneau, Rocky Cheung, Patrick Cormier, Bertrand Cosson, Jenifer Croce, Antonio Fernandez-Guerra, Anne-Marie Genevire, Manisha Goel, Hemant Kelkar, Julia Morales, Odile Mulner-Lorillon, Anthony J Robertson, Jared V Goldstone, Bryan Cole, David Epel, Bert Gold, Mark E Hahn, Meredith Howard-Ashby, Mark Scally, John J Stegeman, Erin L Allgood, Jonah Cool, Kyle M Judkins, Shawn S McCafferty, Ashlan M Musante, Robert A Obar, Amanda P Rawson, Blair J Rossetti, Ian R Gibbons, Matthew P Hoffman, Andrew Leone, Sorin Istrail, Stefan C Materna, Manoj P Samanta, Viktor Stolc, Waraporn Tongprasit, Qiang Tu, Karl-Frederik Bergeron, Bruce P Brandhorst, James Whittle, Kevin Berney, David J Bottjer, Cristina Calestani, Kevin Peterson, Elly Chow, Qiu Autumn Yuan, Eran Elhaik, Dan Graur, Justin T Reese, Ian Bosdet, Shin Heesun, Marco A Marra, Jacqueline Schein, Michele K Anderson, Virginia Brockton, Katherine M Buckley, Avis H Cohen, Sebastian D Fugmann, Taku Hibino, Mariano Loza-Coll, Audrey J Majeske, Cynthia Messier, Sham V Nair, Zeev Pancer, David P Terwilliger, Cavit Agca, Enrique Arboleda, Nansheng Chen, Allison M Churcher, F Hallbk, Glen W Humphrey, Mohammed M Idris, Takae Kiyama, Shuguang Liang, Dan Mellott, Xiuqian Mu, Greg Murray, Robert P Olinski, Florian Raible, Matthew Rowe, John S Taylor, Kristin Tessmar-Raible, D Wang, Karen H Wilson, Shunsuke Yaguchi, Terry Gaasterland, Blanca E Galindo, Herath J Gunaratne, Celina Juliano, Masashi Kinukawa, Gary W Moy, Anna T Neill, Mamoru Nomura, Michael Raisch, Anna Reade, Michelle M Roux, Jia L Song, Yi-Hsien Su, Ian K Townley, Ekaterina Voronina, Julian L Wong, Gabriele Amore, Margherita Branno, Euan R Brown, Vincenzo Cavalieri, Vronique Duboc, Louise Duloquin, Constantin Flytzanis, Christian Gache, Franois Lapraz, Thierry Lepage, Annamaria Locascio, Pedro Martinez, Giorgio Matassi, Valeria Matranga, Ryan Range, Francesca Rizzo, Eric Rttinger, Wendy Beane, Cynthia Bradham, Christine

96

BIBLIOGRAPHY

Byrum, Tom Glenn, Sofia Hussain, Gerard Manning, Esther Miranda, Rebecca Thomason, Katherine Walton, Athula Wikramanayke, Shu-Yu Wu, Ronghui Xu, C Titus Brown, Lili Chen, Rachel F Gray, Pei Yun Lee, Jongmin Nam, Paola Oliveri, Joel Smith, Donna Muzny, Stephanie Bell, Joseph Chacko, Andrew Cree, Stacey Curry, Clay Davis, Huyen Dinh, Shannon Dugan-Rocha, Jerry Fowler, Rachel Gill, Cerrissa Hamilton, Judith Hernandez, Sandra Hines, Jennifer Hume, Laronda Jackson, Angela Jolivet, Christie Kovar, Sandra Lee, Lora Lewis, George Miner, Margaret Morgan, Lynne V Nazareth, Geoffrey Okwuonu, David Parker, Ling-Ling Pu, Rachel Thorn, and Rita Wright. The genome of the sea urchin strongylocentrotus purpuratus. Science (New York, N.Y.), 314(5801):941–952, November 2006. [70] Y Yokota. Id and development. Oncogene, 20(58):8290–8298, December 2001. [71] Teresa Lpez-Rovira, Elisabet Chalaux, Joan Massagu, Jose Luis Rosa, and Francesc Ventura. Direct binding of smad1 and smad4 to two distinct motifs mediates bone morphogenetic protein-specific transcriptional activation of id1 gene. The Journal of Biological Chemistry, 277(5):3176–3185, February 2002. [72] Olexander Korchynskyi and Peter ten Dijke. Identification and functional characterization of distinct critically important bone morphogenetic protein-specific response elements in the id1 promoter. The Journal of Biological Chemistry, 277(7):4883–4891, February 2002. [73] Claudia M Mizutani and Ethan Bier. EvoD/Vo: the origins of BMP signalling in the neuroectoderm. Nat Rev Genet, 9(9):663–677, 2008. [74] Chang-Rung Chen, Yibin Kang, Peter M Siegel, and Joan Massagu. E2F4/5 and p107 as smad cofactors linking the TGFbeta receptor to c-myc repression. Cell, 110(1):19–32, July 2002. [75] Brian J Raney, Melissa S Cline, Kate R Rosenbloom, Timothy R Dreszer, Katrina Learned, Galt P Barber, Laurence R Meyer, Cricket A Sloan, Venkat S

BIBLIOGRAPHY

97

Malladi, Krishna M Roskin, Bernard B Suh, Angie S Hinrichs, Hiram Clawson, Ann S Zweig, Vanessa Kirkup, Pauline A Fujita, Brooke Rhead, Kayla E Smith, Andy Pohl, Robert M Kuhn, Donna Karolchik, David Haussler, and W James Kent. ENCODE whole-genome data in the UCSC genome browser (2011 update). Nucleic Acids Research, 39(Database issue):D871–875, January 2011. [76] Kevin J Peterson, James A Cotton, James G Gehling, and Davide Pisani. The ediacaran emergence of bilaterians: congruence between the genetic and the geological fossil records. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 363(1496):1435–1443, April 2008. [77] G.J. Rauch, D.A. Lyons, I. Middendorf, B. Friedlander, N. Arana, T. Reyes, and W.S. Talbot. Submission and curation of gene expression data. ZFIN direct data submission, 2003. [78] Paul A Gray, Hui Fu, Ping Luo, Qing Zhao, Jing Yu, Annette Ferrari, Toyoaki Tenzen, Dong-In Yuk, Eric F Tsung, Zhaohui Cai, John A Alberta, Le-Ping Cheng, Yang Liu, Jan M Stenman, M Todd Valerius, Nathan Billings, Haesun A Kim, Michael E Greenberg, Andrew P McMahon, David H Rowitch, Charles D Stiles, and Qiufu Ma. Mouse brain organization revealed through direct genomescale TF expression analysis. Science (New York, N.Y.), 306(5705):2255–2257, December 2004. [79] Roger Revilla-i Domingo, Takuya Minokawa, and Eric H Davidson. R11: a cis-regulatory node of the sea urchin embryo gene network that controls early expression of SpDelta in micromeres. Developmental Biology, 274(2):438–451, October 2004. [80] Alexandra Saudemont, Emmanuel Haillot, Flavien Mekpoh, Nathalie Bessodes, Magali Quirin, Franois Lapraz, Vronique Duboc, Eric Rttinger, Ryan Range, Arnaud Oisel, Lydia Besnardeau, Patrick Wincker, and Thierry Lepage. Ancestral regulatory circuits governing ectoderm patterning downstream of nodal

98

BIBLIOGRAPHY

and BMP2/4 revealed by gene regulatory network analysis in an echinoderm. PLoS Genetics, 6(12):e1001259, 2010. [81] P Y Cheah, Y B Meng, X Yang, D Kimbrell, M Ashburner, and W Chia. The drosophila l(2)35Ba/nocA gene encodes a putative zn finger protein involved in the development of the embryonic brain and the adult ocellar structures. Molecular and Cellular Biology, 14(2):1487–1499, February 1994. [82] Alexander P Runko and Charles G Sagerstrm. Isolation of nlz2 and characterization of essential domains in nlz family proteins. The Journal of Biological Chemistry, 279(12):11917–11925, March 2004. [83] Alexander P Runko and Charles G Sagerstrm. Nlz belongs to a family of zinc-finger-containing repressors and controls segmental gene expression in the zebrafish hindbrain. Developmental Biology, 262(2):254–267, October 2003. [84] Jacqueline Hoyle, Yixin P Tang, Elizabeth L Wiellette, Fiona C Wardle, and Hazel Sive. nlz gene family is required for hindbrain patterning in the zebrafish. Developmental Dynamics: An Official Publication of the American Association of Anatomists, 229(4):835–846, April 2004. [85] N Vlachakis, S K Choe, and C G Sagerstrm. Meis3 synergizes with pbx4 and hoxb1b in promoting hindbrain fates in the zebrafish. Development (Cambridge, England), 128(8):1299–1312, April 2001. [86] Seong-Kyu Choe, Nikolaos Vlachakis, and Charles G. Sagerstrm. Meis family proteins are required for hindbrain development in the zebrafish. Development, 129(3):585–595, February 2002. [87] H D Ryoo, T Marty, F Casares, M Affolter, and R S Mann. Regulation of hox target genes by a DNA bound Homothorax/Hox/Extradenticle complex. Development (Cambridge, England), 126(22):5137–5148, November 1999. [88] Yuan Jiang, Herong Shi, and Jun Liu. Two hox cofactors, the Meis/Hth homolog UNC-62 and the Pbx/Exd homolog CEH-20, function together during

BIBLIOGRAPHY

99

c. elegans postembryonic mesodermal development. Developmental Biology, 334(2):535–546, October 2009. [89] Christian P Petersen and Peter W Reddien. Wnt signaling and the polarity of the primary body axis. Cell, 139(6):1056–1068, December 2009. [90] Muriel Rhinn, Klaus Lun, Marta Luz, Michaela Werner, and Michael Brand. Positioning of the midbrain-hindbrain boundary organizer through global posteriorization of the neuroectoderm mediated by wnt8 signaling. Development (Cambridge, England), 132(6):1261–1272, March 2005. [91] Edwina McGlinn, Joy M Richman, Vicki Metzis, Liam Town, Natalie C Butterfield, Brandon J Wainwright, and Carol Wicking. Expression of the NET family member zfp503 is regulated by hedgehog and BMP signaling in the limb. Developmental Dynamics: An Official Publication of the American Association of Anatomists, 237(4):1172–1182, April 2008. [92] E H Davidson, R A Cameron, and A Ransick. Specification of cell fate in the sea urchin embryo: summary and some proposed mechanisms. Development (Cambridge, England), 125(17):3269–3290, September 1998. [93] P Cubas, J Modolell, and M Ruiz-Gmez.

The helix-loop-helix extra-

macrochaetae protein is required for proper specification of many cell types in the drosophila embryo. Development (Cambridge, England), 120(9):2555–2566, September 1994. [94] Y Tomoyasu, M Nakamura, and N Ueno.

Role of dpp signalling in

prepattern formation of the dorsocentral mechanosensory organ in drosophila melanogaster. Development (Cambridge, England), 125(21):4215–4224, November 1998. [95] Linda Z Holland. Chordate roots of the vertebrate nervous system: expanding the molecular toolkit. Nature Reviews. Neuroscience, 10(10):736–746, October 2009.

100

BIBLIOGRAPHY

[96] Detlev Arendt, Alexandru S Denes, Gspr Jkely, and Kristin Tessmar-Raible. The evolution of nervous system centralization. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 363(1496):1523– 1528, April 2008. [97] Florian Raible, Kristin Tessmar-Raible, Kazutoyo Osoegawa, Patrick Wincker, Claire Jubin, Guillaume Balavoine, David Ferrier, Vladimir Benes, Pieter de Jong, Jean Weissenbach, Peer Bork, and Detlev Arendt. Vertebrate-type intron-rich genes in the marine annelid platynereis dumerilii. Science (New York, N.Y.), 310(5752):1325–1326, November 2005. [98] Nicholas H Putnam, Mansi Srivastava, Uffe Hellsten, Bill Dirks, Jarrod Chapman, Asaf Salamov, Astrid Terry, Harris Shapiro, Erika Lindquist, Vladimir V Kapitonov, Jerzy Jurka, Grigory Genikhovich, Igor V Grigoriev, Susan M Lucas, Robert E Steele, John R Finnerty, Ulrich Technau, Mark Q Martindale, and Daniel S Rokhsar. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science (New York, N.Y.), 317(5834):86– 94, July 2007. [99] France Denoeud, Simon Henriet, Sutada Mungpakdee, Jean-Marc Aury, Corinne Da Silva, Henner Brinkmann, Jana Mikhaleva, Lisbeth Charlotte Olsen, Claire Jubin, Cristian Caestro, Jean-Marie Bouquet, Gemma Danks, Julie Poulain, Coen Campsteijn, Marcin Adamski, Ismael Cross, Fekadu Yadetie, Matthieu Muffato, Alexandra Louis, Stephen Butcher, Georgia Tsagkogeorga, Anke Konrad, Sarabdeep Singh, Marit Flo Jensen, Evelyne Huynh Cong, Helen Eikeseth-Otteraa, Benjamin Noel, Vronique Anthouard, Betina M Porcel, Rym Kachouri-Lafond, Atsuo Nishino, Matteo Ugolini, Pascal Chourrout, Hiroki Nishida, Rein Aasland, Snehalata Huzurbazar, Eric Westhof, Frdric Delsuc, Hans Lehrach, Richard Reinhardt, Jean Weissenbach, Scott W Roy, Franois Artiguenave, John H Postlethwait, J Robert Manak, Eric M Thompson, Olivier Jaillon, Louis Du Pasquier, Pierre Boudinot, David A Liberles, JeanNicolas Volff, Herv Philippe, Boris Lenhard, Hugues Roest Crollius, Patrick

BIBLIOGRAPHY

101

Wincker, and Daniel Chourrout. Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate. Science (New York, N.Y.), 330(6009):1381–1385, December 2010. [100] A E Pasquinelli, B J Reinhart, F Slack, M Q Martindale, M I Kuroda, B Maller, D C Hayward, E E Ball, B Degnan, P Mller, J Spring, A Srinivasan, M Fishman, J Finnerty, J Corbo, M Levine, P Leahy, E Davidson, and G Ruvkun. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature, 408(6808):86–89, November 2000. [101] Simon E Prochnik, Daniel S Rokhsar, and A Aziz Aboobaker. Evidence for a microRNA expansion in the bilaterian ancestor. Development Genes and Evolution, 217(1):73–77, January 2007. [102] O. Hallikas, K. Palin, N. Sinjushina, R. Rautiainen, J. Partanen, E. Ukkonen, and J. Taipale. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell, 124(1):47–59, January 2006. [103] H. Meng, A. Banerjee, and L. Zhou. BLISS: binding site level identification of shared signal-modules in DNA regulatory sequences. BMC Bioinformatics, 7:287, 2006. [104] A. M. Wenger, S. L. Clarke, H. Guturu, J. Chen, B. T. Schaar, C. Y. McLean, and G. Bejerano. PRISM offers a comprehensive genomic approach to transcription factor function prediction. Genome Res., 23(5):889–904, May 2013. [105] F. Hsu, W. J. Kent, H. Clawson, R. M. Kuhn, M. Diekhans, and D. Haussler. The UCSC known genes. Bioinformatics, 22(9):1036–46, May 2006. [106] Kim D Pruitt, Tatiana Tatusova, and Donna R Maglott. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research, 35(Database issue):D61–65, January 2007.

102

BIBLIOGRAPHY

[107] T Hubbard, D Barker, E Birney, G Cameron, Y Chen, L Clark, T Cox, J Cuff, V Curwen, T Down, R Durbin, E Eyras, J Gilbert, M Hammond, L Huminiecki, A Kasprzyk, H Lehvaslaiho, P Lijnzaad, C Melsopp, E Mongin, R Pettett, M Pocock, S Potter, A Rust, E Schmidt, S Searle, G Slater, J Smith, W Spooner, A Stabenau, J Stalker, E Stupka, A Ureta-Vidal, I Vastrik, and M Clamp. The ensembl genome database project. Nucleic Acids Research, 30(1):38–41, January 2002. [108] Adam Siepel and David Haussler. Computational identification of evolutionarily conserved exons. In Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, RECOMB ’04, page 177186, New York, NY, USA, 2004. ACM. [109] Dennis A Benson, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and David L Wheeler. GenBank: update. Nucleic Acids Research, 32(Database issue):D23–26, January 2004. [110] Daniela S Gerhard, Lukas Wagner, Elise A Feingold, Carolyn M Shenmen, Lynette H Grouse, Greg Schuler, Steven L Klein, Susan Old, Rebekah Rasooly, Peter Good, Mark Guyer, Allison M Peck, Jeffery G Derge, David Lipman, Francis S Collins, Wonhee Jang, Steven Sherry, Mike Feolo, Leonie Misquitta, Eduardo Lee, Kirill Rotmistrovsky, Susan F Greenhut, Carl F Schaefer, Kenneth Buetow, Tom I Bonner, David Haussler, Jim Kent, Mark Kiekhaus, Terry Furey, Michael Brent, Christa Prange, Kirsten Schreiber, Nicole Shapiro, Narayan K Bhat, Ralph F Hopkins, Florence Hsie, Tom Driscoll, M Bento Soares, Tom L Casavant, Todd E Scheetz, Michael J Brown-stein, Ted B Usdin, Shiraki Toshiyuki, Piero Carninci, Yulan Piao, Dawood B Dudekula, Minoru S H Ko, Koichi Kawakami, Yutaka Suzuki, Sumio Sugano, C E Gruber, M R Smith, Blake Simmons, Troy Moore, Richard Waterman, Stephen L Johnson, Yijun Ruan, Chia Lin Wei, S Mathavan, Preethi H Gunaratne, Jiaqian Wu, Angela M Garcia, Stephen W Hulyk, Edwin Fuh, Ye Yuan, Anna Sneed, Carla Kowis, Anne Hodgson, Donna M Muzny, John McPherson, Richard A Gibbs,

BIBLIOGRAPHY

103

Jessica Fahey, Erin Helton, Mark Ketteman, Anuradha Madan, Stephanie Rodrigues, Amy Sanchez, Michelle Whiting, Anup Madari, Alice C Young, Keith D Wetherby, Steven J Granite, Peggy N Kwong, Charles P Brinkley, Russell L Pearson, Gerard G Bouffard, Robert W Blakesly, Eric D Green, Mark C Dickson, Alex C Rodriguez, Jane Grimwood, Jeremy Schmutz, Richard M Myers, Yaron S N Butterfield, Malachi Griffith, Obi L Griffith, Martin I Krzywinski, Nancy Liao, Ryan Morin, Ryan Morrin, Diana Palmquist, Anca S Petrescu, Ursula Skalska, Duane E Smailus, Jeff M Stott, Angelique Schnerch, Jacqueline E Schein, Steven J M Jones, Robert A Holt, Agnes Baross, Marco A Marra, Sandra Clifton, Kathryn A Makowski, Stephanie Bosak, and Joel Malek. The status, quality, and expansion of the NIH full-length cDNA project: the mammalian gene collection (MGC). Genome Research, 14(10B):2121–2127, October 2004. [111] Sam Griffiths-Jones, Harpreet Kaur Saini, Stijn van Dongen, and Anton J Enright. miRBase: tools for microRNA genomics. Nucleic Acids Research, 36(Database issue):D154–158, January 2008. [112] Laurent Lestrade and Michel J Weber. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research, 34(Database issue):D158–162, January 2006. [113] John E Karro, Yangpan Yan, Deyou Zheng, Zhaolei Zhang, Nicholas Carriero, Philip Cayting, Paul Harrrison, and Mark Gerstein. Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Research, 35(Database issue):D55–60, January 2007. [114] J L Ashurst, C-K Chen, J G R Gilbert, K Jekosch, S Keenan, P Meidl, S M Searle, J Stalker, R Storey, S Trevanion, L Wilming, and T Hubbard. The vertebrate genome annotation (Vega) database. Nucleic Acids Research, 33(Database issue):D459–465, January 2005. [115] Jakob Skou Pedersen, Gill Bejerano, Adam Siepel, Kate Rosenbloom, Kerstin Lindblad-Toh, Eric S Lander, Jim Kent, Webb Miller, and David Haussler.

104

BIBLIOGRAPHY

Identification and classification of conserved RNA secondary structures in the human genome. PLoS Computational Biology, 2(4):e33, April 2006. [116] F Chiaromonte, V B Yap, and W Miller. Scoring pairwise genomic sequence alignments. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pages 115–126, 2002. [117] S F Altschul, W Gish, W Miller, E W Myers, and D J Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410, October 1990. [118] Mathieu Blanchette, W James Kent, Cathy Riemer, Laura Elnitski, Arian F A Smit, Krishna M Roskin, Robert Baertsch, Kate Rosenbloom, Hiram Clawson, Eric D Green, David Haussler, and Webb Miller. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research, 14(4):708– 715, April 2004. [119] M A Larkin, G Blackshields, N P Brown, R Chenna, P A McGettigan, H McWilliam, F Valentin, I M Wallace, A Wilm, R Lopez, J D Thompson, T J Gibson, and D G Higgins. Clustal w and clustal x version 2.0. Bioinformatics (Oxford, England), 23(21):2947–2948, November 2007. [120] Andrew M Waterhouse, James B Procter, David M A Martin, Michle Clamp, and Geoffrey J Barton. Jalview version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics (Oxford, England), 25(9):1189–1191, May 2009. [121] Gavin E Crooks, Gary Hon, John-Marc Chandonia, and Steven E Brenner. WebLogo: a sequence logo generator. Genome Research, 14(6):1188–1190, June 2004. [122] Daniel E Newburger and Martha L Bulyk. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Research, 37(Database issue):D77–82, January 2009.

BIBLIOGRAPHY

105

[123] V Matys, O V Kel-Margoulis, E Fricke, I Liebich, S Land, A Barre-Dirrie, I Reuter, D Chekmenev, M Krull, K Hornischer, N Voss, P Stegmaier, B Lewicki-Potapov, H Saxel, A E Kel, and E Wingender. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Research, 34(Database issue):D108–110, January 2006. [124] Qiang Li, Deborah Ritter, Nan Yang, Zhiqiang Dong, Hao Li, Jeffrey H Chuang, and Su Guo. A systematic approach to identify functional motifs within vertebrate developmental enhancers. Developmental Biology, 337(2):484–495, January 2010. [125] Shannon Fisher, Elizabeth A Grice, Ryan M Vinton, Seneca L Bessling, Akihiro Urasaki, Koichi Kawakami, and Andrew S McCallion. Evaluating the biological relevance of putative enhancers using tol2 transposon-mediated transgenesis in zebrafish. Nature Protocols, 1(3):1297–1305, 2006. [126] R J DiLeone, L B Russell, and D M Kingsley. An extensive 3’ regulatory region controls expression of bmp5 in specific anatomical structures of the mouse embryo. Genetics, 148(1):401–408, January 1998. [127] E. D. Jarvis, O. Gunturkun, L. Bruce, A. Csillag, H. Karten, W. Kuenzel, L. Medina, G. Paxinos, D. J. Perkel, T. Shimizu, G. Striedter, J. M. Wild, G. F. Ball, J. Dugas-Ford, S. E. Durand, G. E. Hough, S. Husband, L. Kubikova, D. W. Lee, C. V. Mello, A. Powers, C. Siang, T. V. Smulders, K. Wada, S. A. White, K. Yamamoto, J. Yu, A. Reiner, and A. B. Butler. Avian brains and a new understanding of vertebrate brain evolution. Nat Rev Neurosci, 6(2):151–9, February 2005. [128] Z. Molnar. Evolution of cerebral cortical development. Brain, behavior and evolution, 78(1):94–107, 2011. [129] J. H. Lui, D. V. Hansen, and A. R. Kriegstein. Development and evolution of the human neocortex. Cell, 146(1):18–36, July 2011.

106

BIBLIOGRAPHY

[130] J. L. Rubenstein. Annual research review: Development of the cerebral cortex: implications for neurodevelopmental disorders. Journal of child psychology and psychiatry, and allied disciplines, 52(4):339–55, April 2011. [131] B. J. Molyneaux, P. Arlotta, J. R. Menezes, and J. D. Macklis. Neuronal subtype specification in the cerebral cortex. Nat Rev Neurosci, 8(6):427–37, June 2007. [132] K. Y. Kwan, N. Sestan, and E. S. Anton.

Transcriptional co-regulation

of neuronal migration and laminar identity in the neocortex. Development, 139(9):1535–46, May 2012. [133] A. F. Cheung, A. A. Pollen, A. Tavare, J. DeProto, and Z. Molnar. Comparative aspects of cortical neurogenesis in vertebrates. J. Anat., 211(2):164–176, August 2007. [134] Z. Molnar. Evolution of cerebral cortical development. Brain Behav. Evol., 78(1):94–107, 2011. [135] Cory Y McLean, Dave Bristor, Michael Hiller, Shoa L Clarke, Bruce T Schaar, Craig B Lowe, Aaron M Wenger, and Gill Bejerano. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnology, 28(5):495– 501, May 2010. [136] Matthew H. Kaufman. The atlas of mouse development. Academic Press, London ; San Diego, 1992. [137] A. E. Ayoub, S. Oh, Y. Xie, J. Leng, J. Cotney, M. H. Dominguez, J. P. Noonan, and P. Rakic. Transcriptional programs in transient embryonic zones of the cerebral cortex defined by high-resolution mRNA sequencing. Proc Natl Acad Sci U S A, 108(36):14950–5, September 2011. [138] L. das Neves, C. S. Duchala, F. Tolentino-Silva, M. A. Haxhiu, C. Colmenares, W. B. Macklin, C. E. Campbell, K. G. Butz, and R. M. Gronostajski. Disruption of the murine nuclear factor i-a gene (Nfia) results in perinatal lethality,

BIBLIOGRAPHY

107

hydrocephalus, and agenesis of the corpus callosum. Proceedings of the National Academy of Sciences of the United States of America, 96(21):11946–51, October 1999. [139] G. Steele-Perkins, C. Plachez, K. G. Butz, G. Yang, C. J. Bachurski, S. L. Kinsman, E. D. Litwack, L. J. Richards, and R. M. Gronostajski. The transcription factor gene nfib is essential for both lung maturation and brain development. Molecular and cellular biology, 25(2):685–98, January 2005. [140] D. Zhang, D. C. Zeldin, and P. J. Blackshear. Regulatory factor x4 variant 3: a transcription factor involved in brain development and disease. Journal of neuroscience research, 85(16):3515–22, December 2007. [141] R. F. Hevner, R. D. Hodge, R. A. Daza, and C. Englund. Transcription factors in glutamatergic neurogenesis: conserved programs in neocortex, cerebellum, and adult hippocampus. Neuroscience research, 55(3):223–33, July 2006. [142] Sebastian J Arnold, Guo-Jen Huang, Amanda F P Cheung, Takumi Era, ShinIchi Nishikawa, Elizabeth K Bikoff, Zoltn Molnr, Elizabeth J Robertson, and Matthias Groszer. The t-box transcription factor Eomes/Tbr2 regulates neurogenesis in the cortical subventricular zone. Genes & Development, 22(18):2479– 2484, September 2008. [143] F. Bedogni, R. D. Hodge, G. E. Elsen, B. R. Nelson, R. A. Daza, R. P. Beyer, T. K. Bammler, J. L. Rubenstein, and R. F. Hevner. Tbr1 regulates regional and laminar identity of postmitotic neurons in developing neocortex. Proceedings of the National Academy of Sciences of the United States of America, 107(29):13129–34, July 2010. [144] P. S. Joshi, B. J. Molyneaux, L. Feng, X. Xie, J. D. Macklis, and L. Gan. Bhlhb5 regulates the postmitotic acquisition of area identities in layers II-V of the developing neocortex. Neuron, 60(2):258–72, October 2008. [145] Allen Institute for Brain Science.

Allen developing mouse brain atlas.

http://developingmouse.brain-map.org, 2009.

108

BIBLIOGRAPHY

[146] G. Bejerano, C. B. Lowe, N. Ahituv, B. King, A. Siepel, S. R. Salama, E. M. Rubin, W. J. Kent, and D. Haussler. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature, 441(7089):87–90, May 2006. [147] C. B. Lowe, G. Bejerano, and D. Haussler. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proceedings of the National Academy of Sciences of the United States of America, 104(19):8005–10, May 2007. [148] N. Frankel, G. K. Davis, D. Vargas, S. Wang, F. Payre, and D. L. Stern. Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature, 466(7305):490–3, July 2010. [149] M. W. Perry, A. N. Boettiger, J. P. Bothma, and M. Levine. Shadow enhancers foster robustness of drosophila gastrulation. Current biology : CB, 20(17):1562– 7, September 2010. [150] Neil Shubin, Cliff Tabin, and Sean Carroll. Deep homology and the origins of evolutionary novelty. Nature, 457(7231):818–823, February 2009. [151] A. Visel, E. M. Rubin, and L. A. Pennacchio. Genomic views of distant-acting enhancers. Nature, 461(7261):199–205, September 2009. [152] Y. Zhang, T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nusbaum, R. M. Myers, M. Brown, W. Li, and X. S. Liu. Model-based analysis of ChIP-Seq (MACS). Genome Biol, 9(9):R137, 2008. [153] A. E. Kel, E. Gossling, I. Reuter, E. Cheremushkin, O. V. Kel-Margoulis, and E. Wingender. MATCH: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res, 31(13):3576–9, July 2003. [154] W James Kent, Robert Baertsch, Angie Hinrichs, Webb Miller, and David Haussler. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proceedings of the National Academy of Sciences of the United States of America, 100(20):11484–11489, September 2003.

BIBLIOGRAPHY

109

[155] J. A. Bailey, Z. Gu, R. A. Clark, K. Reinert, R. V. Samonte, S. Schwartz, M. D. Adams, E. W. Myers, P. W. Li, and E. E. Eichler. Recent segmental duplications in the human genome. Science, 297(5583):1003–7, August 2002. [156] J. Persampieri, D. I. Ritter, D. Lees, J. Lehoczky, Q. Li, S. Guo, and J. H. Chuang. cneViewer: a database of conserved non-coding elements for studies of tissue-specific gene regulation. Bioinformatics, 24(20):2418–9, October 2008. [157] M. T. Weirauch and T. R. Hughes. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet., 26(2):66–74, February 2010. [158] Shannon Fisher, Elizabeth A Grice, Ryan M Vinton, Seneca L Bessling, and Andrew S McCallion. Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science (New York, N.Y.), 312(5771):276– 279, April 2006. [159] D. T. Odom, R. D. Dowell, E. S. Jacobsen, W. Gordon, T. W. Danford, K. D. MacIsaac, P. A. Rolfe, C. M. Conboy, D. K. Gifford, and E. Fraenkel. Tissuespecific transcriptional regulation has diverged significantly between human and mouse. Nat. Genet., 39(6):730–732, June 2007. [160] Susumu Ohno. Evolution by gene duplication. New York: Springer-Verlag, 1970. [161] G. Bejerano, D. Haussler, and M. Blanchette. Into the heart of darkness: largescale clustering of human non-coding DNA. Bioinformatics, 20 Suppl 1:i40–48, August 2004. [162] C. B. Lowe, G. Bejerano, and D. Haussler. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl. Acad. Sci. U.S.A., 104(19):8005–8010, May 2007. [163] T E Haerry and W J Gehring. Intron of the mouse hoxa-7 gene contains conserved homeodomain binding sites that can function as an enhancer element

110

BIBLIOGRAPHY

in drosophila. Proceedings of the National Academy of Sciences of the United States of America, 93(24):13884–13889, November 1996. [164] T E Haerry and W J Gehring. A conserved cluster of homeodomain binding sites in the mouse hoxa-4 intron functions in drosophila embryos as an enhancer that is directly regulated by ultrabithorax. Developmental Biology, 186(1):1–15, June 1997. [165] Steven G Kuntz, Erich M Schwarz, John A DeModena, Tristan De Buysscher, Diane Trout, Hiroaki Shizuya, Paul W Sternberg, and Barbara J Wold. Multigenome DNA sequence conservation identifies hox cis-regulatory elements. Genome Research, 18(12):1955–1968, December 2008. [166] Francesco Argenton, Simona Giudici, Gianluca Deflorian, Simona Cimbro, Franco Cotelli, and Monica Beltrame. Ectopic expression and knockdown of a zebrafish sox21 reveal its role as a transcriptional repressor in early development. Mechanisms of Development, 121(2):131–142, February 2004. [167] Magnus Sandberg, Magdalena Kllstrm, and Jonas Muhr. Sox21 promotes the progression of vertebrate neurogenesis. Nature Neuroscience, 8(8):995–1001, August 2005.