Differentiation of Human Parthenogenetic Pluripotent ... - Cell Press

0 downloads 0 Views 3MB Size Report
Apr 2, 2015 - a genome-wide search for tissue and isoform dependent im- ...... Robust Multichip Analysis (RMA) in the Affymetrix Expression Console.
Article

Differentiation of Human Parthenogenetic Pluripotent Stem Cells Reveals Multiple Tissue- and Isoform-Specific Imprinted Transcripts Graphical Abstract

Authors Yonatan Stelzer, Shiran Bar, ..., Sebastian Kadener, Nissim Benvenisty

Correspondence [email protected] (S.K.), [email protected] (N.B.)

In Brief By differentiating human parthenogenetic induced pluripotent stem cells into different cell types and combining DNA methylation with 50 RNA sequencing analyses, Stelzer et al. identify tissue- and isoform-dependent imprinted genes in a genome-wide manner. This study provides a global analysis of tissuespecific imprinting in humans.

Highlights d

d

Differentiating parthenogenetic iPSCs uncover paternally imprinted genes DNA methylation and 50 RNA-seq reveal tissue- and isoformdependent imprinting

d

Nearly half of all known imprinted genes express both biallelic and monoallelic isoforms

d

Alternative promoters are suggested to be central to the regulation of imprinting

Stelzer et al., 2015, Cell Reports 11, 308–320 April 14, 2015 ª2015 The Authors http://dx.doi.org/10.1016/j.celrep.2015.03.023

Accession Numbers GSE65002

Cell Reports

Article Differentiation of Human Parthenogenetic Pluripotent Stem Cells Reveals Multiple Tissue- and Isoform-Specific Imprinted Transcripts Yonatan Stelzer,1,2,4 Shiran Bar,1,4 Osnat Bartok,3 Shaked Afik,3 Daniel Ronen,1 Sebastian Kadener,3,* and Nissim Benvenisty1,* 1Azrieli Center for Stem Cells and Genetic Research, Department of Genetics, Institute of Life Sciences, Edmond J. Safra Campus-Givat Ram, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel 2Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA 3Department of Biological Chemistry, Institute of Life Sciences, Edmond J. Safra Campus-Givat Ram, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel 4Co-first author *Correspondence: [email protected] (S.K.), [email protected] (N.B.) http://dx.doi.org/10.1016/j.celrep.2015.03.023 This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

SUMMARY

Parental imprinting results in monoallelic parent-oforigin-dependent gene expression. However, many imprinted genes identified by differential methylation do not exhibit complete monoallelic expression. Previous studies demonstrated complex tissue-dependent expression patterns for some imprinted genes. Still, the complete magnitude of this phenomenon remains largely unknown. By differentiating human parthenogenetic induced pluripotent stem cells into different cell types and combining DNA methylation with a 50 RNA sequencing methodology, we were able to identify tissue- and isoform-dependent imprinted genes in a genome-wide manner. We demonstrate that nearly half of all imprinted genes express both biallelic and monoallelic isoforms that are controlled by tissue-specific alternative promoters. This study provides a global analysis of tissue-specific imprinting in humans and suggests that alternative promoters are central in the regulation of imprinted genes. INTRODUCTION Parental imprinting involves a subset of genes that is expressed exclusively from only one of the parental alleles. One of the hallmarks of this phenomenon is considered to be the preclusion of asexual forms of reproduction in placental mammals. This was manifested by the inability of complete maternal (parthenogenetic) and paternal (androgenetic) mouse embryos to develop normally and survive to term (McGrath and Solter, 1984; Surani and Barton, 1983; Surani et al., 1986). It is estimated that roughly 100 genes are imprinted in both human and mouse, many of which are expressed in the placenta and the brain (Coan et al., 2005; Davies et al., 2005; Fowden et al., 2006; Wilkinson et al., 308 Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors

2007). Intriguingly, chimeric mice models suggest that in addition to their role in early development, imprinted genes are subjected to a complex tissue-specific expression in the developing embryo (Thomson and Solter, 1988; Wilkinson et al., 2007). Specifically, a highly complex spatial distribution of imprinted genes was observed in the brain (Davies et al., 2005), making it a useful tissue in the study of parental imprinting. In humans, because of limited accessibility to study material, little is known about the magnitude of tissue-specific imprinted genes expression. One major limitation in studying tissue-specific expression of imprinted genes is the requirement to analyze a homogenous population of cells at a specific developmental stage. Parthenogenetic pluripotent stem cells (PSCs) serve as an attractive tool to study the role of imprinted genes in early embryonic development because they completely lack their paternal alleles. Both parthenogenetic and normal biparental PSCs can be differentiated into specific cell types, allowing for a comprehensive comparison of gene expression in a tissue-specific manner. We have previously shown that parthenogenetic induced PSCs (Pg-iPSCs) can be generated successfully by reprogramming of parthenogenetic ovarian teratomas (Stelzer et al., 2011). Genome-wide gene expression and DNA methylation analyses confirmed the parthenogenetic origin of these cells and enabled the identification of additional paternally expressed genes (PEGs) and imprinted differentially methylated regions (iDMRs) throughout the human genome (Stelzer et al., 2011, 2013). Furthermore, differentiating the Pg-iPSCs both in vivo and in vitro identified marked effects on the extra-embryonic trophectoderm and on embryonic liver and muscle tissues. These results demonstrated that Pg-iPSCs may be utilized to study tissue-specific effects of imprinted genes in early human development. Here we conducted a genome-wide tissue-specific study of imprinted genes in humans. Our analyses uncovered additional candidate imprinted genes that are differentially expressed between normal and parthenogenetic differentiated cells and expressed in a monoallelic fashion in normal cells. In addition, analyzing high-throughput sequencing data of both gene

A

B

C

D

E

F

G

(legend on next page)

Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors 309

expression and DNA methylation suggests that nearly half of the previously identified PEGs are subjected to a complex isoform and tissue-dependent regulation, which is controlled by iDMRs that reside at alternative promoters. Finally, as these complex tissue-specific isoforms are not easily detected using current methods, we developed a technique that enables a genome-wide search for tissue and isoform dependent imprinted genes. RESULTS Differentiation of Parthenogenetic and Normal PSCs to Different Cell Types To analyze the expression of human imprinted genes in various embryonic lineages, we differentiated normal PSCs and PgiPSCs into neural progenitor cells (NPCs) (Kim et al., 2010), as well as into early endodermal cells (Kopper and Benvenisty, 2012) (Figures 1A–1C). A major roadblock for enabling a comprehensive tissue-specific analysis of parental imprinting is the inherent heterogeneity in the differentiated cell populations. To overcome this limitation, we used surface markers for sorting the cells in order to generate more homogeneous cell populations (Figures 1A–1E). We used NCAM1 as a marker for NPCs and CXCR4 as a marker for early endoderm progenitor cells (Kopper and Benvenisty, 2012) (Figures 1D and 1E). Analyzing the gene expression of both positive and negative sorted cells populations confirmed their ectodermal and endodermal identity (Figures S1A and S1B, respectively). As expected, both NCAM1+ and CXCR4+ cell populations downregulated pluripotency associated markers and upregulated characteristic tissuespecific genes (Figure S1). In addition, comparing the gene expression profiles of the sorted cells populations and their parental undifferentiated cells demonstrated marked differences between their overall gene expression patterns (Figure 1F). Next, we compared the gene expression of the parthenogenetic and control cells for both NCAM1+ and CXCR4+ populations. This analysis identified a high correlation between the parthenogenetic and control cells, indicative of successful and robust differentiation (Figure 1G). In spite of this general similarity, close examination revealed some of the known PEGs to be downregulated in the parthenogenetic cells as compared with control cells (Figure 1G). Maternally expressed genes (MEGs) are less easily detected using parthenogenetic cells, as they are expected to exhibit a merely 2-fold increase in their gene expression as compared with normal cells. Therefore, we focused our analysis

on studying tissue-dependent PEGs in both NCAM1+ and CXCR4+ sorted cell populations. Charting the Dynamics of Known PEGs during Early Human Differentiation Analyzing known PEGs in the three different cell types (e.g., PSC, NCAM1+, CXCR4+) enabled us to classify the expressed PEGs according to two distinct expression patterns: (1) known PEGs that are consistently downregulated in the parthenogenetic cells as compared with control cells and (2) PEGs that are highly expressed in the parthenogenetic cells at comparable levels with the control cells, in at least one examined cell type (Figure 2A). Notably, unlike in group I, in which known PEGs showed consistent downregulation in all parthenogenetic cell types (e.g., PEG10; Figure 2B), some genes in group II showed high and comparable expression levels between the parthenogenetic and control cells in all cell types examined (e.g., GNAS; Figure 2B). Other genes in group II exhibited a more complex expression signature with high expression levels in some cell types and complete downregulation in others (e.g., MEST; Figure 2B). Accordingly, we sought to study the different genomic and epigenomic characteristics of the two groups of PEGs. Interestingly, close examination of annotated transcripts (RefSeq and USCS genes) of PEGs that showed consistent downregulation in the parthenogenetic cells (group I) revealed that the vast majority are single isoform genes (Figure 2C). In contrast, PEGs that showed high or variable expression levels in the parthenogenetic cells (group II) tended to exhibit a complex genomic organization consisting of multiple isoforms (Figure 2D). The parent-of-origin-specific marks are established in the germ cells by differential DNA methylation (Reik et al., 2001; Sasaki and Matsui, 2008). Unlike gene expression, which is tissue dependent and regulated by cell-specific transcription factors, iDMRs are maintained in most tissues to allow the monoallelic parent-of-origin expression of imprinted genes. Recently, a high-resolution single-base DNA methylation sequencing in multiple undifferentiated and adult tissues was reported (Bernstein et al., 2010). We therefore applied these data to link between gene expression of known PEGs and the genomic organization of their iDMRs. Normal post-zygotic tissues should exhibit intermediate methylation levels (30%–70% methylated CpGs) in their iDMRs. We also utilized a complete sperm methylome (Bernstein et al., 2010; Molaro et al., 2011), which should appear either hypermethylated (>70% methylated CpGs) or hypomethylated (3-fold) in both NCAM1+ and CXCR4+ parthenogenetic cells as compared with control cells (i.e., genes from group I; Figure 2A). Among the PEGs resulting in this analysis, we further analyzed two putative PEGs that are downregulated in a significant manner (p % 0.05) (Figure 3C). The two candidate genes, NAIP and WDR17, were not expressed in the undifferentiated cells (Figure S3A) and therefore could not have been detected in our previous analysis

(Stelzer et al., 2011). To verify whether the candidate PEGs are expressed in a monoallelic fashion, we utilized informative single nucleotides polymorphism (SNP), located in the transcribed region of both NAIP and WDR17. By applying RT-PCR and direct sequencing on two independent control sorted cells, we validated the monoallelic expression of both genes (Figure 3D), indicating that they are indeed new PEGs. Genome-wide Search for iDMRs in Genes with Alternative Promoters Next, we searched globally for imprinted genes that are regulated by alternative promoters (i.e., group II genes; Figure 2A). We therefore utilized single-base resolution DNA methylation data (Bernstein et al., 2010; Molaro et al., 2011) and performed an unbiased genome-wide search for iDMRs within genes that comprise alternative promoters. To reduce the potential tissuespecific variation and to increase the confidence of our findings, we included different cell types in our analysis (see Supplemental Experimental Procedures). The use of distinct cell types derived from different individuals significantly minimized the possibility of including DMRs that resulted from random monoallelic methylation rather than parental imprinting. We analyzed two adult tissues, representative of ectodermal and endodermal lineages (brain hippocampus and liver, respectively); PSC derived NPCs that closely resemble our established NCAM1+ cells and two undifferentiated PSCs (H1 and H9 cell lines). In addition, to identify the parent-of-origin signature of the putative iDMRs, we included testis spermatozoa primary cells (Bernstein et al., 2010; Molaro et al., 2011). The analysis comprised the subgroup of autosomal genes in the human genome, which harbor alternative promoters, including 1-kb regions both upstream and downstream to these genes. Subsequently, we searched for regions with consistent intermediate methylation levels in all samples. Then we further analyzed the levels of DNA methylation in primary testis cells and included regions that were either hypermethylated or hypomethylated in the testis (see Supplemental Experimental Procedures). Taken together, this analysis validated the majority (13 of 18) of the known PEGs with iDMRs at alternative promoters (Table S2). Notable exceptions were IGF2 and INS-IGF2, which did not include consistent iDMRs between the samples. SLC22A18 and TSC22D1, which comprise relatively small iDMR and therefore did not pass our threshold (see Supplemental Experimental Procedures), and NTM for which we could not detect an iDMR. In addition to 13 known imprinted genes, we identified 28 putative candidate genes with alternative promoters that harbor an intragenic iDMR (Figure 4A; Table S2). To further link the genomic location of the iDMRs to their putative regulatory function, we analyzed the distances between the iDMRs with respect to the transcription start site (TSS; Figure 4B). This analysis demonstrated that

maternal parental bias, respectively; black arrows point at biallelic expressed SNPs. iDMR is highlighted in bright blue. Note the schematic setting of the monoallelic (mono)- and biallelic (bi)-specific primer sets used to specifically identify ZNF331 isoforms. (B) Expression patterns of ZNF331 across different tissues of both biallelic and monoallelic isoforms as measured by RT-PCR in representative control (WT) and parthenogenetic (Pg) cell lines. TE, trophectoderm. Band sizes are designated in bp. (C) Average expression levels ± SD of two candidate PEGs in control and parthenogenetic cell types. (D) Sequencing of NAIP and WDR17 in two independent control PSC line. Shown are heterozygote SNPs in the genomic DNA (gDNA) and monoallelic expression in the complimentary DNA (cDNA).

314 Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors

A

B

DMRs in Alternative Promoters Known

Distance to TSS

Putative

5

Density

4 13

3 2 1

28

0 0.0

0.2

0.4

0.6

0.8

1.0

Distance (normalized to gene length)

C

D

H3K4me3

DMRs in Alternative Promoters

Density

Not-Conserved

Conserved

4 3

Mouse

Chimp

2

6 (16%)

1 24 (51%) 23 (49%) 0

31 (84%) +4

+2

0

-2

-4

Distance to DMR

DUSP22

Primer-mono

E

20 kb

Scale DUSP22

Primer-bi

DUSP22 1_

CD34 Primary Cells Adult Liver

0 1_ 0 1_

PSCs derived NPCs

0 1_

PSCs

0 1_

Sperm

0

CpG Island

F DUSP22 - total transcript Undiff

Relative Expression

300

NCAM1+

G

CXCR4+

DUSP22

250 200

bi

227

150

mono

95

100 50

WT R XC -C Pg

C TW

Pg

4+

4+ R

1+ XC

AM

1+

C -N Pg

W

T-

N

C

AM

SC -iP Pg

W

T-

PS C

s

s

0

Figure 4. Genome-wide DNA Methylation Analysis Identified New iDMR within Genes with Alternative Promoters (A) Pie chart showing the number of known and new iDMRs in alternative promoters, which were identified by our analysis.

(legend continued on next page)

Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors 315

the vast majority of putative iDMRs reside in close proximity to at least one TSS of their associated gene. Nevertheless, as some putative iDMRs are located relatively far from any known TSS, we further analyzed the distance between the iDMRs and the promoter-associated histone mark H3K4me3 (Figure 4C). Interestingly, virtually all putative iDMRs were associated with H3K4me3 marks (Figure 4C), suggesting that additional isoforms remain to be identified for some of these genes. Next we checked whether the iDMRs that reside in genes with alternative promoters are conserved among species. We thus utilized previously reported single-base resolution DNA methylation analysis in both mouse (Stadler et al., 2011) and chimpanzee (Zeng et al., 2012). In addition to searching for intermediate methylation levels in different cell types, we also verified that the appropriate region contained either hypomethylation or hypermethylation levels in mouse (Kobayashi et al., 2012) and chimpanzee (Molaro et al., 2011) testis cells. Our results demonstrate that nearly half of the iDMRs, which reside within genes with alternative promoters, are conserved between human and mouse (Figure 4D; Table S3). These results are in agreement with our previous analysis of all iDMRs in both mouse and human (Stelzer et al., 2013) and at comparable conservation levels with the group of previously identified human iDMRs that are associated with alternative promoters (Figure S3B). Our analysis of the chimpanzee methylome covers only 36 of the 46 human counterpart iDMRs because of gaps in the current chimpanzee genome map. Unlike in the mouse, analyzing the chimpanzee methylome identified high levels of conservation between the species (Figure 4D; Table S3) and at comparable levels with the group of previously identified iDMRs that are associated with alternative promoters (Figure S3B). To further support our findings, we validated the candidate isoform-dependent genes. As shown for substantiated PEGs, DUSP22 is regulated by two promoters—an upstream promoter that harbors an iDMR and a downstream promoter that is hypermethylated (Figure 4E). This epigenomic setting suggests that DUSP22 consists of both a long monoallelic isoform and a short biallelic isoform. Our global gene expression analysis comparing parthenogenetic and control cells could not detect differences in expression of DUSP22 (Figure 4F). However, by applying specific primers that were able to distinguish the two isoforms, we could detect a complete downregulation of the long isoform in two Pg-NCAM1+ cell lines (Figure 4G), demonstrating that DUSP22 is indeed a new isoform dependent imprinted gene. 50 Transcript Sequencing for Confirmation of Imprinted Genes with Alternative Promoters Our analysis demonstrated that nearly 50% of the known imprinted genes incorporate alternative promoters. As a result

of the overlap between the imprinted and non-imprinted isoforms, only the promoter region of each isoform can exhibit differential expression between the parthenogenetic and normal cells. Therefore, we implemented a method that enables a genome-wide expression analysis based on sequencing of the 50 end of the mRNA (see Supplemental Experimental Procedures). RNA was extracted from NCAM1+ and PSC cell lines from both parthenogenetic and control samples, and 50 cDNA libraries were then generated. Sequencing of these libraries resulted in identification of active promoters in both parthenogenetic and control cells. As a proof of concept, we compared the 50 expression patterns of some known PEGs with alternative promoters between the parthenogenetic and control cells. In agreement with previous reports of isoform-specific imprinting for the known PEG GRB10 (Monk et al., 2009), we demonstrated the differential promoter expression between parthenogenetic and control cells in only one out of the three promoters of this gene (Figure S3C). Another example is the imprinted gene GNAS, which was previously shown to exhibit a complex isoform- and tissue-dependent imprinting pattern (Kelsey, 2010). Close examination of the complex genomic and epigenomic organization of the GNAS locus revealed five distinct promoters that were marked by H3K4me3 chromatin modification: NESP55, GNASAS, GNASXL, GNAS1A, GNAS1 (Figure 5A). The allelic expression analysis revealed paternal bias of the isoforms that originated from the GNASAS and GNASXL promoters and biallelic expression of the isoform that originated from the GNAS1 promoter. The 50 RNA sequencing unraveled a more convoluted expression pattern of these genes, supporting a complex isoform-dependent parental expression. Some isoforms demonstrated paternal monoallelic expression (i.e., GNASAS and GNAS1A), whereas others demonstrated maternal (NESP55) or biallelic expression (GNAS1) (Figures 5A and 5B). The 50 RNA sequencing analysis can enable the identification of alternative promoters in genes that are known to have only a single TSS. Therefore, we performed our DNA methylation analysis on single isoform genes and filtered the results to include iDMRs located far from the known TSS, but next to K4me3 peaks (Table S4). By combining the 50 RNA sequencing data with the DNA methylation results, we validated another candidate imprinted gene NHP2L1, which consists of two alternative promoters: a known upstream promoter located in a hypomethylated region with a high expression pattern in all cell types and an additional downstream promoter located in an iDMR, which displayed complete downregulation in the parthenogenetic cells (Figures 5C and 5D). Thus, our results established that NHP2L1 includes an imprinted isoform. Finally, to conduct a genomewide search for additional PEGs with alternative promoters, we compared the 50 RNA sequencing between parthenogenetic

(B and C) Distribution of distances calculated from the putative iDMRs to the nearest TSS (B) or to enrichment sites of histone mark H3K4me3 (p < 104) (C). Distances were normalized to gene length. (D) Pie chart comparing the number of conserved iDMRs in alternative promoters between human and mouse (left) and human and chimpanzee (right). (E) Regional view of DUSP22. DNA methylation varies from 1 (fully methylated) to 0 (non-methylated); shown are average methylation levels of different cell types with respect to the genomic organization of the gene. iDMR is highlighted in bright blue. Note the schematic setting of the monoallelic (mono)- and biallelic (bi)specific primer sets used to specifically identify DUSP22 isoforms. (F) Average expression levels ± SD of DUSP22 comparing between the parthenogenetic and control cells in the different cells types. (G) Expression patterns of the monoallelic and biallelic isoforms of DUSP22 in NCAM1+ cells as measured by RT-PCR in two independent control (WT) and parthenogenetic (Pg) cell lines. Band sizes are designated in bp.

316 Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors

A

20 kb

Scale 50 _

WT-NCAM1+ 0 _ 50 _

Pg-NCAM1+ 0

_

GNAS-AS1 G ASGNAS (NESP55) GNAS ((GNASXL) SXL)

GNAS (GNAS1A)) GNAS (GNAS1) S1) GNAS

146 _

GNAS

H3K4me3 0_ 1_

CD34 Primary Cells

0 1_ _

Adult Liver

0 1_ _

PSCs derived NPCs

0 1_

_

PSCs

0 1_

_

Sperm

0

CpG Island

_

0.5 _

Parental Bias

0-0.5 _

B

C

GNAS Relative Fold Change (WT/Pg) 0

2

4

6

8

10

NHP2L1

NESP55 Scale

5 kb

50 _ GNAS-AS

WT-NCAM1+

0

50 _

Pg-NCAM1+

GNAS1A

0 152 _

GNAS1 H3K4me3 CD34 Primary Cells

D

NHP2L1 Relative Fold Change (WT/Pg) 0

Upstream Promoter

5

10 15 20 25 30 35

Adult Liver

0 1_ 0 1_

0 1_ PSCs derived NPCs 0 1_ PSCs 0 1_

Sperm

0

CpG Island Putative Downstream Promoter

Figure 5. Utilizing a 50 RNA Sequencing Method to Identify Promoter-Specific Differential Expression (A) Regional genomic and epigenomic view of the different isoforms of GNAS. DNA methylation varies from 1 (fully methylated) to 0 (non-methylated); shown are average methylation levels of different cell types with respect to the genomic organization of the different isoforms. Maternal-derived iDMRs are highlighted in bright blue, while the secondary paternal iDMR is highlighted in bright red. Parent-of-origin-specific allelic expression was analyzed on LCLs derived from seven individuals. Shown is the average parental bias of heterozygote SNP, calculated as the differences between b-values of gDNA and cDNA for each sample. Blue and red are paternal or maternal parental bias, respectively; black arrows point at biallelic expressed SNPs. Normalized 50 reads for representative control and parthenogenetic NCAM1+ cells are shown for each of the promoters. (B) Expression analysis of the different GNAS promoters quantifying the relative ratios ± SD of 50 reads between control and parthenogenetic cells. Black dashed line represents similar expression levels. (C) Regional genomic and epigenomic view of NHP2L1. DNA methylation varies from 1 (fully methylated) to 0 (non-methylated); shown are average methylation levels of different cell types with respect to the genomic organization of the different isoforms. iDMR is highlighted in bright blue and associates with the H3K4me3 signature. Normalized 50 reads for representative control, and parthenogenetic NCAM1+ cells are shown for both the known upstream and the new downstream promoters. (D) Expression analysis of the different NHP2L1 promoters quantifying the relative ratios ± SD of 50 reads between control and parthenogenetic cells. Black dashed line represents similar expression levels.

Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors 317

Figure 6. Tissue- and Isoform-Dependent Imprinted Genes

A

B

C

and control samples in different cell types. Using a 3-fold cutoff, this analysis identified 18 genes consisting of alternative promoters, in which their 50 ends were downregulated in the parthenogenetic cells as compared with control cells (Table S5). Among the 18 putative PEGs, 10 were uniquely expressed in the NCAM1+ cells, and 5 were expressed exclusively in the undifferentiated cells and the rest were downregulated in both cell types (Table S5). DISCUSSION Recent advances in high-throughput genomic and transcriptomic technologies have added only a few imprinted genes to the previously known ones, suggesting that the vast majority of imprinted genes have already been identified (Babak et al., 2008; Choufani et al., 2011; Morcos et al., 2011; Pollard et al., 2008). Nevertheless, recent genome-wide DNA methylation analyses in mouse (Smallwood et al., 2011; Xie et al., 2012) and human (Choufani et al., 2011; Court et al., 2014; Stelzer et al., 2013) 318 Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors

(A–C) Concluding scheme depicting possible expression patterns of tissue- and isoformdependent imprinted genes. (A) Tissues that express both imprinted and nonimprinted isoforms. (B) Tissues in which only the non-imprinted biallelic isoforms are expressed. (C) Tissues in which only the imprinted monoallelic isoforms are expressed.

uncovered additional iDMRs, thus implying that more imprinted genes remain to be identified. One possible explanation for this discrepancy is that some of the genes, which are marked with iDMR, are regulated in a monoallelic fashion according to their tissue-specific expression. Considering that most of the research in the field was conducted only on a small number of cell types, it is possible that more genes remain to be identified in other tissues. Indeed, comparing the gene expression between different parthenogenetic and control cell types uncovered two candidate PEGs (NAIP and WDR17) that are downregulated in the parthenogenetic cells and expressed in a monoallelic fashion in normal cells. Surprisingly, in addition to the PEGs that are consistently downregulated in the parthenogenetic cells, we demonstrated that nearly half of all known PEGs are highly expressed in the parthenogenetic cells, in comparable levels to their expression in the control cells. We further showed that the vast majority of these PEGs exhibited both biallelic and imprinted monoallelic isoforms. Isoform-specific regulation was previously associated with some imprinted genes (Hikichi et al., 2003; Kamei et al., 2007; Liu et al., 2005; Wood et al., 2007); however, the magnitude of this phenomenon was largely underappreciated. The simultaneous expression of a biallelic and monoallelic isoforms at specific cell types complicates the identification of the imprinted isoform (Figures 6A–6C). In most cases, the biallelic and monoallelic isoforms differ in only a few exons (e.g., MEST; Figure 2D). Therefore, only exon-based methods can distinguish between the different isoforms. Similar to array and sequencing-based methods, utilizing SNPs to identify monoallelic expression depends heavily on the ratio of expression between the two isoforms and is expected to show only minor parental bias in most of these genes. To overcome these limitations, we utilized two complementary approaches to identify putative imprinted genes with alternative promoters: (1) a genome-wide search for iDMRs, taking

advantage of DNA methylation data from multiple published cell lines, and (2) a technique identifying only the 50 mRNA expression levels, which enabled us to distinguish appropriately between the imprinted and non-imprinted transcripts. Analyzing the 50 RNA sequencing results enabled a genome-wide identification of differential expression between the parthenogenetic and control cells but also within genes that are thought to consist of a single TSS. We suggest that executing this method on additional tissues and combining the results with appropriate DNA methylation analysis will facilitate the identification of additional, previously unidentified, isoform-specific imprinted genes. Considering the specificity of the cell populations included in our analysis, our findings support the notion that dozens of tissue- and isoform-dependent imprinted genes remain to be identified in human cells. Finally, we show that the vast majority of the newly identified iDMRs are also present in the chimpanzee, supporting their evolutionary conservation. Interestingly, although genomic imprinting was primarily and extensively studied in mouse, recent data hint at differences between human and mouse imprinted genes (Court et al., 2014; Nakabayashi et al., 2011; Stelzer et al., 2011, 2013). Consistent with previous reports (Court et al., 2014; Stelzer et al., 2013), nearly half of the iDMRs identified in this study are not conserved between mouse and human. Taken together, many of the recently identified imprinted genes and iDMRs appear to be species specific, strengthening the importance of studying parental imprinting in human cells. EXPERIMENTAL PROCEDURES Cell Types Pg-iPSCs were established and cultured as previously described (Stelzer et al., 2011). NCAM1-positive cells were derived from human PSCs using a previously reported protocol for differentiation toward early NPCs (Kim et al., 2010). CXCR4-positive cells were derived using a protocol for differentiation toward early endodermal progenitor cells, as described in the Supplemental Experimental Procedures. BMP4-treated cells were derived as previously described (Stelzer et al., 2011). Allelic Expression Analysis Parent-of-origin allelic expression analysis was performed as previously described (Morcos et al., 2011). Genome-wide DMR Analysis The methylation analysis was performed using six independent whole-genome bisulfite sequencing (WGBS) samples from different tissues, downloaded from the NIH Roadmap Epigenomics Project (Bernstein et al., 2010), as described in detail in Supplemental Experimental Procedures. Generation and Analysis of 50 RNA Sequencing Libraries 50 RNA sequencing libraries were generated from undifferentiated and NCAM1+ from both parthenogenetic and control cells. For more detailed information regarding protocols and procedures, see Supplemental Experimental Procedures.

ACCESSION NUMBERS The NCBI Gene Expression Omnibus accession number for the microarray data reported in this paper is GEO: GSE65002.

SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental Experimental Procedures, three figures, and five tables and can be found with this article online at http://dx.doi.org/10.1016/j.celrep.2015.03.023. AUTHOR CONTRIBUTIONS Y.S., S.B., and N.B. conceived the study and designed the experiments. Y.S and S.B performed the experiments and analyzed and interpreted the data. O.B. prepared the 50 RNA sequencing libraries, and S.A. analyzed the sequencing data. D.R. performed genome-wide DMR analysis. Y.S and S.B. wrote the manuscript with input from D.R., S.K., and N.B. ACKNOWLEDGMENTS N.B. is supported by the Israel Science Foundation-Morasha Foundation (grant number 1252/12), by the Israel Ministry of Science and Technology Infrastructure (grant number 3-9693), by the Rosetrees Trust, and by the Azrieli Foundation. Y.S. is supported by a Human Frontier Science Program postdoctoral fellowship. D.R. is supported by the Israel Cancer Research Fund. S.K. is supported by the HFSP Program Grant (grant number 31/2011) and by the Israel Science Foundation-Morasha Foundation (grant number 839/10). Received: October 10, 2014 Revised: January 19, 2015 Accepted: March 10, 2015 Published: April 2, 2015 REFERENCES Babak, T., Deveale, B., Armour, C., Raymond, C., Cleary, M.A., van der Kooy, D., Johnson, J.M., and Lim, L.P. (2008). Global survey of genomic imprinting by transcriptome sequencing. Curr. Biol. 18, 1735–1741. Bernstein, B.E., Stamatoyannopoulos, J.A., Costello, J.F., Ren, B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M.A., Beaudet, A.L., Ecker, J.R., et al. (2010). The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048. Choufani, S., Shapiro, J.S., Susiarjo, M., Butcher, D.T., Grafodatskaya, D., Lou, Y., Ferreira, J.C., Pinto, D., Scherer, S.W., Shaffer, L.G., et al. (2011). A novel approach identifies new differentially methylated regions (DMRs) associated with imprinted genes. Genome Res. 21, 465–476. Coan, P.M., Burton, G.J., and Ferguson-Smith, A.C. (2005). Imprinted genes in the placenta–a review. Placenta 26 (Suppl A), S10–S20. Court, F., Tayama, C., Romanelli, V., Martin-Trujillo, A., Iglesias-Platas, I., Okamura, K., Sugahara, N., Simo´n, C., Moore, H., Harness, J.V., et al. (2014). Genome-wide parent-of-origin DNA methylation analysis reveals the intricacies of human imprinting and suggests a germline methylation-independent mechanism of establishment. Genome Res. 24, 554–569. Davies, W., Isles, A.R., and Wilkinson, L.S. (2005). Imprinted gene expression in the brain. Neurosci. Biobehav. Rev. 29, 421–430. Ferro´n, S.R., Charalambous, M., Radford, E., McEwen, K., Wildner, H., Hind, E., Morante-Redolat, J.M., Laborda, J., Guillemot, F., Bauer, S.R., et al. (2011). Postnatal loss of Dlk1 imprinting in stem cells and niche astrocytes regulates neurogenesis. Nature 475, 381–385. Fowden, A.L., Sibley, C., Reik, W., and Constancia, M. (2006). Imprinted genes, placental development and fetal growth. Horm. Res. 65 (Suppl 3), 50–58. Ge, B., Pokholok, D.K., Kwan, T., Grundberg, E., Morcos, L., Verlaan, D.J., Le, J., Koka, V., Lam, K.C., Gagne´, V., et al. (2009). Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nat. Genet. 41, 1216–1222. Hikichi, T., Kohda, T., Kaneko-Ishino, T., and Ishino, F. (2003). Imprinting regulation of the murine Meg1/Grb10 and human GRB10 genes; roles of

Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors 319

brain-specific promoters and mouse-specific CTCF-binding sites. Nucleic Acids Res. 31, 1398–1406. Kamei, Y., Suganami, T., Kohda, T., Ishino, F., Yasuda, K., Miura, S., Ezaki, O., and Ogawa, Y. (2007). Peg1/Mest in obese adipose tissue is expressed from the paternal allele in an isoform-specific manner. FEBS Lett. 581, 91–96. Kelsey, G. (2010). Imprinting on chromosome 20: tissue-specific imprinting and imprinting mutations in the GNAS locus. Am. J. Med. Genet. C. Semin. Med. Genet. 154C, 377–386. Kim, D.S., Lee, J.S., Leem, J.W., Huh, Y.J., Kim, J.Y., Kim, H.S., Park, I.H., Daley, G.Q., Hwang, D.Y., and Kim, D.W. (2010). Robust enhancement of neural differentiation from human ES and iPS cells regardless of their innate difference in differentiation propensity. Stem Cell Rev. 6, 270–281. Kobayashi, H., Sakurai, T., Imai, M., Takahashi, N., Fukuda, A., Yayoi, O., Sato, S., Nakabayashi, K., Hata, K., Sotomaru, Y., et al. (2012). Contribution of intragenic DNA methylation in mouse gametic DNA methylomes to establish oocyte-specific heritable marks. PLoS Genet. 8, e1002440. Kopper, O., and Benvenisty, N. (2012). Stepwise differentiation of human embryonic stem cells into early endoderm derivatives and their molecular characterization. Stem Cell Res. 8, 335–345. Li, J., Bench, A.J., Vassiliou, G.S., Fourouclas, N., Ferguson-Smith, A.C., and Green, A.R. (2004). Imprinting of the human L3MBTL gene, a polycomb family member located in a region of chromosome 20 deleted in human myeloid malignancies. Proc. Natl. Acad. Sci. USA 101, 7341–7346. Li, J., Bench, A.J., Piltz, S., Vassiliou, G., Baxter, E.J., Ferguson-Smith, A.C., and Green, A.R. (2005). L3mbtl, the mouse orthologue of the imprinted L3MBTL, displays a complex pattern of alternative splicing and escapes genomic imprinting. Genomics 86, 489–494. Liu, J., Chen, M., Deng, C., Bourc’his, D., Nealon, J.G., Erlichman, B., Bestor, T.H., and Weinstein, L.S. (2005). Identification of the control region for tissuespecific imprinting of the stimulatory G protein alpha-subunit. Proc. Natl. Acad. Sci. USA 102, 5513–5518. McGrath, J., and Solter, D. (1984). Completion of mouse embryogenesis requires both the maternal and paternal genomes. Cell 37, 179–183. Molaro, A., Hodges, E., Fang, F., Song, Q., McCombie, W.R., Hannon, G.J., and Smith, A.D. (2011). Sperm methylation profiles reveal features of epigenetic inheritance and evolution in primates. Cell 146, 1029–1041. Monk, D., Arnaud, P., Frost, J., Hills, F.A., Stanier, P., Feil, R., and Moore, G.E. (2009). Reciprocal imprinting of human GRB10 in placental trophoblast and brain: evolutionary conservation of reversed allelic expression. Hum. Mol. Genet. 18, 3066–3074. Morcos, L., Ge, B., Koka, V., Lam, K.C., Pokholok, D.K., Gunderson, K.L., Montpetit, A., Verlaan, D.J., and Pastinen, T. (2011). Genome-wide assessment of imprinted expression in human cells. Genome Biol. 12, R25. Nakabayashi, K., Trujillo, A.M., Tayama, C., Camprubi, C., Yoshida, W., Lapunzina, P., Sanchez, A., Soejima, H., Aburatani, H., Nagae, G., et al. (2011). Methylation screening of reciprocal genome-wide UPDs identifies novel human-specific imprinted genes. Hum. Mol. Genet. 20, 3188–3197. Noguer-Dance, M., Abu-Amero, S., Al-Khtib, M., Lefe`vre, A., Coullin, P., Moore, G.E., and Cavaille´, J. (2010). The primate-specific microRNA gene cluster (C19MC) is imprinted in the placenta. Hum. Mol. Genet. 19, 3566–3582.

320 Cell Reports 11, 308–320, April 14, 2015 ª2015 The Authors

Pant, P.V., Tao, H., Beilharz, E.J., Ballinger, D.G., Cox, D.R., and Frazer, K.A. (2006). Analysis of allelic differential expression in human white blood cells. Genome Res. 16, 331–339. Pollard, K.S., Serre, D., Wang, X., Tao, H., Grundberg, E., Hudson, T.J., Clark, A.G., and Frazer, K. (2008). A genome-wide approach to identifying novelimprinted genes. Hum. Genet. 122, 625–634. Reik, W., Dean, W., and Walter, J. (2001). Epigenetic reprogramming in mammalian development. Science 293, 1089–1093. Sasaki, H., and Matsui, Y. (2008). Epigenetic events in mammalian germ-cell development: reprogramming and beyond. Nat. Rev. Genet. 9, 129–140. Smallwood, S.A., Tomizawa, S., Krueger, F., Ruf, N., Carli, N., SegondsPichon, A., Sato, S., Hata, K., Andrews, S.R., and Kelsey, G. (2011). Dynamic CpG island methylation landscape in oocytes and preimplantation embryos. Nat. Genet. 43, 811–814. Stadler, M.B., Murr, R., Burger, L., Ivanek, R., Lienert, F., Scho¨ler, A., van Nimwegen, E., Wirbelauer, C., Oakeley, E.J., Gaidatzis, D., et al. (2011). DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495. Stelzer, Y., Yanuka, O., and Benvenisty, N. (2011). Global analysis of parental imprinting in human parthenogenetic induced pluripotent stem cells. Nat. Struct. Mol. Biol. 18, 735–741. Stelzer, Y., Ronen, D., Bock, C., Boyle, P., Meissner, A., and Benvenisty, N. (2013). Identification of novel imprinted differentially methylated regions by global analysis of human-parthenogenetic-induced pluripotent stem cells. Stem Cell Reports 1, 79–89. Surani, M.A., and Barton, S.C. (1983). Development of gynogenetic eggs in the mouse: implications for parthenogenetic embryos. Science 222, 1034–1036. Surani, M.A., Barton, S.C., and Norris, M.L. (1986). Nuclear transplantation in the mouse: heritable differences between parental genomes after activation of the embryonic genome. Cell 45, 127–136. Thomson, J.A., and Solter, D. (1988). The developmental fate of androgenetic, parthenogenetic, and gynogenetic cells in chimeric gastrulating mouse embryos. Genes Dev. 2, 1344–1351. Vu, T.H., and Hoffman, A.R. (1994). Promoter-specific imprinting of the human insulin-like growth factor-II gene. Nature 371, 714–717. Wilkinson, L.S., Davies, W., and Isles, A.R. (2007). Genomic imprinting effects on brain development and function. Nat. Rev. Neurosci. 8, 832–843. Wood, A.J., Bourc’his, D., Bestor, T.H., and Oakey, R.J. (2007). Allele-specific demethylation at an imprinted mammalian promoter. Nucleic Acids Res. 35, 7031–7039. Xie, W., Barr, C.L., Kim, A., Yue, F., Lee, A.Y., Eubanks, J., Dempster, E.L., and Ren, B. (2012). Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831. Zeng, J., Konopka, G., Hunt, B.G., Preuss, T.M., Geschwind, D., and Yi, S.V. (2012). Divergent whole-genome methylation maps of human and chimpanzee brains reveal epigenetic basis of human regulatory evolution. Am. J. Hum. Genet. 91, 455–465.

Cell Reports Supplemental Information

Differentiation of Human Parthenogenetic Pluripotent Stem Cells Reveals Multiple Tissue- and Isoform-Specific Imprinted Transcripts Yonatan Stelzer, Shiran Bar, Osnat Bartok, Shaked Afik, Daniel Ronen, Sebastian Kadener, and Nissim Benvenisty

Supplementary Figure 1, Relates to Figure 1

A

B

NANOG OCT4

2.0

6

SOX2

5

1.5

4

1.0

3

0.5

2

0

1

SOX2

0

3

25

2

20

1

15 10

Relative Fold Change

NESTIN

3 2 1 0

0 14 12 10 8

NEUROD1

1,000

6

800

4

600

2

400

0

200 0

12

NCAM1

10

6

20

4

10

2

0

0

WT-iPSCs

Diff. NCAM1

30

Diff. NCAM1+

8

EB 20 d

40

Undiff.

HNF4a

Diff. CXCR4

50

FOXA2

Diff. CXCR4+

1,200

5

EB 20 d

Relative Fold Change

0 4

CXCR4

Undiff.

4

Pg-iPSCs

Supplemental Figure 1. Characterization of the different sorted cell populations. (A-B) qRT-PCR of the mean relative fold change ± SD of pluripotency markers, and selected tissue specific markers in (A) WT-iPSCs following in vitro differentiation to neural progenitor cell (NPCs). Gene expression analysis was conducted on control undifferentiated iPSCs, 20 days old embryoid bodies (EBs), and sorted NPCs (both NCAM1 positive and negative populations). (B) Pg-iPSCs following in vitro differentiation to endodermal progenitor cells. Gene expression analysis was conducted on control undifferentiated Pg-iPSCs, 20 days old embryoid bodies (EBs), and sorted CXCR4 cells (both positive and negative populations).

Supplementary Figure 2, Relates to Figure 3

A

Scale

20 kb

L3MBTL1

Primer-bi L3MBTL1

L3MBTL1

L3MBTL1

SGK2

Primer-mono 90 _

L3MBTL1

SGK2 SGK2

L3MBTL1 SGK2

0_ 1_

CD34 Primary Cells 0_ 1_

Adult Liver 0_ 1_

PSCs derived NPCs 0_ 1_

PSCs 0_ 1_

Sperm 0_

CpG Island

0.5 _ 0-0.5_

B

L3MBTL1 PSCs

NCAM1

433

bi

184

mono WT

Pg

WT Pg

Supplemental Figure 2. Known PEGs with alternative promoters. (A) Regional view of L3MBTL1. DNA methylation varies from 1 hyper-methylated to 0 hypo-methylated; Shown are average methylation levels of different cell types, in respect to the genomic organization of the genes. Parent-of-originspecific allelic expression was analyzed on LCLs derived from seven individuals. Shown is the average parental bias of heterozygote SNP, calculated as differences between b-values of gDNA and cDNA for each sample; Blue and red – paternal or maternal parental bias (respectively); Black arrows point at biallelic expressed SNPs. iDMRs are highlighted in bright blue. Note the schematic setting of the monoallelic (mono) and biallelic-specific (bi) primer sets, used to specifically identify L3MBTL1 isoforms. (B) Expression patterns of L3MBTL1 in PSCs and NPCs of both biallelic and monoallelic isoforms as measured by RT-PCR in representative control (WT) and parthenogenetic (Pg) cell lines; Band sizes are designated in bp.

Supplementary Figure 3, Relates to Figures 4 & 5

A

B 500

WT-PSCs

Known DMRs in Alternative Promoters

Pg-PSCs

Relative Expression

450

Conserved

Not-Conserved

400

Mouse

350

Chimp

300

(6%) 1

250 200

7 (37%)

150

12 (63%)

100

17 (94%)

50 0

NAIP

WDR17

C

GRB10 Scale

20 kb

50 _

WT-NCAM1+

0_

50 _

Pg-NCAM1+

0_ GRB10 130 _

H3K4me3

GRB10 GRB10

0_ 1_

CD34 Primary Cells

0_ 1_

Adult Liver

0_ 1_

PSCs derived NPCs

0_ 1_

PSCs

0_ 1_

Sperm 0

CpG Island

Relative Expression

D

_

(A) Average expression levels ± S.D of the putative imprinted genes comparing between the parthenogenetic and control cells in the undifferentiated state. Vertical segmented line represents minimal expression values. (B) Pie-chart comparing the conservation of the

12 11 10 9 8 7 6 5 4 3 2 1 0 Downstream monoallelic isoform

Upstream biallelic isoform

Table S1 - Groups of known PEG. Relates to Figure 2 Alternative Promoters Genes Single Isoform Genes DIRAS3 RNU5D GPR1 LRRTM1 PLAGL1 ZDBF2 GRB10 NAP1L5 DDC FAM50B MEST SGCE ZFAT PEG10 GLIS3 COPG2IT1 INPP5F MESTIT1 WT1 DLGAP2 IGF2 ZFAT-AS1 ZC3H12C KCNQ1OT1 SNRPN IGF2AS SNURF DLK1 PEG3 MAGEL2 BLCAP MKRN3 L3MBTL SNORD116 GNAS SNORD107 HYMAI SNORD108 INS SNORD109 ZIM2 SNORD115 NDN SNORD64 MIMT1 PSIMCT-1 NNAT GNASAS

Table S2-Alternative Promoter DMR Analysis. Relates to Figure 4 Chr

Location

Locus

chr1 chr1 chr6 chr7 chr7 chr10 chr15 chr16 chr20 chr20 chr20 chr19 chr8

68517144-68517427 3649251-3649625 144328918-144329830 50849818-50850621 130131731-130132414 121577751-121578357 25200520-25201021 3493246-3493519 57463504-57464520 42143149-42143564 36149667-36150104 54041020-54041818 141108148-141109518

DIRAS3 TP73 PLAGL1 GRB10 MEST INPP5F SNRPN NAA60/ZNF597 GNAS L3MBTL1 BLCAP/NNAT ZNF331 TRAPPC9

chr1 chr1 chr1 chr1 chr1 chr3 chr3 chr3 chr3 chr6 chr6 chr7 chr11 chr12 chr12 chr14 chr15 chr15 chr16 chr16 chr17 chr17 chr17 chr18 chr19 chr19 chr19 chr20

16974427-16974826 6514916-6516071 895064-895414 175568378-175568751 245851289-245851829 50313087-50313458 122640626-122641352 47050555-47051595 127795521-127795920 35108820-35109165 292237-292649 138348883-138349584 94278171-94278816 1905963-1906726 52408540-52408845 102554649-102555064 65688019-65688522 31775770-31776367 88947647-88947970 2140693-2141158 26708160-26708676 43507014-43507754 1960566-1961664 13641530-13641941 49232188-49232587 49713476-49714087 19221179-19221569 62327941-62328513

MST1P2 ESPN KLHL17/NOC2L TNR KIF26B SEMA3B SEMA5B NBEAL2 RUVBL1 TCP11 DUSP22 SVOPL PIWIL4/FUT4 LRTM2/CACNA2D4 GRASP HSP90AA1 IGDCC4 OTUD7A CBFA2T3 PKD1 SARM1 ARHGAP27 HIC1 LDLRAD4 RASIP1/MAMSTR TRPM4 SLC25A42 RTEL1-TNFRSF6B

Previously associated with alternative promoter regulation Testis Methylation Nearby Known DMR - Alternative Promoters Hypo Promoter No Hyper End of gene No Hypo Promoter Yes Hypo Promoter Yes Hypo Promoter Yes Hypo Promoter Yes Hypo Promoter No (All promoters are differentially methylated) Hypo Promoter No Hypo Promoter Yes Hypo Promoter No Hypo Promoter No Hypo Promoter No Hypo K4me3 No Novel DMR - Alternative Promoters Hypo Promoter Hypo Middle of gene Hypo Promoter Hypo K4me3 Hypo End of gene Hypo Promoter Hypo K4me3 Hypo K4me3 Hypo K4me3 Hypo Promoter Hypo Promoter Hypo Promoter Hypo Promoter Hyper K4me3 Hypo End of gene Hypo Promoter Hypo Promoter Hypo K4me3 Hyper Promoter Hypo K4me3 Hypo Promoter Hypo Promoter Hypo Promoter Hypo Promoter Hypo K4me3 Hypo K4me3 Hypo End of gene Hyper Promoter

Table S3-DMR Conservation. Relates to Figure 4 Gene Chimp DMR Mouse DMR Known DMR With Alt. Promoters DIRAS3 Yes No Orthologue TP73 No No PLAGL1 Yes Yes GRB10 Yes Yes MEST Yes Yes INPP5F Yes Yes SNRPN Yes Yes NAA60/ZNF597 Yes No GNAS Yes Yes L3MBTL1 Yes Yes ZNF331 Yes No Orthologue TRAPPC9 Yes Yes (Known - Peg13) BLCAP/NNAT Yes Yes KCNQ1 Yes Yes INS-IGF2 Yes Yes NTM Yes No SLC22A18 Yes No TSC22D1 NA No Novel DMR MST1P2 Yes Yes ESPN Yes Yes KLHL17/NOC2L Yes No TNR Yes No KIF26B No No SEMA3B Yes Yes SEMA5B Yes Yes NBEAL2 No No RUVBL1 NA No TCP11 No No DUSP22 NA No SVOPL NA No PIWIL4/FUT4 Yes Yes LRTM2/CACNA2D4 NA Yes GRASP NA Yes HSP90AA1 NA No IGDCC4 Yes Yes OTUD7A NA No CBFA2T3 No No PKD1 NA No SARM1 Yes Yes ARHGAP27 Yes Yes HIC1 NA No LDLRAD4 Yes No RASIP1/MAMSTR Yes Yes TRPM4 Yes No SLC25A42 Yes No RTEL1-TNFRSF6B No No NA - No Available data in the Chimp genome

Chr chr13 chr5 chr19 chr11 chr15 chr13 chr5 chr22 chr5 chr20

Table S4-Single Isoforms DMR Analysis. Relates to Figure 5. Location Locus Testis Methylation 21295591-21296127 AK055408 Hypo 60921809-60922421 BC032910 Hypo 1295447-1295842 EFNA2 Hypo 639698-640381 DRD4 Hypo 82335385-82336358 MEX3B Hypo 113763756-113765376 F7 Hypo 140871005-140872423 PCDHGC5 Hypo 42078062-42078508 NHP2L1 Hypo 172110797-172111403 NEURL1B Hypo 3732154-3732475 HSPA12B Hypo

Table S5-5' RNA sequencing analysis - Candidate PEGs with alternative promoters. Relates to figure 5 Gene WT Undiff Average Pg Undiff Average WT-NCAM1 Average Pg-NCAM1 Average Expression in Pluripotent and NCAM1+ Cells chr2:133402336-133427780 LYPD1 69.2 16.2 162.6 27.8 chr7:94034984-94037203 COL1A2 166.0 28.3 91.9 12.7 chr11:11984542-12030917 DKK3 41.2 12.0 41.8 12.1 chr12:133345494-133405288 GOLGA3 59.1 18.8 40.3 12.2 Expression restricted to Pluripotent Cells chr1:115828536-115880857 NGF 33.4 1.6 NE NE chr18:5392387-5543986 EPB41L3 32.4 7.5 NE NE chr19:45417576-45422606 APOC1 38.6 8.8 NE NE chr19:47278139-47288134 SLC1A5 77.9 6.3 NE NE Expression restricted to NCAM1+ Cells chr1:11539294-11541938 PTCHD2 NE NE 46.8 10.2 chr1:201372894-201390874 TNNI1 NE NE 40.2 5.3 chr2:66662531-66799583 MEIS1 NE NE 37.7 8.5 chr4:157682762-157892546 PDGFC NE NE 37.2 3.0 chr5:92919042-92929786 NR2F1 NE NE 1745.3 246.6 chr6:37600283-37665766 MDGA1 NE NE 35.9 7.6 chr8:124260689-124286727 ZHX1 NE NE 34.9 2.4 chr12:106976684-107156582 RFX4 NE NE 245.3 41.2 chr12:47469489-47473734 AMIGO2 NE NE 32.0 5.5 chr13:31774111-31906411 B3GALTL NE NE 43.2 7.0 Location

NE - Not Expressed

Supplemental Experimental Procedures Cell types Parthenogenetic induced pluripotent stem cells were established and cultured as previously described (Stelzer et al., 2011). NCAM1-positive cells were derived from human PSCs using a previously reported protocol for differentiation towards early neural progenitor cells (Kim et al., 2010) with slight modification. Briefly, PSC colonies were cultured as embryoid bodies (EBs) for 4 days in DMEM-F12 medium (Sigma) supplemented with 15% Knockout Serum Replacement -mercaptoethanol (Sigma), 2 mM L-glutamine, 0.1 mM nonessential

dorsomorphin (Tocris Bioscien plated for neural expansion on 0.2% gelatin-coated plates and cultured for an additional 6 days in DMEM-F12 medium supplemented with 1× N2 (Invitrogen), 2 mM L-glutamine and 20 ng/ml basic fibroblast growth factor (R&D). Neural cells expressing NCAM1 were sorted by flow cytometry using hNCAM-1/CD56 antibody (1:150, R&D). CXCR4-positive cells were derived using the following protocol for differentiation towards early endodermal progenitor cells: PSC colonies were separated and plated on Matrigel coated plates (BD Pharmingen) and cultured for 24 hours in mTSeR1 medium (Stemgent). The medium was then replaced with 10ml RPMI (Sigma) and 100 ng ml-1 Activin (Peprotech Inc.). This treatment was repeated every 48 hours during the next 4 days, with fresh medium, Activin and addition of B-27 (Invitrogen) dilution 1:50 and 0.5mM NaBut (Sigma). After 7 days from

the time of plating on Matrigel, Endodermal cells expressing CXCR4 were sorted by flow cytometry using CD184 (CXCR4) antibodies (1:20, BD Pharmingen). BMP4 treated cells were derived as previously described (Stelzer et al., 2011) . Briefly, PSC were treated with 50 ng ml-1 of BMP4 for 7 days and then subjected to Microarray analysis. Isolation of genomic DNA and RNA and reverse transcription Total genomic DNA was extracted using Nucleic Acid and Protein Purification kit (MachereyNagel Corporation), and RNA was purified with PerfectPure RNA Cultured Cell kit (5-Prime), or RNeasy® Mini Kit (QIAGEN). One microgram of total RNA was used for reverse transcription using ImProm-II reverse transcriptase (Promega) with random hexamer primers. Sequencing and RT-PCR experiments were performed with GoTaq® (Promega), whereas quantitative real time-PCR was performed using TaqMan® Universal Master Mix or SYBR green qPCR Supermix. Data was analyzed with the 7300 real-time PCR system (Applied Biosystems). For sequencing the PCR products were cleaned using a Mega Quick spin purification kit (Intron Biotechnology). RT PCR Gene

Isoform

MEST

Biallelic

MEST

Monoallelic

ZNF331

Biallelic

ZNF331

Monoallelic

L3MBTL1

Biallelic

L3MBTL1 DUSP22

Monoallelic Monoallelic

5' Primer AAACATGGAGTCCT GTAGGCAA GATAACGCGGCCAT GGTG GGGTCTCCGTGTCTC TGAAA CTGTCCCCGTAACTG TGACA CTCGGACCGTAGCTA GGC TGAGGGTTTGGCTG GTGTAG TGTAACATGCCATAG

3' Primer ATGTGCAGGTACGC AGCAA ACTTCCATGAGTGA AGGGCA GAAGGCCAGCTCTT TCTTCC GATTCGTCTTCCTCC TCGGG GTGAGGACCAGAA CCGGG AGGGCAGCCATCAT TAGAGG CGGGCAGGATCTTG

Size (bp)

No. of cycles

Tm

105

35

56

145

30

56

433

35

55

161+260

35

55

184

35

55

433 95

30 35

55 55

DUSP22

Biallelic

TGCGC GGGGAGTGTGGCTG TAGAAT

TTCATC GCGGCTGTGAAGA AAGAACA

227

30

56

RT PCR for Sequencing Gene

SNP

5' Primer genomic 3' Primer genomic Size No. of Size No. of DNA DNA (bp) cycles Tm 5' Primer CDNA 3' Primer CDNA (bp) cycles

CATCCAGATTGT NAIP rs28409706 GGGTTCCT TGATGGTCAAAT WDR17 rs4276243 CCATCCAA

Gene GAPDH HNF4A FOXA2 SOX2 NESTIN NCAM1 Gene GAPDH OCT4 NANOG NEUROD1

AAACGCCAGAGA AACACTTCA CCCTCACTTAAAA ATGGCAGTC

5' Primer AGCCACATCGCTCAGACACC TGTACTCCTGCAGATTTAGCC GGGAGCGGTGAAGATGGA TCACGCAAAAACCGCGAT ATCTGCAAACCCATCGGACTC GATGCGACCATCCACCTCAA TaqMan Probe Hs 99999905_m1 Hs 00742896_s1 Hs 02387400_g1 Hs 00159598_m1

Tm

658 35

GGACGGACAG AAGACAAAGC 56 AGCATTTGTT CAGCCTCTGA 248

35

55

272 35

55 Same as gDNA Same as gDNA 272

35

55

qRT PCR 3' Primer GTACTCAGCGCCAGCATCG CTGTCCTCATAGCTTGACCT TCATGTTGCTCACGGAGGAGTA TATACAAGGTCCATTCCCCCG TGAGGCACCTTTTCTTCCTGG TCTCCGGAGGCTTCACAGGTA

Size (bp) 302 163 89 129 121 113

DNA microarray analysis Total RNA was extracted according to the manufacturer's protocol (Affymetrix, CA). RNA was subjected to Human Gene 1.0 ST microarray platform (Affymetrix, CA) and washing and scanning were performed according to the manufacturer's protocol. Arrays were analyzed using Robust Multichip Analysis (RMA) in the Affymetrix Expression Console. In our search for putative imprinted genes, we focused on genes where their expression is down-regulated in the parthenogenetic cells by at least 3-fold. Genes that were further analyzed were verified to be significantly down-regulated in the parthenogenetic cells with p-value

Allelic expression analysis Parent-of-origin allelic expression analysis was performed as previously described (Morcos et al., 2011) on 7 individuals of CEPH family 1,420. In order to calculate allelic bias we applied the

(cDNA – gDNA) ; (Paternal bias). If AB_allele != phased genotype then Al – cDNA) ; (Maternal bias). The final value is the average of the phased AIs from all samples. Genome wide DMR analysis The methylation analysis was performed using 6 independent whole genome bisulfite sequencing (WGBS) samples from different tissues, downloaded from the NIH Roadmap Epigenomics Project (Bernstein et al., 2010) as follows: Adult Liver (GSM916049); Brain Hippocampus middle (GSM916050); H1 Derived Neural Progenitor (GSM675542); Human Embryonic Stem Cells H1 (GSM429321) and H9 (GSM706059); Testis (GSM1127117). DNA methylation varies from 1 (fully-methylated) to 0 (non-methylated). We used CpGs containing values ranging between 0.3-0.7 (30%-70% methylated CpGs), to be considered as intermediate-methylated. A list of all UCSC genes was filtered to include only genes with known variants driven from alternative promoters. Data from a region of 2kb surrounding each of the genes were extracted for analysis. Methylation values from these coordinates were used for the detection of intermediate-methylated regions using the following calculation algorithms: Following Initial smoothing, intermediate-methylated CpGs (methylation values between 0.3-0.7) were scored based on the position of the next 5 intermediate-methylated CpGs. Every intermediatemethylated value found within 75bp from the analyzed CpG added the value 1 to its score, making the maximum score 5 for each CpG. Only CpGs with scores >3 were selected to create a

list of coordinates for intermediate-methylation peaks. Next, the best peak in each region was selected based on the highest score (for equal scores the shortest region was chosen), and based on overlapping with peaks found in other samples. Subsequently, a list of minimal consensus regions and maximal spanning regions of the putative iDMRs was generated. In addition, we filtered out all regions with overlapping intermediate-methylation scores in the testis to include only regions that were either hyper- or hypo-methylated in the testis. The final list of putative iDMRs was filtered according to the following conditions: average maximal spanning region score > 70, minimal number of overlapping files for a given peak >3, and peak length>220. Notably the sex chromosomes were not included in this analysis. Histone H3K4me3 peak analysis ChIP-Seq Analysis of H3K4me3 in hESC H1 Cells (GEO GSM433170) and ChIP-Seq analysis of WCE in human H1 cells (GEO GSM433179, used as Input) were downloaded from the NIH roadmap epigenomics project (Bernstein et al., 2010) and aligned using “Map with Bowtie for Illumina” version 1.1.2 (Langmead et al., 2009) . Peaks were found using “MACS” version 1.0.1 with Tag size 27 based on FASTQ Summary Statistics (Blankenberg et al., 2010), band width 300, Pvalue cutoff for peak detection 0.0001 and MFOLD 20. Remaining settings used as default. There were 30,513 regions were identified and for every putative iDMR the closest peak coordinates were assigned. Generation and analysis of 5' RNA sequencing libraries 5' RNA sequencing libraries were generated from undifferentiated and NCAM1+ from both parthenogenetic and control cells (Shaked Afik, Osnat Bartok and Sebastian Kadener., manuscript in preparation). Total RNA was polyA selected using Oligo-dT beads (Invitrogen),

fragmented, and then enriched for 5' ends using Terminator Exonuclease (Epicentre). Reaction mixture was cleaned up with 2.5X of SPRI beads and then dephosphorylated with FastAP (Fermentas), cleaned (2.5X SPRI, Agencourt) and then ligated to a linker1 (5Phos/AXXXXXXXXAGATCGGAAGAGCGTCGTGTAG/3ddC/, XXXXXXXX is an internal barcode specific for each sample,) using T4 RNA ligase I (NEB). Ligated RNA was cleaned-up by Silane beads (Dynabeads MyOne, Life Technologies) and pooled into a single tube. RT was then performed for the pooled sample, with a specific primer (5´CCTACACGACGCTCTTCC-3´) using an AffinityScript Multiple Temperature cDNA Synthesis Kit (Agilent Technologies). Then, RNA-DNA hybrids were degraded by incubating the RT mixture with 10% 1M NaOH (e.g. 2ul to 20ul of RT mixture) at 70 C for 12 min. pH was then normalized by addition of corresponding amount of 0.5M AcOH (e.g. 4ul for 22 ul of NaOH+RT mixture). The reaction mixture was cleaned up using Silane beads and second ligation was performed, where 3’end of cDNA was ligated to linker2 (5Phos/AGATCGGAAGAGCACACGTCTG/3ddC/) using T4 RNA ligase I. The sequences of linker1 and linker2 are partially complementary to the standard Illumina read1 and read2 barcode adapters, respectively. Reaction Mixture was cleaned up (Silane beads) and PCR enrichment was set up using enrichment primers 1 and 2 (5’AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC T-3’, 5’-CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTGACTGGAGTTCAGAC GTGTGCTCTTCCGATCT-3’, where XXXXXXX is barcode sequence) and Phusion HF MasterMix (NEB). Ten to 12 cycles of enrichment were performed depending on the initial input amount of RNA. After clean-up with 0.8X volume of SPRI beads, the library was sequenced using Illumina HiSeq 2500. Subsequently the data were processed through a bioinformatic

pipeline and analyzed as described in (Shaked Afik, Osnat Bartok and Sebastian Kadener., manuscript in preparation).

Supplemental References Blankenberg, D., Gordon, A., Von Kuster, G., Coraor, N., Taylor, J., and Nekrutenko, A. (2010). Manipulation of FASTQ data with Galaxy. Bioinformatics 26, 1783-1785. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.

Suggest Documents