Spatial Transcriptome for the Molecular Annotation of ... - Cell Press

0 downloads 0 Views 22MB Size Report
Mar 21, 2016 - Spatial transcriptome collated from RNA-seq data of the .... microarray analysis of whole embryo or the epiblast (Kojima ..... is defined by the coordinate position in the cross-sectional plane ...... (D) The workflow of assessment of single-cell zip code mapping accuracy (detailed ...... Histone methylation.
Resource

Spatial Transcriptome for the Molecular Annotation of Lineage Fates and Cell Identity in Mid-gastrula Mouse Embryo Graphical Abstract

Authors Guangdun Peng, Shengbao Suo, Jun Chen, ..., Patrick P.L. Tam, Jing-Dong J. Han, Naihe Jing

Correspondence [email protected] (J.-D.J.H.), [email protected] (N.J.)

In Brief Peng et al. apply high-resolution RNAseq to mid-gastrulation mouse embryos to collate a spatial transcriptome resource. 3D quantitative data rendition enables spatial gene expression pattern visualization in a web-based database and identifies zip code marker genes for mapping single epiblast cell position in the embryo by gene expression profile concordance.

Highlights d

Spatial transcriptome collated from RNA-seq data of the epiblast of mouse gastrula

d

Rendition of expression pattern of 20,000 genes as digital whole-mount and corn plot

d

Domains of transcription and signaling activity correlate with epiblast cell fates

d

Mapping cell identity and home address in the epiblast by zip code marker genes

Peng et al., 2016, Developmental Cell 36, 681–697 March 21, 2016 ª2016 Elsevier Inc. http://dx.doi.org/10.1016/j.devcel.2016.02.020

Accession Numbers GSE65924

Developmental Cell

Resource Spatial Transcriptome for the Molecular Annotation of Lineage Fates and Cell Identity in Mid-gastrula Mouse Embryo Guangdun Peng,1,4 Shengbao Suo,2,4 Jun Chen,1,4 Weiyang Chen,2 Chang Liu,1 Fang Yu,1 Ran Wang,1 Shirui Chen,1 Na Sun,2 Guizhong Cui,1 Lu Song,1 Patrick P.L. Tam,3 Jing-Dong J. Han,2,* and Naihe Jing1,* 1State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China 2Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center for Genetics and Developmental Biology, Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China 3Embryology Unit, Children’s Medical Research Institute and School of Medical Sciences, Sydney Medical School, The University of Sydney, Westmead, NSW 2145, Australia 4Co-first author *Correspondence: [email protected] (J.-D.J.H.), [email protected] (N.J.) http://dx.doi.org/10.1016/j.devcel.2016.02.020

SUMMARY

Gastrulation of the mouse embryo entails progressive restriction of lineage potency and the organization of the lineage progenitors into a body plan. Here we performed a high-resolution RNA sequencing analysis on single mid-gastrulation mouse embryos to collate a spatial transcriptome that correlated with the regionalization of cell fates in the embryo. 3D rendition of the quantitative data enabled the visualization of the spatial pattern of all expressing genes in the epiblast in a digital whole-mount in situ format. The dataset also identified genes that (1) are coexpressed in a specific cell population, (2) display similar global pattern of expression, (3) have lineage markers, (4) mark domains of transcriptional and signaling activity associated with cell fates, and (5) can be used as zip codes for mapping the position of single cells isolated from the mid-gastrula stage embryo and the embryo-derived stem cells to the equivalent epiblast cells for delineating their prospective cell fates.

INTRODUCTION Gastrulation is a critical milestone of early embryogenesis in mammals when the primary germ layers are formed and the multipotent embryonic cells are allocated to the progenitors of tissue lineages within the germ layers. Morphogenesis of the germ layers during gastrulation entails a complex mechanism that regulates the proliferation, movement, and patterning of cell populations, and the choreography of switches in genetic and signaling activity that may drive lineage specification and tissue modeling in the embryo (Arnold and Robertson, 2009; Kojima et al., 2014b; Pfister et al., 2007; Solnica-Krezel and Sepich, 2012; Tam and Behringer, 1997; Tam and Loebel, 2007). Fate

mapping of the mouse embryo has shown that there is distinctive regionalization of cell fates in the germ layers before gastrulation is completed (Beddington, 1981, 1982; Tam and Behringer, 1997), and lineage tracing studies showed that concurrent with the formation of primary germ layer, progressive restriction of differentiation potency of embryonic cells takes place, culminating in the generation of lineage-restricted progenitors for specific tissue types (Beddington, 1981, 1982; Cajal et al., 2012; Lawson et al., 1991; Li et al., 2013). However, the molecular controls in time and space that underpin the exit of cells from the multipotent state, the specification of lineage-restricted progenitors and the regionalization of cell fates pertaining to the establishment of the body plan are not fully known. A holistic knowledge of the activity of the genome specifically at gastrulation is essential for gleaning a better understanding of the molecular mechanism for lineage specification and embryonic patterning. Developmental changes in transcriptional activity during gastrulation have been studied by genome-wide microarray analysis of whole embryo or the epiblast (Kojima et al., 2014a; Mitiku and Baker, 2007). These studies have revealed that genes associated with regulation of pluripotency, germ layer formation, transcriptional regulation, cell metabolism, and transport and ion homeostasis are highly enriched during gastrulation, and the progression to organogenesis is accompanied by the activation of tissue-specific genes and morphogenetic drivers. Of special significance is the finding of a major overhaul of the transcriptome at the transition from mid- to late gastrulation (Kojima et al., 2014b), which reflects the downregulation of the pluripotency genes and concurrent acquisition of the ectoderm propensity and restriction of the mesendoderm fate. In the context of derivation of pluripotent stem cells, the dismantling of the pluripotency gene network marks the most advanced developmental stage at which epiblast stem cells could be isolated from the epiblast (Kojima et al., 2014a; Osorno et al., 2012). Taking into consideration the regionalization of cell fates in the germ layers, the mid-gastrulation stage is therefore the most advanced stage that cells at the transition between pluripotency and lineage-restricted state co-exist in the epiblast. However, neither whole embryo nor whole-epiblast transcriptome is

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 681

A 15

S11

m

R11

R

Laser Microdissection

Cryosection

RNA-seq/Data analysis

P

A



~300 m

L S2 R2 S1 Anterior R1

R: Reference S: Sample

Proximal Right Posterior

Left Distal

E C A L RP

R

T

A

L

A S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1

D

R

11 10 9 8 7 6 5 4 3 2 1

P

High

11 10 9 8 7 6 5 4 3 2 1

A

Otx2

Sox2

0.8 0.6 0.4 0.2 0

Expression level

T In situ A L RP

R

T

A

Lefty2

Low

11 10 9 8 7 6 5 4 3 2 1 High

Wnt3

Tdgf1

Eomes

1

P L

Low Corn plot

Pou3f1

1

P L

Scaled expression value for RNA-seq

B

T 1 r = 0.96 0.8 0.6 0.4 0.2 0 -0.2 0 0.5 Otx2 r = 0.60 1

0.8 0.6 0.4 0.2 0

T RNA-seq

Six3

Noto

11 10 9 8 7 6 5 4 3 2 1

A L R P

A L R P 11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1 Pou3f1

Otx2

A L R P

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1 Lefty2

Sox2

A L R P

A L R P

Eomes

A L R P 11 10 9 8 7 6 5 4 3 2 1

1 0.8 0.6 0.4 0.2 0 -0.2

1 0.8 0.6 0.4 0.2 0 -0.2

Wnt3

Tdgf1

1 0.8 0.6 0.4 0.2 0 -0.2

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1 Noto

Six3

0.8 0.6 0.4 0.2 0 -0.2

0

-0.2

A L R P

A L R P

1

0.8 0.6 0.4 0.2

P A L R P

Pou3f1 1 r = 0.79 0.8 0.6 0.4 0.2 0 -0.2 0 0.5 Sox2 r = 0.76 1

11A

0

0.5 Lefty2 r = 0.60

1

0 0.5 Tdgf1 r = 0.68

1

0 0.5 Noto r = 0.29

1

0

0.5

1 0.8 0.6 0.4 0.2 0 -0.2

1 0.8 0.6 0.4 0.2 0 -0.2

1 0.8 0.6 0.4 0.2 0 -0.2

1

1

0

0.5 Eomes r = 0.75

1

0 0.5 Wnt3 r = 0.64

1

0 0.5 Six3 r = 0.10

1

0

0.5

1

Scaled expression value for in situ Six3

F

iTranscriptome

Input

Gene symbol

Gene symbol

Predefined pattern

Sample transcriptome

Functions

Pattern search by gene

Gene search by gene

Gene search by pattern

Zipcode mapping

T

R A

Low

1

11 10 9 8 7 6 5 4 3 2 1

Output High

Top 10 genes co-expressed with query gene:

A L RP

P L

Gene symbol Apln

0.8 0.6 0.4

T RNA−seq

Gene ID 30878

Ensembl ID

P value

Top 10 genes with the highest correlation to Pattern1: FDR

ENSMUSG00000037010 1.117e-08 0.0001101

Gene symbol Epcam

Tmem47

192216 ENSMUSG00000025666 9.514e-08 0.0006248

Crb3

Rbm24

666794 ENSMUSG00000038132 3.358e-07 0.0010711

Tdgf1 Cxx1c

L

R

P

Gene ID

Ensembl ID

RCC value

P value

11

0.5836

0.219

0.4391

0.5223

17075

ENSMUSG00000045394

0.7379

2.4577e-08

10

0.6688

0.4786

0.6019

0.5343

224912 ENSMUSG00000044279

0.7333

3.3305e-08

9

0.685

A

0.6145

0.5493

0.5534

Gsta4

14860

ENSMUSG00000032348

0.7252

5.5855e-08

8

0.6323

0.6665

0.6607

0.5445

21667

ENSMUSG00000032494 3.591e-07 0.0010711

1700019D03Rik

67080

ENSMUSG00000043629

0.7182

8.5771e-08

7

0.6323

0.6501

0.6573

0.5752

72865

ENSMUSG00000051851 3.805e-07 0.0010711

Chd9

109151 ENSMUSG00000056608

0.7182

8.5771e-08

6

0.6146

0.6881

0.6853

0.5742

Utf1

22286

0.7159

9.8677e-08

5

0.6279

0.6984

0.671

0.6011

Prtg

235472 ENSMUSG00000036030 3.805e-07 0.0010711

0.2

Vangl1

229658 ENSMUSG00000027860 7.043e-07 0.0014827

Znrf2

387524

0

Tceal8 Rbms1

ENSMUSG00000047751

0.7077

1.5946e-07

4

0.6534

0.6625

0.619

0.6871

7.043e-07 0.0014827

Igsf8

140559 ENSMUSG00000038034

0.7054

1.8236e-07

3

0.6625

0.6399

0.6201

0.7162

66684

ENSMUSG00000051579 7.526e-07 0.0014827

Uchl1

22223

ENSMUSG00000029223

0.7042

1.9493e-07

2

0.649

0.6383

0.6163

0.7321

56878

ENSMUSG00000026970 8.989e-07 0.0016099

Nme3

79059

ENSMUSG00000073435

0.7019

2.2249e-07

1

0.6821

0

0

0.7623

-

Id3

15903

-

Corn plotter A L R P 11 10 9 8 7 6 5 4 3 2 1

0.7 0.6 0.5 0.4 0.3 ESD−EpiSC

(legend on next page)

682 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

adequate to provide the molecular annotation of the body plan in the gastrulating embryo. Hence, it is imperative to undertake an analysis of the transcriptome at the resolution of discrete cell populations at anatomically defined sites in the epiblast. In this study, we documented the spatial transcriptome of the epiblast of the mid-gastrulation (E7.0 late mid-streak stage) mouse embryo. We performed low-input RNA sequencing (RNA-seq) analysis on individual populations, of about 20 epiblast cells each, that were sampled by laser capture microdissection at known positional addresses in single embryos. This has enabled the collation of a genome-wide profile of the expressed genes in each location-defined cell population, which can be rendered into a spatial display of the transcriptional architecture for the whole epiblast. From the three mid-gastrulation embryos, the spatially defined and quantified expression data of over 20,000 genes are collated into a web-based database. This in silico spatial transcriptome (iTranscriptome) enables the visualization of the expression pattern of specific genes by 3D rendering and query for the cohort of genes displaying specific syn-expression patterns. In addition, the iTranscriptome of the mid-gastrulation embryo has a unique attribute of zip code mapping that enables pinpointing the position, at sufficient precision, of single cells isolated from the epiblast and identifying the equivalent cell population of embryo-derived stem cells such as the epiblast stem cells. RESULTS Construction of the Spatial Transcriptome For profiling the gene expression pattern of cell populations in different regions of the epiblast, samples were harvested by laser capture microdissection (LCM) from non-overlapping sites in late mid-streak stage (E7.0) mouse embryo (Downs and Davies, 1993). To minimize the noise in the transcriptome created by between-embryo variance, the analysis was performed on single embryos in triplicate (E1, E2, and E3). Each embryo was cryo-sectioned serially, and alternate sections in the distal-proximal series were used for sample collection (S) and reference templates (R) for digital 3D reconstruction, respectively (Figures

1A, S1A, and S1B). A population of approximately 20 cells were captured from four quadrants of each sample section: anterior (A), posterior (P, which contains the primitive streak) and left lateral (L) and right lateral (R) (Figures 1A and S1C), except for the most distal section (S1) where only cells in the anterior and posterior positions were harvested. Altogether, 42 anatomical regions from 11 sections per embryo were sampled. Given that the late mid-streak epiblast contains about 2,000 cells (Power and Tam, 1993; Snow, 1977; Tam and Beddington, 1992), the capture of 423 20 cells represented over 80% of the 1,000 cells in the sample sections (i.e., 40% of total epiblast population). To enhance the efficiency of RNA-seq with low RNA input, a single-cell RNA-seq technique was adapted and optimized for analyzing the small-cell-number LCM samples. RNA-seq was performed at a depth of over 20 million reads per library to achieve sufficient sequencing depth saturation (Figure S1D). After removing samples with low sequencing quality, data analyses were carried out on the three embryos. For embryo E1, which provided the reference transcriptome for data mining, the dataset of a missing sample was re-created computationally (Figure S1E). Mapping of the raw reads onto mouse mm10 genome revealed that transcripts of between 13,500 and 14,500 unique genes (fragment per kilobase per million (FPKM) >1.0 in at least two samples for each embryo) were detected in individual embryos (altogether about 20,000 unique genes from the three embryos), and the samples in each quadrant have similar expression density distribution (Figure S1F). 3D Visualization of the Spatial Transcriptome as Digital WISH Whole-mount in situ hybridization (WISH) has been the most practiced means to study the spatial pattern of gene expression in gastrulation-stage mouse embryos, but it is inherently a low-throughput gene-by-gene approach. Our RNA-seq data that integrated gene expression level (by transcript counts) with the precise position of the cell population made it feasible to reconstruct a quasi-quantitative 3D expression map for every transcript in a single epiblast in WISH format. An algorithm was

Figure 1. Spatial RNA-Seq Analysis of the Mid-gastrulation Embryo (A) Experimental strategy: cells were captured by laser capture microdissection from four quadrants: anterior (A), posterior (P), and lateral (left/right, L/R), of each sample section (S1–S11) of the epiblast of late mid-streak stage (E7.0) C57BL/6 embryos, and analyzed by RNA-seq. The reference sections (R1–R11) were used as templates for 3D reconstruction of the embryo for data visualization. (B) The presentation of the spatial pattern of gene expression in the corn plot. Each dot in the plot represents the cell sample at the specific positional address, and the color indicates the level of gene expression computed from the transcript counts in the RNA-seq dataset. (C) Whole-mount in situ hybridization (WISH) result of T gene expression (upper paired panels): The in situ image was digitally scanned in 11 slices and expression values for anterior (A), left (L), and posterior (P) regions for each slice were quantified by the average color intensity and plotted in the corn plot format (values were scaled to [0, 1], see Supplemental Experimental Procedures). As scanning was performed from the left side of the embryo, only the expression values of the left quadrants were shown. Digital (d)-WISH (lower paired panels): display of the RNA-seq data for the T gene on the reconstructed template of the embryo and the corn plot (FPKM values were scaled to [0, 1]). (D) Comparison of WISH (images, upper panels) and d-WISH (corn plot data, lower panels) for Pou3f1, Sox2, Noto, Six3 (our data), Otx2, Lefty2, Eomes, Wnt3a, and Tdgf1 (from EMAGE in eMouseAtlas). d-WISH reveals Noto expression in the distal posterior epiblast that is masked by the overlying endoderm when examined by WISH. Six3 expression in the anterior proximal region was detected by d-WISH but not detected by WISH. Inset shows the reads coverage of Six3 in sample 11A (data displayed by the genome browsing application IGV). (E) The linear correlation analysis of d-WISH (RNA-seq) and in situ hybridization (image scan) data of genes shown in (C) and (D). Expression values are scaled and the Pearson correlation coefficient values are indicated for each plot. (F) The functionalities of the iTranscriptome web portal: (1) search of expression pattern by gene, (2) search of co-expressing genes, (3) search of genes by expression pattern, and (4) zip code mapping to equivalent cell population or positional address by transcriptome comparison. The output examples were the T gene, top ten genes co-expressed with T, genes displaying a predefined expression pattern, and the mapping of cells from an ESC-derived EpiSC line (ESD-EpiSC). See also Figure S1.

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 683

PC1 PC2

A

B PC loading

Visceral endoderm

High

D3 D1

D4

G1 D3 Low

D2 Proximal

G2

Right Anterior

Posterior

Left Distal

Anterior

Lateral-Left Lateral-Right

Posterior

3D mapping for domains

G3 PC1 PC2 1P 2P 3P 4P 5P 7P 6P 8P 9P 10P 11P 11R 11L 10L 9R 9L 8R 11A 10R 8L 7L 7R 6L 6R 4L 5R 4R 3L 5L 3R 2R 2L 1A 2A 3A 4A 5A 10A 6A 9A 7A 8A

D1

D2

D3

PC score

D4

C 0.34

0.93 8A 7A 9A 6A 10A 5A 4A 3A 2A 1A 2L 2R 3R 5L 3L 4R 5R 4L 6R 6L 7R 7L 8L 10R 11A 8R 9L 9R 10L 11L 11R 11P 10P 9P 6P 7P 5P 4P 3P 2P 1P

0.40

8A 7A 9A 6A 10A 5A 4A 3A 2A 1A 2L 2R 3R 5L 3L 4R 5R 4L 6R 6L 7R 7L 8L 10R 11A 8R 9L 9R 10L 11L 11R 11P 10P 9P 6P 7P 5P 4P 3P 2P 1P

1P 2P 3P 4P 5P 7P 6P 8P 9P 10P 11P 11R 11L 10L 9R 9L 8R 11A 10R 8L 7L 7R 6L 6R 4L 5R 4R 3L 5L 3R 2R 2L 1A 2A 3A 4A 5A 10A 6A 9A 7A 8A

1P 2P 3P 4P 7P 6P 8P 9P 10P 11P 11R 11L 10L 9R 9L 8R 11A 10R 8L 7L 7R 6L 6R 4L 5R 4R 3L 5L 3R 2L 1A 2A 3A 4A 5A 10A 6A 9A 7A 8A

1P 2P 3P 4P 5P 7P 6P 8P 9P 10P 11P 11R 11L 10L 9R 9L 8R 11A 10R 8L 7L 7R 6L 6R 4L 5R 4R 3L 5L 3R 2R 2L 1A 2A 3A 4A 5A 10A 6A 9A 7A 8A

E1

E2

E1

D

E G1

0.97

G2

D1

D2

D3

E3

8A 7A 9A 6A 10A 5A 4A 3A 2A 1A 2L 3R 5L 3L 4R 5R 4L 6R 6L 7R 7L 8L 10R 11A 8R 9L 9R 10L 11L 11R 11P 10P 9P 8P 6P 7P 4P 3P 2P 1P

E3

0.93

E2

0.45

D4

G3

G2

Regulation of neurological system process Cellular ion homeostasis Placenta blood vessel development Chemical homeostasis Regulation of transmission of nerve impulse Regulation of myeloid cell differentiation

Group

Protein transport Establishment of protein localization Organelle fission Carboxylic acid biosynthetic process ncRNA metabolic process Mitotic cell cycle Vesicle-mediated transport Heterocycle biosynthetic process Intracellular transport Cell cycle process

G1

Gene ontology enrichment Pattern specification process Embryonic morphogenesis Regionalization, Gastrulation Anterior/posterior pattern formation Mesoderm formation, heart Morphogenesis Embryonic organ development Vasculature development Wnt receptor signaling pathway

Glycine, serine and threonine metabolism Purine metabolism Valine, leucine and isoleucine degradation Lysosome Aminoacyl-tRNA biosynthesis Nucleotide excision repair Porphyrin and chlorophyll metabolism Amino sugar and nucleotide sugar metabolism

Leukocyte transendothelial migration Tight junction Cell adhesion molecules (CAMs) Neurotrophin signaling pathway

G3

KEGG pathway enrichment Wnt signaling pathway Axon guidance Hedgehog signaling pathway TGF-beta signaling pathway Notch signaling pathway

0.4

1.2

0.4

1.2

0.4

1.2

0.4

1.2

Log10(FPKM+1)

Figure 2. Spatial Transcriptome Reveals Transcription Domains of the Mid-gastrulation Embryo (A) Clustering of differentially expressed genes (DEGs) of the expression domains in the reference embryo (embryo E1). Hierarchical clustering and BIC-SKmeans separated the laser capture microdissection samples and DEGs into four expression domains (D1–D4) and three gene groups (G1–G3), respectively. A resampling method was applied to identify gene groups significantly associated with the two major PCs, and the FDRs (one-sided Mann-Whitney test followed by BH correction) for G1, G2, and G3 are 4.1e 259, 0, 3.8e 196, respectively. Right column: contribution of each gene (rows) to the first two PCs (PC loading). Bottom row: projection scores of the cell samples on the first two principal components (PC score). (legend continued on next page)

684 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

devised for displaying the RNA-seq data by mapping to an anatomical template of the embryo. Virtually, the cylindrical embryo was transformed into a planar (2D) configuration. The position of each cell sample was represented by the grid node in a corn plot (so named for its resemblance to a corn cob), which is defined by the coordinate position in the cross-sectional plane (A, P, R, L) and the distal-proximal axis (1–11). Samples in the lateral quadrants (R, L) are displayed as parallel sets of nodes in the middle of the plot and the gene expression level is presented in a color-coded format (Figure 1B). To validate that the corn plot is a faithful emulation of the spatial pattern of gene expression, we compared the digital display with conventional WISH, using the Brachury (T) gene as a worked example. For the WISH, the stained embryo was scanned in 11 slices and the staining intensity was digitally rendered as color gradient data (signal values) displayed in a corn plot (Figure 1C). A high degree of concordance was found between the ISH pattern in whole mount and corn plot (T in situ) with the digital gene expression data (digital (d)-WISH) displayed in the 3D reconstruction and the corn plot (T RNA-seq), respectively (Figure 1C). The veracity of the d-WISH (our data) against conventional WISH (our data and from EMAGE: http://www.emouseatlas.org/ emage/) was further validated for examples of genes that were expressed in the anterior epiblast (Pou3f1, Otx2, and Sox2), and the posterior epiblast (Lefty2, Eomes, Tdgf1, and Wnt3) (Figure 1D). The RNA-seq FPKM values were found to correlate significantly with the WISH signal values (Figure 1E). The d-WISH analysis has a potentially higher sensitivity than WISH for detecting genes that are expressed weakly or in a confined region (e.g., Six3 and Noto; Figure 1D). Six3 expression in the anterior proximal epiblast was not detected by WISH but d-WISH has revealed reads of Six3 transcripts by RNA-seq in region 11A of the epiblast (Figure 1D and inset). The spatial transcriptome also revealed the epiblast-specific component of the expression pattern: WISH of Noto did not distinguish between the expression in distal visceral endoderm and distal epiblast, while d-WISH revealed a discrete expression domain of Noto in the 1P-4P epiblast (Figure 1D). In these two cases, the difference in the sensitivity between d-WISH and WISH underpinned the poor correlation of FPKM and staining intensity value. In essence, d-WISH offers unprecedented resolution and data content: the expression results can be viewed in a spatial context, the data give a finer measurement of gene activity, and this genome-wide spatial transcriptome have documented both spatial and quantitative expression data of transcripts of over 20,000 unique genes in the epiblast. To avail of the spatial transcriptome data as a resource to the scientific community, a web portal, iTranscriptome (for in silico transcriptome) at http://www.itranscriptome.org, has been es-

tablished to provide open access of the spatial transcriptome data (Figure 1F). The iTranscriptome offers data search functionalities in (1) pattern search by gene: querying and displaying the expression pattern of genes of interest in either a corn plot or a digitally reconstructed d-WISH format; (2) gene search by gene: searching for genes that share a similar expression pattern with the queried gene. For each queried gene, the guilt-by-association (GBA) method (Walker et al., 1999) is used to identify the co-expressed genes, and then returns the genes with a p value smaller than the input cutoff; and (3) search for a set of genes displaying a predefined (20 in the database) or customized (user-defined) expression pattern. Besides these functionalities, the iTranscriptome also enables matching queried cells to their in vivo counterparts in the epiblast (by zip code mapping; see details later). Spatial Domains of Gene Expression To identify the transcriptome domain in the mid-gastrulation embryo, genes that were expressed (FPKM > 1.0) in at least two samples in each embryo and with a variance in transcript level > 0.05 (transcript levels are measured as log10(FPKM+1)) across all samples were retrieved for further analysis. Unsupervised hierarchical clustering of cell samples using differentially expressed genes (DEGs; Figure S2A and Table S1) identified four spatial expression domains (D1, D2, D3, and D4) that display distinct gene expression patterns (Figure 2A). Principal component analysis (PCA) showed that PC1 and PC2 were sufficient for separating the samples into the same four domains (Figure 2A, bottom, heatmap for PC1 and PC2 scores). Among the spatial expression domains, D1 encompassed all the anterior samples, whereas the posterior region samples were clustered into D4 aligning orderly by their distal-proximal positions (Figure 2A). D2 consisted of samples from the distal lateral region of the epiblast and D3 contains those from the proximal lateral region. The anterior (D1) and distal-lateral regions (D2) showed closer relations than with other domains, suggesting a developmental coherence of these two domains. This classification indicated that the epiblast at this stage can be partitioned into four distinctive domains of gene activity, each displaying remarkable spatial coherence of samples (Figure 2B). The concordance (measured by Spearman rank correlation coefficients [RCC]) of the DEGs across all samples between the embryos was high (average RCC ± SD, 0.74 ± 0.13, Figure 2C). The significantly higher intra-domain correlation (average RCC ± SD, 0.75 ± 0.04) compared with inter-domain correlations (average RCC ± SD, 0.66 ± 0.04, two-sided t test, p = 3.9e 3) indicates that the spatial transcriptome is reproducible and the delineation of spatial expression domains is consistent among the mid-gastrulation embryos.

(B) Four expression domains in the embryo (upper panel) and the 3D presentation of the domains, visualized by DEG clustering in the four quadrants of the epiblast (lower panel). (C) Heatmaps of Spearman rank correlation coefficient (RCC) analysis showing pairwise comparison of inter-domain DEG expression profiles across all samples of the three embryos (E1, E2, and E3). The RCC of each sample between any two embryos across all inter-domain DEGs was calculated. The samples in each embryo were sorted according to the results of hierarchical clustering in the reference embryo (E1). (D) The GO and KEGG enrichment analysis for the three DEG gene expression groups (G1–G3, p < 0.01). (E) Violin plot showing the mean expression values of each of three gene groups across the four sample domains. Each violin plot shows the frequency distribution of the mean transcript level (log10 transformed FPKM) of DEGs per sample. See also Figure S2 and Table S1.

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 685

Distinctive Functional Gene Ontology in the Spatial Expression Domains Next, BIC-SKmeans (Zhang et al., 2013) was used to search for the minimal numbers of gene clusters based on DEGs that can account for the variations of the transcriptome across the 42 samples (Figure S2B). Three distinct gene groups (referred to as G1, G2, and G3; Figure 2A) were identified based on similarity in their expression patterns across different samples. G1 was highly expressed in the posterior epiblast. G2 was highly expressed in both the anterior and posterior epiblast. G3 was significantly upregulated in the anterior epiblast and also highly expressed in the distal lateral regions, further suggesting a developmental relationship between the cells in the anterior and distal lateral regions. Mapping the median expression level of each gene group in different samples onto the 3D embryo template (Figure S2C) revealed that the transcriptional relationships between samples recapitulate the spatial topography of the embryo. In addition, by a re-sampling method (Chung and Storey, 2015), we found these three groups significantly associated with the two major PCs (p < 1.0e 195; Figure 2A, right column). We next examined the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched in each group (Huang et al., 2009) (Figure 2D). G1, which consisted of genes highly expressed in the posterior region of the embryo, showed a high enrichment for embryonic morphogenesis and pattern specification under the regulation of WNT signaling, corroborating known functions for posterior development. On the contrary, we found relatively fewer function terms enriched for the G3, except for ‘‘regulation of neurological system process’’ and ‘‘tight junction’’ functions, consistent with epithelial morphogenesis and the neuroectoderm differentiation of cells in this domain. G2 genes, which were highly expressed in both anterior and posterior epiblasts, were significantly enriched for cell-cycle and metabolism functions. The average expression level of G2 genes was the highest in all domains, compared with other gene groups (Figure 2E), indicating that the epiblast is at a phase of rapid cell proliferation (Lawson et al., 1991; Snow, 1977). Interestingly, we also observed a high enrichment for epigenetic modifications in G2 expressed anterior and posterior domains (Figure S2D). Epithelial-mesenchymal transition (EMT) is an important morphogenetic process during gastrulation (Thiery et al., 2009). The EMT-regulated genes were strongly enriched in the posterior domain (D4) (Figure S2E), consistent with the heightened EMT activity in the primitive streak. These EMT genes were expressed at a lower level in the anterior domain (D1) (Figure S2E), where cells maintain an epithelial architecture. Whereas there was significant difference in the transcriptome between the anterior (D1) and posterior (D4) domains, analysis of the lateral domains showed that between 26 and 34 genes in each sample embryo displayed contralaterally different expression patterns (Figure S2F). To validate the DEG data on left-right asymmetrical expression of the genes identified in spatial transcriptome of embryos E1–E3, WISH was performed on six differentially expressed genes, each on more than three embryos (Figure S2G). Overall, there was no consistent left-right asymmetry in expression (Figure S2H). It is therefore likely that the left-right asymmetry has yet to be consistently established in the late mid-streak stage mouse embryo (Raya and Izpisua Belmonte, 2006).

The spatial transcriptome is a unique resource for identifying region-specific marker genes for epiblast cells of the mid-gastrulation embryo. Since PC1 and PC2 have accounted for the majority (60%) of variances, and selection of genes by PCA loading could preserve more effective vectors on each variable (Wang et al., 2015), the top 40 genes with highest or lowest PCA loadings in the first two PCs (Table S2, 93.7% of them overlapped with DEGs, Fisher’s exact test, p = 8.4e 58.) were reanalyzed by unsupervised k-means (k=4) clustering for all 42 samples. From this analysis, four module domains (MD1–MD4) and four module gene families (MG1–MG4) were identified (Figures 3A and S3A). The module domains MD1–MD4 were highly consistent with the four domains identified by DEGs (D1–D4). MG1, MG2, and MG3 showed similar expression patterns as G1, G2, and G3, respectively (Figure 2A). MG4 that was highly expressed in MD3 characterized the proximal lateral domain. The gene expression density distribution of each gene module in each domain (Figure S3B) was similar to that of DEGs (Figure 2E). Like G2, the MG2 that was common for anterior and posterior domains displayed the highest expression level. Therefore, we identified a smaller set of ‘‘region-specific’’ genes (Table S2), which are sufficient to characterize each expression domain (MD1–MD4) in the epiblast. PCA analysis of these PCA loading genes (Table S2) showed that they constituted four sample groups (Figure 3B) and four distinct gene groups (Figure 3C). On the basis of co-expression with known markers (anterior epiblast, Sox2; posterior epiblast, Mixl1, T, and Mesp1), two gene groups were identified on the PC1 axis for the anterior and posterior regions of the epiblast (Figure 3C), and contained within each group were domain-specific signature genes. Among them, Pou3f1, a POU III subfamily transcription factor, which belongs to the anterior gene groups, was recently found to be associated with the neuroectoderm lineage (Zhu et al., 2014). Several newly identified region-specific genes were validated by WISH and the results agreed well with the domain classification: Cbx7, Cldn7, Uchl1, Sall2, Utf1, Slc7a3, and Zfp462 were identified to be exclusively expressed in the anterior region and Ccnd2, Sall3, and Sp5 were mainly expressed in the posterior region (Figures 3D and S3C). Therefore the spatial transcriptome enables identification of spatially restricted transcripts that could be putative lineage markers. On the PC2 axis, two gene groups were differentially expressed in anterior/posterior (D1/D4) and proximal lateral domains (D3) (Figure 3C). Gene set enrichment analysis (GSEA) analysis based on KEGG pathways (false discovery rate [FDR] < 0.25) showed that the PC1 axis was enriched for the development-related pathways, such as the WNT and TGF-b signaling pathways. The PC2 axis was enriched for the cellgrowth and metabolism-related pathway terms, such as spliceosome, purine metabolism, and cell cycle (Figure S3D), representing different functional attributes of the epiblast cells. For the four region-specific marker gene set, we further verified the expression pattern by microfluidic multiplex qPCR using the E1 LCM samples. Similar four spatial domains were identified by PCA analysis of the marker gene expression profiles (Figure S3E). Taken together, unsupervised hierarchical clustering and PCA analysis have defined four spatial expression domains in the

686 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

B A (D1) L (D2/D3) P (D4) R (D2/D3)

2A 4A●●● 3A● 1A 3P ● 1P● ● 7A● ● 5A 9A ● 2P 4P● 6A●● A ● 10A 8A 3L● 4L● 5L 7R ● 6R ● ● 3R ● ● ● ● 6L ● 5R 7L ●● 4R 2R 2L R L● 8R 10R ● ● 11A● 8L ● 9L 10L ● 9R●

−4

5P● 6P ● ●9P ●● ● P 7P 8P 10P ● 11P



11R

−6

PC 2 (17.98%)

MG1 ( G1)

−2

0

2

A



11L

−8

MG2 ( G2)

−5

0 PC 1 (54.41%)

0.4

C

PC 2 (17.98%) 0.0 0.2 MD2 ( D2)

MD3 ( D3)

−0.2

Grb7 Utf1 Pou3f1 Slc7a3 Syt11 Pcdh1 Zfp462 Grik5 Comt Lars2 Cxcl12

Mixl1

Mesp1 T

Fn1 Prtg Nkd1

Frzb

Sall3

−0.4

1P 2P 3P 4P 5P 6P 7P 8P 9P 10P 11P 11R 11L 11A 10R 10L 9R 9L 8R 8L 7R 7L 6R 6L 5R 5L 4R 4L 3R 3L 2R 2L 1A 2A 3A 4A 5A 6A 7A 8A 9A 10A

MD1 ( D1)

10

Gemin6 Cdca4 Pmpcb Eif2b1 Bccip Cops6 Mesp2 Cnbp Cdh11 Sp5 Dlc1 Snai1

MG3 ( G3)

MG4

5

−0.5

0.0

MD4 ( D4)

0.5

1.0

PC 1 (54.41%)

D Cbx7

A

Cldn7

Uchl1

Utf1

Sall2

Slc7a3

Zfp462

Ccnd2

Sp5

Sall3

P

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

Cbx7

A L RP

A L RP

A L RP 11 10 9 8 7 6 5 4 3 2 1

Cldn7

A L RP 11 10 9 8 7 6 5 4 3 2 1

Uchl1

A L RP 11 10 9 8 7 6 5 4 3 2 1

Sall2

Utf1

Slc7a3

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1 Zfp462

A L RP

A L RP

A L R P

A L R P 11 10 9 8 7 6 5 4 3 2 1

Ccnd2

A L RP 11 10 9 8 7 6 5 4 3 2 1

Sall3

Sp5

Figure 3. Spatial Transcriptome Uncovers Gene Markers for Different Expression Domains (A) The heatmap of clustering analysis of marker genes. The samples are clustered by K-means (k = 4), and the marker genes are clustered by BIC-SKmeans. (B) Principal component analysis (PCA) of all cell samples and their projections on PC1 and PC2. PC1 discriminates between the anterior and the posterior samples, and PC2 between the proximal lateral samples and the others. (C) PC projections showing the contribution of each gene to the first two PCs. The arrow tips indicate the correlation coefficient of the respective genes with each principal component. Mixl1, Mesp1, T represent the posterior enrichment. Pou3f1 and Utf1 indicate an anterior enrichment. Cxcl12 and Lars2 denote the correlation with the proximal lateral population. Only genes with cosine correlation >0.8 or >0.55 for the purple genes are shown. (D) WISH validation (upper panels) of newly identified region-specific genes against the d-WISH data presented in corn plots (lower panels). Anterior, Cbx7, Cldn7, Uchl1, Sall2, Utf1, Slc7a3, and Zfp462; posterior, Ccnd2, Sall3, and Sp5. See also Figure S3 and Table S2.

epiblast of the mid-gastrulation embryo, which may be correlated with the regional variations of the molecular activity associated with a full spectrum of functional attributes such as cell proliferation, lineage differentiation, epithelial-mesenchyme transition and epigenetic modification.

Subdomains of Transcriptional Activity in the Anterior and Posterior Epiblast Next, we examined if the spatial transcriptome may reveal the molecular activity that is associated with the regionalization of cell fates in the anterior and posterior epiblast. To do this,

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 687

Figure 4. Subdomains of the Anterior and Posterior Epiblast

A Anterior/Pou3f1

(A) Top 30 co-expressed genes in the anterior (anchored by Pou3f1) and posterior (by Mixl1) regions identified by guilt-by-association analysis (GBA). These co-expressed genes are sorted by the BH multiple testing corrected p values. Known domain-specific markers are highlighted in blue and WISH-validated newly identified markers in red. (B and C) Subdomains of the posterior transcription domain (B) and the anterior transcription domain (C) delineated by clustering analysis using the combined set of genes co-expressed with the seed marker and genes of highest PCA loading. The subdomains of the samples are classified by hierarchical clustering and the gene groups are identified by BIC-SKmeans. The enriched GO/KEGG terms (one-sided Fisher’s exact test, p < 0.01), representative subdomain genes, motif-enriched transcription factors (TFs) (known, p < 0.01; de novo, p < 1.0e 12) and corn plots of highly expressed TFs are shown for each subgroup. See also Figure S4 and Tables S3 and S4.

-Log10(FDR)

4 2 1 0

f a1 av2 t11 hl1 tx1 d1c a3 tf1 yr Rik xp1 d3 n1 28 p3 t3b a6 Rik Rik p2 ig1 h4 25 all2 a1 62 Rik rs1 rs2 d3 l18 N Sy Uc D Jmj Slc7 U M J11 Fo lc35 Spo Usp msa nm Itg 03 18 Enp Ins Jp mp S Slc2 Zfp4 B07 Fr I tap7 M S Co 7 9D 4D M 28 Ca D 5 00 01 00 10 10 00 30 26 24 17 A9

Co-expressed genes

-Log10(FDR)

Posterior/Mixl1 4 3 2 1

0

a2 lnr a1 c1 24 al8 2a gl1 a1 2a 3b 14 x1 29 as o3 19 a6 e5 p5 x1 kg b1 1 T 1d1 5b1 b42 om kk1 t p D x xn Ap l13 p26 bm ce pxd an p26 21 Arid na Lh Usp eg3 Rsp am Gat Nm S Cd Dg Hox Ifitm te wa Zb A G Pl V y am P Co Cy R T Sh3 Ad C F Tc V

Co-expressed genes

B

Enriched GO, Pathway and Representative genes

PG1

• • • • •

Motif enriched TFs

Negative regulation of cell proliferation Wound healing Epithelial cell differentiation Peptide cross linking/cell motion Regulation of cell development

Corn plot of highly expressedTFs

TTCC GGAA G C

C T



Stat3



Atf4



Zfp691 ACAGAGCTACAT



Zfp128

G A

A

A T C G T A T

G

A

C

G

A T T C

A T

• Hhex, Gsc

PG2

• • • • • •

Pattern specification process Mesoderm formation/gastrulation Formation of primary germ layer Tissue morphogenesis Cell fate commitment Stem cell division

A L RP

TGATGCAAT

A

C

C

T

G

C A

T

TTGAGTTCGCCC A C

G AC

• Wnt signaling pathway • Eomes, Fgf8, Gbx2, Pdgfrb, Tbx6

PG3

2 1.5 1 0.5 0

Atf4

• Embryonic development ending in birth or egg hatching • Pattern specification process • Chordate embryonic development • Regionalization/tube development • Anterior/posterior pattern formation • Wnt receptor signaling pathway



Foxh1

G T TC AAA

T

C

G

A

AA T T

11 10 9 8 7 6 5 4 3 2 1

C C

• Frzb, Gata4, Gata6, Hoxb1, Mixl1, Mesp1, Msx1, Snail, Dkk1



Pgr

1

0.5

Zfp691

1.5 1 0.5 0

Foxh1

• None

11 10 9 8 7 6 5 4 3 2 1

A L RP

C G TGTGG C T G TA T

C C G

• Wnt signaling pathway

High

A L RP

11 10 9 8 7 6 5 4 3 2 1

G

A

G

T T A AGG C TAC TGTC ACAG T

C A

A

G G

T C C C T G G A T T G G T T T A

A

G

T

C C A

G

A

AA

C C G

PG4

11P 10P 9P 8P 7P 6P 5P 4P 3P 2P 1P

Low

PD1 PD2 PD3

C

Enriched GO, Pathway and Representative genes

Motif enriched TFs

Corn plot of highly expressedTFs A L RP

AG1

• Cell adhesion • Biological adhesion



Jun



Bach2

• Cldn4, Cbx7



Tcf12

C TCAT TGAG TGAGTCA CAGCTG T

GA

CC A

A

T

C AG G G T

G T

C C

A

T

A

T

TGC A

C

C C G

A

A

A

A

C G C

G

C TG T

C

C

A

G A A

11 10 9 8 7 6 5 4 3 2 1

1

0.5

Tcf12

• Glial cell differentiation • Gliogenesis



Arntl

0

T G A A

CACGTG

G C C T

A

T

• Ether lipid metabolism

AG2

• Dtx1, Chd6, Pou3f1, Sox2

• • • • • •

AG3

Carboxylic acid transport Organic acid transport Cell adhesion Biological adhesion Vitamin transport Cellular component morphogenesis

• None

• Tight junction • Nav2, Pcdh1, Slc35d3, Utf1, Acsl4, Uchl1, Stmn2, Sall2

High

AG4

• Inositol metabolic process • Phosphorus metabolic process

• None

• Inositol phosphate metabolism 11A 10A 9A 7A 8A 6A 5A 4A 3A 2A 1A

Low

AD1 AD2 AD3

688 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

0

A

0

Figure 5. Transcription Factors Regulatory Network

C

1

Zic3 Nanog Eomes Id2 Irf1 Smad7 Tub Klf10 Myb Foxd3 Sox2 Pou3f1 Sox3 Zglp1 Otx2 Ovol2 Pml Rara Gbx2 Lhx1 Mixl1 Gata6 Tbx6 Six2 Snai1 Meis2 Mesp1 Smad1 Hoxb1 Mesp2 Cdx1 Irx5 Id1 Msx1 Olig1 Zic3 Sall3 Sp8 Nanog Eomes Hes1 Evx1 Irx3 Cited2 Gata4 Ddit3 Foxh1 Tfdp1 Klf7 Trip13 Hes7 Tcf7 Myc Junb Nrip1 Zdhhc15 Cdkn2d Htatip2 Xbp1 Rxrg Foxa2 Lmo2 Hmgb3 Gsc Otx1

Msx1

Snai1

Smad1

TC2

0.6 0.4

1.4

11 10 9 8 7 6 5 4 3 2 1

1.2 1 0.8 0.6 0.4 0.2

TC2

1 0.8 0.6 0.4 0.2

TC3

Meis2

Hoxb1

Irx5

Mesp1

Mixl1 Gata6

TC3

Lhx1

Foxd3

Tbx6

Rara

Gsc

Htatip2 Hmgb3

Id2 Klf7

Irf1

Tfdp1 Myc Junb

Otx2

Tub

Smad7

Foxh1

Lmo2 Nrip1

Klf10 Myb

Foxa2 Rxrg

Ovol2

Sox2

Zdhhc15 Tcf7 Otx1

TC4

Pml Sox3

Pou3f1

Mesp2

Zglp1 Trip13

Ddit3

Cdkn2d Xbp1

A L R P 1.2

11 10 9 8 7 6 5 4 3 2 1

Id1 Evx1

Gbx2

Otx1 Gsc Hmgb3 Lmo2 Foxa2 Rxrg Xbp1 Htatip2 Cdkn2d Zdhhc15 Nrip1 Junb Myc Tcf7 Hes7 Trip13 Klf7 Tfdp1 Foxh1 Ddit3 Gata4 Cited2 Irx3 Evx1 Hes1 Eomes Nanog Sp8 Sall3 Zic3 Olig1 Msx1 Id1 Irx5 Cdx1 Mesp2 Hoxb1 Smad1 Mesp1 Meis2 Snai1 Six2 Tbx6 Gata6 Mixl1 Lhx1 Gbx2 Rara Pml Ovol2 Otx2 Zglp1 Sox3 Pou3f1 Sox2 Foxd3 Myb Klf10 Tub Smad7 Irf1 Id2

0.8

TC1

A L R P

A L R P 1

Irx3

Gata4

Cdx1

Hes7 A L R P

Hes1

Six2

Sall3

B 11 10 9 8 7 6 5 4 3 2 1

Olig1

Sp8 Cited2

TC1

(A) Hierarchical clustering by the connection specific index (CSI, CSI > 0.7) reveals four development-related transcriptional gene cliques (TC1–4). (B) The corn plot shows the average expression values of TFs in each transcription factor clique (TC1–4). (C) The co-expression network of developmentrelated TFs based on the CSI values (CSI > 0.7). The node with different colors represents the TF cluster and the edges are the CSI value of two nodes. Red lines indicate the positive correlation and green lines indicate negative correlation. Edge weights are proportional to the CSI values of two correlated nodes. See also Figures S5, S6, and Table S5.

11 10 9 8 7 6 5 4 3 2 1

1 0.8 0.6

TC1

TC2

TC3

TC4

0.4

TC4

Pou3f1 in the anterior region, and Mixl1 in the posterior region, which showed the highest square cosine correlation to both PC1 and PC2 simultaneously, were selected as the seed marker gene (see Experimental Procedures). We expanded the gene set by including genes that were co-expressed with the seed marker gene through GBA analysis (Walker et al., 1999) (Figure 4A and Table S3). Functional enrichment analysis showed that the genes co-expressing with Pou3f1 and Mixl1 were related to neuron development and pattern specification, respectively, and the average expression pattern of the co-expressed genes was also highly consistent with the seed genes (Figures S4A–S4D). Unsupervised hierarchical clustering of posterior samples based on the combined posterior gene set partitioned the posterior region into three subdomains: 1P to 4P as distal posterior subdomain (PD1), 5P to 8P as the intermediate posterior domain (PD2), and 9P to 11P as the proximal posterior domain (PD3) (Figure 4B). BIC-SKmeans analysis clustered the combined posterior marker gene set into four subgroups: posterior gene PG1–3, corresponding to distal, intermediate and proximal expressed gene groups respectively, while PG4 showed no proximal-distal bias (Figures 4B and S4E). Some known genes of mesodermal progenitors were represented in this marker set, such as Hhex, a PG1 gene, which is related to cardiac mesoderm formation (Liu et al., 2014), was specifically expressed in the distal subdomain (PD1); Tbx6, a PG2 gene, related to the specification of paraxial mesoderm (Chapman et al., 1996), was exclusively expressed in the intermediate domain (PD2), and Gata4, a PG3 gene associated with lateral mesoderm differentiation (Rojas et al., 2005) was strongly expressed in the proximal posterior domain (PD3). The expression pattern of these PG genes aligned well with the regionalization of mesodermal fates of cells in the primitive streak (Tam and Loebel, 2007). The different PG groups also displayed enriched biological functions, pathways, and the upstream transcriptional factors (TFs) (Figure 4B), which are associated with mesendoderm specification under the influence of WNT signaling. Consistent with this inference, enforced expression of Foxh1, a core upstream transcription factors for the PG3 proximal subdomain gene groups,

Positive Low

High

in embryonic stem cells (ESCs) led to the upregulation of PG3 genes, such as Dkk1, Mesp1, in a WNT-signaling-dependent manner (Figure S4F). This example corroborated with the previous finding that Foxh1 functions as a potential regulator for mesendoderm progenitors (Hoodless et al., 2001; Yamamoto et al., 2001). To identify gene signatures that correlated with the three distinct subdomains in the posterior region, a domain specificity score was defined and used (see Supplemental Experimental Procedures). The expression patterns of the top five genes for PD1–3 were found to be spatially matched to each subdomain (Figure S4G), further confirming that the data allow detection of a wide range of spatial expression patterns at the genomewide level. The full dataset (Table S4) provides a useful resource for further research in cell-fate determinants. The findings of the aforementioned analysis have led to the identification of subregions of the apparently homogeneous transcription domain in the posterior epiblast, thus we sought to find out if there are also subregions in the anterior epiblast. Unsupervised clustering has identified three proximal-distal (AD1–3) subdomains (Figures 4C and S4E). However, the difference in expression profiles was not as clear-cut as that of the posterior domain, and some genes showed uniform expression in the anterior samples (e.g., AG1 and AG3). The biological functions, pathways, and TFs for each of these subdomains were also less well defined. These results suggest that the lineage fates of cells in the anterior epiblast are not as well segregated as those in the posterior epiblast at mid-gastrulation. Negative

Low

High

Regionalization of Transcription Factor Regulatory Network Activity To characterize the transcriptional regulation network in the epiblast, the Connection Specificity Index (CSI) (Bass et al., 2013) of all the development-related TFs in DEGs (TFs with GO terms related to development; Table S5) was calculated and subjected to unsupervised clustering analysis. The TFs formed four distinct module TF cliques (TC1–TC4, Figure 5A). Interestingly, these four modules were roughly divided into two mutually exclusive expression categories based on the average expression level of genes in each TF clique. TC2, TC3, and TC4 showed

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 689

A 0

16

G1

8.46

−0.05

0.11

−0.00

11.12

−0.02

15.89

−0.01

G2

0.00

−0.05

0.09

−0.00

0.00

−0.02

0.00

−0.01

G3

3.37

−0.05

0.55

−0.00

0.00

−0.95

0.07

−1.68

BMP (A)

BMP (I)

FGF (A)

FGF (I)

Nodal (A)

Nodal (I)

Wnt (A)

Wnt (I)

B

Ligand and receptor A L RP

BMP

11 10 9 8 7 6

11 10 9 8 7 6

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

Acvr2a

Chrd

Smad7

Id1 A L RP

A L RP

A L RP

A L RP

A L RP

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

Wnt3

Nodal

A L RP

11 10 9 8 7 6

Sfrp1

Frzb

A L RP

A L RP

FGF

A L RP

11 10 9 8 7 6

Bmpr1a

Wnt

Inhibited (I)

11 10 9 8 7 6

A L RP

Axin2

A L RP

Myc

Dkk1

A L RP

A L RP

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

Fgf8

Fgf5

A L RP

A L RP

Snai1

Fgfr2

Spry2

A L RP

A L RP

A L RP

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

11 10 9 8 7 6

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

4 3 2 1

Nodal

Gsc

Tdgf1

0

Antagonist

Targets A L RP

A L RP

Activated (A) -16

Eomes

High

Lefty2

Low

0.5

2

C Inhibited

Activated A L R P 11 10 9 8 7 6 5 4 3 2 1

A L R P 11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1 BMP

A L R P

FGF

11 10 9 8 7 6 5 4 3 2 1 Nodal

A L R P 11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1 Wnt

A L R P

BMP

E

A L R P 11 10 9 8 7 6 5 4 3 2 1

FGF

A L R P 11 10 9 8 7 6 5 4 3 2 1

Significant Non-significant

Nodal

Ectoderm Mesoderm Endoderm

Wnt

F

8A 7A 9A 6A 10A 5A 4A 3A 2A 1A 2L 2R 3R 5L 3L 4R 5R 4L 6R 6L 7R 7L 8L 10R 11A 8R 9L 9R 10L 11L 11R 11P 10P 9P 8P 6P 7P 5P 4P 3P 2P 1P

D

A L R P

Nodal

Signaling activities Proximal Right Anterior

Postertior

Left Distal

s me bem

Wnt

exm

ne

d

BMP

PS

se

en

PS

Bmp4 Zic2 Foxp2 Pou3f1 Sox2 Celsr1 Otx2 Tubb3 Krt18 Ctnnb1 Foxp4 Pax6 Runx1 Sox18 Foxg1 Cebpa Six3 Krt19 Bmp2 Sox17 Foxa1 Cer1 Hhex Foxa2 Foxq1 Tcf15 Runx3 Twist2 Meox1 Msx2 Hand1 Zic3 Fst Eomes Nodal Gbx2 Foxc1 Wnt3a Cdx1 Hoxb1 Pdgfra Snai1 Tdgf1 T Mixl1 Lhx1 Kdr Cxcr4 Hnf4a Afp Amn Cdx2 Dab2 Pax3 Srf Bmpr2 Gdf3 Dlx5 Gata3

Fate map Proximal Right Anterior

Postertior

Left Distal

(legend on next page)

690 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

a posterior-specific expression pattern, whereas TC1 genes were exclusively expressed in the anterior region (Figure 5B). This mutually exclusive expression pattern suggests that these two categories of TFs may have divergent function in the epiblast. In order to reveal the underlying connections between these TF groups, we further constructed a TF co-expression network on the basis of their expression correlation CSI (CSI > 0.7). Consistent with the clustering result, the development-related TFs formed four groups in the network recapitulating TC1–4 (Figure 5C). TFs within each group showed positive interactions with each other, indicating that TFs of each group may form a combinatorial regulatory circuitry. Two TF groups that showed strong negative correlation in expression represented the anterior and posterior domains, respectively (TC1 and TC2). Since negative correlation often occurs between genes regulating switch-like behavior or alternative cellular states (Xue et al., 2007), the negative correlation between TC1 and TC2 may suggest that these TFs contribute to the divergence of cell identity in the anterior and posterior epiblast. Indeed, many hub TFs (e.g. Mixl1, Snai1, Gsc, Sox2) in the network (Figure 5C) are associated with the determination of anterior versus posterior fate (Arnold and Robertson, 2009; Robb and Tam, 2004). Many of the TFs in combinations (e.g. Pou3f1, Sox2 and Sox3, Snai1 and Smad1) also shared development-related targets (Figures S5A–S5D). Similar analysis on all differentially expressed TF showed the presence of similar anterior and posteror TF cliques (AllTC) (Figures S6A–S6C). The GO enrichment analysis showed that a clique enriched for terms associated with metabolism was less represented in the development-related TF groups (AllTC2; Figures S6D and S6E), suggesting this clique of TFs and its enriched function ‘‘metabolism and cell cycle’’ may have insignificant impact on the divergence of cell fates of the anterior and posterior epiblast. Regionalization of Signaling Pathway Activity In order to chart the activity of signaling pathways in the epiblast, we examined the enrichment of signaling target genes of the WNT, BMP, FGF, and Nodal pathways (using microarray data from published perturbation experiments). The results of the enrichment analysis revealed that target genes of active WNT, BMP, and Nodal signaling pathways were significantly enriched

in the posterior gene group (G1) (Figure 6A, one-sided Fisher’s exact test with FDR < 0.01), whereas FGF signaling targets showed no obvious spatial enrichment. Key signaling pathway factors, including ligands, targets, and antagonists all showed enhanced expression in the posterior domain (Figure 6B). In contrast, the anterior epiblast was largely devoid of FGF/Wnt/ Nodal activity, while expression of BMP responsive genes was detected in the proximal anterior and lateral epiblast (Figure 6B). GSEA analysis revealed that cells in the posterior epiblast displayed activated Nodal/WNT activity and inhibitory BMP activity, whereas the anterior epiblast displayed both inhibitory and activated FGF activity (Figure 6C, FDR < 0.25). Of interest is that the regionalization of signaling activity in the epiblast revealed by the spatial transcriptome (Figure 6D) is related to the prospective lineage fate of the epiblast cells signified by the region-specific expression of germ layer markers (Figure 6E) and results of previous fate mapping studies (Figure 6F). For example, the BMP active domain is populated by progenitors of surface ectoderm and extraembryonic mesoderm, the Nodal/WNT active domain contains the mesendoderm progenitors, and the anterior epiblast with least active signaling activity is fated for neuroectoderm. The correlation of the gene expression pattern with the prospective lineage fates of cells at a specific region of the epiblast provides the basis for mapping exogenous cells to their equivalent cell types in the mid-gastrulation epiblast by zip code mapping. Zip Code Mapping of Single Cells and Embryo-Derived Stem Cells Analysis of the spatial transcriptome of the transcription domains has defined a small set of domain-specific signature genes (Figures 3A–3C, Table S2) that would enable mapping the position of cells in the epiblast on the basis of the concordance of the gene expression profile. To examine whether this gene set in the D1–D4 domains was consistent across different embryos, a machine learning method was used to analyze the cell samples of embryos E2 and E3 against the reference embryo E1. Using the expression values of these genes as the input to train a support vector machine (SVM), and then applying the trained model, the domain label for each sample in E2 and E3 can be predicted at 85.0% and 92.7% accuracy, respectively (Figure 7A), suggesting that this gene set could be used to map the position of individual epiblast cells sourced independently from other

Figure 6. Signaling Regulatory Network (A) The enrichment for target genes of development-related signaling pathways in the inter-domain DEG gene groups (G1–G3, tested by one-sided Fisher’s exact test followed by BH correction). Red and blue indicate activated enrichment and inhibitory enrichment, respectively. The value in each cell is the log10(FDR) and the green border indicates significant enrichments (FDR < 0.01). (B) Corn plots of the expression pattern of key component genes of development-related signaling pathway in each sample. The pathway components include ligands and receptors, downstream targets, and antagonists. (C) Corn plots showing the GESA analysis of the target genes related to the activated and inhibitory modes of signaling pathways in all samples. The fold changes (divided by mean expression value of each gene across all samples) of DEGs were used as the rank list for GSEA analysis. A hollow dot with a red or green border indicates enrichment FDR < 0.25 by GSEA. (D) The territory of active BMP, Wnt, and Nodal signaling in the epiblast revealed by the spatial transcriptome. (E) The expression pattern of molecular markers of the precursor cells of the three germ layers across the domains encompassing epiblasts of the anterior (A), left (L), right (R), and posterior (P) regions. Ectoderm genes (red) were expressed predominantly in the anterior and lateral domains, while mesoderm (green) and endoderm (blue) genes were preferentially expressed in the posterior domain. (F) Regionalization of prospective cell fates revealed by fate mapping studies and correlated with molecular annotation by germ layer markers. emb-mes, embryonic mesoderm; end, endoderm; exm, extraembryonic mesoderm; ne, neuroectoderm; ps, primitive streak; se, surface ectoderm. Crossed arrows mark the three embryonic axes.

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 691

A

B D1 D4 D1 D4 D2 D1 D4 D2 D2 D1 D4 D2 D2 D1 D2 D2 D1 D4 D2 D2 D1 D4 D2 D2 D1 D4 D3 D3 D1 D4 D3 D3 D1 D4 D3 D3 D3 D4 D3 D3

D1 D4 D1 D4 D2 D1 D4 D2 D2 D2 D4 D2 D2 D2 D2 D2 D1 D4 D2 D2 D2 D4 D2 D2 D2 D4 D2 D2 D1 D4 D3 D3 D1 D4 D3 D3 D3 D4 D3 D3

D

0.85 0.91 0.86 0.88 0.76 0.72 0.41 0.66 0.77 0.54 0.62 0.88 0.51 0.43 0.36 0.60 0.74 0.97 0.39 0.44 0.85 0.89 0.41 0.54 0.49 0.67 0.61 0.47 0.69 0.73 0.69 0.69 0.62 0.81 0.95 0.81 0.65 0.86 0.97 0.85

X

X

X

X X X

Sample 1A 1P 2A 2P 2L 2R 3A 3P 3L 3R 4A 4P 4L 4R 5A 5P 5L 5R 6A 6P 6L 6R 7A 7P 7L 7R 8A 8L 8R 9A 9P 9L 9R 10A 10P 10L 10R 11A 11P 11L 11R

A

Real domain Predicted domain SVM probability D1 D1 0.40 D4 D4 0.59 D1 D1 0.95 D4 D4 0.83 D2 D2 0.82 D2 D2 0.76 D1 D1 0.97 D4 D4 0.90 D2 D2 0.81 D2 D2 0.79 D1 D1 0.99 D4 D4 0.97 D2 D2 0.93 D2 D2 0.88 D1 D1 0.98 D4 D4 0.96 D2 D2 0.88 D2 D2 0.91 D1 D1 0.92 D4 D4 0.93 D2 D2 0.95 D2 D2 0.72 D1 D2 X 0.90 D4 D4 0.82 D2 D2 0.98 D2 D2 0.96 D1 D1 0.96 X D3 D2 0.92 D3 D2 X 0.92 D1 D1 0.35 D4 D4 0.84 D3 D3 0.91 D3 D3 0.96 D1 D1 0.54 D4 D4 0.77 D3 D3 0.96 D3 D3 0.93 D3 D3 0.67 D4 D4 0.55 D3 D3 0.99 D3 D3 0.98

1

L

Single cells Cj (j=1, 2…70)

P25

10

A28

P22

8

6

4

A24

3

1

H

i

A L R P

ii

11 10 9 8 7 6 5 4 3 2 1

iv 11 10 9 8 7 6 5 4 3 2 1

A L R P

v 11 10 9 8 7 6 5 4 3 2 1

0.762 0.761 0.76 0.759 0.758

0.737 0.736 0.735 0.734 0.733

GSi =

A L R P

Pcdh1

0.867 0.866 0.865 0.864 0.863

0.762

11 10 9 8 7 6 5 4 3 2 1

0.76 0.758 0.756 0.754 0.752 0.75 0.748 0.746

A24

A L R P 0.695

11 10 9 8 7 6 5 4 3 2 1

0.69 0.685 0.68 0.675 0.67 0.665 0.66

P4

Si C3

∑R j =1

i, j

A L R P 0.766

11 10 9 8 7 6 5 4 3 2 1

0.764 0.762 0.76 0.758 0.756 0.754 0.752

11 10 9 8 7 6 5 4 3 2 1

A28

A L R P

0.815 0.81 0.805 0.8

0.795 0.79 0.785 0.78 0.775 0.77

P22

0.795

11 10 9 8 7 6 5 4 3 2 1

P20

∗ E(G, C j ) N

L

R

0.79 0.785 0.78 0.775 0.77

P25

P

High

11 10 9 8 7 6 5 4 3 2 1

ROC curve (G gene) 1

Imputed G expression A

L

R

Low

P 1

11 10 9 8 7 6 5 4 3 2 1

AUC

0 0

False-positive rate

0

1

Binary G expression

G

F 1

20

0.9

18

0.8

Msx1as

A L R P 0.868

4

Define of binary G expression in reference embryo are descripted in GBA method

High

11 10 9 8 7 6 5 4 3 2 1

No single cell is mapped to this sample

C2

N

11 10 9 8 7 6 5 4 3 2 1

16

0.7 0.6 0.5 0.4 0.3 Meis2 (0.96) Enpp3 (0.75) Cmtm8 (0.79) Pcdh1 (0.95) Msx1as (0.91)

0.2 0.1 0

0

0.2

0.4

0.6

False-positive rate

Msx1as

0.8

14 12 10 8 6 4 2

1

0 0

0.2 0.4 0.6 0.8

1

AUC

A L R P 11 10 9 8 7 6 5 4 3 2 1

A L R P

0.763

Imputed expression of G is defined as:

A L R P

A L R P

iii

0.764

RCC between single cell Cj and reference sample Si : Ri,1, Ri,2 …, Ri,5 (j=1, 2, …, 5)

11 10 9 8 7 6 5 4 3 2 1

11 10 9 8 7 6 5 4 3 2 1

A L R P 11 10 9 8 7 6 5 4 3 2 1

0.738

A21

Impute expression of G

Low

Pcdh1

Cmtm8

0.794

A L R P 0.765

Gene G in single cell’s expression (N=5): E(G,C1), E(G,C2), …, E(G,C5)

A L R P

Enpp3

A L R P

C4

A L R P

11 10 9 8 7 6 5 4 3 2 1

0.795

P2

C5

Cmtm8

0.796

A10

A

11 10 9 8 7 6 5 4 3 2 1

A L R P 11 10 9 8 7 6 5 4 3 2 1

Meis2

P20

2

True-positive rate

A L R P

0.778

11 10 9 8 7 6 5 4 3 2 1

A2

Single cells imputed expression pattern 11 10 9 8 7 6 5 4 3 2 1

0.779

0.797

A L R P 0.739

Sinlge cell from posterior region

C1

A L R P

Enpp3

0.78

P4

A10

P

iTranscriptome expression pattern

Meis2

0.781

0.798

A21

max(Ri,j) corresponding to Si Cj is located in Si

11 10 9 8 7 6 5 4 3 2 1

0.782

11 10 9 8 7 6 5 4 3 2 1

0.799

5

Cj location:

A L R P

0.783

7

A L R P 0.8

11 10 9 8 7 6 5 4 3 2 1

0.784

3 R

E 11 10 9 8 7 6 5 4 3 2 1

0.785

11 10 9 8 7 6 5 4 3 2 1

A2

RCC between single cell Cj and reference sample Si : Ri,j

A L R P

P2

9

Reference samples Si (i=1, 2…42)

Remove one zipcode gene per time (leave one out)

A L R P

A L R P

11 10 9 8 7 6 5 4 3 2 1

Single cell mapping (Remove gene G)

P Sinlge cell from anterior region

11

2 L

A

11 10 9 8 7 6 5 4 3 2 1

R

Number of zipcode genes

1A 1P 2A 2P 2L 3A 3P 3L 3R 4A 4P 4L 4R 5A 5L 5R 6A 6P 6L 6R 7A 7P 7L 7R 8A 8P 8L 8R 9A 9P 9L 9R 10A 10P 10L 10R 11A 11P 11L 11R

C

Embryo 3 (Accuracy: 92.68%)

Real domain Predicted domain SVM probability

True-positive rate

Embryo 2 (Accuracy: 85.00%) Sample

EpiSC line (i) PS3 (ii) LMS2 (iii) PS4 (iv) CAV4 (v) CAV2

High

Low N-Ect Sox1 Pax6 +++ +++ +++ +++ +++ + + +++ ++ +

ME Mixl1 +++ +++ +++ + +

N-Ect: neuroectoderm, ME: mesendoderm. Lineage propensity: +++: high, ++: medium, +: low.

(legend on next page)

692 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

embryos of the late to mid-streak stage (see later). This finding raised the possibility that these signature genes (designated as the zip code genes; Table S2) may also be used for mapping samples of cells of other sources to the equivalent epiblast cell population. To this end, RCCs were computed between each queried sample and all 42 LCM samples based on the expression values of the zip code genes, resulting in 42 RCC values for each sample. The RCC data were displayed in a corn plot (zip code mapping, Corn plotter; Figure 1F), with the higher RCC indicating a strong probability of an epiblast position. The protocol of zip code mapping is available in the online users’ manual on the iTranscriptome portal (also see Supplemental Experimental Procedures). To demonstrate the utility of zip code mapping, 70 single cells isolated from the anterior and posterior fragments (each including part of the lateral domain) of the epiblast of one midgastrulation embryo (stage-matched with the embryos used for generating the spatial transcriptome) were subjected to singlecell RNA-seq. Zip code mapping showed that most of the cells had an RCC value above 0.7 across all these 42 samples (Figures S7A and S7B). A position with the maximum RCC (single best fit) was retained as the location of this particular single cell. The results showed that 66 of 70 single cells can be mapped back to the original half-epiblast (Figure 7B). To evaluate whether the single best fit of mapping for a single cell is reliable, we calculated a p value for each single cell through the permutation of zip code genes for 10,000 times (Figure S7C). We found that all the mapping results were significant (p < 0.05). With best-RCC-based single-cell mapping, some single cells could be mapped to a number of adjoined locations. To resolve this ambiguity, a smoothing procedure was used instead to render the single cell’s location as a diffused domain that is defined by the top smoothed RCC values (diffused domain of the ten examples are colored in the corn plot; Figure 7C). Good agreement was found between the maximum RCC (Figure 7B, the same cells in Figure 7C were numbered) and the smoothed RCC results (Figure 7C) in pinpointing the single cell’s position. To further evaluate the accuracy of the zip code mapping, we attempted to infer a spatial expression pattern for each zip code gene. First, the mapped embryo position of each of the 70 single

cells were recorded and the weighted average expression of the particular zip code gene in each node of the corn plot was calculated based on single-cell RNA-seq values in this position (Satija et al., 2015). In this way, a cross-validation was performed by taking out the zip code gene one at a time from the mapping exercise, zip code mapping was then reiterated to re-infer the location of the single cells. Finally, the imputed expression pattern for the held-out gene was compared with its spatial expression pattern as documented in the iTranscriptome (Figure 7D). This reiterative analysis revealed good concordance between the iTranscriptome expression pattern and the imputed pattern for the held-out genes (Figure 7E). When individual 42 sample regions can be correctly classified into the expressed (1) or not expressed (0) state (binarization defined by GBA analysis) as gold standards, the average area under the curve (AUC) for all zip code genes approached 0.7 (Figures 7F and 7G), indicating that the zip code mapping of the single cells can be achieved with high confidence. However, while most zipocode genes performed well (i.e., AUC > 0.6), a subset of genes performed less satisfactorily. The reduced fidelity of imputed expression to the cell position might be affected by insufficient coverage of the single cells isolated from a particular location (e.g., only one single cell has been mapped to region 9A) and the range of innate cellular heterogeneity. We further applied zip code mapping to identify the epiblast cells that are developmentally similar to the epiblast stem cells (EpiSCs). In agreement with our previous finding that EpiSCs are congruent with the anterior primitive streak cells of the midgastrulation embryo (Kaufman-Francis et al., 2014; Kojima et al., 2014a), cells of ten embryo-derived EpiSC lines and one ES cell-derived EpiSC line (ESD-EpiSC) (Zhang et al., 2010) were mapped to epiblast cell populations that encompassed those in the distal posterior domains (1P–4P) (Figures 7H, i-iii and 1F). Interestingly, EpiSCs with enhanced mesendoderm propensity displayed the transcriptome characteristics more of the posterior epiblast (the presumptive mesoderm and endoderm progenitors in the anterior primitive streak, Figure 7H iii; cf. Figure 6F), while EpiSCs with enhanced ectoderm propensity were more similar to the anterior and distal lateral epiblasts that are fated for the ectoderm progenitor (Figure 7H iv, v, cf. Figure 6F).

Figure 7. Zip Code Mapping of Single Epiblast Cells and Stem Cells (A) The SVM prediction of domain label for each sample in embryos E2 and E3 using the scaled expression values of marker genes from reference embryo E1 as the input features. The accuracy is calculated by comparing the predicted and real domain labels. X, imperfect match. (B) Zip code mapping of single cells isolated from the anterior and posterior fragments of the epiblast of mid-gastrulation embryo. The positions of the single cells in the epiblast are predicted based on the single-best fit maximum RCC value. Yellow dots, cells from the anterior epiblast fragment; blue dots, cell from the posterior epiblast fragment. Cells that were also mapped to diffused domains (Figure 7C) by the smoothing procedure are numbered. (C) The location of single cells (numbered in Figure 7B, five single cells each from anterior and posterior regions) defined as diffused domains based on the top smoothed RCC values. Nodes of the six top values are shown for each single cells in the corn plots. (D) The workflow of assessment of single-cell zip code mapping accuracy (detailed method in the Supplemental Experimental Procedures). (E) The mapping results of the iTranscriptome expression pattern (derived from spatial RNA-seq) and the imputed expression pattern of a zip code gene (see Supplemental Experimental Procedures for details of the analysis). The results of five randomly selected held-out zip code genes are presented. The reference sample regions that the single cells were not mapped to are shown as hollow spots. (F) Receiver operating characteristics results showing the accuracy of the imputed expression levels versus the binarized expression values from the spatial RNA-seq data for the five randomly selected zip code genes in (C). AUC values are in brackets. (G) Histogram summarizing AUC values of all zip code genes. (H) Zip code mapping of EpiSCs to five patterns of equivalent in vivo cell populations in the epiblast: (i) posterior-distal, lateral, and anterior epiblast, (ii) posteriordistal and distal lateral epiblast, (iii) posterior epiblast and lateral-intermediate epiblast, (iv) lateral epiblast and distal posterior epiblast, and (v) lateral epiblast and anterior epiblast. The table shows examples of EpiSC lines with different propensity of neuroectoderm and mesendoderm differentiation revealed by the expediency of activation of the lineage markers (data from GEO: GSE46227). See also Figure S7.

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 693

Zip code mapping by spatial transcriptome can therefore be applied for characterizing the spectrum of prospective cell fates of embryo-derived stem cells such as the epiblast stem cells. DISCUSSION Post-implantation mouse development, especially during gastrulation, is a complex process involving rapid cell proliferation, extensive tissue re-modeling and significant changes in transcriptional architecture. It is technically challenging to study the molecular activities underlying the mouse gastrulation due to the limited amount of tissues that could be procured from an individual embryo. Moreover, the rapid changes in genetic activity require a genome-wide and unbiased gene expression profiling of embryos within a narrow window of development. In this study, we have profiled the transcriptome by low-input RNA-seq analysis of groups of about 20 cells in the epiblast of single mid-gastrulation (late mid-streak) stage embryos that were systematically sampled by laser capture microdissection and with all the spatial information retained. On the basis that the genome-wide transcriptome data are integrated with the positional address of the cell samples from a single embryo, we termed this approach single-embryo spatial RNA-seq. Analysis of the single-embryo spatial transcriptome enabled the characterization of expression domains in the epiblast (that correspond to the regionalization of cell fates), the identification of regionspecific marker genes for each domain, the inference on the interaction network for TFs and signaling pathways, and the generation of an in-depth quantitative in silico expression map of over 20,000 transcribed genes in the epiblast of mid-gastrulation embryos in a 3D topographic context. Fate mapping studies of the gastrulating embryo have demonstrated that the progenitors of the major germ layer tissues reside in different regions in the epiblast (Tam and Behringer, 1997; Tam et al., 2006). However, fate maps reveal little about the molecular activities underlying cell fate specification. Although the current study only focuses on one developmental stage, we identified gene expression domains and subdomains that represent the different transcriptional states in the epiblast, providing a high-fidelity molecular annotation for the fate map of germ layer progenitors. Many of the signature genes are known to be functionally relevant to germ layer specification. However, a large number of genes with uncharacterized function could be important new regulators and await further investigation. In this light, the mouse spatial RNA-seq dataset is an invaluable resource for elucidating the functional genomics of lineage specification and differentiation. The advent of high-resolution transcriptome analysis, particularly of single cells, opens a venue to investigate in depth the molecular mechanism of embryogenesis, tissue patterning, lineage allocation, and differentiation (Klein et al., 2015; Macosko et al., 2015; Morrison et al., 2015; Tang et al., 2010; Xue et al., 2013; Yan et al., 2013). For single-cell analysis of tissues, organs, or embryos where topographical location of the cells has a significant bearing on the study of regionalized cellular heterogeneity, it is imperative to know the positional address of the cell of interest. While barcoding or unique gene expression profiling may identify the genetic materials sourced from individual cells (Wilson et al., 2015), an effective systematic coding of the cells by

position is not yet feasible. Spatial mapping of cell positions have been reported recently; however, these studies either revealed spatial resolution primarily in a 2D context with a much lower number of genes detected (Zechel et al., 2014), or required the assembly of sets of known position markers to build a reference map of positional addresses (Achim et al., 2015; Satija et al., 2015), and thus may have limitations in wider application when a detailed expression pattern of gene markers is not available for precise positioning of the single cells in the tissue or embryo. An unbiased spatial transcriptome of the zebrafish embryo has been generated by the tomo-seq technique (Junker et al., 2014). As it is based on the tomographic reconstruction of the RNA-seq data from multiple embryos, the dataset might be subject to between-embryo heterogeneity even if the samples were nominally of the same developmental stages. Analysis of our spatial transcriptome of the mouse epiblast through the unbiased interrogation of the gene expression profile of small cell populations from precisely defined anatomical sites has allowed us to define a set of position-related genes. This gene set could be used as the zip code for mapping the position of a specific cell population of unknown address to the best-matched transcriptional domain in the epiblast. As a validation of this approach, 94% of the single cells were mapped to the correct halves of the embryo and each cell to a predicted position with a high degree of precision (Figure 7B). The unique attributes of the positional dataset are (1) it is a stage-specific reference for the epiblast of embryos at a defined developmental stage, and (2) the dataset is generated from the region-specific quantitative profile of the genome-wide transcriptome of cell populations at known locations. The latter attribute distinguishes our analytical approach (based on d-WISH) from that of two recently published studies (Achim et al., 2015; Satija et al., 2015), where the spatial information is derived from the binary scores of gene expression (based on conventional ISH) at each grid position. It is envisaged that with further training of the dataset using transcriptome data of additional single cells harvested from cell populations at defined positions, the zip code functionality could be enhanced to allow precise mapping the position of every single cell to the mid-gastrulation epiblast. Work is in progress for the acquisition of new single-cell data to be used in machine learning for zip code mapping, and the generation of a spatial transcriptome for the full range of stages of gastrulation. Our spatial transcriptome dataset can also be used for pinpointing the best-fit cell populations that match the embryo-derived stem cells. We demonstrated the utility of zip code mapping to delineate the heterogeneous lineage propensity (the spectrum of cell fates displayed upon differentiation) of EpiSCs derived from the epiblast and the ESCs. The mapping functionality of the spatial transcriptome is a useful adjunct to the elucidation of the cell potency and lineage property of multipotent stem cells. The transcriptome profiling of the current study provide a snapshot of the molecular activity at one stage of gastrulation. Extension of this study to cover the whole period of gastrulation would allow a longitudinal study of lineage hierarchy and diversification of cell populations based on the history of expression of lineage-specific marker genes, gleaned from the expression profile or by lineage-gene reporter expression, in selected cell lineages throughout gastrulation. Taken together, the spatial RNA-seq data of the iTranscriptome

694 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

provide an unprecedented resource of information on the lineage status and differentiation trajectories of epiblast cells and transcriptional and signaling activity that are vital for lineage specification and differentiation, and such knowledge gleaned from the spatial transcriptome has significant implications for understanding the mechanism of embryogenesis and the biological attributes of stem cells. EXPERIMENTAL PROCEDURES Embryo Laser Capture Microdissection and RNA Isolation Late mid-streak (E7.0) embryos (Downs and Davies, 1993) of C57BL/6J mice were embedded in OCT compound and cryosectioned serially at 15 mm. Alternate sections were mounted on polyethylene terephthalatecoated slides and on regular glass slides. Frozen sections were allowed to quickly thaw at room temperature and then dehydrated in ice-cold 100% ethanol for 30 s. Fixation was performed in 75% ethanol for 30 s, then the slides were stained for 30 s with 1% cresyl violet acetate solution (Sigma-Aldrich, prepared in 75% ethanol), dehydrated in a series of 75%, 95%, 100% ethanol (30 s for each step), and finally subjected to LCM on an MMI Cellcut Plus system (MMI, Zurich, Switzerland). Sections mounted on plain slides were stained with 1% cresyl violet acetate solution for histology and imaged for the construction of the 3D embryo template for data presentation. Approximately 20 cells from four positions in each section were harvested by LCM. Cell samples were lysed in 50 ml of 4 M guanidine isothiocyanate solution (GuSCN; Invitrogen, 15577-018) at 42 C for 10 min. The volume of the lysate was adjusted to 200 ml by nuclease-free water, and was further concentrated by ethanol precipitation in the presence of 1/10 volume of acetate sodium (pH 5.7, 3 M; Ambion) and 2 ml of carrier glycogen (20 mg/ml; Roche). Total RNA pellets were dissolved in lysis solution and used as a template for low-cell number RNA-seq. Mouse Embryo Single-Cell RNA-Seq E7.0 C57BL/6E embryos were harvested and incubated in 0.5% Trypsin and 2.5% pancreatin at 4 C. The epiblast dissected free from the extraembryonic tissues, the mesoderm, and endoderm, was bisected into the anterior and the posterior fragments. These fragments were dissociated into single cells by digestion with 50 ml of Accutase (STEMCELL Technologies) at 37 C for 15 min. The single-cell suspension was diluted in PBS-BSA solution and individual cells were picked into PCR tubes by micropipette for modified Smart2seq (Picelli et al., 2014). All animal experiments were performed in compliance with the guidelines of the Animal Ethical Committee of the Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences.

Mouse Embryo 3D Model Reconstruction The mouse embryo 3D model is visualized by Vaa3D (Peng et al., 2010). For 3D visualization of the gene expression pattern, four regions were first colored in each 2D image of the sample slice, performed by a MATLAB-based customwritten program, and the slices were assembled in a 3D format for visualization by Vaa3D. For 3D visualization of the four transcription domains, each region was colored in the 2D image of the sample slice before the input of the colored slice images into Vaa3D. Identifying Additional Marker Genes for Anterior and Posterior Regions Based on Seed Genes To uncover previously unknown anterior and posterior specific maker genes besides the marker genes that are identified by PCA loading (using the FactoMineR package in R), a ‘‘seed gene’’ for each of these two domains was defined as the gene having the highest square cosine (cos2) correlation to both PC1 and PC2 simultaneously, which is carried out using the FactoMineR package (the plot.PCA function with parameter ‘‘select’’ as cos2). GBA (Walker et al., 1999) analysis was used to find the co-expressed genes with the seed genes. To do this, all gene expression values were first Z score normalized across all samples. Then, a binary expression matrix was built by defining a gene as expressed (defined as 1) in a given sample if the normalized expression value was larger than or equal to 0, and as not expressed (defined as 0) if it was smaller than 0. Finally, the p values for observing a given co-expression of two genes were calculated using hypergeometric distribution, followed by Benjamini Hochberg (BH) multiple testing correction. Only co-expressed genes with FDR % 0.05 were identified as additional marker genes for both anterior and posterior regions. Comparison between RNA-Seq Data and In Situ Data To calculate the correlation of RNA-seq and in situ data for a particular gene, the expression values were scale to [0, 1] using the formula (Ei min(E))/ (max(E) min(E)), (i = 1,.,42), where E is the expression values of this gene in all 42 samples, and then Pearson correlation coefficients were calculated.

Identification of Inter-domain Differentially Expressed Genes To find the DEGs between different domains, we first used the unsupervised hierarchical clustering method to classify all samples into four domains (based on a distinctly separated dendrogram; Figure S2A) by using all genes expressed (FPKM > 1.0) in more than two samples and with a variance in transcript level (log10(FPKM + 1)) across all samples greater than 0.05. Next, to identify inter-domain differentially expressed genes, the genes in each domain were compared with other three domains using the t test in MATLAB with 10,000 permutations (Huber et al., 2002). Genes were defined as DEGs if they exhibited a p value %0.01 and a mean fold change of >2.0 or 1.0) in at least two samples and with a variance in transcript level (log10(FPKM + 1) > 0.05) across all samples as the zip code genes (158 zip code genes were identified). 2. Calculate the Spearman RCCs between 42 reference samples and the user’s query transcriptome data based on the expression of these zip

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 695

code genes. The RCC values for each query sample were visualized in a corn plot. The higher RCC indicates a strong probability of a matching position. 3. A smoothing procedure was applied to fit the query sample’s location as a diffused domain. In the smooth processing, 42 RCC values between all samples of reference embryo and the query sample were computed. Each RCC value was mapped on the corn plot by calculating the average value of this RCC and the RCCs of cell samples of adjoining regions in the same spatial domain. The position of the single cell was displayed as a diffused domain, encompassing regions with top smoothed RCC values, in the corn plot (e.g., Figure 7C).

Zip Code Mapping for EpiSCs The gene expression datasets of ten mouse EpiSC lines retrieved from the GEO database (GEO: GSE46227) and the data of one EpiSC line generated by in vitro conversion of ESCs (ESD-EpiSC) were used for zip code mapping. Spearman RCCs were computed between each query sample and all 42 samples of the reference embryo based on the expression values of the zip code genes. The RCC values for each EpiSC sample were collated in a corn plot. Web Service The django package in python was used to construct the web site. The SQLite database was used to store the original gene expression data, the GBA values of any two genes in E1, and the mouse gene annotation data (downloaded from NCBI). The interfaces enable the user to search for the expression pattern of queried genes, and for genes that display a similar expression profile to a query gene or pattern. In addition, transcriptome data (from microarray or RNA-seq analysis) can be submitted to the web portal for zip code mapping onto the spatial transcriptome for delineating cell identity.

Received: September 25, 2015 Revised: February 10, 2016 Accepted: February 22, 2016 Published: March 21, 2016 REFERENCES Achim, K., Pettit, J.B., Saraiva, L.R., Gavriouchkina, D., Larsson, T., Arendt, D., and Marioni, J.C. (2015). High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol. 33, 503–509. Arnold, S.J., and Robertson, E.J. (2009). Making a commitment: cell lineage allocation and axis patterning in the early mouse embryo. Nat. Rev. Mol. Cell Biol. 10, 91–103. Bass, J.I.F., Diallo, A., Nelson, J., Soto, J.M., Myers, C.L., and Walhout, A.J.M. (2013). Using networks to measure similarity between genes: association index selection. Nat. Methods 10, 1169–1176. Beddington, R.S. (1981). An autoradiographic analysis of the potency of embryonic ectoderm in the 8th day postimplantation mouse embryo. J. Embryol. Exp. Morphol. 64, 87–104. Beddington, R.S. (1982). An autoradiographic analysis of tissue potency in different regions of the embryonic ectoderm during gastrulation in the mouse. J. Embryol. Exp. Morphol. 69, 265–285. Cajal, M., Lawson, K.A., Hill, B., Moreau, A., Rao, J., Ross, A., Collignon, J., and Camus, A. (2012). Clonal and molecular analysis of the prospective anterior neural boundary in the mouse embryo. Development 139, 423–436. Chapman, D.L., Agulnik, I., Hancock, S., Silver, L.M., and Papaioannou, V.E. (1996). Tbx6, a mouse T-box gene implicated in paraxial mesoderm formation at gastrulation. Dev. Biol. 180, 534–542.

ACCESSION NUMBERS

Chung, N.C., and Storey, J.D. (2015). Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31, 545–554.

The RNA-seq data reported in this paper were deposited in NCBI Gene Expression Omnibus under accession number GEO: GSE65924.

Downs, K.M., and Davies, T. (1993). Staging of gastrulating mouse embryos by morphological landmarks in the dissecting microscope. Development 118, 1255–1266.

SUPPLEMENTAL INFORMATION

Hoodless, P.A., Pye, M., Chazaud, C., Labbe, E., Attisano, L., Rossant, J., and Wrana, J.L. (2001). FoxH1 (Fast) functions to specify the anterior primitive streak in the mouse. Genes Dev. 15, 1257–1271.

Supplemental Information includes Supplemental Experimental Procedures, seven figures, and five tables and can be found with this article online at http://dx.doi.org/10.1016/j.devcel.2016.02.020.

Huang, D.W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57.

AUTHOR CONTRIBUTIONS N.J., P.P.L.T., and J.D.J.H. conceived the study. N.J. and J.D.J.H. supervised the project. N.J., P.P.L.T., J.D.J.H., and G.P. designed the experiments. G.P., J.C., F.Y., C.L., S.C., G.C., and L.S. performed experiments. S.S., R.W., and N.S. analyzed the sequencing data. W.C. wrote algorithms for 3D image construction and in situ image analysis. G.P., P.P.L.T., J.D.J.H., S.S., and N.J. wrote the paper with the help of all other authors. ACKNOWLEDGEMENTS We thank the Uli Schawarz public laboratory platform in PICB, SIBS, CAS for technical support, Fuchou Tang for help with the single-cell sequencing technology and insightful ideas, Janet Rossant for invaluable suggestions, Pingyu Liu and Qingqing Zhu for assistance and providing materials, Yun Qian for the care of animals, and all the laboratory members for discussions and helpful suggestions. This work was supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA01010201 to N.J., XDA01010303 to J.D.J.H), National Key Basic Research and Development Program of China (2014CB964804, 2015CB964500, 2015CB964803), and National Natural Science Foundation of China (91219303, 31430058, 31401261, 91329302, 31210103916, and 91519330). P.P.L.T. is a Senior Principal Research Fellow of the National Health and Medical Research Council of Australia (grants 1003100 and 1110751).

Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 (Suppl 1 ), S96–S104. Junker, J.P., Noel, E.S., Guryev, V., Peterson, K.A., Shah, G., Huisken, J., McMahon, A.P., Berezikov, E., Bakkers, J., and van Oudenaarden, A. (2014). Genome-wide RNA tomography in the zebrafish embryo. Cell 159, 662–675. Kaufman-Francis, K., Goh, H.N., Kojima, Y., Studdert, J.B., Jones, V., Power, M.D., Wilkie, E., Teber, E., Loebel, D.A., and Tam, P.P. (2014). Differential response of epiblast stem cells to Nodal and Activin signalling: a paradigm of early endoderm development in the embryo. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, http://dx.doi.org/10.1098/rstb.2013.0550. Klein, A.M., Mazutis, L., Akartuna, I., Tallapragada, N., Veres, A., Li, V., Peshkin, L., Weitz, D.A., and Kirschner, M.W. (2015). Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201. Kojima, Y., Kaufman-Francis, K., Studdert, J.B., Steiner, K.A., Power, M.D., Loebel, D.A., Jones, V., Hor, A., de Alencastro, G., Logan, G.J., et al. (2014a). The transcriptional and functional properties of mouse epiblast stem cells resemble the anterior primitive streak. Cell Stem Cell 14, 107–120. Kojima, Y., Tam, O.H., and Tam, P.P. (2014b). Timing of developmental events in the early mouse embryo. Semin. Cell Dev. Biol. 34C, 65–75. Lawson, K.A., Meneses, J.J., and Pedersen, R.A. (1991). Clonal analysis of epiblast fate during germ layer formation in the mouse embryo. Development 113, 891–911.

696 Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc.

Li, L., Liu, C., Biechele, S., Zhu, Q., Song, L., Lanner, F., Jing, N., and Rossant, J. (2013). Location of transient ectodermal progenitor potential in mouse development. Development 140, 4533–4543. Liu, Y., Kaneda, R., Leja, T.W., Subkhankulova, T., Tolmachov, O., Minchiotti, G., Schwartz, R.J., Barahona, M., and Schneider, M.D. (2014). Hhex and Cer1 mediate the Sox17 pathway for cardiac mesoderm formation in embryonic stem cells. Stem Cells 32, 1515–1526. Macosko, E.Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K., Goldman, M., Tirosh, I., Bialas, A.R., Kamitaki, N., Martersteck, E.M., et al. (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214. Mitiku, N., and Baker, J.C. (2007). Genomic analysis of gastrulation and organogenesis in the mouse. Dev. Cell 13, 897–907. Morrison, J.A., Box, A.C., McKinney, M.C., McLennan, R., and Kulesa, P.M. (2015). Quantitative single cell gene expression profiling in the avian embryo. Dev. Dyn. 244, 774–784. Osorno, R., Tsakiridis, A., Wong, F., Cambray, N., Economou, C., Wilkie, R., Blin, G., Scotting, P.J., Chambers, I., and Wilson, V. (2012). The developmental dismantling of pluripotency is reversed by ectopic Oct4 expression. Development 139, 2288–2298.

Tam, P.P., and Behringer, R.R. (1997). Mouse gastrulation: the formation of a mammalian body plan. Mech. Dev. 68, 3–25. Tam, P.P., and Loebel, D.A. (2007). Gene function in mouse embryogenesis: get set for gastrulation. Nat. Rev. Genet. 8, 368–381. Tam, P.P., Loebel, D.A., and Tanaka, S.S. (2006). Building the mouse gastrula: signals, asymmetry and lineages. Curr. Opin. Genet. Dev. 16, 419–425. Tang, F., Barbacioru, C., Bao, S., Lee, C., Nordman, E., Wang, X., Lao, K., and Surani, M.A. (2010). Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468–478. Thiery, J.P., Acloque, H., Huang, R.Y., and Nieto, M.A. (2009). Epithelialmesenchymal transitions in development and disease. Cell 139, 871–890. Walker, M.G., Volkmuth, W., Sprinzak, E., Hodgson, D., and Klingler, T. (1999). Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res. 9, 1198–1203. Wang, B., Yan, X.F., and Jiang, Q.C. (2015). Loading-based principal component selection for PCA integrated with support vector data description. Ind. Eng. Chem. Res. 54, 1615–1627.

Peng, H., Ruan, Z., Long, F., Simpson, J.H., and Myers, E.W. (2010). V3D enables real-time 3D visualization and quantitative analysis of large-scale biological image data sets. Nat. Biotechnol. 28, 348–353.

Wilson, N.K., Kent, D.G., Buettner, F., Shehata, M., Macaulay, I.C., CaleroNieto, F.J., Sanchez Castillo, M., Oedekoven, C.A., Diamanti, E., Schulte, R., et al. (2015). Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724.

Pfister, S., Steiner, K.A., and Tam, P.P. (2007). Gene expression pattern and progression of embryogenesis in the immediate post-implantation period of mouse development. Gene Expr. Patterns 7, 558–573.

Xue, H., Xian, B., Dong, D., Xia, K., Zhu, S., Zhang, Z., Hou, L., Zhang, Q., Zhang, Y., and Han, J.D. (2007). A modular network model of aging. Mol. Syst. Biol. 3, 147.

Picelli, S., Faridani, O.R., Bjorklund, A.K., Winberg, G., Sagasser, S., and Sandberg, R. (2014). Full-length RNA-seq from single cells using Smartseq2. Nat. Protoc. 9, 171–181.

Xue, Z., Huang, K., Cai, C., Cai, L., Jiang, C.Y., Feng, Y., Liu, Z., Zeng, Q., Cheng, L., Sun, Y.E., et al. (2013). Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597.

Power, M.A., and Tam, P.P. (1993). Onset of gastrulation, morphogenesis and somitogenesis in mouse embryos displaying compensatory growth. Anat. Embryol. (Berl) 187, 493–504.

Yamamoto, M., Meno, C., Sakai, Y., Shiratori, H., Mochida, K., Ikawa, Y., Saijoh, Y., and Hamada, H. (2001). The transcription factor FoxH1 (FAST) mediates Nodal signaling during anterior-posterior patterning and node formation in the mouse. Genes Dev. 15, 1242–1256.

Raya, A., and Izpisua Belmonte, J.C. (2006). Left-right asymmetry in the vertebrate embryo: from early information to higher-level integration. Nat. Rev. Genet. 7, 283–293. Robb, L., and Tam, P.P. (2004). Gastrula organiser and embryonic patterning in the mouse. Semin. Cell Dev. Biol. 15, 543–554. Rojas, A., De Val, S., Heidt, A.B., Xu, S.M., Bristow, J., and Black, B.L. (2005). Gata4 expression in lateral mesoderm is downstream of BMP4 and is activated directly by Forkhead and GATA transcription factors through a distal enhancer element. Development 132, 3405–3417. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., and Regev, A. (2015). Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502. Snow, M. (1977). Gastrulation in the mouse: growth and regionalization of the epiblast. J. Embryol. Exp. Morphol. 42, 293–303. Solnica-Krezel, L., and Sepich, D.S. (2012). Gastrulation: making and shaping germ layers. Annu. Rev. Cell Dev. Biol. 28, 687–717. Tam, P.P., and Beddington, R.S. (1992). Establishment and organization of germ layers in the gastrulating mouse embryo. Ciba Found. Symp. 165, 27–41, [discussion: 42–9].

Yan, L., Yang, M., Guo, H., Yang, L., Wu, J., Li, R., Liu, P., Lian, Y., Zheng, X., Yan, J., et al. (2013). Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131–1139. Zechel, S., Zajac, P., Lonnerberg, P., Ibanez, C.F., and Linnarsson, S. (2014). Topographical transcriptome mapping of the mouse medial ganglionic eminence by spatially resolved RNA-seq. Genome Biol. 15, 486. Zhang, K., Li, L., Huang, C., Shen, C., Tan, F., Xia, C., Liu, P., Rossant, J., and Jing, N. (2010). Distinct functions of BMP4 during different stages of mouse ES cell neural commitment. Development 137, 2095–2105. Zhang, W., Liu, Y., Sun, N., Wang, D., Boyd-Kirkup, J., Dou, X., and Han, J.D. (2013). Integrating genomic, epigenomic, and transcriptomic features reveals modular signatures underlying poor prognosis in ovarian cancer. Cell Rep. 4, 542–553. Zhu, Q., Song, L., Peng, G., Sun, N., Chen, J., Zhang, T., Sheng, N., Tang, W., Qian, C., Qiao, Y., et al. (2014). The transcription factor Pou3f1 promotes neural fate commitment via activation of neural lineage genes and inhibition of external signaling pathways. Elife 3, e02224.

Developmental Cell 36, 681–697, March 21, 2016 ª2016 Elsevier Inc. 697

Developmental Cell, Volume 36

Supplemental Information

Spatial Transcriptome for the Molecular Annotation of Lineage Fates and Cell Identity in Mid-gastrula Mouse Embryo Guangdun Peng, Shengbao Suo, Jun Chen, Weiyang Chen, Chang Liu, Fang Yu, Ran Wang, Shirui Chen, Na Sun, Guizhong Cui, Lu Song, Patrick P.L. Tam, Jing-Dong J. Han, and Naihe Jing

INVENTORY OF SUPPLEMENTAL MATERIALS Figure S1, related to Figure 1. The spatial RNA-seq analysis of the mid-gastrulation mouse embryo Figure S2, related to Figure 2. Comparative analysis of inter-domain and inter-embryo differentially expressed genes (DEG) Figure S3, related to Figure 3. The region-specific gene identification Figure S4, related to Figure 4. The anterior and posterior sub-region partition and response of SG3 genes to Foxh1 Figure S5, related to Figure 5. The overlapped target genes for selected TC1 and TC2 transcription factors Figure S6, related to Figure 5. The transcription factor (TF) network inferred from the spatial transcriptome data Figure S7, related to Figure 7. The determination of mapped position for single cells Table S1, related to Figure 2. Expression values for DEGs (log10 transformed) Table S2, related to Figure 3. Region-specific marker gene (zipcode gene-set) information Table S3, related to Figure 4. Subdomain genes of anterior and posterior epiblast Table S4, related to Figure 4. Posterior subdomain (PD1-PD3) signature gene information Table S5, related to Figure 5. Developmental-related TF information Supplemental Experimental Procedures Supplemental References

A Embryo collection

Fixation and cresyl violet staining

Cryosection

Low RNA input PCR amplification

Laser microdissection

High throughput sequencing

R A

P L

B

C S1

A L

R

S2

S3

S4

S5

(ii)

S6

P

(iv) S7

S8

S9

S10

S11

(v)

(i)

S12

5M

5M

1M

M

1M

0.

M

05

0.

0.

5M

01

0.

00

0.

05 00

Number of reads

1M

0

M

2000

10

4000

0.

5M

5M

1M

1M

0.

0.

M

M 05

M

5M

01

0.

0.

00

00

0.

0.

05 00

1M

0

1P 2P 3P 4P 5P 6P 7P 8P 9P 10P 11P

6000

M

2000

10

4000

8000

M

2R 3R 4R 5R 6R 7R 8R 9R 10R 11R

6000

10000

00

8000

12000

0.

10000

0.

Number of reads

Number of genes (FPKM>1.0)

12000

M

5M

5M

1M

M

1M

0.

M

05

0.

0.

01

00

0.

05 00

Number of reads

5M

0

10

2000

0.

5M

1M

M

5M

1M

0.

0.

M

05

M

5M

01

0.

0.

00

00

0.

0.

05 00 0.

1M

0

4000

M

2000

2L 3L 4L 5L 6L 7L 8L 9L 10L 11L

6000

10

4000

8000

M

1A 2A 4A 5A 6A 7A 8A 9A 10A 11A

6000

10000

1M

8000

12000

00

10000

0.

12000

0.

Number of genes (FPKM>1.0)

Number of genes (FPKM>1.0)

D

Number of genes (FPKM>1.0)

(iii)

Number of reads

E 3A correlated with mean values of PCC

?

3A

√ 2A

4R 4L 3R 3L 2R 2L

E2

0.9950

0.9965 √

0.9877

0.9517

0.9792 √

0.9186

4P

Virtual 3A (Mean of 2A and 4A in E1)

3P

2P

2

F

1

0.5

1

1 2 3 Log10(FPKM+1)

4

5

0 -1

E3

0.9580 2.5

2R 3R 4R 5R 6R 7R 8R 9R 10R 11R

1.5

0

1 2 3 Log10(FPKM+1)

4

5

1

0 -1

1P 2P 3P 4P 5P 6P 7P 8P 9P 10P 11P

2

0.5

0.5

0

0.9819

2 Density

1.5

3A

E2

2.5

2L 3L 4L 5L 6L 7L 8L 9L 10L 11L

2 Density

Density

2.5

1A 2A 3A 4A 5A 6A 7A 8A 9A 10A 11A

1.5

0 -1

Virtual 3A correlated with

PCC

3

2

[3L, 3R] (horizontal)

E3

4

2.5

[2A, 4A] (vertical)

Density

√ 4A

[2A, 4A, 3L, 3R] (vertical-horizontal)

1.5 1 0.5

0

1 2 3 Log10(FPKM+1)

4

5

0 -1

0

1 2 3 Log10(FPKM+1)

4

5

Figure S1, related to Figure 1: The spatial RNA-seq analysis of the mid-gastrulation mouse embryo

1

(A) The procedure of spatial RNA-sequencing. Embryos were cryo-sectioned serially along the distal to proximal axis. Cells from four quadrants (A, anterior; P, posterior; R: right; L: left) of each cresyl violet-stained section were collected by laser capture microdissection (LCM). RNA samples were subjected to PCR preamplification and low input RNA sequencing. (B) An overview of the embryo sections before LCM. S1-S11 represents the sections from the distal to the proximal region of the embryo. S12, which contained the amniotic fold, was in the extraembryonic region. (C) The series of step for LCM capture of the cell population: (i) the sample section, (ii) anterior LCM area marked, (iii) anterior quadrant harvested, posterior sample being dissected, (iv) posterior quadrant harvested, left LCM area marked and (v) left quadrant harvested, right area marked for dissection. (D) Saturation analysis for reads for samples from each quadrant. Different numbers of mapped reads were randomly selected thrice, and the mean number of detected genes with FPKM>1.0 was plotted. (E) Computational generation of sample 3A dataset for the reference embryo (E1). First, the correlation between 3A values and mean values of surrounding samples were determined by Principal Component Correlation (PCC) analysis in embryo 2 (E2) and embryo 3 (E3). Then, a virtual 3A was generated by the mean value of 2A and 4A from E1. The value of virtual 3A was further compared with actual 3A samples in E2 and E3. (F) Gene expression density for each quadrant position. Gene expression (FPKM) was log10 transformed and plotted against the density.

2

C

Lateral-Left

14 12 10 8 6 4 2 0

Lateral-Right

Anterior

Posterior

3D mapping for gene expression patterns

Lateral-Left

5

Posterior

Anterior

G1

G2

G2

G2

G3

G3

G3

High

Low

D

Lateral-Left

Lateral-Right

High

High

F E1

Up regulated EMT gene Down regulated EMT gene 8A 7A 9A 6A 10A 5A 4A 3A 2A 1A 2L 2R 3R 5L 3L 4R 5R 4L 6R 6L 7R 7L 8L 10R 11A 8R 9L 9R 10L 11L 11R 11P 10P 9P 8P 6P 7P 5P 4P 3P 2P 1P

Col1a2 Mmp2 Sparc Tcf4 Col5a2 Ahnak Tfpi2 Rgs2 Snai3 Tmem132a Cdh1 Spp1 Gng11 Itgav Vim Timp1 Vps13a Igfbp4 Msn Bmp1 Nudt13 Krt19 Itga5 Cdh2 Gsc Fn1 Snai1 Twist1 Wnt5b Vcan Steap1 Fgfbp1 Desi1 Tmeff1 Sox10 Wnt5a Tspan13

E2

27

26 1

P value 3.67e-06 4.14e-03 1.25e-05 2.72e-10

1 34 E3

H

Modification terms Enrichment Histone acetylation 1.67 Histone methylation 1.42 Histone phosphorylation 2.09 Histone ubiquitination 2.57

Posterior

Low

E P samples

15

3D mapping for gene expression patterns

G1

Low

10 Embryo 3

Lateral-Right

3D mapping for gene expression patterns

G1

A samples

0

Embryo 2

Embryo 1 Anterior

Number of gene group

B

7A 8A 9A 6A 10A 5A 4A 3A 2A 1A 2L 2R 3R 5L 3L 4R 4L 7R 5R 6L 6R 7L 8L 8R 9L 9R 10L 11L 11R 10R 11A 11P 10P 9P 8P 6P 7P 5P 4P 3P 2P 1P

A

Gene

Expression pattern by # embryo displaying spatial transcriptome asymmetricalexpression (N)

Cxx1c

E2 (LR)

0 (7)

Prex1

E1 (L>R)

0 (4)

Rab28

E1 (L>R)

0 (3)

Rgs10

E1 (L>R), E3 (LR)

0 (6)

Tes

E3 (L>R)

0 (4)

G L

A P

R

L

R

L

R

L

R

L

R

L

R

P A

Cxx1c

Prex1

Rab28

Rgs10

Spry1

Tes

Figure S2, related to Figure 2: Comparative analysis of inter-domain and inter-embryo differentially expressed genes (DEG) (A) The hierarchical clustering analysis for genes that are expressed (FPKM>1.0) in at least two samples and with a variance in transcript level log10(FPKM+1) of > 0.05 across all samples. The samples with different colors represent different domains identified in the dendrogram. (B) Tuning the λ parameter of the BIC scoring function to find the optimal number of clusters, showing the relationship between the optimal number of gene clusters (y axis) versus the λ parameter in BIC scoring function (x axis). (C) 3D visualization of median expression level of each gene group in the four quadrants of the epiblast of the three embryos in this study. (D) The expression pattern of epigenetic modifications factors across the D1-D4 transcriptome domain (samples are ordered as Figure 2A) and the different enriched modification types for genes that are highly expressed in anterior and posterior regions.

3

(E) The expression pattern of up-regulated (red color) and down-regulated (green color) EMT genes across the D1-D4 transcriptome domains (samples are ordered as Figure 2A). (F) Asymmetrically expressed genes in three separate embryos (E1-E3). Asymmetric genes between left samples ( -11 ) and right samples ( -11 ) were identified based on paired t-test s value . 1 and fold change . or . . nly genes were found to be asymmetrically expressed in more than one embryo. (G) WISH showing expression of six selected asymmetric genes identified in the spatial transcriptome: Cxx1c, rex1, ab , gs1 , pry1 and Tes. airs of images are shown for the left and right views of each embryo. (H) WISH data showing no discernible left-right difference in the expression of the six selected genes that showed asymmetrical expression in spatial transcriptome. bbreviations , left , right , wea er expression; > stronger expression.

4

C

40 30

Scaled expression value for RNA-seq

Number of gene group

A 50

20 10 0

B

0

0.5

MD2

1.5

2

MD3

MD4

MG3

Group

MG2

MG1

MD1

1

Cbx7

MG4

0.5 1.5

0.5 1.5

r = 0.80

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

0

-0.2

0

0.5

-0.2

1

r = 0.71

1

0

0.5

1

Slc7a3 r = 0.63

1

-0.2

1

1

0.5

1

Zfp462 r = 0.60

-0.2

1

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0

0

0 -0.2

0

0.5

1

0

Sall3 r = 0.77

0.5

1

0

0.5

1

Ccnd2 r = 0.62

0

-0.2

-0.2

Sall2 r = 0.76

0

0

0.8

0

0.5

1

-0.2

0

0.5

1

Sp5 1

0.8

0.8

0.6

0.6

0.4

0.4 0.2

0.2

0.5 1.5

1

1

1

0.5 1.5

Cldn7 r = 0.82

1

0

0

-0.2

-0.2

0

0.5

1

Log10 (FPKM+1)

0

0.5

1

Scaled expression value for in situ

E

PC1

D 2.23 2.00 1.81 1.76 1.62 1.58 1.54 1.51 1.48 1.46

Wnt Signaling Pathway Hedgehog Signaling Pathway Pathway Axon Guidance Retinol Metabolism

-2 -1 0

1

2

Pou3f1 Slc7a3 Grb7 Cldn7 Crb3 Sox2

raction Regulation Of Actin Cytoskeleton Transcription Factors Focal Adhesion Mapk Signaling Pathway Fc Epsilon Ri Signaling Pathway Transporters Cell Adhesion Molecules (Cams) Tight Junction Leukocyte Transendothelial Migration

Dtx1 L1td1 Gap43 Gse1 Cgn Pgap1 Sall2 Enc1 Stmn2 Grp Meg3 Foxn3 Rab25 Eras Sfrp2 Hes6 Zbtb8a Rln3 Marveld3 Rab20 Mt2 Gadd45g Snap25 Mafg G2e3 Dph5 Fst Cxx1c Hes1 Nanog Grrp1 Wnt3 Sp8 Nodal Cdkn1a

NES

NES

5A 4A 8A 2A 6A 1A 3R 2R 4R 2L 3L 5L 4L 10A 7R 6R 6L 5R

8R 7A

7P 6P

11A

11L 11R 10R 7L 10L 8L 1P 2P 3P 4P 11P 10P 5P 8P

Glycine, Serine And Threonine Metabolism Spliceosome DNA Replication Pentose Phosphate Pathway Nucleotide Excision Repair Purine Metabolism Oocyte Meiosis One Carbon P Folate

15

11R

10L 8R

10

Pyrimidine Metabolism Transcription Factors

11A

5R 7L

10R

11P 10P

2R

5

6L 2L 3R 4R

8L

7P

0

PC2

ryotes MRNA Surveillance Pathway rated Fatty Acids

4L

+

7A 6A

-5

Cyanoamino Acid Metabolism Cell Cycle RNA Polymerase Porphyrin And Chlorophyll Metabolism Valine Keratan Sulfate Ribosome Pathway

2A

-10

1.57 1.57 1.56 1.53 1.50 1.50 1.47 1.46 1.46 1.45 1.45 1.44 1.43 1.40

Mixl1 Fn1 Lhx1 Aplnr Fgf3 Dll3 Rbp1 Ninj2 Cdkn3 Rbm47 Foxa2 Pou5f1 Otx2 Pml

5L

8A

2P

10A

5A

8P 5P 6P

4P 7R 6R 3P

1A 1P

-15

PC2

1.81 1.75 1.70 1.67 1.65

T

3L 4A

-10

0

10

20

PC1

Figure S3, related to Figure 3: The region-specific gene identification (A) The determination of optimal gene group numbers by BIC scoring function for selected marker genes identified by the highest PC loadings. (B) The violin plot of mean gene expression values for each module gene expression group (MG) in each module expression domain (MD). Each violin plot shows the frequency distribution of the mean transcript level (log10 transformed FPKM) of marker genes per sample. (C) The linear correlation between the pattern of scaled expression of genes shown in Figure 3D. The RCC value is indicated for each plot. (D) The gene set enrichment analysis (GSEA) for enriched KEGG pathways along the first two PCs. The correlation between PC1 or PC2 and genes in the GS1 were calculated as the rank list of GSEA. Gene function was ranked by the enrichment score. NES: normalized enrichment score.

5

(E) The qPCR verification of the module gene expression domains. Domain-specific genes were profiled by multiplex qPCR on Fluidigm Biomark HD and the results were analyzed by hierarchical clustering (upper panel) and PCA (lower panel).

6

B

A

A L R P

Enriched GO terms

Nervous system development Multicellular organismal development Anatomical structure development Single-organism developmental process Developmental process Neurogenesis Generation of neurons Negative regulation of kinase activity 0

2

4

6

8

10

1.1 1 0.9 0.8 0.7

D Enriched GO terms

Pattern specification process Regionalization Tissue development Embryonic morphogenesis Anterior/posterior pattern specification Epithelium development Embryo development Tissue morphogenesis Regulation of anatomical structure morphogenesis Anatomical structure morphogenesis 10

5

15

20

A L R P

A L R P

25

2.5

11 10 9 8 7 6 5 4 3 2 1

0.9 0.8 0.7 0.6 0.5

2 1.5 1 0.5 0

Mixl1

Anterior

Dkk1

40

Mesp1 8

6

30

10 0

0.5

1

1.5

Expression

20

Expression

4

2

6 4 2

Posterior

50

0

0.5

G

1

1.5

A L R P

PD1

11 10 9 8 7 6 5 4 3 2 1

A L R P 1.5 1 0.5

Noto

0

11 10 9 8 7 6 5 4 3 2 1

PD2

11 10 9 8 7 6 5 4 3 2 1

0.6 0.4 0.2

PD3

0

0.4 0.2 0

1 0.8 0.6 0.4 0.2 Prss22

1 0.8 0.6 0.4 0.2

11 10 9 8 7 6 5 4 3 2 1

0

1

0.5

Wnt2

0

0.5

11 10 9 8 7 6 5 4 3 2 1

0

0.8 0.6 0.4 0.2

0.5

1700003E16Rik

11 10 9 8 7 6 5 4 3 2 1

0

1 0.5

Mesp2 A L R P

1 0.5 0

11 10 9 8 7 6 5 4 3 2 1

0

IR H +C K 1

xh Fo 0.5

Dnajb13

0

11 10 9 8 7 6 5 4 3 2 1

1

0.5

Them5

0

A L R P 2 1.5 1 0.5

Apoc2

1

A L R P 1.5

Wnt3a

1.5

D

+C D 1

A L R P

Lgals2

A L R P 11 10 9 8 7 6 5 4 3 2 1

A L R P 1

11 10 9 8 7 6 5 4 3 2 1

ol on

H K 1 xh

1

Dcpp3

tr

IR

ol on C

K 1 xh Fo

A L R P 11 10 9 8 7 6 5 4 3 2 1

A L R P

A L R P 1.2

Apoa4

0.6

11 10 9 8 7 6 5 4 3 2 1

A L R P 11 10 9 8 7 6 5 4 3 2 1

0.8

11 10 9 8 7 6 5 4 3 2 1

A L R P 0.8

Dll1

A L R P 1

Spag8

A L R P

Fo

0

tr

H +C D

+C D K xh

1

10 0

IR

ol tr on

H

C

20

C

on

tr

IR

ol

0

30

C

40

Fo

Number of gene group

0.5

GBA-Mixl1

50

60 Number of gene group

1

F

60

0

1.5

Pou3f1

11 10 9 8 7 6 5 4 3 2 1

-Log10 (p value)

E

11 10 9 8 7 6 5 4 3 2 1

GBA-Pou3f1

-Log10 (p value)

C

0

A L R P

1.2

11 10 9 8 7 6 5 4 3 2 1

System development

0

11 10 9 8 7 6 5 4 3 2 1

1 0.8 0.6 0.4 0.2 Hoxb1

0

Figure S4, related to Figure 4: The anterior and posterior sub-region partition and response of SG3 genes to Foxh1

7

(A-D) The GO enrichment analysis for GBA gene lists (FDR 5 taken as the candidate subdomain-specific marker genes. Transcription factors (TFs) analysis Mouse TFs were collected from RIKEN (Kanamori et al., 2004) and AnimalTFs (Zhang et al., 2012) databases. The pairwise expression PCC of these TFs were used to calculate CSI scores (Bass et al., 2013). Hierarchical clustering was performed based on the CSI scores (cutoff: CSI>0.7). Next, the CSI coexpression network was constructed based on CSI scores. We also added the edges with negative PCC, where the corresponding CSI scores were calculated based on the absolute pairwise PCC. ChIP-seq data collection The ChIP-Seq data of hub genes Sox2 (GSE44288), Sox3 (GSM818950), Pou3f1 (GSE54569), Smad1 (GSE29196) and Snai1 (GSE24852) was downloaded from NCBI GEO. ChIP-seq data processing The Bowtie2 version 2.0.0 alignment tool was used to align ChIP-Seq reads to the mouse genome build mm10 (Langmead and Salzberg, 2012). Only reads with less than two mismatches that uniquely mapped to the genome were used in subsequent analyses. Using FindPeaks Homer software, Sox2, Sox3, Pou3f1, Smad1 and Snai1 binding peaks were identified in ChIP experiments compared with the control (Heinz et al., 2010). We calculated the distance from the peak centers to the annotated transcription start sites (TSS) and then defined the nearest genes as peak-related genes. R program is used to draw the Venn diagram of overlapped genes. Target gene enrichment analysis for different signaling pathways



15

Four published GEO datasets (GSE48092, GSE41260, GSE17879 and GSE31544) for signaling pathway perturbations were assessed by comparing control samples with treatment samples using RankProd (p value≤0.001) (Hong et al., 2006) to identify the potential signaling target genes of the BMP, FGF, Nodal and WNT pathways. To determine the enrichment of the target genes in different gene groups, one-sided Fisher’s exact test followed by BH multiple testing correction were performed to evaluate the statistics significance. GSEA was used to evaluate whether a certain signaling pathway target gene set is significantly enriched in each sample (only used inter-domain DEGs). Briefly, the mean-scaled expression values (divided by the mean) of inter-domain DEGs in each sample were used as the rank file and the target gene sets of different signaling pathways were used as the test gene set to run GSEA (FDR≤0.25). Protocol for zipcode mapping in web service (1) Open the Zipcode mapping page in web service and then click “Scan...”button to upload the query transcriptome file (See User’s manual for file format). In this file, multiple samples can be included, the row represent genes and the column represent samples. (2) Click “Send” button to submit this query file to calculate the RCC based on the expression of zipcode genes. After several minutes (depend on the number of samples), this page will return the results. When the uploaded file does not contain all zipcode genes, only overlapped genes with zipcode genes will be used to calculate the RCC, and then a notice will be displayed in the output page. The Zipcode mapping utility can process more than one samples contained in transcriptome file, but only the result of the first sample is shown in the output page. (3) Click the hyperlink in the output page to download the RCC results for all samples. (4) Click another hyperlink in the output page to download the CornPlotter program, to draw the corn plots for downloaded RCC results of all query samples using the downloaded RCC results for all samples into the CornPlotter to visualize the results in corn plots. Evaluating the performance of single cell mapping To assess whether the single cells are correctly mapped to reference samples, an in silico catalog of inferred d-WISH patterns was generated by calculating, for each gene, its expected expression level in each of the 42 reference samples. For each gene G in the list of zipcode genes that contributed to the mapping of the single cell to a specific reference sample, the following steps of analysis was performed: (1) Remove G from the zipcode, and retain the remaining genes; (2) Redo the single cell mapping using the remaining genes by the above single cell mapping procedure. In this case, the expression profile of G was not known, and would not influence the downstream inferences. For all reference samples Si (i=1, 2…42), and the single cells Cj (j=1, 2…70), the RCC value between Si and Cj is represented as Ri,j. For each single cell, the mapped location was the corresponding reference sample with the maximum Ri; (3) Impute the spatial expression pattern of G across the whole epiblast through the RCC-weighted estimation of gene expression (excluding the region where the single cell cannot be mapped to). Specifically, for all reference samples Si, we calculated the imputed expression level of gene G as: N

G Si =

∑R

i, j

∗ E (G , C j )

j =1

N

Where E (G,Cj) represents the expression level (log-transformed) of G in cell Cj,, and N is the number of single cells that mapped to Si; (4) Use the area under the curve (AUC) of receiver operating characteristic (ROC) curve to test whether the imputed expression levels of G were sufficient to correctly classify the binarized expression as defined by the GBA method. The pseudo codes for ROC analysis are the following: z-score normalization of the expression of G gene in iTranscriptome as ZscoreValue. Define BinaryVector; For i in each of the 42 samples if ZscoreValue(i)>=0 BinaryVector(i)=1;



16

else BinaryVector(i)=0; ROC processing: Define ImputeVector as the imputed expression values for G gene. TotalPostiveCount= number of 1 in BinaryVector; TotalNegativeCount= number of 0 in BinaryVector; For i in each of the 42 samples sorted by imputed expression in ImputeVector, cumulative from high to low TruePostive= TruePostiveEnclosed_from_1_to_i/ TotalPostiveCount; FalsePostive= FalsePostiveEnclosed_from_1_to_i / TotalNegativeCount; Calculate total AUC.



17

SUPPLEMENTARY REFERENCES Bass, J.I.F., Diallo, A., Nelson, J., Soto, J.M., Myers, C.L., and Walhout, A.J.M. (2013). Using networks to measure similarity between genes: association index selection. Nature methods 10, 1169-1176. Cabili, M.N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., and Rinn, J.L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915-1927. Chang, C.C., and Lin, C.J. (2011). LIBSVM: A Library for Support Vector Machines. Acm Transactions on Intelligent Systems and Technology 2. Chung, N.C., and Storey, J.D. (2015). Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31, 545-554. Guo, G., Huss, M., Tong, G.Q., Wang, C., Li Sun, L., Clarke, N.D., and Robson, P. (2010). Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. Dev. Cell 18, 675-685. Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576-589. Hong, F.X., Breitling, R., McEntee, C.W., Wittner, B.S., Nemhauser, J.L., and Chory, J. (2006). RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics 22, 2825-2827. Huang, C., Xiang, Y., Wang, Y., Li, X., Xu, L., Zhu, Z., Zhang, T., Zhu, Q., Zhang, K., Jing, N., et al. (2010). Dual-specificity histone demethylase KIAA1718 (KDM7A) regulates neural differentiation through FGF4. Cell Res. 20, 154-165. Huang, D.W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44-57. Kanamori, M., Konno, H., Osato, N., Kawai, J., Hayashizaki, Y., and Suzuki, H. (2004). A genome-wide and nonredundant mouse transcription factor database. Biochem. Biophys. Res. Commun. 322, 787-793. Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359. Medvedeva, Y.A., Lennartsson, A., Ehsani, R., Kulakovskiy, I.V., Vorontsov, I.E., Panahandeh, P., Khimulya, G., Kasukawa, T., Consortium, F., and Drablos, F. (2015). EpiFactors: a comprehensive database of human epigenetic factors and complexes. Database (Oxford) 2015, bav067. Picelli, S., Bjorklund, A.K., Faridani, O.R., Sagasser, S., Winberg, G., and Sandberg, R. (2013). Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature methods 10, 1096-1098. Trapnell, C., Hendrickson, D.G., Sauvageau, M., Goff, L., Rinn, J.L., and Pachter, L. (2013). Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46-53. Watanabe, K., Kamiya, D., Nishiyama, A., Katayama, T., Nozaki, S., Kawasaki, H., Watanabe, Y., Mizuseki, K., and Sasai, Y. (2005). Directed differentiation of telencephalic precursors from embryonic stem cells. Nat. Neurosci. 8, 288-296. Weiszmann, R., Hammonds, A.S., and Celniker, S.E. (2009). Determination of gene expression patterns using high-throughput RNA in situ hybridization to whole-mount Drosophila embryos. Nat. Protoc. 4, 605618. Zhang, H.M., Chen, H., Liu, W., Liu, H., Gong, J., Wang, H., and Guo, A.Y. (2012). AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res. 40, D144-149. Zhang, K., Li, L., Huang, C., Shen, C., Tan, F., Xia, C., Liu, P., Rossant, J., and Jing, N. (2010). Distinct functions of BMP4 during different stages of mouse ES cell neural commitment. Development 137, 20952105.



18