Katherine B. McCauley, Konstantinos-Dionysios Alysandratos, Anjali Jacob, Finn ... Morley, Gianni Carraro, Seunghyi Kook, Susan H. Guttentag, Barry R. Stripp, ...
Stem Cell Reports, Volume 10
Supplemental Information
Single-Cell Transcriptomic Profiling of Pluripotent Stem Cell-Derived SCGB3A2+ Airway Epithelium Katherine B. McCauley, Konstantinos-Dionysios Alysandratos, Anjali Jacob, Finn Hawkins, Ignacio S. Caballero, Marall Vedaie, Wenli Yang, Katherine J. Slovik, Michael Morley, Gianni Carraro, Seunghyi Kook, Susan H. Guttentag, Barry R. Stripp, Edward E. Morrisey, and Darrell N. Kotton
SUPPLEMENTAL DATA ITEMS Supplemental Figures
a
b GFP
Nkx2-1 ; Scgb1a1tdTomatoTr mouse tail tip fibroblasts
Subclone pluripotent colonies
Infect with STEMCCA-Frt lentivirus
5’ LTR
EF1a
Subclone excised colonies
FRT
Characterize miPSCs
c
EF1a IRES WPRE 3’ LTR Oct4 Klf4 Sox2 c-Myc
FRT
FRT
+Flp recombinase
d
FRT
Oct4 DAPI
Undifferentiated iPSCs
Alkaline Phosphatase Phase
IRES WPRE 3’ LTR Sox2 c-Myc
Reverse transcription and integration
5’ LTR
Excise STEMCCA (Adeno-Flp)
Oct4 Klf4
Figure S1. Summary of procedure and results for generating PSCs from mouse tail tip fibroblasts. a) Schematic of generation of reprogrammed PSCs from tail tip fibroblasts. b) Schematic of STEMCCA lentiviral cassette pre- and post-excision of the Frt-flanked STEMCCA sequence. c) Representative images of pluripotency marker staining on reprogrammed, Flp-excised Nkx2-1GFP; Scgb1a1tdTomatoTrace miPSCs. Scale = 50 µm. d) G-band karyotype analysis of reprogrammed line showing normal male karyotype. Related to Figure 1.
a
CRISPR gRNA ATG
e1
SCGB3A2 locus
* e3
e2 5’arm
3’arm
P2A
5’arm
CherryPicker
*
NeoR
PGK
3’arm
LoxP
LoxP
Homologous recombination F1 3’ ATG
P2A
e1
Targeted SCGB3A2 locus
e2
CherryPicker
e3
R1 3’
2902 bp
*
F2 3’
1715 bp
PGK
NeoR
LoxP
R2 3’
LoxP e3
Cre excision ATG Targeted, Cre-excised SCGB3A2 locus
e2
F1 3’ – R1 3’
CherryPicker
c
* LoxP e3
SSEA-4
Tra-1-60
SSEA-1
+DNA
+DNA
+DNA
d RUES2 SCGB3A2CherryPicker C22.17BCr1
H20
Untargeted (PiZZ1)
C22.17 B
H20
Targeted
Untargeted (PiZZ1)
C21.23
C22.17 B
e3
F2 3’ – R2 3’
Targeted
C21.23
b
P2A
e1
2902 bp – — 1715 bp
Figure S2. Summary of procedure and results for generating SCGB3A2CherryPicker hPSC line. a) Full schematic of targeting strategy to insert a CherryPicker gene construct and antibiotic selection cassette at the stop codon of the native SCGB3A2 locus. b) PCR screen showing successful targeting and of the SCGB3A2 locus. c) Representative immunofluorescence images of targeted, Cre-excised SCGB3A2CherryPicker line stained for pluripotency markers SSEA-4 and Tra-1-60. SSEA-1 is a negative control, as it is not expressed in human PSCs. Scale = 100 µm. d) Gband karyotype analysis of targeted, Cre-excised SCGB3A2CherryPicker line showing normal female karyotype. Related to Figure 2.
a
b −2
−1
0
1
Upregulated Day 36 SC+ vs. Day 36 SC-
2
FDR
Row Z−Score
Day 15 CD47high
Day 22
Day 36 SCGB3A2-
Day 36 SCGB3A2+ SCGB3A2 SCGB1A1 PTN ROS1 CYBRD1 SFTPA1 SFTA3 AQP4 LRRK2 ZNF385B STEAP4 CD36 RP1 HP CHI3L2 LPL CXCL17 LMO3 GEM GPR116 CYP2B7P1 TREM1 CYP4B1 LAMP3 LOC100996304 NAPSA SLPI ACPP HRASLS2 KIAA1324 PI3 ATP8A1 XKRX LOC728606 ACSL1 SULF1 CYP1B1 SCNN1B PDPN HRASLS5 ATP13A4 ABCA3 TRPC6 PCSK2 CDC20B PRR16 SFTPD CSF3R DUOXA1 HOPX NMU OSMR F3 SPOCK3 DUOX1 ACADL S1PR3 CD9 MLPH NKX2-1 PCP4L1 ITGA8 CHIA FAM129A CXCR7 ST6GALNAC2 C16orf89 CRLF1 ACOXL LIFR
* * * * * * * * * * * *
* * *
c
Upregulated Day 36 SC+ vs. Day 22 FDR
* * *
* * *
* *
d
Day 22
Day 36 SCGB3A2-
Day 36 SCGB3A2+ CAV2 STX12 PAM SGMS1 ARFGAP3 RAB2A SEC24D STX7 VAMP4 RAB22A RAB5A BET1 ATP7A COPB2 GOLGA4 ATP6V1H VAMP3 VPS4B CLN5 VAMP7 SNX2 YIPF6 ARFIP1 RAB14 LMAN1 TMX1 LAMP2 TMED2 COPE DNM1L RPS6KA3 ADAM10 KRT18 DST
*
−1
0
1
Row Z−Score
Figure S3. Further microarray analysis of hPSC-derived SCGB3A2+ secretory cells. a) Heatmap showing all genes upregulated between day 36 SCGB3A2CherryPicker+ and SCGB3A2CherryPicker- populations with FDR≤0.05 and FC≥10. Values represent row-normalized gene expression (z-score) on days 15, 22, and 36. * = genes also upregulated (FDR≤0.05) between day 22 and day 36 SCGB3A2CherryPicker+. b-c) Table of Hallmark pathways upregulated (FDR≤0.05) between b) day 36 SCGB3A2CherryPicker+ and SCGB3A2CherryPicker- cells and c) day 36 SCGB3A2CherryPicker+ vs day 22 by gene set enrichment analysis. d) Heatmap showing normalized expression (zscore) at day 22 and 36 of genes in the HALLMARK_PROTEIN_SECRETION pathway ranked by fold change and meeting a cutoff of FDR≤0.05. Related to Figure 3.
b
Fluidigm − TP63 vs. SCGB3A2 − −0.37 corr
Fluidigm − SCGB3A2 vs. SCGB1A1 − 0.38 corr
4
6
8
10
4
6
8
● ●
●
● ● ●
●
●
●
●
●
● ● ●
●
cherry 8 green
● ●
●
● ●
●
●
●
● ●
● ● ●
C1
●
4C2
● ●
●
●
●
● ●
●
● ●
●
● ●
●
●
C3
cherry
●
green
●
● ● ●
Sort Marker:
●
● ● ● ● ● ● ●●
●
condition ●
●
●
6
group
●
● ●
●
9 condition
●
● ●
●
group
●
●
●
2.5
●
● ●
normalized_expression
●
●
●
●
condition ●●●
normalized_expression
normalized_expression
●
●
● ● ● ●●
●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ●
● ●
●
●
8
● ●
● ●
●
8
LAMP3
●
● ● ●
5.0
6
12
●
●
4
TP63 log expression
● ● ● ●
● ●● ●● ● ● ● ● ●
2
NAPSA
● ●
● ●
0
TP63 log expression
LPCAT1 10.0
10
12 2
SCGB3A2 log expression
d
6 2
0
12
4
mCherry log expression
10 8 6 2 0
2
Fluidigm − TP63 vs. mCherry − −0.37 corr
Corr = -0.37
4
SCGB3A2 log expression
12 4
6
8
10
SCGB1A1 log expression
2 0 0
7.5
c
Corr = -0.37 12
Corr = 0.38
0
a
● ●
●
●
group Group:
●
C1
●
C1
C2
●
C2
●
3 C3
●
C3
●
●
e
●●●
●
● ●
0
●●●●●●●
●●
●● ● ●
●
●
●●●●●●●
green
●
●
●
cherry
●
●
●
0.0
SCGB3A2CP+ SCGB3A2CP-
●
0
●
●● ● ● ●● ● ●
●
●
●●●●●●
●
C1: Secretory lung C2: Non-secretory airway C3: Non-lung
●●●●●●
10 5 0
SFTPC log expression
Corr = 0.56
0
2
4
6
8
10
12
SCGB3A2 log expression
Figure S4. Single cell mRNA sequencing reveals subsets of proximalized cells and co-expression of proximal airway and type 2 cell marker genes. a-c) Correlation plots comparing log2 gene expression between a) SCGB3A2 vs. SCGB1A1, b) TP63 vs. SCGB3A2, c) TP63 vs. mCherry. d) Violin plots showing normalized expression for indicated genes across cell clusters. Point color indicates sort marker used (red = CherryPicker+ (SC+); green = CherryPicker- (SC-)); violin color indicates assigned cell cluster (red = C1; green = C2; blue = C3). e) Correlation plots comparing log2 gene expression between SCGB3A2 vs. SFTPC. Related to Figure 5.
a
C7 C6 C5 C4 C3 C2 C1
b
MKI67
TOP2A
Normalized expression 0.0 0.25 0.5 0.75 1.0
Distal lung
c
Cell cycle genes removed
Cell cycle genes included Hepatic (C4)
Hepatic (C4) 0
● ● ● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ●● ●●●●●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ●●● ●●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ●● ●●●●● ●●●● ● ●● ● ● ●● ●●●● ●
Basal (C5)
Gastric (C3)
Secretory airway (C6) AEC2 (C2)
AEC2-like (C7)
AEC2 (mitotic, C1)
C1 (170 cells) C2 (311 cells) C3 (131 cells) C4 (234 cells) C5 (284 cells) C6 (155 cells) C7 (107 cells)
−10
−20
Gastric (C3)
Secretory airway (C6)
●● ● ●
−30
● ● ● ● ● ●●
●
AEC2 (C2)
−40
−50
●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●●● ●● ●●● ●●● ● ● ●● ● ●● ●●● ● ●● ●● ●●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●
●●
AEC2 (mitotic, C1)
● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ●●●● ●●●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ●●● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●●● ● ● ●●● ● ●● ●●● ● ●●● ●●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●●●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●
ID
C1 (272 cells) C2 (218 cells) C3 (130 cells) C4 (233 cells) C5 (285 cells) C6 (165 cells) C7 (89 cells)
●
AEC2-like (C7) −40
d
● ● ● ● ●● ● ● ●● ● ●● ● ●
● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●●● ● ● ●● ●●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ●●● ●● ● ● ● ● ●● ●● ● ●● ●
●● ● ●● ● ● ● ● ●● ●● ● ●●● ● ●● ● ●● ● ●● ●● ●●● ●● ● ●● ●●●● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ●●● ●● ●●● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ●● ●● ● ● ●●●● ● ●● ● ●● ● ●● ● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ●●● ●● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●●● ● ●●● ● ●●●● ● ● ● ●●●● ● ●● ●
Basal (C5)
−20
0
Secretory airway
Gastric
Cell cycle genes removed
Normalized expression
e
0.0 0.25 0.5 0.75 1.0
Proximal only: SCGB3A2 vs. SCGB1A1- 0.31 correlation SCGB1A1 log expression
SFTPC log expression
Basal cells
SCGB3A2 vs. SFTPC - 0.09 correlation
SCGB3A2 log expression
Hepatic
SCGB3A2 log expression
Figure S5. Clustering of single cell transcriptomes of hPSC-derived airway and alveolar cells. a) Heatmap of average log2 values for the top 20 genes expressed highly in each indicated cell cluster relative to the entire dataset.
Figure S5, continued. b) tSNE of normalized Log2 per-cell gene expression of indicated genes c) tSNE plot with (left) and without (right) cell cycle genes, with assigned clusters indicated. d) tSNE of normalized expression of indicated genes after analysis without cell cycle genes e) Correlation plots, shown only for cells differentiated in proximal media, indicate log2 gene expression comparing SCGB3A2 vs. SFTPC (left) and SCGB3A2 vs. SCGB1A1 (right). Related to Figure 6.
a
With cell cycle genes
b
Without cell cycle genes
Figure S6. Detailed heatmap of differentially expressed genes in 7 cell clusters. Top 50 genes by fold change with p < 5x10-4 by negative binomial exact test in each cluster are shown, as calculated a) with cell cycle genes and b) without cell cycle genes. Related to Figure 6.
Day 15 Sort CD47hi/CD26lo
Day 21 2+10+DCI 2+10+DCI 2+10+DCI
Day 25
+CHIR 2+10+DCI 2+10+DCI
Day 35 +CHIR +CHIR 2+10+DCI Analyze SFTPCtdTomato
NKX2-1+ Lung Epithelium
-CHIR
+CHIR D21-35
+CHIR D25-35
Figure S7. Upregulation of SFTPCtdTomato in response to Wnt activation post-proximalization. Representative phase contrast and tdTomato images showing the expression of tdTomato after addition of CHIR for 0 (left), 14 (middle), or 10 (right) days after 20 (left), 6 (middle), or 10 (right) days of differentiation post-lung specification in proximal airway medium, as indicated in the schematic. Scale = 100 µm. Related to Figure 7.
Supplemental Tables Table S1. Table of all p-values obtained by ANOVA with Tukey’s multiple comparisons test for Figures 2h, 4b, and 7g. Related to Figures 2, 4, and 7. Table S2. Searchable table of analyzed microarray results with statistical analysis by one-way ANOVA comparing all 5 array samples and moderated pairwise Student’s t-tests to compare each pair of individual samples for each gene. Related to Figure 3. Table S3. Table of all genes upregulated (by ANOVA with FDR-adjusted p < 0.05; variance > 3) in clusters from analysis in Figure 5. This list of genes is represented in the heatmap in Figure 5c. Related to Figure 5. Table S4. Searchable table of analyzed single cell RNA-Seq data showing genes differentially expressed (FDRadjusted p 7.5. The Bioanalyzer trace from one sample was different from all other samples, and this sample was excluded from further analysis. Global gene expression was analyzed using Affymetrix Gene Chip Human Gene 2.0 ST arrays (ThermoFisher). Array results were normalized using the Robust Multiarray Average (RMA) algorithm. From these results, the Area Under the Receiver Operating Characteristics Curve (AUC) was computed. The AUC was >0.8 for all samples except for two, and these samples were excluded from downstream analysis and the remaining 12 samples were re-normalized using RMA. Principle Component Analysis (PCA) was performed on all samples passing quality control across all sampled genes (23786 probesets). To determine genes differentially expressed across all biological groups, a moderated one-way analysis of variance (ANOVA) was used for each gene to measure the statistical differences between samples. Benjamini-Hochberg False Discovery Rate (FDR) correction was applied to these values to generate FDR adjusted p-values (hereafter FDR), which were used for all subsequent determinations of significance (with all statistical values of fold change and p-values provided in Table S2 under the column labeled “one-way ANOVA”). To allow head-to-head comparisons between each group indicated in Table S2, moderated pairwise Student’s t-tests were used to compare each gene across each of the two indicated experimental groups. FDR correction was then applied to these values. Preranked GSEA analysis was performed using MSigDB v5.1 hallmark gene sets (http://software.broadinstitute.org/gsea/index.jsp(Mootha et al., 2003) to compare between indicated samples. Moderated t-statistics for each pairwise comparison were used for preranking of genes. Statistics, including heatmaps, were generated using the R environment for statistical computing. Heatmaps were generated from normalized log2 values for genes meeting the indicated fold change and significance (FDR adjusted p-value) cutoffs. Single Cell Analysis using Fluidigm Technology Cells were dissociated from spheres at day 27 of differentiation and sorted on viability (calcein blue+) and CherryPicker expression, as described above. To distinguish from CherryPicker+ cells by microscopy, CherryPicker- cells were stained with CellTracker Green CMFDA (ThermoFisher) and re-sorted for dye uptake into CherryPicker+ sorted cells at a ratio of 1:2. Following capturing on a 96-well Fluidigm C1 Machine (Fluidigm), 70 cells were captured (out of 96 possible wells) and each captured cell was identified as CherryPicker+ (CellTracker
Green-) or CherryPicker- (CellTracker Green+) for later analysis. These cells were then lysed and RNA was converted to cDNA and amplified according to the detailed Fluidigm (South San Francisco, CA) protocol (“Using C1 to Generate Single-Cell cDNA Libraries for mRNA Sequencing”, Fluidigm, PN 100-7168). Cells were barcoded and a sequencing library prepared using the Illumina Nextera XT DNA kit and preparation protocol from Fluidigm. cDNA concentration was evaluated using Quant-iT PicoGreen dsDNA Assay (ThermoFisher) on a Tecan (Männedorf, Switzerland) Infinite M1000 Microplate Reader. Sequencing was performed on one pooled, barcoded sample with 75 base pair paired-end reads (150 cycles) with 130 million total reads in one lane of a flow cell using an Illumina NextSeq 500 (San Diego, CA). The total reads per cell was 2.3x106. Fluidigm Bioinformatic Analysis Reads were aligned to the human genome (GRCh38) and quantified using the STAR aligner (Dobin et al., 2013). Cells were deemed to fail quality control metrics when they deviated by at least 3 Median Absolute Deviations in terms of a) total number of aligned reads, b) percentage of reads aligned to mitochondrial genes or c) number of genes with at least one read count. Approximately, 10% of cells failed these metrics and were removed from the analysis. Read counts were normalized using pool-based scaling factors and deconvolution (Lun et al., 2016) via the Scater bioconductor package (https://academic.oup.com/bioinformatics/article/33/8/1179/2907823/Scater-preprocessing-quality-control). Dimensionality reduction (Figure 5b) was performed by applying a zero-inflated negative binomial model (ZINBWaVE) (Risso et al., 2017) to the log-normalized expression of the top 1000 most variable genes. This output was clustered using the k-means algorithm. The Fluidigm heatmap (Figure 5c) was generated by applying Ward's hierarchical agglomerative clustering method (Murtagh and Legendre, 2014) on the row-scaled expression values of selected genes. Genes were selected by applying an ANOVA and filtering for FDR-corrected p-value < 0.05 and variance > 3. 10X Single Cell Analysis of Differentiated Airway and Alveolar Spheres BU3 NGST was differentiated to NKX2-1GFP+ lung progenitors, and replated to form spheres in airway or alveolar conditions, as described above. Airway and alveolar spheres were passaged on day 29 and day 26, respectively. For passaging, spheres were dissociated with 2 mg/mL dispase for 1 hour at 37°C, then dissociated to single cells with 0.05% Trypsin for 10 minutes at 37°C. Trypsin was stopped by addition of media containing 10% FBS and cells were washed two times with DMEM by spinning at 300xg for 5 minutes at 4°C. Cell pellets were resuspended in growth factor reduced Matrigel at a concentration of 500-1000 cells/mL. Pellets were solidified for 15-30 minutes at 37°C and re-fed with airway or alveolar culture medium, based on initial condition. Cells were further differentiated to day 41 and dissociated for sorting as described previously. DAPI was used for live/dead exclusion. Cells were sorted as described previously for viability (DAPI-) and resuspended in sort buffer at a concentration of 200 to 700 cells/mL. Single cells were captured for sequencing library prepration using a 10X Chromium (10X Genomics, Pleasanton, CA) instrument at the Cedars-Sinai Genomics Core. Single-cell RNA-seq libraries were prepared according to the Single Cell 3’ v2 Reagent Kits User Guide (10X Genomics). Cellular suspensions were loaded on a Chromium Controller instrument (10X Genomics) to generate single-cell Gel Bead-In-EMulsions (GEMs). Reverse transcription (GEM-RT) was performed in a Veriti 96-well thermal cycler (ThermoFisher). After RT, GEMs were harvested and the cDNAs were amplified and cleaned with SPRIselect Reagent Kit (Beckman Coulter). Indexed sequencing libraries were constructed using the Chromium Single-Cell 3’ Library Kit (10X Genomics) for enzymatic fragmentation, end-repair, A-tailing, adapter ligation, ligation cleanup, sample index PCR, and PCR cleanup. The barcoded sequencing libraries were quantified by quantitative PCR using the KAPA Library Quantification Kit (KAPA Biosystems, Wilmington, MA). Sequencing libraries were loaded on a NextSeq500 (Illumina) with a custom sequencing setting (26bp for Read 1 and 98bp for Read 2), to obtain a sequencing depth of ~200K reads per cell. 714 distal and 742 proximal cells were captured and sequenced. For the distal cells, 259,583 mean reads per cell were obtained corresponding to 25,315 median UMI counts and 3,968 median genes expressed per cell. For the proximal cells, 279,812 mean reads per cell were obtained corresponding to 36,969 median UMI counts and 5,217 median genes expressed per cell. 10X Bioinformatic Analysis Reads were aligned to the human genome (hg19) and quantified using the STAR aligner via the Cell Ranger SingleCell Software Suite (Dobin et al., 2013). Cells were deemed to fail quality control metrics when they deviated by at
least 3 Median Absolute Deviations in terms of a) total number of aligned reads, b) percentage of reads aligned to mitochondrial genes or c) number of genes with at least one read count. Approximately, 4% of cells failed these metrics and were removed from the analysis, leaving 675 alveolar and 717 proximal cells. UMI counts were normalized by adjusting for library size and dividing by the median count of each cell, as implemented by the Cell Ranger R kit (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/rkit). Dimensionality reduction (Figure 6b) was performed by applying Barnes-Hut t-Distributed Stochastic Neighbor Embedding (Van der Maaten, https://arxiv.org/abs/1301.3342) (t-SNE) to the log-normalized data using the first 10 principal components and a perplexity of 30. This output was clustered using the k-means algorithm. A range of k values (from 5 to 13) was tried and k=7 was selected, because this value separated the mitotic and the non-mitotic subpopulations of each cell type into different clusters (lower k values tended to combine these two groups in the same cluster), and higher k values tended to split mitotic cells into subclusters). The heatmap (Figure 6e, Figure S6) was generated by applying Ward's hierarchical agglomerative clustering method on the row-scaled expression values of selected genes. Genes were selected by applying a negative binomial exact test for each cluster against all others, applying a stringent p-value threshold (FDR < 5x10-4), and selecting the top 50 genes for each cluster. All differentially expressed genes meeting a less stringent threshold of FDR < 0.1 are listed in supplemental table S4 along with their p values and log2 fold expression changes across clusters. Western Blotting Cultured cells were treated with lysis buffer (RIPA buffer (ThermoFisher) plus 1x Complete Protease Inhibitor Cocktail (Sigma Aldrich). Buffer-treated cells were incubated for 30 minutes on ice and cleared by centrifugation at 15,000xg for 20 minutes. Protein was measured in supernatants using the Bio-Rad (Hercules, CA) DC Protein Assay. A total of 50 µg of lysate from airway spheres, 50 µg of lysate from purified SFTPCtdTomato+ hPSC-derived iAEC2s, and 25 µg of lysate from alveolar epithelial cells isolated from lung explants of 21 week human lung and cultured for 6 days in DCI medium were resolved on pre-cast 10% NUPAGE gels (ThermoFisher) and transferred to PVDF membrane (Bio-Rad). Blots were incubated with the following primary antisera: surfactant protein B (rabbit polyclonal antibody against mature bovine SP-B, 1:3000) (Beers et al., 1992); NFLANK (rabbit polyclonal antibody against a synthetic peptide of Gln186-Gln200 of the human Pro-SPB amino acid sequence, dilution 1:2000) (Korimilli et al., 2000); GAPDH (1:5000, EMD Millipore). Species-specific secondary antisera were all conjugated to IR dyes of either 680 or 800 nm wavelengths (Rockland, Limerick, PA) at a dilution of 1:10000. Visualization was accomplished using the Odyssey Imaging System (LiCOR Biosciences, Lincoln, NE). Electron Microscopy Whole differentiated spheres were harvested with 2 mg/mL dispase and fixed and processed for electron microscopy into plastic embedded samples as described previously (Jacob et al., 2017). 70nM thin sections on grids were imaged on a Philips CM12 Transmission Electron Microscope.
References Beers, M.F., Bates, S.R., and Fisher, A.B. (1992). Differential extraction for the rapid purification of bovine surfactant protein B. Am. J. Physiol. 262, L773–L778. Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. Herriges M, Swarr D, Morley MP, Peng T, Stewart KM, Morrisey EE. (2014). Long noncoding RNAs are spatially correlated with transcription factors and regulate lung development. Genes and Development, 28:1363-1379. Korimilli, A., Gonzales, L.W., and Guttentag, S.H. (2000). Intracellular localization of processing events in human surfactant protein B biosynthesis. J. Biol. Chem. 275, 8672–8679. Kurmann, A.A., Serra, M., Hawkins, F., Rankin, S.A., Mori, M., Astapova, I., Ullas, S., Lin, S., Bilodeau, M., Rossant, J., et al. (2015). Regeneration of Thyroid Function by Transplantation of Differentiated Pluripotent Stem Cells. Cell Stem Cell 17, 527–542. Longmire, T.A., Ikonomou, L., Hawkins, F., Christodoulou, C., Cao, Y., Jean, J.C., Kwok, L.W., Mou, H., Rajagopal, J., Shen, S.S., et al. (2012). Efficient derivation of purified lung and thyroid progenitors from embryonic stem cells. Cell Stem Cell 10, 398–411. Lun, A.T.L., Bach, K., and Marioni, J.C. (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75. Madisen, L., Zwingman, T.A., Sunkin, S.M., Oh, S.W., Zariwala, H.A., Gu, H., Ng, L.L., Palmiter, R.D., Hawrylycz, M.J., Jones, A.R., et al. (2010). A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat. Neurosci. 13, 133–140. Mootha, V.K., Lindgren, C.M., Eriksson, K.-F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstråle, M., Laurila, E., et al. (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273. Murtagh, F., and Legendre, P. (2014). Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? Journal of Classification 31, 274–295. Ran, F.A., Hsu, P.D., Wright, J., Agarwala, V., Scott, D.A., and Zhang, F. (2013). Genome engineering using the CRISPR-Cas9 system. Nature Protocols 8, 2281–2308. Rawlins, E.L., Okubo, T., Xue, Y., Brass, D.M., Auten, R.L., Hasegawa, H., Wang, F., and Hogan, B.L.M. (2009). The role of Scgb1a1+ Clara cells in the long-term maintenance and repair of lung airway, but not alveolar, epithelium. Cell Stem Cell 4, 525–534. Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S., and Vert, J.-P. (2017). ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data. bioRxiv 125112. Swarr DT, Peranteau W, Pogoriler J, Frank DB, Adzick S, Hedrick H, Morley MP, Zhou S, Morrisey EE (2018). Novel Molecular and Phenotypic Insights into Congenital Pulmonary Airway Malformations. American Journal of Respiratory and Critical Care Medicine, in press.