Model-based unsupervised learning informs ...

10 downloads 0 Views 891KB Size Report
In order to determine if these clusters were recognizable using a distance-based method, we chose two hierarchical clustering methods that use Euclidean.
www.impactjournals.com/oncotarget/

Oncotarget Supplementary Materials

Model-based unsupervised learning informs metformininduced cell-migration inhibition through an AMPK-independent mechanism in breast cancer Supplementary Materials Supplementary Section 1 Cluster validation using distance-based hierarchical methods In order to determine if these clusters were recognizable using a distance-based method, we chose two hierarchical clustering methods that use Euclidean distances to find the clusters in metformin-treated cells. Agglomerative hierarchical clustering takes a bottom-up approach by iteratively joining the closest cells; divisive hierarchical clustering takes a top-down approach by first forming clusters with maximized intercluster distance and breaking them down until intercluster distance is minimized. We tested baseline and metformin-treated cells with both agglomerative hierarchical clustering and

divisive hierarchical clustering methods, each of which uses a distance matrix of pairwise Euclidean distances between cells as input. Both hierarchical methods found three clusters in metformin-treated cells, with one cluster comprising the same cells as M2, and with the difference in other clusters being that one cell from M1 was added to M3, with the rest remaining the same. Thus, we were able to establish consistency among the clustering results using different approaches that measured statistical distances between cells (where MiMoSA uses distribution-based distances and hierarchical clustering uses Euclidean distances). Because of the consistent clustering results and high statistical significance in the expression levels of the 230 genes in cluster M2 in comparison with all other clusters, we chose this set of 230 genes for further analyses.

Supplementary Figure 1: The probability density function (PDF) of gene expression within a cell, and the model-fit for the expressions under a Gaussian mixture model with exponential distributions. From a statistical perspective, we tested the samples drawn from the two distributions using the Mann-Whitney U-test and Kolmogorov-Smirnov test (KS-test), and the null hypothesis (samples drawn from the identical distribution) was accepted with p-value of 0.74 (0.5869 for KS-test) being greater than significance level of 0.05.

Supplementary Table 1: List of pathways significantly enriched in the 230-gene set. See Supplementary_Table_1

Supplementary Table 2: The 230 differentially expressed genes (bolded and highlighted are 24 genes with less literature evidence in metformin response) ATP5F1 ATP6V0B C1orf31 CCT3 CDC42 CNIH4 DDAH1 DNTTIP2 GAS5 NDUFS5 NME7 PARK7 PPT1 RHOC RPS8 SDHB SSR2 TAGLN2 UBE2T YBX1 ZBTB8OS ZMYM6NB CCT4 COX5B COX7A2L DPY30 FARSB HAT1 LINC00152 MRPL33 NCL OST4 PTRHD1 RNF181 RRM2 SF3B14 SSB TXNDC9 VAMP8 YWHAQ ARPC4 C3orf78 CCDC72 IMPDH2 LSM3 POLR2H

PSMD6 RPL35A UBA3 UQCRC1 DANCR H2AFZ MRPS18C PAICS RPL34 ARRDC3 BRIX1 CDK7 CSNK1A1 MRPL36 MTRNR2L2 NDUFS4 RPL37 TAF9 VDAC1 CENPW CLIC1 COX7A2 ECI2 EEF1E1 GTF3C6 HIST1H1C MNF1 MRPL18 PAK1IP1 RPL10A RPL7L1 RPS10 TMEM14C ARPC1A CHCHD2 CHCHD3 MRPS17 NAMPT SBDS SHFM1 DCAF13 LACTB2 MRPL13 MRPS28 NDUFB9 PABPC1

PBK SNHG6 UBE2V2 VDAC3 EDF1 TOMM5 ATP5C1 GDI2 GSTO1 NDUFB8 SFTA1P VDAC2 ZWINT ATP5L BANF1 C11orf10 CFL1 CLNS1A CWC15 FAU FTH1 GSTP1 MRPL48 POLR2G POLR2L PPME1 SLC35F2 TALDO1 TIMM10 TMEM179B TMX2 TRMT112 ARPC3 ATP5B CCT2 CD63 COX14 DYNLL1 EMG1 FAM216A GLTP NDUFA12 PTGES3 RPS26 SLC25A3 TMEM106C

TRIAP1 VPS29 PCID2 SAP18 TPT1 C14orf2 CINP DAD1 JKAMP NEDD8 NPC2 PSMA6 PSMC1 RPL36AL SLIRP C15orf23 ETFA SCG5 SRP14 TMEM85 APRT ARL6IP1 COX4I1 MT1E MT2A NQO1 NUTF2 RPS15A RPS2 TCEB2 TNFRSF12A UQCRC2 ANAPC11 ATP5H COPS3 H3F3B KPNA2 LOC100507246 LSM12 MRPS7 NME1 PFN1 PSMB6 SKA2 SUMO2 TUBG1

ATP5A1 IER3IP1 NDUFV2 PMAIP1 RPL17 TXNL1 AP2S1 BST2 C19orf53 COX6B1 EMP3 FXYD5 GPI NDUFA13 NUDT19 PDCD5 PSMD8 RPL36 RPS28 SNRPD2 TECR UBL5 UCA1 AURKA DYNLRB1 FKBP1A RIN2 ROMO1 SRSF6 YWHAB CSTB EIF3D EIF3L NDUFA6 RBX1 TOMM22 TXN2 UQCR10 ACOT9 ATP6AP2 EBP NDUFA1 PRPS2 RPL10 RPS4X TIMP1