PINSPlus: Clustering Algorithm for Data Integration and ... - CRAN-R

Recommend Documents

Algorithm for data clustering in pattern recognition

Jan 7, 2002 - Algorithm for Data Clustering in Pattern Recognition Problems Based on Quantum Mechanics ... and Astronomy, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, ... As an introduction to our approach we start with

A clustering algorithm for multivariate data ... - Journal of Big Data

analysis of video frames for objects recognition, recognition of human movements .... section we face the problem of the estimate of the covariance matrix of each cluster. ..... In our algorithm we will actually use expression (12) for the computatio

A Supervised Clustering and Classification Algorithm for Mining Data ...

AbstractâThis paper presents a data mining algorithm based on supervised ... data mining algorithms must be capable of handling mixed data types. ...... [12] S. Guha, R. Rastogi, and K. Shim, âCURE: An efficient clustering algorithm for large .

An Energy-Efficient Re-Clustering and Data Reporting Algorithm for

compared to LEACH algorithm. The performance is evaluated by considering the impact of energy balancing among the nodes and network lifetime efficiency in ...

Fast Density Clustering Algorithm for Numerical Data and Categorical

Jan 15, 2017 - Fast Density Clustering Algorithm for Numerical Data and. Categorical Data .... expertise. Besides the above-mentioned unsupervised similarity.

A Supervised Clustering and Classification Algorithm for Mining Data ...

A Supervised Clustering and Classification. Algorithm for Mining Data With. Mixed Variables. Xiangyang Li, Member, IEEE, and Nong Ye, Senior Member, IEEE.

Heterogeneous Data Integration with the Consensus Clustering ...

[email protected]. Abstract. Meaningfully integrating massive multi-experimental genomic data sets is becoming critical for the understanding of gene ...

Improving Categorical Data Clustering Algorithm by ... - CiteSeerX

categorical data clustering by giving greater weight to uncommon attribute ... Most previous clustering algorithms focus on numerical data whose inherent ... each attribute value, the uncommon value matches in similarity computation will make ... Def

Fuzzy based clustering algorithm for privacy preserving data mining ...

algorithms over three different datasets are used in the experiments to show the effectiveness of the approach. Keywords: data mining; privacy preserving data ...

Clustering Algorithm for Spatial Data Mining: An Overview - CiteSeerX

Clustering Algorithm for Spatial Data Mining: An. Overview. A.Padmapriya,. M.C.A.,M.Phil.,Ph.D. Department of Computer Science and Engineering. Alagappa ...

Energy Efficient Clustering Algorithm for Data ... - Semantic Scholar

sensor nodes and even premature death of entire network. In this paper, exploiting ... devices using sensors to cooperatively monitor physical or environmental .... to join a cluster based on cluster head sensor Reading (CR) and My local.

Efficient parallel spectral clustering algorithm design for large data ...

According to the experiment, with the processing data scale being enlarged, the clustering ... Large dataSpectral clustering algorithmClustering analysisParallel ...

An Algorithm for Clustering Heterogeneous Data ... - Semantic Scholar

Aug 3, 2017 - categorical data and it could also handle heterogeneous values as well ... experimental analysis and results over different datasets ... The list of.

An Entropy-Based Subspace Clustering Algorithm for Categorical Data

Keywords-clustering; subspace clustering; categorical data; attribute ...... data clustering:a correlation-based approach for unsupervised attribute weight-.

A Quantum-inspired Genetic Algorithm for Data Clustering - CiteSeerX

Mar 21, 2009 - K-Means clustering algorithm based on Quantum-inspired. Genetic Algorithm ..... machine learning repository [25] are carried out to compare.

Functional Clustering Algorithm for High-Dimensional Proteomics Data

Eastern Virginia Medical School, Norfolk, VA 23507, USA. Received 9 September 2004; revised 10 ... [email protected]. This is an open access article ...

Algorithm for data clustering in pattern recognition ... - Semantic Scholar

Jan 7, 2002 - estimator [3] of the probability distribution leading to the ... in Eq. (3) is positive definite, it follows that ... the true classification is quite amazing.

clusiVAT: A Mixed Visual/Numerical Clustering Algorithm for Big Data

labels to the rest of the big data with the nearest prototype rule. Numerical ... limitations of working memory on a computational platform. So, what is the volume of ... But to date, there has been no comparison of clusiVAT to SL clusters built with

A data set oriented approach for clustering algorithm ... - CiteSeerX

Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu. "A Density-Based Algorithm for. Discovering Clusters in Large Spatial Databases with Noise", ...

A GA-Based Clustering Algorithm for Large Data ... - Semantic Scholar

A GA-Based Clustering Algorithm for Large Data Sets With Mixed. Numeric and Categorical Values. LI Jie, GAO Xinbo, JIAO Li-cheng. National Key Lab. of ...

An efficient algorithm for clustering categorical data - Journal of ...

Clusteringï¼ CategoriCal dataï¼ data streamï¼ data mining ... rithms, such as ROCK[1], C2P[2], DBSCAN[3], BIRTH[4], CURE[5], CHAMELEON[6], WaveCluster[7] ... Most previous clustering algorithms focus on numerical data whose inherent ...

Mean Shift Clustering Algorithm for Data with Missing Values

[email protected] & [email protected]. Abstract. Missing values in data are common in real world applications. There are several methods that deal ...

A Spectral Based Clustering Algorithm for Categorical Data with

Clustering is a method of unsupervised learning allowing the assignment of a ... developed for clustering categorical data, e.g, 1998; Huang [13], 1998; Ganti et.

A Variant of Genetic Algorithm Based Categorical Data Clustering for

Clustering techniques have been widely used in machine learning, pattern ... for categorical data clustering using GA for better partition in less space. It is hard to ...

PINSPlus: Clustering Algorithm for Data Integration and ... - CRAN-R

Download PDF

5 downloads 0 Views 320KB Size Report

Comment

May 2, 2018 - Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. The approach can accurately identify known ...

PINSPlus: Clustering Algorithm for Data Integration and Disease Subtyping Hung Nguyen, Sangam Shrestha, and Tin Nguyen∗ Department of Computer Science and Engineering University of Nevada, Reno, NV 89557 2018-05-02 Abstract PINS+ provides a robust approach for data integration and disease subtyping. It allows for unsupervised clustering using multi-omics data. The method automatically determines the optimal number of clusters and then partitions the samples in a way such that the results are robust to noise and data perturbation. PINS+ has been validated on thousands of cancer samples obtained from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. The approach can accurately identify known subtypes and discover novel groups of patients with significantly different survival profiles. The software is extremely fast and able to cluster hundreds of matched samples in minutes.

Contents Introduction

2

PerturbationClustering

2

SubtypingOmicsData

6

References

9

1

Introduction In a recent paper published in Genome Research, Nguyen et al. [1] proposed a robust approach for multi-omics data integration and disease subtyping called PINS. The framework was tested upon many datasets obtained from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. In the analysis, PINS outperforms state-of-the-art clustering methods like Similarity Network Fusion (SNF) [2], Consensus Clustering (CC) [3], and iClusterPlus [4] in identifying known subtypes and in discovering novel groups of patients with significantly different survival profiles. Please consult Nguyen et al. [1], [5] for the mathematical description. PINS+ offers many improvements of PINS from practical perspectives. One outstanding feature is that the package is extremely fast. For example, it takes PINS+ only five minutes using a single core to analyze the Glioblastoma dataset (273 patients with three data types, mRNA, miRNA, and methylation) while it takes PINS 175 minutes (almost three hours) to analyze the same dataset (see Supplemental Table S14 in Nguyen et al. [1] for running time of PINS). PINS+ also allows for parallelization on any of the three platforms: Windows, Linux, and Mac OS. In addition, PINS+ provides users with more flexibility, including customized basic clustering algorithms, distance metrics, noise levels, subsampling, data perturbation, etc. This document provides a tutorial on how to use the PINS+ package. PINS+ is designed to be convenient for users and uses two main functions: PerturbationClustering and SubtypingOmicsData. PerturbationClustering allows users to cluster a single data type while SubtypingOmicsData allows users to cluster multiple types of data.

PerturbationClustering The PerturbationClustering function automatically determines the optimal number of clusters and the membership of each item (patient or sample) from a single data type in an unsupervised analysis. Preparing data The input of the function PerturbationClustering is a numerical matrix or data frame in which the rows represent items while the columns represent features. Load example data AML2004 library(PINSPlus) data(AML2004)

Run PerturbationClustering Run PerturbationClustering with default parameters system.time(result