Document not found! Please try again

ClustOfVar: an R package for the clustering of variables - R Project

19 downloads 205 Views 412KB Size Report
Likelihood Linkage Analysis (Lerman, 1987). Qualitative variable clustering (Abdallah and Saporta, 2001). Specific metho
Outline

ClustOfVar: an R package for the clustering of variables Marie Chavent & Vanessa Kuentz & Benoˆıt Liquet & J´erˆ ome Saracco IMB, University of Bordeaux, France INRIA Bordeaux Sud-Ouest, CQFD Team CEMAGREF, UR ADBX, Bordeaux, France ISPED, University of Bordeaux, France

The R User Conference 2011 University of Warwick, August 16-18 2011

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Outline

Outline

1

Introduction

2

The methods in ClustOfVar

3

Illustration on simple examples

4

Concluding remarks

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

Outline

1

Introduction

2

The methods in ClustOfVar

3

Illustration on simple examples

4

Concluding remarks

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

Introduction

Clustering of variables lumps together strongly related variables Usefulness for case studies, variable selection and dimension reduction A first approach: apply classical method dedicated to the clustering of observations

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

Introduction

Some specific methods: VARCLUS (SAS) Likelihood Linkage Analysis (Lerman, 1987) Qualitative variable clustering (Abdallah and Saporta, 2001) Specific methods based on PCA: CLV (Vigneau and Qannari, 2003) Diametrical clustering (Dhillon et al., 2003) → For quantitative variables

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

Introduction

The goal of the package ClustOfVar: Propose methods for the clustering of a mixture of quantitative and qualitative variables Also suitable for non mixed quantitative or qualitative )

0.8

● ●



0.6



0.4







0.2



0.0

mean adjusted Rand criterion

1.0

Stability of the partitions

2

3

4

5

6

7

8

9

number of clusters

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

First example: ”decathlon” data > part print(part) Call: cutreevar(obj = tree, k = 5) name description "$var" "list of variables in each cluster" "$sim" "similarity matrix in each cluster" "$cluster" "cluster memberships" "$wss" "within-cluster sum of squares" "$E" "gain in cohesion (in %)" "$size" "size of each cluster" "$scores" "score of each cluster"

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

First example: ”decathlon” data > summary(part) Call: cutreevar(obj = tree, k = 5) Cluster 1 : 100m Long.jump 400m 110m.hurdle

squared loading 0.68 0.69 0.67 0.64

... Gain in cohesion (in %):

65.33

UseR! 2011

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

First example: ”decathlon” data

> part$scores # synthetic variables

SEBRLE CLAY KARPOV BERNARD YURKOV WARNERS ...

cluster1 0.26 1.38 1.11 -0.19 -2.03 1.14

cluster2 -0.72 -0.25 -1.41 1.12 -1.62 0.67

UseR! 2011

cluster3 0.94 0.57 0.57 2.03 -0.15 0.57

cluster4 1.02 0.38 -1.68 0.93 1.07 -1.37

cluster5 1.10 1.95 1.84 0.09 -0.23 -0.08

ClustOfVar: an R package for the clustering of variables

Introduction The methods in ClustOfVar Illustration on simple examples Concluding remarks

Second example: ”wine” data > data(wine) #data of the package FactoMineR > head(wine[,c(1:4)]) Label Soil Odor.Intensity 2EL Saumur Env1 3.07 1CHA Saumur Env1 2.96 1FON Bourgueuil Env1 2.85 1VAU Chinon Env2 2.80 1DAM Saumur Reference 3.60 2BOU Bourgueuil Reference 2.85 > > > >

Aroma.quality 3.00 2.82 2.92 2.59 3.42 3.11

X.quanti