Visualizing Cluster Results Using Package FlexClust and ... - R Project

27 downloads 223 Views 1MB Size Report
Theresa Scharl, Ingo Voglhuber (Vienna University of Technology) .... 8 Volunteer Clusters mind.off recognition network benefited good.job children services.
Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes, 10.7.2009

Acknowledgements & Apology • Sara Dolnicar (University of Wollongong)

• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)

• Paul Murrell, Deepayan Sarkar (R Core)

Friedrich Leisch: Cluster Visualization, 10.7.2009

1

Acknowledgements & Apology • Sara Dolnicar (University of Wollongong)

• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)

• Paul Murrell, Deepayan Sarkar (R Core)

Apology: Microarray data only in Theresa’s talk (try time-shift back to Wednesday).

Friedrich Leisch: Cluster Visualization, 10.7.2009

1

Partitioning Clustering – KCCA K-centroid cluster algorithms: • data set XN = {x1, . . . , xN }, set of centroids CK = {c1, . . . , cK } • distance measure d(x, y) • centroid c closest to x: c(x) = argmin d(x, c) c∈CK

• Most KCCA algorithms try to find a set of centroids CK for fixed K such that the average distance N 1 X d(xn, c(xn)) → min, D(XN , CK ) = CK N n=1

of each point to the closest centroid is minimal. • Optimization algorithm not important for rest of talk

Friedrich Leisch: Cluster Visualization, 10.7.2009

2

Example: Australian Volunteers Survey among 1415 Australian adults about which organziations they would consider to volunteer for, main motivation to volunteer, actual volunteering, image of organizations, . . . We use a block of 19 binary questions (“applies”, “does not apply”) about motivations to volunteer: “I want to meet people”, “I have no one else”, “I want to set an example”, . . . Organziations investigated: Red Cross, Surf Life Savers, Rural Fire Service, Parents Association, . . . Our analyses show that there is both competition between organizations with similar profiles as well as complimentary effects (idividuals volunteering for more than one arganization, in most cases with very different profiles).

Friedrich Leisch: Cluster Visualization, 10.7.2009

3

8 Volunteer Clusters meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off

Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 9.24 15.77 28.77 97.18 82.17 80.97 90.81 8.21 8.34 6.16 27.72 10.51 12.47 16.75 12.80 14.68 63.56 93.30 35.84 73.33 80.32 14.12 10.68 3.28 88.74 52.79 54.17 83.39 0.00 100.00 92.93 95.17 56.95 89.18 86.98 21.68 29.18 87.73 96.24 43.41 87.60 96.18 12.04 5.54 10.46 71.34 35.01 17.49 18.29 4.55 8.71 2.30 56.55 17.16 9.76 18.93 17.17 15.93 23.38 93.82 81.61 53.97 77.00 16.17 9.64 66.66 93.75 14.67 87.80 90.34 20.49 12.07 79.66 96.91 47.11 83.31 85.12 10.12 7.24 22.84 66.89 10.52 42.10 27.83 7.00 7.10 11.63 78.15 23.99 43.72 44.98 6.88 11.76 11.88 28.58 14.86 16.20 14.64 19.14 23.81 100.00 94.61 49.18 85.35 75.38 10.49 15.09 14.26 74.37 15.04 100.00 0.00 10.75 8.47 6.29 85.86 43.59 22.83 38.98 10.30 8.03 11.29 79.59 12.80 19.56 21.40 8.56 10.95 12.53 87.96 39.55 24.55 47.43

Friedrich Leisch: Cluster Visualization, 10.7.2009

Cl.8 34.23 7.23 45.30 6.35 86.12 89.77 11.54 0.75 23.98 77.21 79.82 19.47 14.64 8.31 0.00 12.68 10.28 3.49 4.31

Total 49.47 11.38 47.63 35.83 66.78 63.75 20.57 13.14 44.88 52.72 58.66 24.03 25.23 13.00 51.80 26.29 25.94 18.73 26.57

4

8 Volunteer Clusters meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off

Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 Cl.8 Total 9 16 29 97 82 81 91 34 49 8 8 6 28 11 12 17 7 11 13 15 64 93 36 73 80 45 48 14 11 3 89 53 54 83 6 36 0 100 93 95 57 89 87 86 67 22 29 88 96 43 88 96 90 64 12 6 10 71 35 17 18 12 21 5 9 2 57 17 10 19 1 13 17 16 23 94 82 54 77 24 45 16 10 67 94 15 88 90 77 53 20 12 80 97 47 83 85 80 59 10 7 23 67 11 42 28 19 24 7 7 12 78 24 44 45 15 25 7 12 12 29 15 16 15 8 13 19 24 100 95 49 85 75 0 52 10 15 14 74 15 100 0 13 26 11 8 6 86 44 23 39 10 26 10 8 11 80 13 20 21 3 19 9 11 13 88 40 25 47 4 27

Friedrich Leisch: Cluster Visualization, 10.7.2009

5

8 Volunteer Clusters 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

Cluster 1: 322 (23%)

Cluster 2: 140 (10%)

Cluster 3: 166 (12%)

Cluster 4: 136 (10%)

Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

Cluster 7: 178 (13%)

Cluster 8: 166 (12%)

1.0

meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off 0.0

0.2

0.4

0.6

0.8

1.0

Friedrich Leisch: Cluster Visualization, 10.7.2009

0.0

0.2

0.4

0.6

0.8

1.0

6

8 Volunteer Clusters We cannot easily test for differences between clusters, because they were constructed to be different.

Friedrich Leisch: Cluster Visualization, 10.7.2009

7

8 Volunteer Clusters We cannot easily test for differences between clusters, because they were constructed to be different. Improved presentation of results (following advice that is only around for a few decades): • Add reference lines/points • Sort variables by content: 1. sort clusters by mean 2. sort variables by hierarchical clustering • Highlight important points

Friedrich Leisch: Cluster Visualization, 10.7.2009

7

Add reference points 0.0 meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off

0.2

0.4

0.6

0.0

Cluster 2: 140 (10%)

Cluster 3: 166 (12%)







● ●









● ●







● ●

● ●

● ●



● ●



● ●









● ●



● ●



● ●

● ●



● ●







● ●



● ●













Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

Cluster 7: 178 (13%)









● ●



● ●

● ●



● ●











● ●

● ●

● ● ●

● ●







● ●



● ●

● ●

● ●

0.8



● ●

● ●



● ●

● ●



● ●

● ●

0.6

● ●





● ●





0.4





● ●

0.2

● ●





Cluster 8: 166 (12%)





1.0

Friedrich Leisch: Cluster Visualization, 10.7.2009

● ●

0.0

1.0

● ● ●



0.8



● ●



0.6





● ● ●

0.4







0.2

Cluster 4: 136 (10%)





0.0

1.0

Cluster 1: 322 (23%) ●

meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off

0.8

0.2



0.4

0.6

0.8

1.0

8

Sort Clusters 0.0 meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off

0.2

0.4

0.6

0.0

Cluster 2: 140 (10%)

Cluster 3: 136 (10%)







● ●









● ●







● ●

● ●

● ●



● ●



● ●









● ●



● ●



● ●

● ●



● ●







● ●



● ●













Cluster 5: 160 (11%)

Cluster 6: 147 (10%)

Cluster 7: 178 (13%)









● ●



● ●

● ●



● ●











● ●

● ●

● ● ●

● ●







● ●



● ●

● ●

● ●

0.8



● ●

● ●



● ●

● ●



● ●

● ●

0.6

● ●





● ●





0.4





● ●

0.2

● ●





Cluster 8: 166 (12%)





1.0

Friedrich Leisch: Cluster Visualization, 10.7.2009

● ●

0.0

1.0

● ● ●



0.8



● ●



0.6





● ● ●

0.4







0.2

Cluster 4: 166 (12%)





0.0

1.0

Cluster 1: 322 (23%) ●

meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off

0.8

0.2



0.4

0.6

0.8

1.0

9

Friedrich Leisch: Cluster Visualization, 10.7.2009 children

no.one.else

recognition

lonely

benefited

services

career

network

faith

mind.off

socialise

meet.people

active

good.job

example

cause

community

give.back

help.others

Sort Variables

10

Sort Variables 0.0

Cluster 1: 322 (23%) benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others

0.2

0.4

0.6

0.0

Cluster 3: 136 (10%)

● ●

● ●





● ●



● ●

● ● ●







































● ●

Cluster 5: 160 (11%)

● ●

Cluster 6: 147 (10%)

● ●

● ●







● ●

● ● ●



● ●

● ●





● ●

● ●

● ●

● ●

● ● ●



● ● ●











● ●

● ●









● ● ●

Friedrich Leisch: Cluster Visualization, 10.7.2009



● ● ●

1.0





● ●



● ●



0.8

Cluster 8: 166 (12%)

● ●

● ● ●

● ●

Cluster 7: 178 (13%)

● ●

● ●





● ●

0.6





● ●

0.4

● ● ●



0.2







● ● ●

0.0

1.0



● ●

0.8

● ●



● ●

0.6

● ●

● ●

0.4

Cluster 4: 166 (12%)

● ●



0.2

● ●

● ● ●

0.0

1.0

Cluster 2: 140 (10%)

● ●

benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others

0.8

0.2

0.4

0.6

● ●

0.8

1.0

11

Highlight Important Points 0.0

Cluster 1: 322 (23%) benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others

0.2

0.4

0.6

0.0

Cluster 3: 136 (10%)

● ●

● ●





● ●



● ●

● ● ●







































● ●

Cluster 5: 160 (11%)

● ●

Cluster 6: 147 (10%)

● ●

● ●







● ●

● ● ●



● ●

● ●





● ●

● ●

● ●

● ●

● ● ●



● ● ●











● ●

● ●









● ● ●

Friedrich Leisch: Cluster Visualization, 10.7.2009



● ● ●

1.0





● ●



● ●



0.8

Cluster 8: 166 (12%)

● ●

● ● ●

● ●

Cluster 7: 178 (13%)

● ●

● ●





● ●

0.6





● ●

0.4

● ● ●



0.2







● ● ●

0.0

1.0



● ●

0.8

● ●



● ●

0.6

● ●

● ●

0.4

Cluster 4: 166 (12%)

● ●



0.2

● ●

● ● ●

0.0

1.0

Cluster 2: 140 (10%)

● ●

benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others

0.8

0.2

0.4

0.6

● ●

0.8

1.0

12

Example: Car Data A German manufacturer of premium cars asked recent customers about main motivation to buy one of their cars. Binary data, respondents were asked to check properties like sporty, power, interior, safety, quality, resale value, etc. Exploration of the data shows no natural groups, a segmentation of the market imposes a partition on the data (which is absolutely fine for the purpose). Here: hierarchical clustering of variables, partition with 4 clusters from neural gas algorithm for customers.

Friedrich Leisch: Cluster Visualization, 10.7.2009

13

Friedrich Leisch: Cluster Visualization, 10.7.2009 styling

interior

reputation

comfort

technology

reliability

quality

safety

handling concept

model_continuity

space

character

clarity

service

resale_value

consumption

economy

sporty

power

driving_properties

Example: Car Data

14

Example: Car Data 0.0

Cluster 1: 237 (30%) comfort technology reliability quality safety reputation model_continuity styling interior concept handling space character clarity service resale_value consumption economy sporty power driving_properties

0.2

0.4

0.6



● ● ● ●

● ● ● ● ● ●

● ●

● ●



● ●



● ●

● ● ● ● ● ● ●

● ● ● ● ● ●

● ●

● ●



Cluster 3: 205 (26%)

Cluster 4: 71 (9%)





● ● ● ●

● ● ● ●

● ●

● ● ● ●

● ●



● ●



● ●

● ● ● ● ● ● ●

● ● ● ● ● ●

● ●

● ●

0.0

1.0

Cluster 2: 280 (35%)



● ●

comfort technology reliability quality safety reputation model_continuity styling interior concept handling space character clarity service resale_value consumption economy sporty power driving_properties

0.8

0.2

Friedrich Leisch: Cluster Visualization, 10.7.2009

0.4

0.6



0.8

1.0

15

PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●● ●●●●● ●● ● ● ● ●● ●● ●● ● ●● ●● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●●● ● ●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●





economy consumption

space model_continuity resale_value

concept

clarity interiorservice styling reputation character

sporty handling

power driving_properties

reliability comfort safety

technology

quality



● ●

● ●●



●● ● ●



● ●● ● ●

● ●

● ● ●

● ●



● ●

● ● ● ●

Friedrich Leisch: Cluster Visualization, 10.7.2009

16

PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●● ●●●●● ●● ● ● ● ●● ●● ●● ● ●● ●● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●●● ● ●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●





consumption

resale_value

power

comfort

technology ●

● ●

● ●●



●● ● ●



● ●● ● ●

quality ●



● ● ●

● ●



● ●

● ● ● ●

Friedrich Leisch: Cluster Visualization, 10.7.2009

17

PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●



1

consumption 2

3

power



● ●

technology comfort ● 4

Friedrich Leisch: Cluster Visualization, 10.7.2009

● resale_value

quality

18

PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●



1

consumption 2

3

power



● ●

technology comfort ● 4

Friedrich Leisch: Cluster Visualization, 10.7.2009

● resale_value

quality

19

PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●



1

consumption 2

3

power



● ●

technology comfort ● 4

Friedrich Leisch: Cluster Visualization, 10.7.2009

● resale_value

quality

20

PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●



1

consumption 2

3

power



● ●

technology comfort ● 4

Friedrich Leisch: Cluster Visualization, 10.7.2009

● resale_value

quality

21

PCA Projection

consumption 2

1 3

resale_value

power technology comfort 4

Friedrich Leisch: Cluster Visualization, 10.7.2009

quality

22

Convex Cluster Hulls The main problem for 2-dimensional generalizations of boxplots is that there is no total ordering of R2 (or higher). For data partitioned using a centroid-based cluster algorithm there is a natural total ordering for each point in a cluster: The distance d(x, c(x)) of the point to its respective cluster centroid. Let Ak be the set of points in cluster k, and mk = median{d(xn, ck )|xn ∈ Ak }

inner area: all data points where d(xn, ck ) ≤ mk

outer area: all data points where d(xn, ck ) ≤ 2.5mk

Friedrich Leisch: Cluster Visualization, 10.7.2009

23

Shadow Values Second-closest centroid to x: ˜ c(x) =

argmin

d(x, c)

c∈CK \{c(x)}

Shadow value: 2d(x, c(x)) s(x) = ∈ [0, 1] d(x, c(x)) + d(x, ˜ c(x)) s(x) = 0: centroid

s(x) = 1: on border of clusters

Friedrich Leisch: Cluster Visualization, 10.7.2009

24

Neighborhood Graph Let n

Aij = xn | c(xn) = ci, ˜ c(xn) = cj

o

be the set of all points where ci is the closest centroid and cj is secondclosest. Cluster similarity: (

sij =

|Ai|−1 0,

P x∈|Aij | s(x),

Aij = 6 ∅ Aij = ∅

Use sij + sji for thickness of line connecting ci and cj .

Friedrich Leisch: Cluster Visualization, 10.7.2009

25

Neighborhood Graph ● ●●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

3

1

4 5





6



7 2 8

Friedrich Leisch: Cluster Visualization, 10.7.2009

26

Neighborhood Graph

3 7

● ● ● ● ● ● ● ● ●● ●● ●●● ●●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●

11

● ●

5



● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●●

1

15



4

9

8 13 2 6

12 16 ● ●●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●

10

14

Friedrich Leisch: Cluster Visualization, 10.7.2009







27

Volunteer PCA-Projection

help.others give.back cause community

3

4 example

6 7

2 1 5

8 active socialise network

Friedrich Leisch: Cluster Visualization, 10.7.2009

28

Volunteer PCA-Projection

● 3

4 ● 6 ● 7 ●

2 ● 1 ● 5 ●

Friedrich Leisch: Cluster Visualization, 10.7.2009

8 ●

29

Volunteer PCA-Projection

● 3

4 ● 6 ● 7 ●

2 ● 1 ● 5 ●

Friedrich Leisch: Cluster Visualization, 10.7.2009

8 ●

30

Nonlinear Arrangement

k1 k2 k5 k7 k3 k4 k6 k8 Friedrich Leisch: Cluster Visualization, 10.7.2009

31

Cluster Size

k1 k2 k5 k7 k3 k4 k6 k8 Friedrich Leisch: Cluster Visualization, 10.7.2009

32

Age Distribution ● ● ● ● ● ●

● ●





● ● ●

● ●

● ● ●

● ●

Friedrich Leisch: Cluster Visualization, 10.7.2009

33

Gender Distribution

Friedrich Leisch: Cluster Visualization, 10.7.2009

34

Nice to Look at

Friedrich Leisch: Cluster Visualization, 10.7.2009

35

Model-based Ordinal Clustering Survey data for two fastfood chains (McDonald’s, Subway) from 715 respondents on 10 items (yummy, fattening, greasy, fast, . . . ). We are interested in capturing scale usage heterogeneity under the assumption that different involvement with a brand may provoke different scale usage. Finite mixture model: For each item and group we estimate mean and standard deviation of a latent Gaussian. Assumption of independence between items given group membership, estimation by EM. For a 3-component model we get 30 means and 30 standard deviations.

Friedrich Leisch: Cluster Visualization, 10.7.2009

36

−3 −2 −1 0

0.4

x 0.4

1 2

2 −2

3 −3

Friedrich Leisch: Cluster Visualization, 10.7.2009 −1 x

−2 −1 0

0.4

1

0.3

0.3

0.3

0

0.2

0.2

0.2

−1

0.1

0.1

0.1

−2

0.0

0.0

0.0

0.0

0.0

0.0

0.2

0.3

0.2

0.3

0 1

1 2 3 0.1

0.2

0.3

dnorm(x, mean = m[n], sd = s[n])

0.1

dnorm(x, mean = m[n], sd = s[n])

0.1

0.4

0.4

0.4

Model-based Ordinal Clustering

2 −2

−3

−1

−2 0

−1 0

1

1 2

x

2 3

37

Subway: 2 Components 1 yummy

1 fattening

1 greasy

1 fast

1 cheap

1 tasty

1 healthy

1 1 disgusting convenient

1 spicy

2 yummy

2 fattening

2 greasy

2 fast

2 cheap

2 tasty

2 healthy

2 2 disgusting convenient

2 spicy

0.8 0.6 0.4 0.2

prob

0.0

0.8 0.6 0.4 0.2 0.0 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456

Friedrich Leisch: Cluster Visualization, 10.7.2009

38

Subway: 2 Components 0.0

1

0.2

0.4

0.6

0.8

1.0

2

yummy fattening greasy fast cheap tasty healthy disgusting convenient spicy 0.0

0.2

0.4

0.6

0.8

1.0

prob

Friedrich Leisch: Cluster Visualization, 10.7.2009

39

Subway: 2 Components Overall Scale Usage 1

2

0.4

prob

0.3

0.2

0.1

0.0 0

1

2

3

4

Friedrich Leisch: Cluster Visualization, 10.7.2009

5

6

0

1

2

3

4

5

6

40

Subway: 2 Components Scale Usage Corrected 1 yummy

1 fattening

1 greasy

1 fast

1 cheap

1 tasty

1 healthy

1 1 disgusting convenient

1 spicy

2 yummy

2 fattening

2 greasy

2 fast

2 cheap

2 tasty

2 healthy

2 2 disgusting convenient

2 spicy

6 5 4 3 2

prob ratio

1 0

6 5 4 3 2 1 0 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456

Friedrich Leisch: Cluster Visualization, 10.7.2009

41

McDonald’s: 3 Components 0.0

1

0.2

0.4

0.6

0.8

1.0

2

3

yummy fattening greasy fast cheap tasty healthy disgusting convenient spicy 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

prob

Friedrich Leisch: Cluster Visualization, 10.7.2009

42

McDonald’s: 3 Components Overall Scale Usage 1

2

3

0.5

0.4

prob

0.3

0.2

0.1

0.0 0

1

2

3

4

5

6

0

Friedrich Leisch: Cluster Visualization, 10.7.2009

1

2

3

4

5

6

0

1

2

3

4

5

6

43

McDonald’s: 3 Components Scale Usage Corrected 1 yummy

1 fattening

1 greasy

1 fast

1 cheap

1 tasty

1 healthy

1 1 disgusting convenient

1 spicy

2 yummy

2 fattening

2 greasy

2 fast

2 cheap

2 tasty

2 healthy

2 2 disgusting convenient

2 spicy

5 4 3 2 1 0

5

prob ratio

4 3 2 1 0

3 yummy

3 fattening

3 greasy

3 fast

3 cheap

3 tasty

3 healthy

3 3 disgusting convenient

3 spicy

5 4 3 2 1 0 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456

Friedrich Leisch: Cluster Visualization, 10.7.2009

44

Crosstabulation of Clusters

2

Pearson residuals: 5.87

1

1

2

4.00 2.00

3

0.00

Only a very small group has extreme response style for both brands, otherwise almost independence.

−2.00 −2.93 p−value = 7.7721e−11

Friedrich Leisch: Cluster Visualization, 10.7.2009

45

Software, Papers, Outlook Packages are available from CRAN and/or R-Forge: flexclust: KCCA clustering for arbitrary distances, shaded barcharts, projections, convex hulls, static neighborhood graphs, . . . gcExplorer: Interactive neighborhood graphs with links to Gene Ontology. symbols: Grid versions of boxplot(), symbols(), stars(), . . . flexmix: Finite mixture models. Model based clustering for various distributions and mixtures of generalized linear models. Public availability of features shown in this talk depends on release version of packages. Papers available at http://www.statistik.lmu.de/~leisch Big 2do: Redo most with iPlots Extreme . . . Friedrich Leisch: Cluster Visualization, 10.7.2009

46