Theresa Scharl, Ingo Voglhuber (Vienna University of Technology) .... 8 Volunteer Clusters mind.off recognition network benefited good.job children services.
Visualizing Cluster Results Using Package FlexClust and Friendsd Friedrich Leisch University of Munich useR!, Rennes, 10.7.2009
Acknowledgements & Apology • Sara Dolnicar (University of Wollongong)
• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)
• Paul Murrell, Deepayan Sarkar (R Core)
Friedrich Leisch: Cluster Visualization, 10.7.2009
1
Acknowledgements & Apology • Sara Dolnicar (University of Wollongong)
• Theresa Scharl, Ingo Voglhuber (Vienna University of Technology)
• Paul Murrell, Deepayan Sarkar (R Core)
Apology: Microarray data only in Theresa’s talk (try time-shift back to Wednesday).
Friedrich Leisch: Cluster Visualization, 10.7.2009
1
Partitioning Clustering – KCCA K-centroid cluster algorithms: • data set XN = {x1, . . . , xN }, set of centroids CK = {c1, . . . , cK } • distance measure d(x, y) • centroid c closest to x: c(x) = argmin d(x, c) c∈CK
• Most KCCA algorithms try to find a set of centroids CK for fixed K such that the average distance N 1 X d(xn, c(xn)) → min, D(XN , CK ) = CK N n=1
of each point to the closest centroid is minimal. • Optimization algorithm not important for rest of talk
Friedrich Leisch: Cluster Visualization, 10.7.2009
2
Example: Australian Volunteers Survey among 1415 Australian adults about which organziations they would consider to volunteer for, main motivation to volunteer, actual volunteering, image of organizations, . . . We use a block of 19 binary questions (“applies”, “does not apply”) about motivations to volunteer: “I want to meet people”, “I have no one else”, “I want to set an example”, . . . Organziations investigated: Red Cross, Surf Life Savers, Rural Fire Service, Parents Association, . . . Our analyses show that there is both competition between organizations with similar profiles as well as complimentary effects (idividuals volunteering for more than one arganization, in most cases with very different profiles).
Friedrich Leisch: Cluster Visualization, 10.7.2009
3
8 Volunteer Clusters meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off
Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 9.24 15.77 28.77 97.18 82.17 80.97 90.81 8.21 8.34 6.16 27.72 10.51 12.47 16.75 12.80 14.68 63.56 93.30 35.84 73.33 80.32 14.12 10.68 3.28 88.74 52.79 54.17 83.39 0.00 100.00 92.93 95.17 56.95 89.18 86.98 21.68 29.18 87.73 96.24 43.41 87.60 96.18 12.04 5.54 10.46 71.34 35.01 17.49 18.29 4.55 8.71 2.30 56.55 17.16 9.76 18.93 17.17 15.93 23.38 93.82 81.61 53.97 77.00 16.17 9.64 66.66 93.75 14.67 87.80 90.34 20.49 12.07 79.66 96.91 47.11 83.31 85.12 10.12 7.24 22.84 66.89 10.52 42.10 27.83 7.00 7.10 11.63 78.15 23.99 43.72 44.98 6.88 11.76 11.88 28.58 14.86 16.20 14.64 19.14 23.81 100.00 94.61 49.18 85.35 75.38 10.49 15.09 14.26 74.37 15.04 100.00 0.00 10.75 8.47 6.29 85.86 43.59 22.83 38.98 10.30 8.03 11.29 79.59 12.80 19.56 21.40 8.56 10.95 12.53 87.96 39.55 24.55 47.43
Friedrich Leisch: Cluster Visualization, 10.7.2009
Cl.8 34.23 7.23 45.30 6.35 86.12 89.77 11.54 0.75 23.98 77.21 79.82 19.47 14.64 8.31 0.00 12.68 10.28 3.49 4.31
Total 49.47 11.38 47.63 35.83 66.78 63.75 20.57 13.14 44.88 52.72 58.66 24.03 25.23 13.00 51.80 26.29 25.94 18.73 26.57
4
8 Volunteer Clusters meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off
Cl.1 Cl.2 Cl.3 Cl.4 Cl.5 Cl.6 Cl.7 Cl.8 Total 9 16 29 97 82 81 91 34 49 8 8 6 28 11 12 17 7 11 13 15 64 93 36 73 80 45 48 14 11 3 89 53 54 83 6 36 0 100 93 95 57 89 87 86 67 22 29 88 96 43 88 96 90 64 12 6 10 71 35 17 18 12 21 5 9 2 57 17 10 19 1 13 17 16 23 94 82 54 77 24 45 16 10 67 94 15 88 90 77 53 20 12 80 97 47 83 85 80 59 10 7 23 67 11 42 28 19 24 7 7 12 78 24 44 45 15 25 7 12 12 29 15 16 15 8 13 19 24 100 95 49 85 75 0 52 10 15 14 74 15 100 0 13 26 11 8 6 86 44 23 39 10 26 10 8 11 80 13 20 21 3 19 9 11 13 88 40 25 47 4 27
Friedrich Leisch: Cluster Visualization, 10.7.2009
5
8 Volunteer Clusters 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
Cluster 1: 322 (23%)
Cluster 2: 140 (10%)
Cluster 3: 166 (12%)
Cluster 4: 136 (10%)
Cluster 5: 160 (11%)
Cluster 6: 147 (10%)
Cluster 7: 178 (13%)
Cluster 8: 166 (12%)
1.0
meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off 0.0
0.2
0.4
0.6
0.8
1.0
Friedrich Leisch: Cluster Visualization, 10.7.2009
0.0
0.2
0.4
0.6
0.8
1.0
6
8 Volunteer Clusters We cannot easily test for differences between clusters, because they were constructed to be different.
Friedrich Leisch: Cluster Visualization, 10.7.2009
7
8 Volunteer Clusters We cannot easily test for differences between clusters, because they were constructed to be different. Improved presentation of results (following advice that is only around for a few decades): • Add reference lines/points • Sort variables by content: 1. sort clusters by mean 2. sort variables by hierarchical clustering • Highlight important points
Friedrich Leisch: Cluster Visualization, 10.7.2009
7
Add reference points 0.0 meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off
0.2
0.4
0.6
0.0
Cluster 2: 140 (10%)
Cluster 3: 166 (12%)
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
● ●
● ●
●
● ●
●
● ●
●
●
●
●
● ●
●
● ●
●
● ●
● ●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
Cluster 5: 160 (11%)
Cluster 6: 147 (10%)
Cluster 7: 178 (13%)
●
●
●
●
● ●
●
● ●
● ●
●
● ●
●
●
●
●
●
● ●
● ●
● ● ●
● ●
●
●
●
● ●
●
● ●
● ●
● ●
0.8
●
● ●
● ●
●
● ●
● ●
●
● ●
● ●
0.6
● ●
●
●
● ●
●
●
0.4
●
●
● ●
0.2
● ●
●
●
Cluster 8: 166 (12%)
●
●
1.0
Friedrich Leisch: Cluster Visualization, 10.7.2009
● ●
0.0
1.0
● ● ●
●
0.8
●
● ●
●
0.6
●
●
● ● ●
0.4
●
●
●
0.2
Cluster 4: 136 (10%)
●
●
0.0
1.0
Cluster 1: 322 (23%) ●
meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off
0.8
0.2
●
0.4
0.6
0.8
1.0
8
Sort Clusters 0.0 meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off
0.2
0.4
0.6
0.0
Cluster 2: 140 (10%)
Cluster 3: 136 (10%)
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
● ●
● ●
●
● ●
●
● ●
●
●
●
●
● ●
●
● ●
●
● ●
● ●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
Cluster 5: 160 (11%)
Cluster 6: 147 (10%)
Cluster 7: 178 (13%)
●
●
●
●
● ●
●
● ●
● ●
●
● ●
●
●
●
●
●
● ●
● ●
● ● ●
● ●
●
●
●
● ●
●
● ●
● ●
● ●
0.8
●
● ●
● ●
●
● ●
● ●
●
● ●
● ●
0.6
● ●
●
●
● ●
●
●
0.4
●
●
● ●
0.2
● ●
●
●
Cluster 8: 166 (12%)
●
●
1.0
Friedrich Leisch: Cluster Visualization, 10.7.2009
● ●
0.0
1.0
● ● ●
●
0.8
●
● ●
●
0.6
●
●
● ● ●
0.4
●
●
●
0.2
Cluster 4: 166 (12%)
●
●
0.0
1.0
Cluster 1: 322 (23%) ●
meet.people no.one.else example socialise help.others give.back career lonely active community cause faith services children good.job benefited network recognition mind.off
0.8
0.2
●
0.4
0.6
0.8
1.0
9
Friedrich Leisch: Cluster Visualization, 10.7.2009 children
no.one.else
recognition
lonely
benefited
services
career
network
faith
mind.off
socialise
meet.people
active
good.job
example
cause
community
give.back
help.others
Sort Variables
10
Sort Variables 0.0
Cluster 1: 322 (23%) benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others
0.2
0.4
0.6
0.0
Cluster 3: 136 (10%)
● ●
● ●
●
●
● ●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
Cluster 5: 160 (11%)
● ●
Cluster 6: 147 (10%)
● ●
● ●
●
●
●
● ●
● ● ●
●
● ●
● ●
●
●
● ●
● ●
● ●
● ●
● ● ●
●
● ● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
● ● ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
●
● ● ●
1.0
●
●
● ●
●
● ●
●
0.8
Cluster 8: 166 (12%)
● ●
● ● ●
● ●
Cluster 7: 178 (13%)
● ●
● ●
●
●
● ●
0.6
●
●
● ●
0.4
● ● ●
●
0.2
●
●
●
● ● ●
0.0
1.0
●
● ●
0.8
● ●
●
● ●
0.6
● ●
● ●
0.4
Cluster 4: 166 (12%)
● ●
●
0.2
● ●
● ● ●
0.0
1.0
Cluster 2: 140 (10%)
● ●
benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others
0.8
0.2
0.4
0.6
● ●
0.8
1.0
11
Highlight Important Points 0.0
Cluster 1: 322 (23%) benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others
0.2
0.4
0.6
0.0
Cluster 3: 136 (10%)
● ●
● ●
●
●
● ●
●
● ●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
Cluster 5: 160 (11%)
● ●
Cluster 6: 147 (10%)
● ●
● ●
●
●
●
● ●
● ● ●
●
● ●
● ●
●
●
● ●
● ●
● ●
● ●
● ● ●
●
● ● ●
●
●
●
●
●
● ●
● ●
●
●
●
●
● ● ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
●
● ● ●
1.0
●
●
● ●
●
● ●
●
0.8
Cluster 8: 166 (12%)
● ●
● ● ●
● ●
Cluster 7: 178 (13%)
● ●
● ●
●
●
● ●
0.6
●
●
● ●
0.4
● ● ●
●
0.2
●
●
●
● ● ●
0.0
1.0
●
● ●
0.8
● ●
●
● ●
0.6
● ●
● ●
0.4
Cluster 4: 166 (12%)
● ●
●
0.2
● ●
● ● ●
0.0
1.0
Cluster 2: 140 (10%)
● ●
benefited services children no.one.else faith recognition lonely career network mind.off socialise meet.people active good.job example cause community give.back help.others
0.8
0.2
0.4
0.6
● ●
0.8
1.0
12
Example: Car Data A German manufacturer of premium cars asked recent customers about main motivation to buy one of their cars. Binary data, respondents were asked to check properties like sporty, power, interior, safety, quality, resale value, etc. Exploration of the data shows no natural groups, a segmentation of the market imposes a partition on the data (which is absolutely fine for the purpose). Here: hierarchical clustering of variables, partition with 4 clusters from neural gas algorithm for customers.
Friedrich Leisch: Cluster Visualization, 10.7.2009
13
Friedrich Leisch: Cluster Visualization, 10.7.2009 styling
interior
reputation
comfort
technology
reliability
quality
safety
handling concept
model_continuity
space
character
clarity
service
resale_value
consumption
economy
sporty
power
driving_properties
Example: Car Data
14
Example: Car Data 0.0
Cluster 1: 237 (30%) comfort technology reliability quality safety reputation model_continuity styling interior concept handling space character clarity service resale_value consumption economy sporty power driving_properties
0.2
0.4
0.6
●
● ● ● ●
● ● ● ● ● ●
● ●
● ●
●
● ●
●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ●
●
Cluster 3: 205 (26%)
Cluster 4: 71 (9%)
●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ●
●
● ●
●
● ●
● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
● ●
0.0
1.0
Cluster 2: 280 (35%)
●
● ●
comfort technology reliability quality safety reputation model_continuity styling interior concept handling space character clarity service resale_value consumption economy sporty power driving_properties
0.8
0.2
Friedrich Leisch: Cluster Visualization, 10.7.2009
0.4
0.6
●
0.8
1.0
15
PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●● ●●●●● ●● ● ● ● ●● ●● ●● ● ●● ●● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●●● ● ●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
economy consumption
space model_continuity resale_value
concept
clarity interiorservice styling reputation character
sporty handling
power driving_properties
reliability comfort safety
technology
quality
●
● ●
● ●●
●
●● ● ●
●
● ●● ● ●
● ●
● ● ●
● ●
●
● ●
● ● ● ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
16
PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●● ●●●●● ●● ● ● ● ●● ●● ●● ● ●● ●● ●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●●● ● ●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
consumption
resale_value
power
comfort
technology ●
● ●
● ●●
●
●● ● ●
●
● ●● ● ●
quality ●
●
● ● ●
● ●
●
● ●
● ● ● ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
17
PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●
●
1
consumption 2
3
power
●
● ●
technology comfort ● 4
Friedrich Leisch: Cluster Visualization, 10.7.2009
● resale_value
quality
18
PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●
●
1
consumption 2
3
power
●
● ●
technology comfort ● 4
Friedrich Leisch: Cluster Visualization, 10.7.2009
● resale_value
quality
19
PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●
●
1
consumption 2
3
power
●
● ●
technology comfort ● 4
Friedrich Leisch: Cluster Visualization, 10.7.2009
● resale_value
quality
20
PCA Projection ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●
●
1
consumption 2
3
power
●
● ●
technology comfort ● 4
Friedrich Leisch: Cluster Visualization, 10.7.2009
● resale_value
quality
21
PCA Projection
consumption 2
1 3
resale_value
power technology comfort 4
Friedrich Leisch: Cluster Visualization, 10.7.2009
quality
22
Convex Cluster Hulls The main problem for 2-dimensional generalizations of boxplots is that there is no total ordering of R2 (or higher). For data partitioned using a centroid-based cluster algorithm there is a natural total ordering for each point in a cluster: The distance d(x, c(x)) of the point to its respective cluster centroid. Let Ak be the set of points in cluster k, and mk = median{d(xn, ck )|xn ∈ Ak }
inner area: all data points where d(xn, ck ) ≤ mk
outer area: all data points where d(xn, ck ) ≤ 2.5mk
Friedrich Leisch: Cluster Visualization, 10.7.2009
23
Shadow Values Second-closest centroid to x: ˜ c(x) =
argmin
d(x, c)
c∈CK \{c(x)}
Shadow value: 2d(x, c(x)) s(x) = ∈ [0, 1] d(x, c(x)) + d(x, ˜ c(x)) s(x) = 0: centroid
s(x) = 1: on border of clusters
Friedrich Leisch: Cluster Visualization, 10.7.2009
24
Neighborhood Graph Let n
Aij = xn | c(xn) = ci, ˜ c(xn) = cj
o
be the set of all points where ci is the closest centroid and cj is secondclosest. Cluster similarity: (
sij =
|Ai|−1 0,
P x∈|Aij | s(x),
Aij = 6 ∅ Aij = ∅
Use sij + sji for thickness of line connecting ci and cj .
Friedrich Leisch: Cluster Visualization, 10.7.2009
25
Neighborhood Graph ● ●●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
3
1
4 5
●
●
6
●
7 2 8
Friedrich Leisch: Cluster Visualization, 10.7.2009
26
Neighborhood Graph
3 7
● ● ● ● ● ● ● ● ●● ●● ●●● ●●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●
11
● ●
5
●
● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●●
1
15
●
4
9
8 13 2 6
12 16 ● ●●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●
10
14
Friedrich Leisch: Cluster Visualization, 10.7.2009
●
●
●
27
Volunteer PCA-Projection
help.others give.back cause community
3
4 example
6 7
2 1 5
8 active socialise network
Friedrich Leisch: Cluster Visualization, 10.7.2009
28
Volunteer PCA-Projection
● 3
4 ● 6 ● 7 ●
2 ● 1 ● 5 ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
8 ●
29
Volunteer PCA-Projection
● 3
4 ● 6 ● 7 ●
2 ● 1 ● 5 ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
8 ●
30
Nonlinear Arrangement
k1 k2 k5 k7 k3 k4 k6 k8 Friedrich Leisch: Cluster Visualization, 10.7.2009
31
Cluster Size
k1 k2 k5 k7 k3 k4 k6 k8 Friedrich Leisch: Cluster Visualization, 10.7.2009
32
Age Distribution ● ● ● ● ● ●
● ●
●
●
● ● ●
● ●
● ● ●
● ●
Friedrich Leisch: Cluster Visualization, 10.7.2009
33
Gender Distribution
Friedrich Leisch: Cluster Visualization, 10.7.2009
34
Nice to Look at
Friedrich Leisch: Cluster Visualization, 10.7.2009
35
Model-based Ordinal Clustering Survey data for two fastfood chains (McDonald’s, Subway) from 715 respondents on 10 items (yummy, fattening, greasy, fast, . . . ). We are interested in capturing scale usage heterogeneity under the assumption that different involvement with a brand may provoke different scale usage. Finite mixture model: For each item and group we estimate mean and standard deviation of a latent Gaussian. Assumption of independence between items given group membership, estimation by EM. For a 3-component model we get 30 means and 30 standard deviations.
Friedrich Leisch: Cluster Visualization, 10.7.2009
36
−3 −2 −1 0
0.4
x 0.4
1 2
2 −2
3 −3
Friedrich Leisch: Cluster Visualization, 10.7.2009 −1 x
−2 −1 0
0.4
1
0.3
0.3
0.3
0
0.2
0.2
0.2
−1
0.1
0.1
0.1
−2
0.0
0.0
0.0
0.0
0.0
0.0
0.2
0.3
0.2
0.3
0 1
1 2 3 0.1
0.2
0.3
dnorm(x, mean = m[n], sd = s[n])
0.1
dnorm(x, mean = m[n], sd = s[n])
0.1
0.4
0.4
0.4
Model-based Ordinal Clustering
2 −2
−3
−1
−2 0
−1 0
1
1 2
x
2 3
37
Subway: 2 Components 1 yummy
1 fattening
1 greasy
1 fast
1 cheap
1 tasty
1 healthy
1 1 disgusting convenient
1 spicy
2 yummy
2 fattening
2 greasy
2 fast
2 cheap
2 tasty
2 healthy
2 2 disgusting convenient
2 spicy
0.8 0.6 0.4 0.2
prob
0.0
0.8 0.6 0.4 0.2 0.0 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456
Friedrich Leisch: Cluster Visualization, 10.7.2009
38
Subway: 2 Components 0.0
1
0.2
0.4
0.6
0.8
1.0
2
yummy fattening greasy fast cheap tasty healthy disgusting convenient spicy 0.0
0.2
0.4
0.6
0.8
1.0
prob
Friedrich Leisch: Cluster Visualization, 10.7.2009
39
Subway: 2 Components Overall Scale Usage 1
2
0.4
prob
0.3
0.2
0.1
0.0 0
1
2
3
4
Friedrich Leisch: Cluster Visualization, 10.7.2009
5
6
0
1
2
3
4
5
6
40
Subway: 2 Components Scale Usage Corrected 1 yummy
1 fattening
1 greasy
1 fast
1 cheap
1 tasty
1 healthy
1 1 disgusting convenient
1 spicy
2 yummy
2 fattening
2 greasy
2 fast
2 cheap
2 tasty
2 healthy
2 2 disgusting convenient
2 spicy
6 5 4 3 2
prob ratio
1 0
6 5 4 3 2 1 0 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456
Friedrich Leisch: Cluster Visualization, 10.7.2009
41
McDonald’s: 3 Components 0.0
1
0.2
0.4
0.6
0.8
1.0
2
3
yummy fattening greasy fast cheap tasty healthy disgusting convenient spicy 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
prob
Friedrich Leisch: Cluster Visualization, 10.7.2009
42
McDonald’s: 3 Components Overall Scale Usage 1
2
3
0.5
0.4
prob
0.3
0.2
0.1
0.0 0
1
2
3
4
5
6
0
Friedrich Leisch: Cluster Visualization, 10.7.2009
1
2
3
4
5
6
0
1
2
3
4
5
6
43
McDonald’s: 3 Components Scale Usage Corrected 1 yummy
1 fattening
1 greasy
1 fast
1 cheap
1 tasty
1 healthy
1 1 disgusting convenient
1 spicy
2 yummy
2 fattening
2 greasy
2 fast
2 cheap
2 tasty
2 healthy
2 2 disgusting convenient
2 spicy
5 4 3 2 1 0
5
prob ratio
4 3 2 1 0
3 yummy
3 fattening
3 greasy
3 fast
3 cheap
3 tasty
3 healthy
3 3 disgusting convenient
3 spicy
5 4 3 2 1 0 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456
Friedrich Leisch: Cluster Visualization, 10.7.2009
44
Crosstabulation of Clusters
2
Pearson residuals: 5.87
1
1
2
4.00 2.00
3
0.00
Only a very small group has extreme response style for both brands, otherwise almost independence.
−2.00 −2.93 p−value = 7.7721e−11
Friedrich Leisch: Cluster Visualization, 10.7.2009
45
Software, Papers, Outlook Packages are available from CRAN and/or R-Forge: flexclust: KCCA clustering for arbitrary distances, shaded barcharts, projections, convex hulls, static neighborhood graphs, . . . gcExplorer: Interactive neighborhood graphs with links to Gene Ontology. symbols: Grid versions of boxplot(), symbols(), stars(), . . . flexmix: Finite mixture models. Model based clustering for various distributions and mixtures of generalized linear models. Public availability of features shown in this talk depends on release version of packages. Papers available at http://www.statistik.lmu.de/~leisch Big 2do: Redo most with iPlots Extreme . . . Friedrich Leisch: Cluster Visualization, 10.7.2009
46