Ideal Type Model and Associated Method for

0 downloads 0 Views 773KB Size Report
Conclusion. Outline. 1. Motivation. 2. Fuzzy Additive Spectral Clustering for Ideal Types. 3. Competitiveness. 4. Conclusion. Susana Nascimento & Boris Mirkin ...
Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Ideal Type Model and Associated Method for Relational Fuzzy Clustering Susana Nascimento1 1 Dep.

Boris Mirkin2

of Computer Science and NOVA Laboratory for Computer Science and Informatics FCT-Universidade Nova de Lisboa Portugal 2 National Research University Higher School of Economics Moscow, Russian Federation Birkbeck University of London, United Kingdom

Fuzzy-IEEE 2017 Napoli, July 11 Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Outline

1

Motivation

2

Fuzzy Additive Spectral Clustering for Ideal Types

3

Competitiveness

4

Conclusion

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Outline

1

Motivation

2

Fuzzy Additive Spectral Clustering for Ideal Types

3

Competitiveness

4

Conclusion

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Outline

1

Motivation

2

Fuzzy Additive Spectral Clustering for Ideal Types

3

Competitiveness

4

Conclusion

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Outline

1

Motivation

2

Fuzzy Additive Spectral Clustering for Ideal Types

3

Competitiveness

4

Conclusion

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Motivation Ideal types are extreme points that synthesize data representing “pure individual” types. They are assigned by the most extreme features of data, which facilitate human understanding and interpretation. Real world applications include – Fraud detection analysis – Benchmark analysis – Talent analysis (e.g. athletes in sports, researchers in science) – Machine Learning applications dealing with high-dimensional data (e.g. collaborative filtering, neuroimaging, computer vision)

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Ideal Types Model (Mirkin & Satarov, 1990) Observed a pre-processed N × V data matrix Y = [yiv ] i = 1, · · · , N; v = 1, · · · , V Assume data entities are a convex combination of a collection of ideal types, C Y = UC – C is K × V matrix whose rows, ck (k = 1, · · · , K ), represent the ideal types – U is N × K matrix whose rows ui = (uik ) are fuzzy membership vectors – K , the number of ideal types is to be found Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Archetypal Analysis (Cutler & Breiman, 1994) Observed N × V data matrix X = [xiv ] Assume data entities as a convex combination of K archetypes

X = AC – C is K × V matrix whose rows, ck (k = 1, · · · , K ), represent the archetypes

– A is N × K matrix whose rows, ai = (aik ), are fuzzy membership vectors

Assume archetypes ck are convex combinations of entities

C = BX – B is K × N matrix whose rows, bk = (bki ), are fuzzy membership vectors Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Proportional Membership Fuzzy Clustering (Nascimento & Mirkin, 2003)

Observed pre-processed N × V data matrix Y = [yiv ] Assume data entities share a proportion of each of K prototypes, ck (k = 1, · · · , K ), to be found Y = u k ck

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Extending Ideal Type Model to Relational Data Taking the Ideal Type Model Y = UC Assume orthogonality of prototypes ck (k = 1, . . . , K ) and multiply the model by its transpose YY ′ = UCC ′ U ′ Additive Similarity Model A = UΛU ′ A = (aij ) N × N similarity data matrix, Λ a K × K diagonal matrix with entries λk =< ck , ck > (k = 1, 2, ..., K )

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Method: Sequential Extraction of Clusters Least-squares for finding one cluster at a time One cluster criterion min u,λ

N X

(aij − λui uj )2

(1)

i,j=1

– wrt unknown λ > 0 and fuzzy membership vector u = (ui ), given residual similarity value aij ; – and constraints N X uik2 = 1, uik ≥ 0 i=1

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Fitting Fuzzy ADDItive Clustering with Spectral Method (1) Find λ by minimizing (1) for arbitrary u b) = min E(λ, u

N X

(aij − λui uj )2

(2)

i,j=1

First order optimality condition λ=

u′ Au (u′ u)2

which is non-negative if matrix A is positive semidefinite.

Susana Nascimento & Boris Mirkin

(3)

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Fitting Fuzzy ADDItive Clustering with Spectral Method (2) Find u minimizing clustering criterion (1) for the derived λ b u) = E(λ,

N X

aij2 − λ2

i,j=1

where T (A) =

N X

ui2

2 i,j=1 aij

G(u) = λ

uj2 = T (A) − λ2 u′ u

j=1

i=1

PN

N X

2

,

is the relational data scatter. 2

Susana Nascimento & Boris Mirkin



uu

2

=



u′ Au u′ u

2

(4)

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Fitting Fuzzy ADDItive Clustering with Spectral Method (3)

Minimize E or maximize G(u) or Rayleigh quotient u′ Au kuk6=0 u′ u max

p G(u) (5)

Maximum value is the maximum eigenvalue of matrix A, which is reached at the corresponding eigenvector – Rayleigh-Ritz theorem

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Fitting Fuzzy ADDItive Clustering with Spectral Method (4) Spectral Clustering approach Λ(A) = [λ, z] – λ maximum eigenvalue of A – z corresponding normed eigenvector for A

Projection P(z) – u∗ = max (z, 0) ∗ – u = kuu∗ k

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Fuzzy ADDItive Spectral Clustering Algorithm (1) One-by-one Fuzzy ADDItive Spectral cluster to model Ideal Type (FADDIS-IT) Input: Symmetric similarity matrix A, threshold of the contribution of an individual cluster ǫ > 0. Output:

– K - number of clusters – u1 , u2 , ..., uK – fuzzy membership vectors – λ1 , λ2 , ..., λK – square distances of ideal types ck (k = 1, . . . , K ), from the origin – G1 (u1 ), G2 (u2 ), . . . , GK (uK ) – clusters’ contributions, computed at different residual similarity matrices A.

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Fuzzy ADDItive Spectral Clustering Algorithm (2) 1 Initialization: Set k = 1 and compute T =

PN

i,j=1

aij2 .

2 Spectral: Find the set of all positive eigenvalues Λ = {λ} and corresponding normed eigenvectors Z = {z} for matrix A. 3 Stop-condition: If Λ is empty, computation stops and outputs the found clusters. 4 Fuzzy cluster projection: take eigenvectors z and −z corresponding to maximum λ ∈ Λ, compute their fuzzified projections, and take that one of them that maximizes the contribution, G(u), as uk along with corresponding λk = (u′k Auk )2 and contribution G(uk ). 5 Stop-condition: If G(uk ) < ǫ the computation stops, with k the n. of found clusters. Otherwise, add 1 to k , set A to its residual, A − λk uk u′k , and go to step 2.

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Fuzzy Additive Clustering Model

Stop Conditions of Sequential Extraction of Fuzzy Clusters

s1 The eigenvalue λ for the spectral fuzzy cluster is negative ( not possible since λ = (u′k Auk )2 )

s2 Contribution G(u) = λ2 of current cluster reached a pre-specified proportion, ǫ.

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Analysis of well-structured dissimilarity data

Table: Data of dissimilarity index between eleven objects (Windham, 1985). Object 1 2 3 4 5 6 7 8 9 10 11

1 0 6 3 6 11 25 44 72 69 72 100

2 6 0 3 11 6 14 28 56 47 44 72

3 3 3 0 3 3 11 25 47 44 47 69

4 6 11 3 0 6 14 28 44 47 56 72

5 11 6 3 6 0 3 11 28 25 28 44

Susana Nascimento & Boris Mirkin

6 25 14 11 14 3 0 3 14 11 14 25

7 44 28 25 28 11 3 0 6 3 6 11

8 72 56 47 44 28 14 6 0 3 11 6

9 69 47 44 47 25 11 3 3 0 3 3

10 72 44 47 56 28 14 6 11 3 0 6

11 100 72 69 72 44 25 11 6 3 6 0

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

FADDIS-IT Results (1) Table: FADDIS-IT results for data objects of Table 1 for Gaussian kernel transformation. Objects 1 2 3 4 5 6 7 8 9 10 11 G(uk )

I 0.2460 0.2662 0.3490 0.2662 0.3567 0.3120 0.3567 0.2662 0.3490 0.2662 0.2460 0.4875

II 0 0 0 0 0 0 0.3924 0.4266 0.5447 0.4266 0.4305 0.1076

Susana Nascimento & Boris Mirkin

III 0.4553 0.4322 0.5464 0.4322 0.3473 0 0 0 0 0 0 0.1094

IV 0 0 0 0 0.3947 0.7606 0.5155 0 0 0 0 0.0233

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Analysis of Complex Shape Data Sets 1.4

1.2

1.2

1

1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2

0

0

−0.2 −0.2

−0.2 −0.4

0

0.2

0.4

0.6

0.8

1

1.2

D1/3 (N=622; K =3)

0

0.2

0.4

0.6

0.8

1

1.2

D2/5 (N=622; K =5)

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.2

0.3

0.2

−0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

D3/4 (N=512; K =4)

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

D4/3 (N=299; K =3)

Figure: Four artificial bivariate datasets of complex shapes. Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

FADDIS-IT Results Complex Shape Data Sets FADDIS-IT

FMFCM

1.2

NERFCM

1.2

1.2

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0

0

0

−0.2 −0.2

−0.2 −0.2

−0.2 −0.2

0

0.2

0.4

0.6

0.8

1

1.2

out K =3; ARI= 1

0.2

0

0.2

0.4

0.6

0.8

1

1.2

in K =3; ARI= 0.47 1.4

1.4

1.2

1.2

1.2

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0

0

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

out K =5; ARI= 1

−0.2 −0.4

0.2

0.4

0.6

0.8

1

1.2

in K =3; ARI= 0.47

1.4

−0.2 −0.4

0

0.2

0

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

in K =5; ARI= 0.62

Susana Nascimento & Boris Mirkin

−0.2 −0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

in K =5; ARI= 0.69

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

FADDIS-IT Results Complex Shape Data Sets FADDIS-IT

FMFCM

NERFCM

1

1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2

0.9

0.4

0.3

0

out K =4; ARI= 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2

in K =4; ARI= 0.32 0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.2

0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

out K =3; ARI= 0.39

0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

in K =4; ARI =0.47

0.9

0.1

0

0.3

0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

in K =3; ARI= 0.35

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

in K =3; ARI = 0.01

Figure: Clustering results& Boris by the Susana Nascimento Mirkincurrent version of FADDIS-IT,

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Conclusion

A competitive spectral fuzzy clustering method to model Ideal Types for relational data – Additive approach where the products of entity membership and the ‘extent of anomaly’ of clusters contribute towards the similarity between entities; – Anomaly extent of clusters; – Model-based cluster extracting stop-conditions based on SEFIT (Mirkin, 1990).

Susana Nascimento & Boris Mirkin

Motivation Fuzzy Additive Spectral Clustering Competitiveness Conclusion

Future Work

To guarantee the property of mutual orthogonality among the Ideal types. Application to real world data.

Susana Nascimento & Boris Mirkin

Suggest Documents