October 21, 2003 presented by Alejandro Murua and Larissa Stanberry at âUnsupervised Learning Workshopâ,. Department of Statistics,. University of ...
Data Clustering using the self-organizing properties of magnetic systems Marcello Blatt, Shi Wiseman and Eytan Domany Department of Physics of Complex Systems, The Weizmann Institute of Science, Rehovot, Israel ∗October 21, 2003
presented by Alejandro Murua and Larissa Stanberry at ”Unsupervised Learning Workshop”, Department of Statistics, University of Washington, Seattle. ∗
Potts Model Points vi, i = 1, . . . , N , residing on some latice. Each vi has a spin si = 1, . . . , q. Jij denotes interaction between spins i and j. Configuration of the system S = {si}N 1. Energy of the system H (“cost function”) H=
X i,j
Jij (1 − δsi,sj )
si · sj do not contribute to the energy sum. si ↑↓ sj contribute a positive interaction Jij > 0
Boltzman distribution at fixed T 1 exp (− H ) P (S) = Z T
gives weight to the configuration S. Z=
P S
exp (− H ) is a normalizing constant. T 2
The ordering properties of the system
. magnetization, Em(S) max (S)−N m(S) = qN(q−1)N ,
where Nmax = maxk {Nk (S)},
Nk (S) =
X
δsi ,k
i
Nk (S) is the number of spins in state k ∼ cluster size
. spin-spin correlation, Eδsisj . Gij = Eδsisj = P (si · sj ) . susceptibility χ (a.k.a. variance) 2 − (Em)2) = N var(m) χ=N (Em T T
3
Homogeneous System All spins have constant interactions Jij = J Ferromagnetic
Paramagnetic
T -low Em = 1 Gij ≈ 1
T -high Em = 0 Gij ≈ 1q ∗
∗ spin-spin correlation G = Eδs s = P (s · s ) ij i j i j
4
Inhomogeneous Potts Model Spins form magnetic “grains” characterized by strong coupling within the grain, and weak coupling between the grains. Ferromagnetic
Super-Paramagnetic
Paramagnetic
T low
T
T high
Spins aligned
Strongly coupled · Weakly coupled ↑↓
Spins disordered
Gij > 1 − 2q O( q12 ) ∗
Gij ≈ 1 − 2q O( q12 )
Gij =
1 q
∀i, j
independent spins
∗ spin-spin correlation G = Eδs s = P (s · s ) ij i j i j
5
Monte-Carlo Simulations to compute thermal averages Thermodynamic average is calculated as EA =
X
A(S)P (S),
S P (S) =
1 Z
exp (− H ) is a Boltzmann factor, Z = T
P S
exp (− H ) T
Number of possible system configurations is q N . Solution:
. Generate {S1, . . . , SM } ∼ Boltzmann distribution. . Use it as a statistical sample.
. Approximate thermal averages by EA =
1 PM A(S ), i M i=1
M ¿ N. 6
Swendsen-Wang Algorithm The algorithm changes the value of the entire cluster, rather than a single spin. . Assign first configuration at random s1, . . . , sN . . Visit all pairs of spins with positive interactions Jij > 0. . The bond between the spins is frozen with probJ ability pij = 1 − exp ( Tij δsisj ). . SW -cluster = {spins, connected by frozen bonds}. . Assign a spin value at random to SW -clusters . Iterate 7
Clustering Data 1. Consider data points x1, . . . , xn ∈ RD 2. Define number of spin states q • Number of spin states q is not related to the number of clusters !!
3. Define a neighborhood • all pairs (i, j) have N 2 interactions • D ≤ 3 use Delaunay triangulation • K-nearest neighbors: xi is K-nn of xj ⇐⇒ xj is K-nn of xi . The outcome is a connected graph. But D %⇒ K % • For D > 100 use a K − nn ◦ MST 8
4. Define interactions Jij = f (dij ), e.g. d2 1 ij exp (− ) ˆ Jij = K 2a2 0
vi, vj − K − nn otherwise
ˆ • K-average number of neighbors. • a-”local length scale” over which Jij decays • defined by the high-density regions • smaller than average distance in low-density regions • a = d¯ij ,
vi , vj -neighbors.
5. Generate M -configurations S1, . . . , SM 6. ∀S calculate
m(S) =
Nmax = maxk {Nk (S)}, the state k ∼ cluster size.
qNmax (S)−N (q−1)N
Nk (S) =
P
,
δ i si ,k
is the number of spins in
7. For each spin configuration calculate an indicator function ½
cij =
1 0
vi , vj ∈ SW − cluster otherwise
1 PM m(S ). 8. Calculate magnetization Em ≈ M k 1
9. Calculate variance var(m) = (Em2 − (Em)2)
10. Transition from super-paramagnetic to paramagnetic phase occurs at Tc ≈
2 1 √ exp (− hhdij ii ) 4 log (1+ q) 2a2
hh¦ii is the average of all neighbors.
11. Identify super-paramagnetic phase.
12. Select one T for all subphases.
13. Gij =
(q−1)Cij +1 , q
Cij = Ecij = P (vi , vj ∈ SWk )
14. Link vi, vj if Gij > 0.5,
1 q
< threshold < 1 − 2q
15. Connected subgraphs are cluster cores.
16. Remaining points are linked to the neighbor with max Gij .
Complexity
. Neighborhood definition is the most time consuming part.
. SW -step requires O(N ) ∼ 0.12 CPU time.
. M -steps, M ≈ 1000.
. LANDSAT, ISOLET ∼ 1 hour of CPU time.
. ISOLET ∼ 1 week with Projection Pursuit method.
. Weakly depends on dimensionality D (only K-nn part).
9
Comments
. Based on dissimilarity measure. No need for metric conversion.
. K-nn ◦ M ST .
. Different features of the data set are uncovered at different T ∼ Multiresolution approach.
. Final outcome is just a sample.
10
Random-Cluster (RC) Problem Consider data points vi. Define a bond variable
½ 1 bond is ’occupied’ nij = 0 bond is ’vacant’
System configuration is given by N = {nij }. Random clusters are defined as vertices of connected components of the occupied bonds. Random Cluster model is defined by: W (N ) =
nij q C(N ) (1−nij ) (1 − p ) Π p ij hi,ji i,j Z
C(N ) is the number of clusters, N ,
0 ≤ pij ≤ 1, Z = const
For ind-t bonds, q = 1, RC-Model ⇔ Potts Model Joint probability distribution of spin and bond variables in Potts model: 1Π P (S, N ) = Z hi,ji [(1 − pij )(1 − nij ) + pij nij δsisj ] 11
Summing over all S-configurations gives a Potts model X
P (S, N ) = W (N )
S
Let pij =
Jij 1 − exp (− T ), X
then
P (S, N ) = P (S)
N 1 exp (− H(S) ) is Boltzmann distribution. P (S) = Z T