Etowah. Fayette. Franklin. Geneva. GreeneHale. Henry. Houston. Jackson. Jefferson. Lamar ..... Avery. Beaufort. Bertie. Bladen. Brunswick. Buncombe Burke. Cabarrus ... McKenzie. McLean. Mercer. Morton. Mountrail. Nelson. Oliver. Pembina.
$
'
Clustering Relational Data Anuˇska Ferligoj Faculty of Social Sciences University of Ljubljana
&
% version: January 20, 2009 / 08 : 35
A. Ferligoj: Clustering relational data
2
'
$
Outline 1 2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2
3
Clustering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
9
Benefits from the Optimizational Approach . . . . . . . . . . . . . . . . . . . .
9
10
Blockmodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
41
Clustering with Relational Constraints . . . . . . . . . . . . . . . . . . . . . . .
41
47 48
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47 48
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
1
'
$
Introduction A large class of clustering problems can be formulated as an optimizational problem in which the best clustering is searched among all feasible clusterings according to a selected criterion function. This clustering approach can be applied to a variety of very interesting clustering problems, as it is possible to adapt it to a concrete clustering problem by an appropriate specification of the criterion function and/or by the definition of the set of feasible clusterings. Both, the blockmodeling problem (clustering of the relational data) and the clustering with relational constraint problem (clustering of the attribute and relational data) can be very successfully treated by this approach.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
2
'
$
Cluster Analysis Grouping units into clusters so that those within a cluster are as similar to each other as possible, while units in different clusters as dissimilar as possible, is a very old problem. Although the clustering problem is intuitively simple and understandable, providing solution(s) remains a very exciting activity. The field of cluster analysis • has its society, the International Federation of Classification Societies, formed in 1985 from several national classification societies; • organizes every second year its conference; • publishes two journals: the Journal of Classification (from 1984) and the journal Advances in Data Analysis and Classification (from 2007).
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
3
'
$
Clustering Problem Cluster analysis (known also as classification and taxonomy) deals mainly with the following general problem: given a set of units, U, determine subsets, called clusters, C, which are homogeneous and/or well separated according to the measured variables. The set of clusters forms a clustering. This problem can be formulated as an optimization problem: Determine the clustering C∗ for which P (C∗ ) = min P (C) C∈Φ
where C is a clustering of a given set of units, U, Φ is the set of all feasible clusterings and P : Φ → R is a criterion function.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
4
'
$
Clustering There are several types of clusterings, e.g., partition, hierarchy, pyramid, fuzzy clustering, clustering with overlaping clusters. The most frequently used clusterings are partitions and hierarchies. A clustering C = {C1 , C2 , ...Ck } is a partition of the set of units U if [ Ci = U i
i 6= j ⇒ Ci ∩ Cj = ∅ A clustering H = {C1 , C2 , ...Ck } is a hierarchy if for each pair of clusters Ci and Cj from H it holds Ci ∩ Cj ∈ {Ci , Cj , ∅}
s
s
s
s
s
s
,
and it is a complete hierarchy if for each unit x it holds {x} ∈ H, and U ∈ H. & % y l y * 6
A. Ferligoj: Clustering relational data
5
'
$
Clustering Criterion Function Clustering criterion functions can be constructed indirectly, e.g., as a function of a suitable (dis)similarity measure between pairs of units (e.g., euclidean distance) or directly. For partitions into k clusters, the Ward criterion function X X P (C) = d(x, tC ) C∈C x∈C
is usually used, where tC is the center of the cluster C and is defined as tC = (u1C , u2C , ..., umC ) where uiC is the average of the variable Ui , i = 1, ...m, for the units from the cluster C. d is the squared euclidean distance.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
6
'
$
As the set of feasible clusterings is finite a solution of the clustering problem always exists. Since this set is usually very large it is not easy to find an optimal solution. In general, most of the clustering problems are NP-hard. For this reason, different efficient heuristic algorithms are used. Among these, the agglomerative (hierarchical) and the relocation approach are most often used.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
7
'
$
Agglomerative Approach The agglomerative clustering approach usually assumes that all relevant information on the relationships between the n units from the set U is summarized by a symmetric pairwise dissimilarity matrix D = [dij ]. Each unit is a cluster: Ci = {xi }, xi ∈ U, i = 1, 2, . . . , n; repeat while there exist at least two clusters: determine the nearest pair of clusters Cp and Cq : d(Cp , Cq ) = minu,v d(Cu , Cv ) ; fuse the clusters Cp and Cq to form a new cluster Cr = Cp ∪ Cq ; replace Cp and Cq by the cluster Cr ; determine the dissimilarities between the cluster Cr and other clusters. The result is a hierarchy that is usually presented by a clustering tree – a dendrogram.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
8
'
$
Relocation Approach This approach assumes that the user can specify the number of clusters in the partition. Determine the initial clustering C; while there exists C0 such that P (C0 ) ≤ P (C), where C0 is obtained by moving a unit xi from cluster Cp to cluster Cq , or by interchanging units xi and xj between two clusters in the clustering C; repeat: substitute C0 for C .
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
9
'
$
Benefits from the Optimizational Approach The optimizational approch to clustering problem offers two possibilities to adapt to a concrete clustering problem: the definition of the criterion function P and the specification of the set of feasible clusterings Φ. Blockmodeling is searching for a clustering according to the relational data only and the solution can be obtained by an appropriatelly defined criterion function. For clustering with relational constraint an appropriately defined set of feasible clusterings is used.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
10
'
$
Blockmodeling The goal of blockmodeling is to reduce a large, potentially incoherent network to a smaller comprehensible structure that can be interpreted more readily. Blockmodeling, as an empirical procedure, is based on the idea that units in a network can be grouped according to the extent to which they are equivalent, according to some meaningful definition of equivalence.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
11
'
$
Cluster, Clustering, Blocks One of the main procedural goals of blockmodeling is to identify, in a given network N = (U, R), R ⊆ U × U, clusters (classes) of units that share structural characteristics defined in terms of R. The units within a cluster have the same or similar connection patterns to other units. They form a clustering C = {C1 , C2 , . . . , Ck } which is a partition of the set U. Each partition determines an equivalence relation (and vice versa). Let us denote by ∼ the relation determined by partition C. A clustering C partitions also the relation R into blocks R(Ci , Cj ) = R ∩ Ci × Cj Each such block consists of units belonging to clusters Ci and Cj and all arcs leading from cluster Ci to cluster Cj . If i = j, a block R(Ci , Ci ) is called a diagonal block. s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
12
'
$
The Everett Network a b c d e f g h i j
a 0 1 1 1 0 0 0 0 0 0
b 1 0 1 0 1 0 0 0 0 0
c 1 1 0 1 0 0 0 0 0 0
d 1 0 1 0 1 0 0 0 0 0
e 0 1 0 1 0 1 0 0 0 0
f 0 0 0 0 1 0 1 0 1 0
g 0 0 0 0 0 1 0 1 0 1
h 0 0 0 0 0 0 1 0 1 1
i 0 0 0 0 0 1 0 1 0 1
a 0 1 0 0 1 1 0 0 0 0
c 1 0 0 0 1 1 0 0 0 0
j 0 0 0 0 0 0 1 1 1 0
a c h j b d g i e f
A
B
C
A
1
1
0
B
1
0
1
C
0
1
1
h 0 0 0 1 0 0 1 1 0 0
j 0 0 1 0 0 0 1 1 0 0
b 1 1 0 0 0 0 0 0 1 0
d 1 1 0 0 0 0 0 0 1 0
g 0 0 1 1 0 0 0 0 0 1
i 0 0 1 1 0 0 0 0 0 1
e 0 0 0 0 1 1 0 0 0 1
f 0 0 0 0 0 0 1 1 1 0
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
13
'
$
Equivalences Regardless of the definition of equivalence used, there are two basic approaches to the equivalence of units in a given network (compare Faust, 1988): • the equivalent units have the same connection pattern to the same neighbors; • the equivalent units have the same or similar connection pattern to (possibly) different neighbors. The first type of equivalence is formalized by the notion of structural equivalence and the second by the notion of regular equivalence with the latter a generalization of the former.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
14
'
$
Structural Equivalence Units are equivalent if they are connected to the rest of the network in identical ways (Lorrain and White, 1971). Such units are said to be structurally equivalent. In other words, X and Y are structurally equivalent iff: s1. s2.
XRY ⇔ YRX XRX ⇔ YRY
s3. s4.
∀Z ∈ U \ {X, Y} : (XRZ ⇔ YRZ) ∀Z ∈ U \ {X, Y} : (ZRX ⇔ ZRY)
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
15
'
$
. . . Structural Equivalence The blocks for structural equivalence are null or complete with variations on diagonal in diagonal blocks.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
16
'
$
Regular Equivalence Integral to all attempts to generalize structural equivalence is the idea that units are equivalent if they link in equivalent ways to other units that are also equivalent. White and Reitz (1983): The equivalence relation ≈ on U is a regular equivalence on network N = (U, R) if and only if for all X, Y, Z ∈ U, X ≈ Y implies both R1. R2.
XRZ ⇒ ∃W ∈ U : (YRW ∧ W ≈ Z) ZRX ⇒ ∃W ∈ U : (WRY ∧ W ≈ Z)
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
17
'
$
. . . Regular Equivalence Theorem 1 (Batagelj, Doreian, Ferligoj, 1992) Let C = {Ci } be a partition corresponding to a regular equivalence ≈ on the network N = (U, R). Then each block R(Cu , Cv ) is either null or it has the property that there is at least one 1 in each of its rows and in each of its columns. Conversely, if for a given clustering C, each block has this property then the corresponding equivalence relation is a regular equivalence. The blocks for regular equivalence are null or 1-covered blocks.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
18
'
$
Establishing Blockmodels The problem of establishing a partition of units in a network in terms of a selected type of equivalence is a special case of clustering problem that can be formulated as an optimization problem (Φ, P ) as follows: Determine the clustering C? ∈ Φ for which P (C? ) = min P (C) C∈Φ
where Φ is the set of feasible clusterings and P is a criterion function.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
19
'
$
Criterion Function Criterion functions can be constructed • indirectly as a function of a compatible (dis)similarity measure between pairs of units, or • directly as a function measuring the fit of a clustering to an ideal one with perfect relations within each cluster and between clusters according to the considered types of connections (equivalence).
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
20
'
$
Indirect Approach RELATION
DISSIMILARITY MATRIX
STANDARD CLUSTERING ALGORITHMS
− → −→ − − −−→ −→
DESCRIPTIONS OF UNITS
R Q
original relation path matrix triads orbits
D
hierarchical algorithms, relocation algorithm, leader algorithm, etc.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
21
'
$
Dissimilarities The dissimilarity measure d is compatible with a considered equivalence ∼ if for each pair of units holds Xi ∼ Xj ⇔ d(Xi , Xj ) = 0 Not all dissimilarity measures typically used are compatible with structural equivalence. For example, the corrected Euclidean-like dissimilarity v u n X u 2 2 ((ris − rjs )2 + (rsi − rsj )2 ) d(Xi , Xj ) = u t(rii − rjj ) + (rij − rji ) + s=1 s6=i,j
is compatible with structural equivalence.
s
s
s
s
s
s
,
The indirect clustering approach does not seem suitable for establishing clusterings in terms of regular equivalence since there is no evident way how to construct a compatible (dis)similarity measure. & % y l y * 6
A. Ferligoj: Clustering relational data
22
'
$
Example: Support Network among Informatics Students The analyzed network consists of social support exchange relation among fifteen students of the Social Science Informatics fourth year class (2002/2003) at the Faculty of Social Sciences, University of Ljubljana. Interviews were conducted in October 2002. Support relation among students was identified by the following question: Introduction: You have done several exams since you are in the second class now. Students usually borrow studying material from their colleagues. Enumerate (list) the names of your colleagues that you have most often borrowed studying material from. (The number of listed persons is not limited.)
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
23
'
$
Class Network - Graph g12
g63
b96
class.net
b85 b02 g09 g42
g07 g22
g24
g28 g10 b03
Vertices represent students in the class; circles – girls, squares – boys. Reciprocated arcs are represented by edges.
b51
b89
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
24
'
$
Class Network – Matrix Pajek - shadow [0.00,1.00] b02 b03 g07 g09 g10 g12 g22 g24 g28 g42 b51 g63 b85 b89
b96
b89
b85
g63
b51
g42
g28
g24
g22
g12
g10
g09
g07
b03
b02
b96
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
25
'
$
Pajek - Ward [0.00,7.81]
Indirect Approach b51 b89 b02 b96 b03 b85 g10 g24 g09 g63 g12 g07 g28 g22
Using Corrected Euclidean-like dissimilarity and Ward clustering method we obtain the following dendrogram. From it we can determine the number of clusters: ‘Natural’ clusterings correspond to clear ‘jumps’ in the dendrogram. If we select 3 clusters we get the partition C.
g42
C = {{b51, b89, b02, b96, b03, b85, g10, g24}, {g09, g63, g12}, {g07, g28, g22, g42}} s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
26
'
$
Partition into Three Clusters (Indirect Approach) g12
g63
b96 b85 b02 g09 g42
g07 g22
g24
g28 g10
On the picture, vertices in the same cluster are of the same color.
b03
b51
b89
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
27
'
$
Matrix Pajek - shadow [0.00,1.00] b02 b03 g10 g24
C1
The partition can be used also to reorder rows and columns of the matrix representing the network. Clusters are divided using blue vertical and horizontal lines.
b51 b85 b89 b96 g07 g22
C2
g28 g42 g09 g12
C3
g63
C3 g12
g09
g42
g28
C2 g22
g07
b96
b89
b85
b51
g24
g10
b03
b02
C1
g63
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
28
'
$
Direct Approach The second possibility for solving the blockmodeling problem is to construct an appropriate criterion function directly and then use a local optimization algorithm to obtain a ‘good’ clustering solution. Criterion function P (C) has to be sensitive to considered equivalence: P (C) = 0 ⇔ C defines considered equivalence.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
29
'
$
Criterion Function One of the possible ways of constructing a criterion function that directly reflects the considered equivalence is to measure the fit of a clustering to an ideal one with perfect relations within each cluster and between clusters according to the considered equivalence. Given a clustering C = {C1 , C2 , . . . , Ck }, let B(Cu , Cv ) denote the set of all ideal blocks corresponding to block R(Cu , Cv ). Then the global error of clustering C can be expressed as X P (C) = min d(R(Cu , Cv ), B) Cu ,Cv ∈C
B∈B(Cu ,Cv )
where the term d(R(Cu , Cv ), B) measures the difference (error) between the block R(Cu , Cv ) and the ideal block B. d is constructed on the basis of characterizations of types of blocks. The function d has to be compatible with the selected type of equivalence. s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
30
'
$
Empirical blocks
a b c d e f g
a 0 1 1 1 1 1 0
b 1 0 1 1 1 1 1
c 1 1 0 1 1 1 1
Number of inconsistencies for each block
A B
A 0 1
B 1 2
d 0 0 0 0 0 0 0
e 1 0 0 0 0 1 0
Ideal blocks
f 0 0 0 0 0 0 0
g 0 0 0 0 0 1 0
a b c d e f g
a 0 1 1 1 1 1 1
b 1 0 1 1 1 1 1
c 1 1 0 1 1 1 1
d 0 0 0 0 0 0 0
e 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0
The value of the criterion function is the sum of all inconsistencies P = 4.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
31
'
$
Local Optimization For solving the blockmodeling problem we use the relocation algorithm: Determine the initial clustering C; repeat: if in the neighborhood of the current clustering C there exists a clustering C 0 such that P (C 0 ) < P (C) then move to clustering C 0 . The neighborhood in this local optimization procedure is determined by the following two transformations: • moving a unit Xk from cluster Cp to cluster Cq (transition); • interchanging units Xu and Xv from different clusters Cp and Cq (transposition).
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
32
'
$
Partition into Three Clusters: Direct Solution (Unique) g12
g63
b96
b85
b02 g09 g42
g07
g22
g24
g28 g10
This is the same partition and has the number of inconsistencies.
b03
b51
b89
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
33
'
$
Generalized Blockmodeling 1
1
1
1
1
1
0
0
1
1
1
1
0
1
0
1
1
1
1
1
0
0
1
0
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
0
1
1
0
0
0
0
1
1
0
1
0
0
0
0
1
1
1
0
C1
C2
C1
complete
regular
C2
null
complete
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
34
'
$
Generalized Equivalence / Block Types Y 1 1 1 1 X 1 1 1 1 1 1 1 1 1 1 1 1 complete Y 0 1 0 X 1 0 1 0 0 1 1 1 0 regular
0 X 0 0 0
Y 0 0 0 0 0 0 0 0 null
0 1 0 0
0 0 0 0
1 1 1 1
Y 0 1 0 0 X 1 1 1 1 0 0 0 0 0 0 0 1 row-dominant
0 0 1 0
Y 0 1 0 0 X 0 1 1 0 1 0 1 0 0 1 0 0 row-regular
0 0 0 0
Y 0 0 0 1 X 0 0 1 0 1 0 0 0 0 0 0 1 row-functional
0 1 0 0
Y 0 0 1 0 X 0 0 1 1 1 1 1 0 0 0 1 0 col-dominant
0 0 0 1
0 0 0 1
Y 0 1 0 1 X 1 0 1 0 1 1 0 1 0 0 0 0 col-regular
0 0 1 0
0 0 0 0
Y 1 0 0 0 0 1 0 0 X 0 0 1 0 0 0 0 0 0 0 0 1 col-functional
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
35
'
$
Pre-specified Blockmodeling In the previous slides the inductive approaches for establishing blockmodels for a set of social relations defined over a set of units were discussed. Some form of equivalence is specified and clusterings are sought that are consistent with a specified equivalence. Another view of blockmodeling is deductive in the sense of starting with a blockmodel that is specified in terms of substance prior to an analysis. In this case given a network, set of types of ideal blocks, and a reduced model, a solution (a clustering) can be determined which minimizes the criterion function.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
36
'
$
Pre-Specified Blockmodels The pre-specified blockmodeling starts with a blockmodel specified, in terms of substance, prior to an analysis. Given a network, a set of ideal blocks is selected, a family of reduced models is formulated, and partitions are established by minimizing the criterion function. The basic types of models are: *
*
*
*
0
0
*
0
0
*
0
0
*
*
0
0
*
0
*
0
0
?
*
*
0
0
*
center -
hierarchy
clustering
periphery
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
37
'
$
Prespecified Blockmodeling Example We expect that center-periphery model exists in the network: some students having good studying material, some not. Prespecified blockmodel: (com/complete, reg/regular, -/null block) 1
2
1
[com reg]
-
2
[com reg]
-
Using local optimization we get the partition: C = {{b02, b03, b51, b85, b89, b96, g09}, {g07, g10, g12, g22, g24, g28, g42, g63}}
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
38
'
$
2 Clusters Solution g12
g63
b96 b85 b02 g09 g42
g07 g22
g24
g28 g10 b03
b51
b89
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
39
'
$
Model Pajek - shadow [0.00,1.00] g07 g10 g12 g22
Image and Error Matrices:
g24 g28
1
2
1
reg
-
2
reg
-
g42 g63 b85 b02 b03 g09
1
2
1
0
3
2
0
2
b51
Total error = 5 center-periphery
b89
b96
b89
b51
g09
b03
b02
b85
g63
g42
g28
g24
g22
g12
g10
g07
b96
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
40
'
$
Blockmodeling of Multi-Way Network It is also possible to formulate a generalized blockmodeling problem where the network is defined by several sets of units and ties between them. Therefore, several partitions – for each set of units a partition has to be determined. The generalized blockmodeling approach was adapted for 2way networks, and only for structural equivalence and the indirect approach for 3-way networks.
Blockmodeling of Valued Networks Till now we were treating only binary networks. Another interesting problem is the development of generalized blockmodeling of valued ˇ networks. Ziberna (2007) proposed several approaches to generalized blockmodeling of valued networks, where values of the ties are assumed to be measured on at least interval scale.
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
41
'
$
Clustering with Relational Constraints To group given territorial units into regions such that units inside the region will be similar according to selected variables (attributes) and form contiguous part of the territory was the motivation to develop clustering with relational constraints approach (Ferligoj and Batagelj, 1982 and 1983). Departements and regions of France
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
42
'
$
. . . Clustering with Relational Constraint In the case of clustering with the relational constraint, the problem is to find clusterings as similar as possible according to attribute data and also considering the ties from a relation R. The constrained clustering problem can be expressed as clustering problem where the constraints are considered in the definition of the feasible clusterings. The clustering with constraints problem seeks to determine the clustering C∗ for which the criterion function P has the minimal value among all clusterings from the set of feasible clusterings C ∈ Φ, where Φ is determined by the constraints. Set of feasible clusterings for relational type of constraint can be defined as: Φ(R) = {C : C is a partition of U and each cluster C ∈ C is a subgraph (C , R ∩ C × C) in the graph (U, R) with the required type of connectedness} s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
43
'
$
. . . Clustering with Relational Constraints We can define different types of sets of feasible clusterings for the same relation R. Some examples of types of relational constraint Φ i (R) are type of clusterings
type of connectedness
Φ 1 (R)
weakly connected units
Φ 2 (R)
weakly connected units that contain at most one center
Φ 3 (R)
strongly connected units
Φ 4 (R)
clique
Φ 5 (R)
the existence of a trail containing all the units of the cluster
Trail – all arcs are distinct. A set of units L ⊆ C is a center of cluster C in the clustering of type Φ 2 (R) iff the subgraph induced by L is strongly connected and R(L)∩(C \L) = ∅. s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
44
'
$
Some Graphs of Different Types 3s 4s A A A A As s 1 2 a clique 4s 5s A A A A A A A A U A Us A s - s 1 2 3 weakly connected units
4s 5s A A A A A A A A AU s U A s s 1 2 3 strongly connected units 4s 5s A A A A AU s s s 1 2 3 weakly connected units with a center {1, 2, 4}
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
45
'
$
Solving Clustering with Relational Constraints Problem The agglomerative hierarchical and the relocation approach can be adapted for solving relational constrained clustering problems (Ferligoj and Batagelj, 1982 and 1983).
Clustering with Relational Constraints for Large Data Recently Batagelj, Ferligoj and Mrvar (2007, 2008) adapted the clustering with relational constraint approach for very large networks. It is available in program Pajek .
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
46
'
$
Example: US Counties US Census 2000: V1 – Area, V2 – Population, V47 – Percent of White, V125 – Educational attainment 1990, V126 – Household income; standardized Whatcom Toole
Liberty
Pend Oreille
Okanogan
Ferry
Burke
Sheridan
Glacier Skagit
Divide
Daniels
Boundary San Juan
Bottineau
Renville
Rolette
Cavalier
Pembina
Kittson
Roseau
Towner
Lake of the Woods
Hill
Lincoln
Stevens
Bonner
Blaine
Flathead
Valley
Pondera
Walsh
Ward
Williams
Roosevelt
Phillips
McHenry
Island
Clallam
Marshall Koochiching
Ramsey
Pierce
Mountrail
Benson
Pennington
Snohomish Nelson
Chelan Jefferson Douglas
Kitsap
Richland
Sanders
Kootenai
Spokane
Lincoln
Red Lake Polk
Lake
McKenzie
Cascade
Benewah
Lewis and Clark
Thurston
Petroleum
Golden Valley
Granite
Garfield Walla WallaColumbia Asotin
Benton
Wahkiakum
Custer
Broadwater
Multnomah Hood River
Washington
Park
Carter
Hyde
Lawrence
Boise
Teton
Brule
Aurora
Jackson
Custer
Mellette
Davison Hanson McCook
Fall River
Douglas Gregory Charles Mix
Todd
Bennett
Hutchinson Turner
Gooding
Converse
Sublette Power
Jerome
Boyd
Keya Paha
Caribou Sioux
Lincoln Cassia
Knox Cherry
Sheridan
Brown
Platte
Thomas
Carbon
Cache
Loup
Arthur
McPhersonLogan
Valley
Greeley
ShermanHoward
Keith Deuel
Platte
Polk Merrick
Shasta Salt Lake Tooele
Douglas Saunders Sarpy
Chase
Hayes
Dundy
Hitchcock Red WillowFurnas
Frontier Gosper Phelps KearneyAdams Clay
Pottawattamie Cass
Juab
Carbon
Churchill Storey
Sanpete
White Pine
Carson Lyon
Douglas
Sevier
Alpine
Yolo Napa
Sonoma
Emery
Millard
Grand
El Dorado
Sacramento
Mineral
Amador
Beaver Solano
Piute
Wayne
Calaveras Marin
Tuolumne
Mono
Contra Costa San Joaquin San Francisco Alameda
Garfield
Iron
Lincoln
Esmeralda Stanislaus
San Juan
Nye
Mariposa San Mateo
Washington
Madera Santa Clara
Kane
Merced
Santa Cruz
Fresno Inyo
San Benito
Monterey Tulare Kings Clark
Mohave
San Luis Obispo
Coconino
Kern
Navajo
Apache
San Bernardino Santa Barbara
Yavapai Ventura Los Angeles
Riverside
Gila La Paz
Orange
Maricopa Greenlee San Diego
Imperial
Graham
Pinal Yuma
Pima Cochise
Santa Cruz
Adair
Poweshiek Iowa
Barry
Eaton
Scott
Ontario Seneca Livingston Wyoming Yates
Clarke Lucas
Will Henry
Warren Elkhart St. Joseph La Porte Lake Porter
Bureau Putnam
La Salle Grundy Kankakee
Lagrange Steuben Fulton Lucas Williams
Ottawa
Noble De Kalb Wood SanduskyErie Henry Starke Marshall Defiance Kosciusko Whitley Seneca Huron Paulding Allen Putnam Hancock
Ashtabula
Knox Livingston
Merrimack Strafford
Bennington Rockingham Windham Cheshire Hillsborough
Rensselaer
Essex
Albany Schoharie Franklin
Chenango
Berkshire Hampshire Greene
Tioga Chemung
Broome
Wabash Van Wert Huntington Miami Allen Wells Adams Cass
Potter
Geauga Cuyahoga
Middlesex Worcester Suffolk
Columbia
Delaware
Tioga
Susquehanna
Bradford
Hampden
Forest
Sullivan Wyoming
Elk
Clarion
Sullivan Lycoming
Cameron
Mercer
Clinton Clearfield
Armstrong
Centre
Lackawanna Pike Luzerne
Jefferson
MahoningLawrence Butler Stark
Ulster
Wayne
Venango Trumbull
Portage MedinaSummit
Crawford Wyandot AshlandWayne Richland
Mc Kean
Crawford
Lorain
Jasper Pulaski Fulton Newton
Marshall DesHenderson Moines Warren
Saratoga
Montgomery Schenectady Otsego
Cortland
Steuben
St. Joseph Branch Hillsdale Lenawee Monroe Lake
Rock Island
Mercer
Monroe Wapello Jefferson Henry
Chautauqua Cattaraugus Allegany
WashtenawWayne
Madison
Norfolk
Erie Cass
Lee Kendall
Fulton Cayuga Onondaga
St. Clair
Oakland Macomb Ingham Livingston
Jackson
Cumberland
Belknap
Sullivan
Oneida
Wayne
Monroe
Tompkins Schuyler
Cook DuPage
Grafton
York Washington
Berrien Kane DeKalb
Whiteside
Johnson
MadisonWarren Marion MahaskaKeokuk Washington Louisa
Montgomery Adams Union
Fremont Page
Clear Creek
Yuma
Davis RinggoldDecatur Wayne Appanoose
Worth
Nodaway
Decatur
Norton
Phillips
Smith
Jewell
Republic
Brown Washington Marshall Nemaha
Van Buren
Putnam Schuyler ScotlandClark
Union
Columbia Montour Carbon
Orange
Sussex Monroe
Northumberland
Plymouth Providence Windham Tolland Bristol Dutchess LitchfieldHartford Kent Bristol Newport New London Washington Middlesex Putnam New Haven Dukes Fairfield
Barnstable
Nantucket
Rockland Westchester
Passaic Bergen
Bronx Warren Morris EssexNew York Nassau Northampton Hudson Schuylkill Queens Union Kings Lehigh Richmond Somerset Hunterdon
Suffolk
Peoria
Macon
Woodford
Tazewell
Schuyler
Lewis
Adams
Daviess
Linn Livingston
McDonough Fulton
Ford
McLean
Mason
Knox
Grundy
Doniphan
Hancock
Sullivan Adair
Richardson Holt
DeKalb
Cheyenne Rawlins
Mercer Harrison
Gentry
Pawnee
Andrew
Adams
Denver
Taylor
Lee
Atchison JohnsonNemaha
Washington
Yuba Nevada
Allegan
Ogle
Orange
Rutland Windsor
Warren
Herkimer
Orleans
Genesee Erie
Racine Walworth Kenosha
Rock
Calhoun Van BurenKalamazoo Carroll
Stark Mills
Fillmore Saline
Gage Harlan Franklin Webster NuckollsThayer Jefferson
Boulder
Gilpin
Sierra
Jasper
Otoe
Phillips
Weld
Grand
Garfield
Placer
Polk
Cass
Seward
Logan
Larimer
Morgan
Uintah
Hamilton York
Hall
Rio Blanco
Sutter
Lafayette Green
Knox Lincoln Sagadahoc
Hamilton
Lapeer Genesee Clinton Shiawassee
Jo Daviess Stephenson Winnebago Boone McHenryLake Jones
Benton Linn
Kennebec Androscoggin
Snyder Columbiana Iroquois White Beaver Hardin Indiana Benton Mifflin Carroll Carroll Marion Holmes Mercer Auglaize Morrow Grant Hancock Blair Cambria Juniata Blackford Howard Jay Tuscarawas Allegheny Middlesex Knox Huntingdon Perry Dauphin Tippecanoe Logan Jefferson LebanonBerks Westmoreland Shelby Bucks Union Delaware Warren Clinton Tipton Coshocton Harrison Brooke Mercer Monmouth Delaware Washington Montgomery VermilionFountain Madison Randolph Champaign Champaign Darke Cumberland Ohio Licking Boone Hamilton Montgomery Miami GuernseyBelmont Lancaster Chester Franklin Philadelphia Somerset Bedford York Franklin Muskingum Fulton Delaware Henry Fayette Clark Madison Adams Vermillion Marshall Greene Burlington Wayne Ocean Hancock Camden Parke Marion Noble Hendricks Fairfield Preble Montgomery Perry Monroe Morgan Greene Gloucester New Castle Edgar Buchanan Putnam Fayette Caldwell Scott Arapahoe Summit Pickaway Eagle Union Pike Rush Morgan Clinton Christian Moultrie Cecil Salem Allegany Monongalia Wetzel Fayette Morgan Washington Carroll Atchison Coles Jefferson Shelby Marion Hocking Harford Atlantic Monroe Ralls MorganJohnson Cloud Frederick Chariton Garrett Preston Baltimore Tyler Butler Mineral Berkeley Jackson Shelby Vigo Washington Randolph WarrenClinton Carroll Franklin Mitchell Clay Pleasants Cumberland Pike Athens Sherman Thomas Sheridan Graham Rooks Osborne Clay Douglas Pottawatomie Platte Taylor Ross Ray Clark Greene Owen Jefferson Hampshire Kit Carson Riley Decatur Baltimore Harrison Clay Cumberland Doddridge MacoupinMontgomery Frederick Howard Jefferson Lake Wood Ritchie Kent Elbert Vinton Audrain Highland BrownBartholomew Leavenworth Pitkin Monroe Saline Howard Hamilton Winchester Montgomery Cape May Calhoun Park Ottawa Barbour Tucker Dearborn Clarke Ripley Lafayette Wyandotte Kent Jersey Loudoun Sullivan Pike Grant Effingham Queen Anne Lincoln Lincoln Geary Shawnee Lincoln Wirt Jackson Greene Jackson Meigs Clermont Jasper Crawford Jennings Hardy Wabaunsee Fayette Anne Arundel OhioBoone CampbellBrown Boone Lewis Kenton MesaDelta Logan Wallace Gove Trego Ellis Russell Montgomery Gilmer Douglas District of Columbia Johnson Upshur Jackson Warren Teller Caroline Bond Falls Church Arlington Dickinson Cooper Shenandoah Fairfax Lawrence Fairfax Prince George Cheyenne Madison El Paso Adams Scioto Jackson Calhoun Alexandria Callaway Switzerland Gallia Saline Manassas Park Warren Talbot Clay Mason Randolph St. Charles Jefferson Gallatin Johnson Pettis Manassas Chaffee Roane Prince William Fauquier Richland Lawrence Daviess Braxton Sussex Gunnison Martin Ellsworth Morris Scott Rappahannock Bracken Pendleton Moniteau Carroll GrantPendleton Knox Osage St. Louis Marion Cass St. Louis Lawrence Page Mason Washington Trimble Clinton Franklin Miami Greenup Calvert Orange Cole Osage Owen Rockingham Rush Robertson Lewis Culpeper Webster Greeley Wichita Scott Lane Ness Fremont Barton Charles Putnam Henry Clay Dorchester Lyon Franklin Kiowa Morgan St. Clair Harrisonburg ClarkOldham Gasconade Wayne Edwards Harrison Madison Stafford Cabell Montrose Wabash Pike Wicomico McPherson Pocahontas Highland Henry Washington Dubois Gibson Marion Fleming Rice Carter Boyd Greene Nicholas Crowley Nicholas Chase Monroe Jefferson Scott Benton Kanawha Fredericksburg St. Mary King George Crawford Floyd Orange Bates Jefferson Coffey Anderson Franklin Miller Worcester Bourbon Linn Rowan HarrisonJeffersonShelby Spotsylvania Augusta Pawnee Staunton Maries Bath Elliott WayneLincoln Pueblo Westmoreland Ouray Somerset Saguache Hodgeman Perry Hamilton Custer White Lawrence Waynesboro Warrick Bath Perry Montgomery Stafford Harvey Spencer Woodford Fayette Charlottesville Camden St. Clair Caroline Randolph AndersonFayette HamiltonKearny Finney PoseyVanderburgh Meade Albemarle Boone Franklin Spencer Greenbrier Clark San Miguel Bullitt Richmond Bent Prowers Otero Reno CrawfordWashington Essex Louisa Hickory MenifeeMorgan Edwards Ste. Genevieve Greenwood Woodson Allen Bourbon Vernon Phelps Jessamine St. Francois Northumberland Johnson Fluvanna Logan Hancock Clifton Forge Mercer Powell Martin Pulaski Rockbridge Henderson San Juan Breckinridge Nelson Hinsdale Butler Covington AlleghanyLexington Washington Hanover Nelson Saline Gallatin Raleigh Daviess Mingo Wolfe Gray Cedar Accomack Perry Jackson Williamson Goochland Buena Vista Hardin Madison King William King andLancaster Queen Estill Sedgwick Ford Magoffin Dolores Union Mineral Dallas Laclede Huerfano Pratt Summers Middlesex Polk BoyleGarrard Rio Grande Dent Lee Wyoming Amherst Larue Marion Stanton Grant Haskell BuckinghamPowhatan Henrico Wilson Neosho Kiowa Kingman Alamosa Botetourt Monroe McLean Cumberland RichmondNew Kent Breathitt Floyd Crawford Barton Iron Madison Hardin Webster Craig Ohio Grayson Pike Union Johnson Elk Lincoln Dade Gloucester Mathews Jackson Mercer Las Animas Lynchburg Owsley Chesterfield Appomattox McDowell Cape Girardeau Charles’ Knott Rockcastle Amelia CrittendenHopkins Taylor Casey Bedford Northampton Pope Costilla Texas Reynolds MontezumaLa Plata Baca Bollinger Giles James’ Bedford Hart Green Buchanan Salem Hopewell Williamsburg Webster Roanoke Wright Roanoke Colonial HeightsYork Greene Sumner Meade Clark Cowley Barber Muhlenberg Campbell Prince Edward Perry Archuleta Petersburg Livingston PulaskiMassac Conejos Butler Edmonson Comanche Harper Montgomery Labette CherokeeJasper Morton Stevens Seward Montgomery Caldwell Shannon Prince George Poquoson Chautauqua Alexander Clay Nottoway Tazewell Bland Wayne Radford Adair Letcher Dickenson Pulaski Laurel Lawrence Leslie Surry Newport News McCracken Ballard Dinwiddie Pulaski Scott Hampton Lyon Franklin Wise Metcalfe Russell Warren CharlotteLunenburg Carter Christian Barren Douglas Norton Russell Floyd Wythe Sussex Marshall Newton Norfolk Christian Isle of Wight Stoddard Knox Carlisle Todd Logan Harlan Pittsylvania Ottawa Smyth Portsmouth Mississippi Trigg Halifax Nowata Wayne Harper Grant Kay Brunswick Southampton Allen Whitley Bell Howell Cumberland McCreary Washington Simpson Craig Cimarron Beaver Texas Carroll Lee Scott Virginia Beach Suffolk Chesapeake Alfalfa Monroe Clinton Graves Greensville Mecklenburg Emporia Barry Stone Washington South Boston Patrick Martinsville Henry Woods Oregon Ripley Butler Grayson Franklin Galax Taney Ozark Hickman McDonald Calloway Bristol Colfax New Madrid Danville Osage Fulton Macon Clay Pickett Taos Stewart Montgomery Sullivan San Juan Robertson Woodward Claiborne Hancock Alleghany Rio Arriba Sumner Johnson Gates Hawkins Delaware Ashe Surry Stokes Rockingham Hertford Trousdale Caswell Person Scott Campbell Garfield Noble Fulton Union Jackson Pawnee Camden Obion VanceWarren Northampton Clay OvertonFentress Rogers Major Pasquotank Dunklin Lake Carroll Boone Marion Currituck Randolph Carter Benton Union Mayes Weakley Henry Granville Dallam Sherman Hansford Ochiltree Lipscomb Houston Washington Halifax Smith Grainger Watauga Cheatham Baxter Perquimans Ellis Wilkes Pemiscot Hamblen Sharp DavidsonWilson Chowan Greene Yadkin Dickson Putnam Morgan Tulsa Forsyth Greene Unicoi Avery Anderson Payne Benton GuilfordAlamance Lawrence Izard Orange Durham Franklin Dyer Jefferson Madison Bertie Humphreys Mora Gibson Washington Mitchell Dewey Knox DeKalb Carroll Nash Wagoner Cumberland Cocke KingfisherLogan Caldwell White Newton Searcy Stone Edgecombe Alexander Davie Creek Cherokee Adair Yancey Williamson Madison Martin Washington Los Alamos Blaine Rutherford Craighead Roane Hutchinson Hartley Moore Roberts Hemphill Crockett Cannon Harding Hickman Sevier Wake Tyrrell Dare Iredell Burke Davidson Loudon Lauderdale McDowell Independence Wilson Sandoval Roger Mills Randolph Chatham Lincoln Mississippi Catawba Rowan WarrenVan Buren Blount Custer Perry Maury Jackson Buncombe Okmulgee Rhea Decatur HaywoodMadisonHenderson Pitt Van Buren Poinsett Bledsoe Beaufort Muskogee Crawford Johnson Haywood Johnston Cleburne Lewis Canadian Oklahoma Santa Fe Meigs Hyde Tipton BedfordCoffee Greene Swain Franklin Lincoln McKinley Marshall Lee Okfuskee Sequoyah San Miguel McMinnMonroe Pope Chester Oldham Carson Gray Potter Wheeler Rutherford Harnett Grundy Cleveland Wayne Sequatchie McIntosh Cabarrus Graham Henderson Stanly Montgomery Cross Polk GastonMecklenburg Washita Moore White Beckham Conway Moore Jackson Haskell Wayne Lawrence Logan Transylvania Giles Hamilton Lenoir Craven Caddo Shelby Fayette Hardeman Sebastian McNairyHardin Woodruff Lincoln Franklin Marion Quay Bradley Seminole ClevelandPottawatomie Pamlico Cherokee Macon Crittenden Polk Faulkner McClain Clay Cumberland Bernalillo Grady Yell Hughes Jones Cherokee Union AnsonRichmondHoke St. Francis York Deaf Smith Randall ArmstrongDonley Collingsworth Cibola Pittsburg Le Flore Perry SampsonDuplin Greer Towns Rabun Carteret Spartanburg Scott Alcorn Catoosa Latimer Lauderdale Kiowa Pickens DeSoto Greenville Scotland Union Dade Whitfield Tippah Fannin Benton Prairie Walker Murray Guadalupe Limestone Lee Harmon Lancaster Oconee Pulaski Lonoke Marshall Pontotoc Madison Jackson Tishomingo Colbert Onslow Gilmer Valencia Garvin Union Chester Monroe Tunica Tate White Torrance Saline Curry Comanche Habersham Robeson Bladen Prentiss Jackson Coal Chesterfield Garland Lumpkin Stephens Lawrence Marlboro Montgomery Parmer Castro Swisher Briscoe Hall Childress DeKalb Pender Anderson Laurens Gordon Morgan Stephens Murray Union Dawson Pickens Polk Franklin Chattooga Dillon Atoka Pushmataha DeBaca Phillips Tillman Franklin Fairfield Kershaw Panola Lafayette Hart Marshall Banks Floyd Hot Spring Hardeman Johnston Darlington Hall Newberry Quitman Lee Itawamba Carter Cotton Grant JeffersonArkansas Coahoma Abbeville Bartow Cherokee Cherokee Pontotoc New Hanover Columbus Forsyth Greenwood Lee Wilbarger Pike JacksonMadison Elbert Marion Winston Cullman Brunswick Roosevelt Bailey Lamb Motley Cottle Floyd Hale Jefferson McCurtain Clark Howard Florence Yalobusha Socorro Wichita Etowah Choctaw Blount Barrow Richland Polk Saluda Marshall Marion Foard Lincoln Dallas Sevier Gwinnett Clarke Calhoun Horry Tallahatchie Paulding Lexington Bryan Cobb Chickasaw Sumter Love Catron Cleveland Monroe Oglethorpe McCormick Fulton Oconee Bolivar Desha Clay WilkesLincoln Lamar DeKalb Walton Grenada Calhoun Haralson Walker Edgefield Hempstead Lincoln Little River Fayette Cleburne MontagueCooke Nevada St. Clair Douglas Grayson Calhoun Clarendon Lamar Clay Rockdale Morgan Red River Sunflower Carroll Knox Baylor Archer Fannin Cochran Hockley Lubbock Crosby Dickens King Webster Williamsburg Drew Ouachita Taliaferro Greene Newton Jefferson Calhoun Columbia Aiken Leflore Lowndes Clayton McDuffie Montgomery Bowie Henry Georgetown Carroll Orangeburg Bradley Warren Oktibbeha Fayette Talladega Richmond Delta Coweta Choctaw Miller ButtsJasperPutnam Tuscaloosa Shelby Chaves Randolph Clay Barnwell ChicotWashington Heard Spalding Pickens Hancock Lafayette Columbia Jack Bamberg Glascock Wise Denton Collin Ashley Union Titus Garza Kent StonewallHaskell Throckmorton Lynn Yoakum Terry Young Franklin Hopkins Berkeley Holmes Humphreys Hunt Morris Cass Winston Noxubee PikeLamar Attala Dorchester Baldwin Troup Meriwether Sierra Bibb Jefferson Burke Monroe Jones Washington Camp Allendale Coosa Chambers Rockwall Sharkey Chilton Rains Upson Colleton Greene Bibb Claiborne Union Tallapoosa Wilkinson WestEast Carroll Lea Marion Jenkins Parker Tarrant Dallas Wood Carroll Screven Hampton Yazoo Morehouse Charleston Leake NeshobaKemper Dawson Borden Scurry Fisher Hale Gaines Jones Shackelford Stephens Palo Pinto Harris Talbot Webster Issaquena Upshur Otero Crawford Twiggs Johnson Sumter Madison Bossier Caddo Lincoln Kaufman Van Zandt Beaufort Perry Elmore Lee Emanuel Harrison Taylor Peach Grant Gregg Autauga Ouachita Muscogee Houston Eddy Scott Newton Lauderdale BleckleyLaurens Dona Ana Macon Smith Richland Madison Treutlen Candler Bulloch Jasper Hood Dallas Chattahoochee Bienville Warren Marion Macon Effingham Johnson Jackson Hinds Rankin Martin Howard Mitchell Nolan Andrews Taylor Ellis Callahan Eastland Schley Russell Marengo Pulaski Montgomery Erath Somervell Luna Panola Lowndes Henderson Dodge Dooly MontgomeryEvans Franklin Rusk WheelerToombs Caldwell De Soto Bullock Stewart Webster Navarro Red River Hidalgo Sumter Tattnall Wilcox Chatham Claiborne Smith Jasper Clarke Choctaw Tensas Wilcox Telfair Hill Bryan Comanche Winn Crisp Simpson Bosque Coke BarbourQuitman Winkler Ector Midland Glasscock Copiah Liberty Pike Runnels Loving Sterling Jeff Davis AndersonCherokee TerrellLee Randolph Long Shelby Ben Hill Natchitoches Coleman Brown Butler Crenshaw Jefferson Hamilton Appling Freestone Turner El Paso Clay Jones Wayne Irwin Coffee Covington Catahoula Grant La Salle Clarke Jefferson Davis Worth Wayne Bacon McLennan Limestone Henry Nacogdoches Lincoln Lawrence CalhounDougherty Monroe McIntosh Sabine Mills Adams Franklin Conecuh Tift Ward Washington Coffee Dale Tom Green Coryell Reeves San Augustine Sabine Crane Upton Concordia Pierce Reagan Concho Culberson Hudspeth Leon Irion Atkinson Early Angelina Baker Covington Houston Mitchell Falls Berrien Glynn McCulloch San SabaLampasas MarionLamar Greene Rapides Brantley Wilkinson Amite Pike Walthall Cook Colquitt Forrest Perry Houston Miller Escambia Vernon Trinity Geneva Avoyelles Bell Robertson Ware Lanier Madison Camden Schleicher Menard Seminole DecaturGradyThomasBrooks Clinch George Holmes Washington Lowndes West Feliciana East Feliciana St. Helena Polk Tyler Milam Walker Jackson Stone Jeff Davis Pointe Coupee Evangeline Mobile Baldwin Pearl River Echols Burnet Mason Newton Charlton Pecos Llano Jasper Okaloosa Crockett Santa Rosa Allen Williamson Brazos Escambia Beauregard Tangipahoa Walton Washington San Jacinto St. Landry Nassau Gadsden Grimes Jackson Baton Rouge Burleson Sutton Kimble WestEast Baton Rouge Harrison Leon Madison Hamilton St. Tammany Livingston Hancock Calhoun Duval Jefferson Baker Montgomery Travis Gillespie Hardin Bay Lee Liberty Calcasieu St. Martin JeffersonAcadia Davis Iberville Blanco Washington Terrell Columbia Lafayette Ascension Liberty St. John the Baptist Suwannee Wakulla Orange Bastrop Orleans Hays Lafayette Kerr Union St. James Waller Taylor Edwards Presidio Vermilion Iberia BradfordClay St. Johns Kendall Gulf Fayette Jefferson Cameron Franklin Real Caldwell Assumption St. Charles Austin Harris Brewster Comal St. Bernard Val Verde Gilchrist St. Mary Bandera Jefferson Chambers Alachua Guadalupe Colorado Putnam Dixie Fort Bend Lafourche Plaquemines Flagler Gonzales Terrebonne Bexar Medina Galveston Uvalde Kinney Lavaca Wharton Levy Marion Brazoria Wilson DeWitt Volusia Jackson Atascosa Karnes Matagorda Zavala Frio Citrus Lake Victoria Seminole Goliad Maverick Sumter Hernando Orange Calhoun Dimmit Bee Live Oak La Salle McMullen Pasco Refugio Brevard Jackson
Routt
Duchesne
Wasatch Utah Eureka
Buffalo
Ionia
Ottawa
Clayton Grant
Bremer
Tama
Waldo
Caledonia
Lewis
Oswego
Sanilac
Saginaw Niagara
Clinton
Lancaster
Moffat
Washoe
Dawson
Perkins
Montcalm Gratiot
Kent
Black Hawk Buchanan Delaware Dubuque
Marshall
Colfax Dodge Washington
Butler
Lincoln Sedgwick
Daggett
Colusa
Franklin Butler
HamiltonHardin Grundy
Addison
Carroll
Huron
Tuscola Muskegon
Muscatine
Cheyenne
Summit
Glenn
Webster
Gladwin
Jefferson
Mecosta Isabella Midland Newaygo
Washington Ozaukee
JeffersonWaukesha Milwaukee
Jackson
Nance
Custer Kimball Morgan Davis
Pershing
Calhoun
Crawford Carroll Greene Boone Story
Harrison Shelby Audubon Guthrie Dallas
Laramie
Uinta
Weber Elko
Butte
Sac
Osceola Clare
Arenac
Bay Oceana
Dodge
Dane
Iowa
Cedar
Banner
Box Elder
Humboldt
Humboldt CherokeeBuena Vista Pocahontas Wright
Oxford
Chittenden
Marquette Green Lake Fond du Lac Sheboygan
Richland Sauk
Burt
Hancock
EssexCoos
Essex Mason Lake
Juneau Adams
Houston
Thurston
MadisonStantonCuming
Garfield Wheeler Boone
Garden
Rich
Siskiyou
Blaine
Morrill
Albany
Sweetwater
Modoc
St. Lawrence
Kewaunee
Manitowoc Winnebago Calumet
Winneshiek Allamakee Crawford Chickasaw Fayette
Plymouth
Dakota Woodbury Ida
Monona Hooker
Scotts Bluff Del Norte
Lake
Fillmore
Mitchell Howard
Hancock Cerro Gordo Floyd
Washington
Orleans
Lamoille
Franklin
Washington
Monroe La Crosse
Freeborn Mower
Winnebago Worth
Wayne
Goshen Grant
Mendocino
Faribault
Palo Alto
Dixon
Pierce Antelope
Box Butte
Franklin
OBrien’ Clay
Franklin Grand Isle Clinton
Alcona
Manistee Wexford Missaukee Roscommon Ogemaw Iosco
Cedar
Holt
Rock
Bear Lake
Oneida
Lander
Martin
Osceola DickinsonEmmet
Sioux
Kossuth
Union
Dawes
Bannock
Twin Falls
Plumas
Jackson
Lyon
Yankton Bon Homme Clay
Natrona Minidoka
Owyhee
Jackson
Tehama
Nobles
Waupaca Brown Outagamie
Waushara
Lincoln
Niobrara Lincoln
Josephine
Lassen
MinnehahaRock
Portage
Columbia Bingham Fremont
Trinity
Wood
Winona
Vernon Tripp
Shannon
Bonneville
Blaine
Lake
Pepin Jackson Buffalo Trempealeau Wabasha
Olmsted
Franklin
Alpena Otsego Montmorency
Grand Traverse KalkaskaCrawfordOscoda
Benzie
Goodhue Le SueurRice
Brown Nicollet
Penobscot
Presque Isle
LeelanauAntrim
Shawano
Blue Earth Cottonwood Waseca Steele Dodge Watonwan
Moody PipestoneMurray
Somerset Cheboygan
Charlevoix Door
Marathon Eau Claire Clark
Lyman
Weston
Washakie Hot Springs
Elmore
Malheur Harney
Humboldt
Lake
Emmet
Marinette
Oconto Menominee
Dunn
Pierce
Dakota
Redwood
Lincoln Lyon Brookings
Kingsbury
Sanborn Miner
Piscataquis
Langlade
Taylor Chippewa
St. Croix Washington Hennepin Ramsey
Scott
Camas Ada
Douglas Coos
Klamath
Jerauld
Delta
Rusk
Barron
Anoka
Wright
Mackinac
Florence Forest
Oneida
Lincoln
Jones
Jefferson Madison Teton
Butte
Buffalo
Sawyer
Menominee
Chisago Polk
Sherburne
Hand
Hughes
Washburn
McLeod Carver Renville Sibley
Beadle
Haakon
Pennington
Payette
Isanti
Stearns
Kandiyohi Meeker
Yellow Medicine
Campbell Johnson
Canyon
Pope
Lac qui Parle Chippewa
Deuel Hamlin
Stanley Park
Lane
Stevens
Grant Codington Clark
Gem
Burnett
Price
Swift
Fremont
Clark
Chippewa Schoolcraft Dickinson
Kanabec Mille Lacs
Big Stone
Day
Faulk Spink
Alger
Iron Vilas
Pine
Morrison
Douglas
Benton Roberts
Edmunds
Potter
Sully Meade
Aroostook
Luce
Marquette
Gogebic Ashland Iron
Todd
Brown Walworth
Dewey Ziebach
Crook
Washington
Deschutes
Otter Tail
Perkins
Butte
Custer
Baraga
Bayfield Douglas
Traverse
Marshall
Harding
Powder River
Sheridan Big Horn
Grant
Sargent
McPherson
Carbon
Yellowstone National
Curry
Campbell
Valley
Crook
Houghton Carlton
Aitkin
Wilkin Richland McIntosh
Beaverhead Lemhi
Linn
Dickey
Emmons
Grant
Big Horn Gallatin
Madison
Cass
Sioux
Corson
Adams Baker
Keweenaw
Becker
Clay
Crow Wing
Grant
Adams
Sweet Grass Stillwater
Union
Wheeler
Ransom
Rosebud
Yellowstone
Wallowa
Umatilla
Morrow
ShermanGilliam
Clackamas Marion
Jefferson
LaMoure
Silver Bow
Wasco
Yamhill
Polk
Benton
Hettinger
Fallon Bowman
Deer Lodge
Idaho
Tillamook
Cass
Ontonagon
Klickitat
Lincoln
Barnes
Wadena Logan
Treasure
Jefferson Ravalli Skamania
Clark
Stutsman
Stark
Slope
Musselshell
Nez Perce Lewis
Cowlitz Columbia
Kidder
Morton Wheatland
Lake
Mahnomen Hubbard
Burleigh
Meagher
Clearwater
St. Louis Itasca
Traill Norman
Billings Golden Valley
Prairie
Latah
Yakima Franklin
Clatsop
Wibaux
Powell Whitman
Lewis Pacific
Steele
Oliver
Judith Basin
Adams
Griggs
Mercer
Fergus
Missoula
Mineral Pierce
Clearwater Foster
Dunn Dawson
Kittitas
Grays Harbor
Wells
Sheridan McLean
Garfield Grant
Cook
Eddy
McCone
Shoshone King Mason
Beltrami
Grand Forks
Chouteau
Teton
Shelby
Marion
Brown Cass
Logan
De Witt
Piatt
Menard
Macon
Sangamon
Douglas
Aransas Polk
San Patricio
Osceola
PinellasHillsborough Nueces
Webb Duval
Indian River
Jim Wells Hardee Kleberg
Manatee
St. Lucie Highlands Okeechobee
Sarasota DeSoto Martin
Jim Hogg Brooks Zapata
Kenedy
Glades Charlotte
Hendry
Palm Beach
Lee
Starr Hidalgo
Willacy
Cameron
Collier
Broward
Dade Monroe
Pajek
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
47
'
$
Software Most of described clustering procedures are implemented in Pajek – program for analysis and visualization of large networks (Batagelj and Mrvar, 1998). It is freely available, for noncommercial use, at: http://pajek.imfm.si
s
s
l y
s
y
s
s
,
s
&
% * 6
A. Ferligoj: Clustering relational data
48
'
$
Conclusion The optimizational approach to the clustering problem can be applied to a variety of very interesting clustering problems, as it alows possible adaptations of a concrete clustering problem by an appropriate specification of the criterion function and by the definition of the set of feasible clusterings. Both the blockmodeling problem and the clustering with relational constraint problem are such cases. There are several possible further developments in blockmodeling, e.g., efficient direct approach for 3-way blockmodeling, blockmodeling for large networks, and dynamic blockmodels.
s
s
l y
s
y
s
s
,
s
&
% * 6