NERF c-MEANS: NON-EUCLIDEAN RELATIONAL ... - ScienceDirect

17 downloads 7161 Views 768KB Size Report
Department of Computer Science, University of West Florida, Pensacola, FL 32514, U.S.A.. (Received 2 March 1993; in revised form 21 September 1993; ...
Pattern Recognition, Vol. 27, No. 3, pp. 429 437, 1994 Elsevier Science Ltd Copyright © 1994 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031 3203/94 $6.00+.00

Pergamon

NERF c-MEANS: NON-EUCLIDEAN RELATIONAL FUZZY CLUSTERING RICHARD J. HATHAWAY~"and JAMESC. BEZDEK~§ t Mathematics and Computer Science Department, Georgia Southern University, Statesboro, GA 30460, U.S.A. :~Department of Computer Science, University of West Florida, Pensacola, FL 32514, U.S.A.

(Received 2 March 1993; in revisedform 21 September 1993; receivedfor publication 5 October 1993) Abstract--The relational fuzzy c-means (RFCM) algorithm can be used to cluster a set of n objects described by pair-wise dissimilarity values if (and only if) there exist n points in R"-1 whose squared Euclidean distances precisely match the given dissimilarity data. This strong restriction on the dissimilarity data renders RFCM inapplicable to most relational clustering problems. This paper substantially improves RFCM by generalizing it to the case of arbitrary (symmetric) dissimilarity data. The generalization is obtained using a computationally efficient modification of the existing algorithm that is equivalent to applying a "spreading" transformation to the dissimilarity data. While the method given applies specifically to dissimilarity data, a simple transformation can be used to convert similarity relations into dissimilarity data, so the method is applicable to any numerical relational data that are positive, reflexive (or anti-reflexive) and symmetric. Numerical examples illustrate and compare the present approach to problems that can be studied with alternatives such as the linkage algorithms. Cluster analysis Dissimilarity measure Non-Euclidean data Relational data

Fuzzy sets Multidimensional scaling Relational fuzzy c-means

1. INTRODUCTION The problem considered here is that of clustering a set of n objects 0 = {01 . . . . . o.} into self-similar groups on the basis of numerical data. The data may be object data X = {xl . . . . . x,} c R s, with feature vector Xk corresponding to (and possibly generated by measurements on) object ok~ O, k = 1, 2 .... , n. That is, xk is an s-tuple of numerical features of the kth object Ok, such as height, weight, etc. The other form numerical data may take is as an n x n relational data matrix R = [Ro], in which case R o measures the relationship between o~ and oj. R may be either a similarity or dissimilarity relation; in the sequel we use S for the former, and D for the latter type of relation. We assume throughout that dissimilarity data matrix D satisfies the following three conditions: Djj = 0

for j = 1. . . . . n;

(1 a)

D~k>O

for k = l . . . . . n a n d j = l

..... n;and

(lb)

D~k=Dkj

for k = l . . . . . n a n d j = l

. . . . . n.

(lc)

O f main concern here is cluster analysis using relational data, but some background discussion of the c-means clustering algorithms for numerical object data is necessary. We begin by describing convenient notational devices for representing hard (non-fuzzy, crisp) and fuzzy c-partitions of n objects. The set Me, of all non-degenerate hard c-partitions of n objects is the set of all matrices U = [Uik] ~ R c x, satisfying: for l < _ i < c and l < k < n ;

Uike{0,1 } Ulk+'"

+ Uck= 1 for l _ k < n ; . a n d

U,+..,+Ui,>0

for l _ < i < c .

§ Author to whom all correspondence should be addressed. 429

(2b) (2c)

U is seen to be a crisp c-partition by interpreting U~k as the membership of Ok(or xk) in hard subset i, 1 < i < c and 1 < k < n. The use of hard partitions forces either total (U~k = 1) or no (Uik = 0) association of object Ok with cluster i. F o r example, the hard partition

U=

It is an easy matter to convert object data into dissimilarity data, since we can define the jkth dissimilarity Dik to be some chosen measure of distance on R s x R s from feature vector x~ to feature vector x k. The reverse question, finding object data that correspond to a given set of dissimilarity data, is the difficult problem of multidimensional scaling. "~

(2a)

1

1

0

0

0

1

represents the clustering of {ol . . . . . o5} into the three (hard) clusters {ol, o5}, {02, o3}, and {04}. Keeping constraints (2b) and (2c), but relaxing constraint (2a) to 0~*

+

Fig. 2. Minimum realization dimension for D~ when D ( ~ e(M -- I)) satisfies conditions (1). is 2. If the eigenvalues had been, for example, {0, - 4 , 1}, then D is not Euclidean, and no (Euclidean) realization exists in any dimension.) Several useful and easily verified results are gathered without proof in the following proposition linking the eigenstructures of PDP and PDpP. Proposition 1. Let D e R "~" satisfy (1), and let M , P , and Da be the matrices from (7), (8), and (12). Then: (a) PDaP = P(D - ill)P; (b) (1, 1. . . . . 1)TeR" is an eigenvector, with corresponding eigenvalue 0, for both PDP and PDpP; (c) w is an eigenvector of PDP if and only if it is an eigenvector of PDaP; (d) if w is an eigenvector of PDP and PDpP other than a multiple of (1, 1,..., 1)T, then the respective, corresponding eigenvalues 2 and 2~ satisfy 2 - fl = 2a. The important consequence of this proposition is that adding il to the off diagonal elements of a matrix D satisfying (1) effects a shift of - il to all the eigenvalues of PDaP, except the zero eigenvalue corresponding to the eigenvector (1, 1,..., 1)T which is left unchanged. Now, let a given non-Euclidean D satisfy (1) and let 2 be the largest eigenvalue of PDP. We must have 2 > 0 by (13), since D is non-Euclidean; and it therefore follows by (13) and Proposition 1 that Dp will be Euclidean for all choices ofil > 2. It is not difficult to show that Fig. 2 represents the general case for any D e R" ~" satisfying (1) with D : / : c t ( M - I ) (i.e. not a scalar multiple of M - I). As shown in Fig. 2, there is a value fl* for which Da* is Euclidean and realizable by points {x 1..... x,) R ~,for some s satisfying 1 < s < n - 2. Furthermore, Da is non-Euclidean for any il < il*, and is Euclidean for any il > il*, but realizable only for {xt . . . . . x,} ~ R ~for s>_n-1. In the example of Fig. 1, the cutoff value is il* = 0, where D = Do is realizable in R t. For any choice of il > 0, the realization requires n - 1 = 2 dimensions, and for any il < 0, no realization exists and Dp is nonEuclidean. It is interesting to note that rows 2 and 3 of Table 1 correspond to cases when R F C M worked even though Dp was non-Euclidean. 3, T H E N E R F c - M E A N S A L G O R I T H M

One approach to using (7) with R F C M is to simply compute (numerically) the largest non-negative eigen-

433

value 2 ( = il* in Fig. 2) of PDP, and to then cluster the Euclidean matrix Da using RFCM. Instead of doing unnecessarily costly eigenvalue computations, we use an alternate approach that dynamically estimates in a computationally efficient way the fl-spread needed to continue RFCM. The reason our approach is efficient is that it depends primarily on by-products of the original R F C M iteration. We call the new algorithm non-Euclidean relational fuzzy (NERF) c-means clustering. While this acronym does not emphasize the important duality between N E R F and O F C M , it is just too tempting to resist! (perhaps the most descriptive terminology is N E R F c-means). Below, ek denotes the kth column of the identity matrix and ql"[I denotes the Euclidean norm. (A detailed rationale for the algorithmic choices is given immediately after the statement of NERF.) The Non-Euclidean Relational Fuzzy (N E R F ) c-means Algorithm N E R F - I . Given relational data D satisfying (1). Fix c, 2 < c < n,m > 1, and initialize il = 0 and Ut°l~Mfc,. Then for r = 0, 1, 2 .... : NERF-2. Calculate the c-mean vectors v~= vlrJ using U = U t'~ and the equations, for 1 _< i < c: m m m m m v, = (U,1, U~2 . . . . . U,,)/(U,I + UT2 + ' " + U,,)

(14)

NERF-3. Calculate dik = (DBv~)k -- (v~Dav~)/2, for 1 < i < c and 1 < k < n, (15)

if d~k < 0 for any i and k, then: calculate

Ail=max{--2*dik/(llvi--ekll2)}

(16a)

update

dik'-- dik + (Ail/2)* IIv~ - ek II2, for l < i < c and l < k < n ,

(16b)

il ,-- il + Aft

(16c)

update

NERF-4. Update U t'~ to U = U t'+ ~ e Mrc, to satisfy, for each k = 1. . . . . n: if d~k > 0 for all i, then: Uik = 1/(dik/dlk + dik/d2 k + ,.. + dik/dck)l/(rn- 1);

(17a)

otherwise: Ua=0

if d~k > O, Uike[O, 1], and (U~k + ... + U~k)= l. (17b)

NERF-5. Check for convergence using any convenient matrix n o r m II I1: if II U ~ ' + " - Ut'~ll -< e, then stop: otherwise: set r = r + 1 and return to NERF-2. Comparing N E R F to R F C M , we see that they are identical except for the modification in (16) that is active whenever some negative value of d~k is encountered. We now look closely at this modification. The duality theory ta~ asserts that the d~k values correspond to certain squared Euclidean distances (between object data x k and object prototypes v~) if an object-data

434

R.J. HATHAWAYand J. C. BEZDEK

realization of Dp exists. It follows that a negative value of dik signals the non-existence of a realization of Dp, which indicates that the current value of ft should be incremented by some Aft > 0 so that the basic (RFCM) iteration can be continued for the new shift value ft + Aft. We claim that the definition of the increment Aft in (16a) is reasonable in that it provides a meaningful lower bound of the minimum increment Aft needed to make the new D 0 Euclidean. To see this, we rewrite the formula for dik in (15) as dik = -- (V i -- ek)TDp(vi -- ek)/2

(18)

SO that (16a) becomes Aft = max { (vi - ek)rDt~(v~ -- ek)/( IIvi - ek)II 2)}.

(19)

Each vector (v~ - ek) is orthogonal to (1,..., 1)T so that P ( v l - e k ) = ( v i - ek), where P is the projector in (12), and the definition of Aft in (19) is therefore equivalent to Aft = max {((vi - e k ) T p D # P ( v i ((V i -- e k ) T ( V i -

-- ek))/

e~))}.

(20)

In (20), we see that Aft is the maximum of cn Rayleigh quotients involving P D p P . The eigenvalue approximation properties of Rayleigh quotients are known to be good (e.g. Stewart(12)), and Aft defined as the maximum of the Rayleigh quotients will provide a useful underestimate of the largest eigenvalue of P D a P . Underestimation is important as we want to avoid excessive/t-spreading, which could adversely affect both the computational complexity of the algorithm and the interpretability of clustering outputs. While this discussion provides some justification for (16a), it is still necessary to verify that the updated d~k values in (16b) are non-negative and correspond to the dik values for the newly updated ft in (I 6c). The updated dik values are non-negative if and only if the right-hand side of (16b) satisfies

dik -]" (Aft/2)* IIo~- ekll2 -->0, for l < i < c

and l < _ k < _ n .

(21)

Solving for Aft in (21) gives the equivalent condition Aft>_--2dik/llvl--ekll

2,

for l < i < _ c

and l < _ k < n (22)

and so the non-negativity of the updated d~k values follows from (16a) and (22). Finally, we verify that the updated d~k in (16b) are consistent with (15) for the updated shift ft + Aft in (16c). Letting d~k(7) denote the d~k in (15) corresponding to D v we must show that di,(ft + Aft) = dik(ft) + (Aft~Z)* IIvi -- ek II2.

(23)

Starting with the left side d i k ( ft -I- A f t ) = -- (v i -- ek)T D# + a#(V i -- e k ) / 2 = -- (v, -- e k ) r ( D p + A f l ( M

(24)

-- l ) ) ( v , -- ek)/2

(25) = - (vi - ek)TDB(vi -- ek)/2

-- Aft(v~ - ek)r(M -- I)(V~ -- ek)/2

(26)

= dik(ft) + (Aft/2)* IIv i - ek II~

(27)

since M ( v l - ek) = (0. . . . . 0)reR ". In summary, modification (16) of the original RFCM algorithm calculates a reasonable (under-) estimate of the minimum shift required to transform the current D 0 into a Euclidean matrix, and then implements this shift by updating the current dik values and value of ft. There are two points concerning computational efficiency that should be made. First of all, the quantities used to determine the shift are the original d~ values and values of Ilvi- ekll 2. Since these are exactly the quantities needed to perform the updating of the d~k , there is no wasted computation done in determining the new increment Aft. Secondly, whenever an increment in the shift is not needed, which in our experience has been the large majority of iterations, the work requirements for that particular iteration of N E R F are no greater than the work requirements for a R F C M iteration, except for the additional dik negativity checks which are negligible in cost.

4. THE NUMERICALTESTS All test results were obtained using a MATLAB program written by and available from the first named author. In the examples reported below we used m = 2 (in (14) and (17a)) and a stopping criterion value of e = 0.0001. As a first example, we use N E R F to finish the calculations for the top row of Table 1 where R F C M failed at ft = - 1. The first row in Table 1 corresponds to subtracting one from each off diagonal of the Euclidean matrix D in (9) and clustering the matrix. D_ a = [ 080 9

800 0

9il .

(28)

The results for the three possible (hard) initial partitions are given in Table 2. Note x2 and x3 are grouped together in all three cases, as we would expect, but that the terminal membership values vary slightly for different initializations. The variation is explained by noting that t h e f i n a l shift values of ft were different for each of the initializations. This is not surprising, since a different initialization gives a different set of Rayleigh quotients in (16a). The shift required to restore (28) to Euclidean is exactly 1.0, and the shift values obtained for the three initializations are all less than this, as seen in the right column of Table 2. In all cases, the final fl is essentially the minimum adjustment needed to insure all generated partitions U e M e c .. Our second example begins with the artificial object data set X from reference (10) that is plotted in Fig. 3 and listed in Table 3. One might expect a good (fuzzy) clustering of X to identify: x a - x 5 as one cluster; xv - xx ~ as another cluster; and x6 as an in-between or bridge point. We applied N E R F to this problem (with c = 2) using three dissimilarity relation matrices derived from X: namely, (1) squared Euclidean distances (L2)2; (2) squared L 1 norm dis-

435

NERF c-means: non-Euclidean relational fuzzy clustering Table 2. NERF results for D_ 1 given at (28) Initial cluster 1 values Ul(x2) Ul(x3)

Terminal cluster 1 values Ul(x2) Ul(x3)

Ul(xO

1.00130

1.0000 0.0000 0.0000

1.0000

1.0000

Ul(xl)

0.00~ 0.0000 1.0000

1.0000 1.0000

1.0000

0.0014 0.0000 0.0024

# of iter.

Final fl

4

0.4595 0.0000 0.7662

0.0012 0.0000 0.0019

1

4

Table 3. Coordinates of the object data X Point x coordinate y coordinate

1

2

3

4

5

6

-5.0 0.0

-3.0 2.0

-3.0 0.0

-3.0 -2.0

-1.0 0.0

7

8

9

0.0 1.0 3.0 3.0 0.0 0.0 2.0 0.0

10

11

3.0 -2.0

5.0 0.0

Table 4. Eigenvalues of PDP for the three dissimilarity matrices derived from X of Table 3 o 0

0

o 0 0 0

o

-5 -10

-5

0

Dissimilarity type (L2) 2

0

o

0

5

Fig. 3. Plot of the object data X in Table 3. tances (L1)2; and (3) squared Lz norm distances with an off-diagonal shift of 48.0 (L1) z + 48. (Recall that the L~ norm measures the size of a vector as the sum of the absolute magnitudes of its component differences.) The third choice seems odd at first, but is partly motivated by the eigenvalues of PDP, all three sets of which are displayed in Table 4. Recalling our discussion in connection with equation (13), we note that the pair of negative eigenvalues for PDP when D is the squared Euclidean distance matrix shown as column 1 in Table 4 imply that this D admits a two-dimensional object-data realization. Table 4 also indicates that using the La norm gives a non-Euclidean D (no surprise) as indicated by three positive eigenvalues, the largest of which is 48.0000. Apparently the L 1 dissimilarity matrix can be made Euclidean using a fl-spread with fl > 48.0000. Using the minimum value (fl = 48.0000) gives a Euclidean D which admits a nine-dimensional object-data realization. The initial partition used in the examples concerned with this data in reference (10) and also here was

(LI) 2

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -32.0000 -- 212.0000

48.0000 22.1124 4.9390 0.0000 0.0000 0.0000 0.0000 - 3.3221 - 47.1208 -80.0000 - 278.7903

0.0000 0.0000 -- 25.8876 -- 43.0610 48.0000 - 48.0000 -48.0000 - 51.3221 - 95.1208 - 128.0000 - 326.7903

The (L 1)2 data produces membership values very close to those obtained using the (L2) 2 data, while the shifted (L1) 2 + 48 data (corresponding to Euclidean distances for some object data set in R 9) is a good deal fuzzier, as expected. The shift needed for the (L02 data was only 3.56, much less than the 48 required to have actual Euclidean dissimilarities. Note that bridge point x 6 receives equal memberships in both clusters in all three cases; this is both expected and desirable. Our last example uses a small set of real data from G o w d a and Diday ~3~ obtained by applying a similarity measure to 5-dimensional object data originally from Ichino t14J which has four quantitative and one nominal qualitative feature values for each of eight different

Ut0~=[0.75

0.75

0.75

0.75

0.75

0.25

0.25

0.25

0.25

0.25

0.25]

I_0.25

0.25

0.25

0.25

0.25

0.75

0.75

0.75

0.75

0.75

0.751"

The results of our tests are given in Table 5. N o t e that R F C M (without some modification) will fail during iteration using the (L~) 2 distances. The ( L 0 2 + 48 data requires two fewer iterations, but typically a positive off-diagonal shift increases the number of iterations.

(L1) 2 + 4 8

types of oil. The similarity data matrix S taken from reference (13) is given in Table 6. We generate dissimilarity matrices D from similarity matrix S in two simple ways. Since the diagonal of S is not specified, we first set Dii = 0 for 1 _< i _< 8. The

436

R.J. HATHAWAYand J. C. BEZDEK Table 5. Terminal NERF membership values for the points in Fig. 3 Off-diag. D~

# of Final iter. fl II II 9

(L2) 2

(LI) 2 (L 1)2 +48

Cluster 1 membership values for:

0 3.56 0

X1

X2

X3

0.93 0.90 0.75

0.91 0.89 0.73

X4

X5

X6

X7

1.00 0 . 9 1 0 . 8 1 0.50 1 . 0 0 0.89 0.76 0.50 0.77 0 . 7 3 0.62 0.50

X8

X9

X10

X11

0.19 0.09 0.00 0 . 0 9 0.07 0.24 0.11 0.00 0.11 0.10 0 . 3 8 0 . 2 7 0 . 2 3 0.27 0.25

Table 6. Similarity matrix S for fat-oil data after Gowda and Diday "3~ Oil type

oI

ol:Linseed oil 02: Perilla oil o3:Cotton-seed 04: Sesame oil 05:Camelia 06: Olive oil 07: Beef-tallow os:Lard

02

-4.98 4.98 -3.66 5.70 3.77 5 . 8 8 3.84 4.70 3.24 5 . 3 0 0.86 2 . 7 8 1.22 3 . 0 8

03

04

05

06

07

08

3.66 5.70 -7.00 6.25 6.68 4.11 4.44

3.77 5.88 7.00 -5.90 6.37 3.61 3.97

3.84 4.70 6.25 5.90 -6.24 3.48 3.89

3.24 5.30 6.68 6.37 6.24 -4.28 4.68

0.86 2.78 4.11 3.61 3.48 4.28 -6.74

1.22 3.08 4.44 3.97 3.89 4.68 6.74 --

Table 7. NERF results for the oils data of Gowda and Diday "3~ Off-diagonal D o

# of iter.

Final fl

ol

02

(1/Sij)-min{l/S,,} max {Srt} - Sij

22 16

0.090 0.000

0.888 0.704

0.811 0.818

0.631 0.935

0.696 0.924

0.663 0.816

0.539 0.834

07

o8

0.087 0.036

0.096 0.028

5. DISCUSSION

two transformations are Dij = (1~So) - rain [1/Srt] rC:t

for i :/:j;

(29)

and Dij = max [S,f] -r~t

Cluster 1 membership values for: o3 04 o~ 06

Sij

for i e j .

(30)

Equations (29) and (30) both yield an off-diagonal value of 0, so that each is, in a sense, on the boundary of the set of dissimilarity matrices satisfying the original conditions at (1). One expected advantage of this fact is that the fl-spread is minimal so cluster structure should be more easily found by N E R F , which can always spread the data more, if necessary. The hard clustering selected as best by the method in reference (13) is represented by { {ol, 02, 03, 04, 05, 06}, {o7,os} }, which is consistent with the single-linkage dendrogram at c = 2 given in reference (14). The initial (hard) 2-partition used in our tests on this data corresponded to the clustering { {ol, 03, 05, O7}, {02, O4.,06, O8} }. Table 7 exhibits the terminal memberships found by N E R F on the dissimilarity matrices produced by (29) and (30). If we convert the final partition obtained in each case to a hard one, based on the rule of maximummembership, then our results agree exactly with those of both G o w d a and Diday 113) and lchino314) It is interesting to see how the first dissimilarity measure sees 06 as a much more ambiguous oil than does the second measure.

The original R F C M algorithm had a serious defect in that it usually failed for non-Euclidean data; and most dissimilarity data is non-Euclidean The N E R F c-means algorithm introduced here shares all the good properties of R F C M while not suffering from this defect. In fact, if negative dik values are not encountered, then N E R F is R F C M ; and when they are encountered, a simple modification adjusts the "spread" of (implicit realizations of) the data just enough to keep the iteration sequence { U (')} in Mfc n. Furthermore, the N E R F modification has a reasonable geometric interpretation, and it can be implemented in a computationaily efficient way. The examples discussed here and elsewhere generally demonstrate that a wide range of fl-spread values can produce quite similar clustering results, even though the dimensions of the corresponding realizations are quite different. The issue of dimension is at the heart of multidimensional scaling, but these techniques depend somewhat robustly on the values of the pair-wise dissimilarities. Examples like that associated with Table 1 indicate that greatly increasing fl-spreads result in an increase in the computational work needed to find a good terminal partition. For this reason, one might be tempted to routinely "pre-shift" a dissimilarity matrix D by the negative of its smallest off-diagonal element, producing a new "low-spread" dissimilarity matrix on

NERF c-means: non-Euclidean relational fuzzy clustering the boundary of the set of matrices satisfying (1). In general, this probably will help the convergence properties of iteration sequences and should be examined further. One also wonders if pre-shifting as described gives cluster membership values that are more interpretable in an absolute sense than those obtained by RFCM. Several interesting problems remain for future work. The first concerns imbedding N E R F in a scheme that also determines and/or validates the number of clusters c. Secondly, a careful convergence analysis of the N E R F clustering scheme should be done. Finally, the numerical examples discussed in this paper were chosen to be small and simple enough so that some basic conclusions about the approach could be drawn. Our results seem to warrant a comparison of N E R F with other relational clustering methods such as the linkage family and Windham's AP algorithm on larger data sets; we hope to do this in the near future.

Acknowledgments--This research was supported by the Faculty Research Committee of Georgia Southern University (RH) and NSF grant IRI-9003252 (JB).

REFERENCES

1. M. L. Davison, Multidimensional Scaling. Wiley, New York (1983).

437

2. E. Ruspini, A new approach to clustering, Inf. Control 15, 22-32 0969). 3. J.C. Bezdek and S. K. Pal, Fuzzy Models for Pattern Recognition. IEEE Press, Piscataway, New Jersey (1992). 4. G. Ball and D. Hall, A clustering technique for summarizing multivariate data, Behavioral Sci. 12, 153- 155 (1967). 5. J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters, J. Cybern. 3, 32-57 0974). 6. J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981). 7. R.J. Hathaway and J. C. Bezdek, Grouped coordinate minimization using Newton's method for inexact minimization in one vector coordinate, J. Optimiz. Theory Applic. 71, 503-516 (1991). 8. R.J. Hathaway, J. W. Davenport and J. C. Bezdek, Relational duals of the c-means algorithms, Pattern RecoOnition 22, 205-212 (1989). 9. M. P. Windham, Numerical classification of proximity data with assignment measures, J. Classification 2, 157172 (1985). 10. J. C. Bezdek, R. J. Hathaway and M. P. Windham, Numerical comparison of the RFCM and AP algorithms for clustering relational data, Pattern Recognition 24, 783-791 (1991). 11. K. V. Mardia, J. T. Kent and J. M. Bibby, Multivariate Analysis. Academic Press, New York (1979). 12. G.W. Stewart, Introduction to Matrix Computations. Academic Press, New York (1973). 13. K. C. Gowda and E. Diday, Symbolic clustering using a new similarity measure, IEEE Trans. Syst. Man. Cybern. 22, 368-378 0992). 14. M. Ichino, General metrics for mixed features the Cartesian space theory for pattern recognition, Proc. IEEE 1988 Int. Conf. on Systems, Man, and Cybernetics (1988).

About the Author--RicK HATHAWAYreceived the B.S. degree in applied mathematics from the University

of Georgia in 1979, and the Ph.D. degree in mathematical sciences from Rice University in 1983. He is an Associate Professor in the Mathematics and Computer Science Department of Georgia Southern University. His research interests include pattern recognition, statistical computing, and non-linear optimization.

About the Author--JIM BEZDEKreceived the B.S. degree in civil engineering from the University of Nevada (Reno) in 1969, and the Ph.D. in applied mathematics from Cornell University in 1973. He is a Professor in the Computer Science Department at the University of West Florida. His research interests include pattern recognition, image processing, computational neural networks, and medical applications. Jim is an IEEE fellow.