Sep 10, 1985 - In this paper we prove a local convergence property, that is, a property pertaining ... or the rate of convergence of FCM to such a point. This note ...
0031 3203 86 S3.00+ .00 Pergamon Journals Ltd Pattern Recognition Society
Pattern Recmlmtrml Vol. 19. No 6. pp 477 480. 1986. Printed in Greal Britain.
LOCAL CONVERGENCE OF THE FUZZY C-MEANS ALGORITHMS* RICHARD J. HATHAWAY Department of Statistics, University of South Carolina, Columbia. SC 29208. U.S.A. and JAMES C. BEZDEK* Department of Computer Science, University of South Carolina, Columbia, SC 29208, U.S.A. (Received 10 September 1985; receivedfor publication 5 December 1985) Abstract--Much understanding has recently been gained concerning global convergence properties of the fuzzy c-Means (FCM) familyof clustering algorithms. These ylobal convergenceproperties, which hold for all iteration sequences,guarantee that every FCM iteration sequenceconverges,at least along a subsequence,to a stationary point of an FCM objectivefunction. In this paper we prove a local convergenceproperty, that is, a property pertaining to iteration sequences started near a solution. Specifically,a simple result is proved which shows that whenever an FCM algorithm is started sufficientlynear a minimizer of the corresponding objective function, then the iteration sequence must converge to that particular minimizer. The result guarantees that once captured by the local neighborhood of a minimizer,the succeedingiterate sequence will not escape--thus, infinite oscillation of such a sequence cannot occur. The rate of convergence of the sequence to such a point is also discussed. Cluster analysis
Fuzzy c-Means
Local convergence
1. I N T R O D U C T I O N
This note concerns itself with a convergence property of the fuzzy c-Means (FCM) clustering algorithms. The FCM algorithms are an infinite family of techniques based on iterative optimization of the generalized least squared error functional defined in Section 2 below. These algorithms, discussed at length in Ref. (1) are a generalization of the Hard c-Means (HCM) or "basic ISODATA" algorithm described in Duda and HartJ 2~ Reference (1) discusses many early applications of the FCM technique to various problems in pattern recognition (e.g. clustering, feature selection and classifier design). Application areas include such diverse fields as medical diagnosis, numerical taxonomy, irrigation engineering, chemistry, geology, shape analysis and image processing. In practice scant attention is paid to theoretical questions about convergence of clustering algorithms. The main concerns of the application community are whether an iterative algorithm "stops" (successive iterates stabilize at an apparent fixed point of the process up to some margin of error); and even more importantly, when an algorithm does stop, is the terminal iterate a plausible and useful solution of the data analysis problem being worked upon? On this second point we digress: all objective-function al*This research was supported by NSF Grant No. IST-8407860. f To whom correspondence should be addressed.
Pattern recognition
gorithms are driven to (at least local) optimizers of their penalty functions on the presumption that the functional measures a "property" of the data that is most desirable at an extrema (in our case a minimizer). However, the property just alluded to is mathematical, i.e. it is an idealization of a physical property that may or may not be captured in the numerical data as represented by the functional. To cite a specific example, the FCM algorithm may converge to the global minimum of its penalty function, and yet this solution provides a poor explanation of the visually apparent substructure in the data [see Ref. (3)]. Nonetheless, a growing number of successful applications indicate that FCM provides reliable explanations of data substructures across a wide variety of processes. {4} The extant theory of convergence of FCM and extensions thereof is (to our knowledge) completely delineated in Refs. (5-9). For related work, see Refs. (10-12), which contain results pertaining to various aspects of the HCM and sequential k-Means algorithms. All of the FCM papers, however, deal with global convergence, viz., convergence of the algorithm from an arbitrary initial guess in the domain of the objective function to a fixed point of the iteration operator. We can sum up the global theory in one sentence: every iterate sequence of FCM converges--or has a subsequence which does--to either a local minimizer or saddle point of the objective function. While this result provides a "weak" form of global convergence, there has been no analysis of
477
478
R.J. HATHAWAYand J. C. BEZDEK
either the local capture region for a given fixed point, or the rate of convergence of F C M to such a point. This note provides answers for these two questions: (i) if the initial guess is close enough to a fixed point of FCM, does the iterate sequence converge to it? (yes); and (ii) what is the rate of convergence in (i)? (linear). We describe the F C M algorithms in Section 2; in Section 3 our results are stated and proven; and we conclude with a short discussion of the implication of these theorems in Section 4.
2. T H E F C M A L G O R I T H M S
Let X = {x~, x2 . . . . . x.} be a data set in :#~ and let c be an integer, 1 < c < n. Fuzzy c-partitions of X are (cxn) matrices U = [ulk] • ~'~" that satisfy:
UikE[O,1] V i = 1,2 .... c , k = 1.... n
(la)
c
Uik= 1 V k = 1,2. . . . . n
(lb)
~ Uik < n V i = 1,2. . . . . c.
(Ic)
entries of U satisfy
Uik "~-
(dik/
for 1 _< i < c, 1 _< k _< n.
j=l
(3b) The family of F C M algorithms can be represented by the iteration operator Tm:MycxD---, M/cxR c~ defined by Tm(U,v) = (F(G(U)), G(U)). Note that Tm is continuous in a neighborhood of every point in its domain, An F C M iteration sequence is obtained by choosing (U°,v °) and computing (U '+1, v"+l) = T,.(U',v') for r = 1. . . . . We implicitly assume that dlk > 0 for all i and k, although this assumption is not essential in obtaining certain convergence results. It is worth noting that the global results in Ref. (7) do not require that dik > O. A local result is a guarantee that an algorithm will converge to a particular solution (in this case a minimizer of Jr,) if it is started close enough to that solution. In the next section, we state and prove a local result for FCM.
i=1 3. A L O C A L C O N V E R G E N C E
0
0, where p' = min
i2e)
i
Iterating through necessary conditions (3) for a minimizer of Jr, produces the family of F C M algorithms given next. t~) Throughout, we assume that the set X contains at least c < n distinct points. (This condition is added to ensure that the range of F, defined below, is in M:~.) Let G(U):M:~--*R" be defined by G(U) = v, where for 1 _< i < c, (Uik)mXk
vi = k= ~
(3a)
k=l
Let F(v) be the function defined on the subset D = {veR":d~k > 0} by F ( v ) = U in M:¢, where the
R E S U L T F O R FCM
)
~ (uik)2 . \'V
k=l
The set S(p) is compact for any choice of p satisfying 0 < p < p'. To see this, we first note for these choices of p the set S(p) is clearly bounded. Also, if(U,v) is a limit point of a sequence in S(p), then II(U,v) (U*,v*)II < P with (U,v)~closure (M:c)xR cs. Since p < p', it must be the case that (U,v) is in M/cxR", which implies that the limit point must be in S(p). This establishes that Sip) is also closed. Hence Sip) is compact. Let p" be any fixed positive n u m b e r less than p' satisfying the following: (i): T,. is continuous on S(p"), and (ii): the Hessian of J,. is positive definite relative to all feasible directions on S(p"). It is easily seen that (i) can be satisfied, and the fact that (ii) can be satisfied follows from the continuity and positive definiteness of -
Local convergence of the fuzzy c-Means algorithms the Hessian of 0r,. at the minimizer. Using (i), (ii), the convexity of the set Sip"), and the fact that (U*,v*) can be a minimizer of or,, only if it is a fixed point of T,~,i.e. T,,,(U*,v*) = (U*,v*), we know that there exists a 6 > 0 such that whenever (U,v) is in N(a), where N(a) = {(U,v)¢S(p")l
IJ,.(U*, v*) - J,(U, v)l ~ a}.
then T,.(U,v)eS(p'). Since Jm(T,,(U,v))