POLARIMETRIC FUZZY K-MEANS CLASSIFICATION WITH CONSIDERATION OF SPATIAL CONTEXT A. Reigber1 , M. J¨ager1 , M. Neumann2,1 and L. Ferro-Famil2 1
Berlin University of Technology, Computer Vision and Remote Sensing. Franklinstr. 28/29 FR3-1, D-10587 Berlin, Germany. Email:
[email protected] 2 University of Rennes 1, Institute of Electronics and Telecommunications of Rennes. Campus de Beaulieu, Bat 11D, 263 Avenue General Leclerc, F-35042 Rennes, France 1
ABSTRACT
This paper proposes a new unsupervised classification approach for polarimetric SAR data. Assuming Wishart distributed polarimetric covariance matrices, it combines spectral clustering based on the covariance matrices themselves with spatial clustering by statistical analysis of local neighbourhoods. Instead of working with fixed assignment of samples to class centres, a soft decision rule is used in which each pixel is assigned to all class centres in spectral and spatial domain. The local neighbourhood is taken into account by altering the probabilities of class membership by a neighbourhood function, obtained from normalised compatibility coefficients, describing cluster sizes and mutual tolerance. In this way, robust and homogenous classification results can be obtained even in presence of strong speckle noise. The performance of the proposed algorithm is evaluated based on fully-polarimetric SAR data acquired by DLR’s E-SAR sensor at L-band.
2
INTRODUCTION
Unsupervised classification is an important technique for automatic analysis of synthetic aperture radar (SAR) data. Particularly polarimetric SAR (PolSAR) data is appealing for this purpose, since it allows sophisticated classification based on the analysis of multiple polarimetric channels. In the literature, many unsupervised classification approaches for polarimetric SAR data have been proposed. Basically, there are two types of algorithms: One type is based on the analysis of physical scattering properties, which has the advantage that some information about class types is available [1–3]. Another type of unsupervised classification algorithms is based purely on statistical clustering [4–8]. Additionally, several interesting combinations of these two types of classification approaches have been found [3, 9]. However, one disadvantage of all these algorithms
is that each pixel is treated independently of its neighbours; spatial context is only indirectly considered during speckle filtering. The local neighbourhood does indeed have a significant influence on a pixel’s class membership: When a certain region already has been classified, with high confidence, as belongin to a single class, it becomes comparatively unlikely that a pixel in this region belongs to another class. The much more likely scenario is a misestimation of its covariance matrix due to speckle noise. Due to the inherently high noise level of SAR data, the inclusion of local neighbourhoods in statistical decision about class membership is helpful to support homogenous classification results.
3
POLARIMETRIC CLASSIFICATION INCLUDING SPATIAL CONTEXT
3.1
K-MEANS CLUSTERING
For a homogenous region ωi with Gaussian backscattering, characterised by a polarimetric covariance matrix Σi = E(C|C ∈ ωi ), it is known that the measured covariance matrices C follow the complex Wishart distribution [5] [10] p(C|Σi ) =
nqn |C|n−q exp[−nTr(Σ−1 i C)] Qq n q(q−1)/2 |Σi | π j=1 Γ(n − j + 1)
(1)
with q denoting the dimensionality of C (here 3), Tr(. . . ) the trace of a matrix, | . . . | the determinant operator and Γ(. . . ) the Gamma-function. In classification problems, a dataset is considered to consist of several classes, each having a different class centre Σi . In a maximum-likehood sense, a measured covariance matrix can be assigned to that class, which has the highest probability, i.e. C ∈ ωi
if p(C|Σi ) > p(C|Σj )
∀j 6= i
(2)
For p(Ci |Σj ), the Wishart-distribution of Eq. 1 can be used, although it is sufficient to ignore all constant
and class independent terms and to use p(Ci |Σj ) = exp −tr(Σ−1 j Ci ) / |Σj |
(3)
instead. However, in an unsupervised classification approach, the set of class centres Σ = (Σ1 , . . . , ΣM ) is unknown. Therefore, in k-means clustering, first an arbitary set of initial class centres is assumed and the pixels are assigned to their most likely class. Then, from all pixels in each class, updated class centres are derived and a new class assignment is performed. This iterative process is repeated until class membership converges. Wishart k-means clustering has successfully been applied to polarimetric SAR data [9] [3] and is nowadays considered to be the standard approach for unsupervised classification of polarimetric SAR data. 3.2
CLASSIFICATION VIA TION MAXIMISATION
probabilities p(C ∈ ωj |C, Σ) for each pixel and class, i.e. the probability that a pixel belongs to class j, given its covariance C and a set of class centres Σ. In case of polarimetric data, this can be achieved by substituting Eq. 3 into Eq. 4: (k)
(k)
pij
=
p(Ci |Σj ) (k)
PM
p(Ci |Σl ) (k) (k) exp −tr (Σj )−1 Ci /|Σj | PM (k) −1 (k) Ci /|Σl | l=1 exp −tr (Σl ) l=1
=
(6) Superscript k indicates the current iteration step; af(0) ter initialisation k = 0 and Σ = {Σ1 , . . . }. The definition of pij according to Eq. 4 implies that
EXPECTApij ∈ [0, 1] ∀i, j
and
M X
pij = 1 ∀i (7)
j=1
In expectation maximisation (EM), each pixel is assigned, with different probabilities, to all possible classes. Given a fixed set of M classes, this ”fuzzy” assignment can be described by normalised probabilities of class membership: p(C|Σi ) p(C ∈ ωi ) = PM j=1 p(C|Σj )
j
Nj
=
number of pixels in ωj
(k+1) Σj
(4)
This is significantly different from unsupervised classification in the classical k-means sense, where each pixel is assigned to only one, namely the most likely class. In the literature, a weighted averaging with the respective probabilities instead of using fixed assignments is often also referred as fuzzy decision [11] and, in fact, expectation maximisation is very closely related to fuzzy classification approaches [6, 8]. In general, it has the advantage that no binary decisions about class memberships have to be made and all possible assignments can be considered in parallel [12]. The EM algorithm starts with an initial guess of a non-optimal set of class centres Σ(0) . To do so, mean covariance matrices 1 X (0) Σj = E(C|C ∈ ωj ) = Ci Nj i∈ω with
Using the values obtained for pij , an updated set of class centres can be calculated in the subsequent maximisation step:
(5)
are estimated over M regions ωj , 1 ≤ j ≤ M , in the scene. Several possibilities exist to determine initial regions: They can be defined manually in the form of training sets, automatically by an Entropy/Alpha pre-classification [9], or even by a random assignment of pixels to one of the M classes. Once initial class centres are known, one estimates, in the so-called expectation step, the a-posteriori
(k) i=1 pij Ci PN (k) i=1 pij
PN =
(8)
The expectation and maximisation steps are carried out iteratively until a certain termination criterion is met. This could be the convergence of pij between two subsequent iterations [8], i.e. s 2 1 X (k) (k−1) < threshold (9) pij − pij N ij Alternatives are the convergence of the class centres themselves [6], the percentage of pixels changing their most likely class between iterations falling below a certain threshold, or simply a fixed number of iterations. Fig. 1 shows a quantitative comparison of the expectation classification technique with threshold classification of the Entropy/Alpha feature space [2] and Wishart k-means classification [9]. For averaging the covariance matrix, a mean filter with only 9 independent samples was applied, which is usually not sufficient for speckle reduction. As expected, all classification results appear very noisy due to insufficient speckle filtering. Generally, EM and k-means provide quite similar results. 3.3
PROBABILISTIC ATION
LABEL
RELAX-
In a neighbourhood-supported classification, the classification result obtained in a pixel’s surrounding
and without considering the content of the pixel itself. After the expectation maximisation procedure, the class probabilities based on the polarimetric covariance matrices (according to Eq. 3) are known. This allows to evaluate Eq. 10 and results in two kinds of class probabilities: One, q(), based only on spatial context, and another, p(), based on polarimetric covariances only. By multiplying both and with (1) the initialisation pq (n, ωi ) = p(Cn |Σi ), as derived in the expectation-step of the EM procedure, an alternative probability measure for determining class membership can be defined: (h)
pq (n, ωi )q (h) (n, ωi ) p(h+1) (n, ωi ) = PM (h) q (h) (n, ω ) j j=1 pq (n, ωj )q
Figure 1: a) Image amplitude in Paulidecomposition (HH=red, VV=green, HV=blue, 9 looks). b) Entropy-alpha classification. c) Classification into 8 classes with Wishart k-means. d) Classification into 8 classes with Wishart expectation maximisation.
(11)
with h indicating the current iteration step. Eq. 11 considers polarimetric information using the Wishart-distribution of C and neighbourhood relations by incorporating neighbourhood functions q(). (h) The conditional probabilities pq (n, ωi ), as computed according to Eq.11, lead in a subsequent step also to updated neighbourhood functions q (h+1) (n, ωi ) =
L X M X
p(n, ωi |m, ωj )p(h) q (m, ωj )
m=1 j=1
is utilised to resolve uncertainties in class membership according to local covariance. This idea is based on the assumption that two neighbouring pixels are not entirely statistically independent: In reality, spatially random classification results are not very likely, instead continuous areas of certain sizes are to be expected. If the direct surroundings of a pixel are already classified, with high confidence, into a certain class ωj , it becomes more likely that the observed pixel also falls into class ωj . This circumstance is the basis for the probabilistic label relaxation technique. The starting point is the introduction of so-called a-priori compatibility coefficients p(n, ωi |m, ωj ): the conditional probability that a pixel n falls into class ωi , if a neighbouring pixel m belongs into class ωj . In general, M possible class assignments are possible; furthermore it is possible to incorporate a larger neighbourhood consisting of L pixels. Based on this, a neighbourhood function q(n, ωi ) =
L X M X
p(n, ωi |m, ωj )p(m, ωj )
(10)
m=1 j=1
can be defined, which describes the total joint probability over all neighbours and their class assignments, that a pixel n falls into class ωi . The probability q(n, ωi ) gives information about class membership of pixel n solely by examination of its neighbourhood
(12) The iteration over Eqns. 11 and 12 is repeated H times until an acceptable convergence is reached. The question remains how the compatibility coefficients are to be determined. Ideally, a spatial model of the area under investigation is known, e.g. derived from a geoinformation system. In general it has to be assumed that the compatibility coefficients are unknown. In this case it is reasonable to use only two different coefficients: A higher one for ωi = ωj and a lower one for ωi 6= ωj . Their ratio quantifies how much more probable a meeting between equal classes is than a meeting between different classes. 3.4
COMBINED APPROACH
In the context of classifying polarimetric SAR data, the usage of probabilistic label relaxation as a postprocessing step on a result like in Fig. 1 is not very effective. The poor classification quality of classifiers like expectation maximisation, caused by the inherent speckle effect of SAR data, leads to an inaccurately estimated set of class covariances Σ. A subsequent relaxation process would indeed homogenise the classification result by incorporating spatial context, but is not able to correct the suboptimal set of class centres. Therefore, it is advisable to integrate the described relaxation process into the expectation maximisation iterations in order to improve the esti(k) mation of class covariances Σj themselves.
A reasonable neighbourhood-supported classification using expectation maximisation and probabilistic label relaxation starts with an initial estimation of mean covariance matrices according to Eq. 5. Subsequently, for each pixel and for each class the aposteriori probabilities of class membership are estimated according to Eq. 6. After this step, a probabilistic label relaxation process is used to iteratively update the conditional probabilities pq for each pixel and class using Eqns. 10 and 11, and a fixed number of H iterations (e.g. 5). Eq. 8 can then be applied to obtain an updated set of class centres from the (k) values of pq (k+1)
Σj
PN (k) i=1 pq (i, ωj )Ci = P (k) N i=1 pq (i, ωj )
(13)
after which the expectation maximisation iteration resumes. This procedure is continued, as in normal expectation maximisation, until a certain termination criterion is met. For example, after each iteration, a decision about the most likely class can be made: Cn ∈ ωi
if pq (n, ωi ) > pq (n, ωj ) ∀j 6= i
(14)
The iterations are stopped when a termination criterion like in Eq. 9 is reached.
4
EXPERIMENTAL RESULTS
In order to evaluate the characteristics and potential of the proposed approach, different classifications of an agricultural scene with relatively large homogenous fields have been produced. The data have been acquired over the testsite of Alling / Germany by DLR’s experimental SAR sensor E-SAR at L-band. The image amplitude in Pauli decomposition is shown in Fig. 2a. For averaging the covariance matrices, a 3x3 boxcar filter has been used, resulting in input data with only 9 looks. Such slight averaging causes high spatial resolution but badly estimated covariances and strong influence of speckle. This is desired here, since it should be demonstrated how noise effects can be controlled with spatial homogenisation by probabilistic label relaxation. As initialisation of the expectation maximisation process, a random assignment to 8 classes has been used. For estimating the neighbourhood function according to Eq. 10 a Gaussian neighbourhood of size 5x5 has been used in all shown examples. When using smaller neighbourhoods, like a 3x3 Gaussian or a simply 4-connectedness, very similar results are achieved: By iterating on the neighbourhood function pixels farther away are also included step-bystep. As termination criteria, the convergence of the (k) (k−1) conditional probability ||pq −pq || has been used.
If their value falls below 1%, the iteration process is stopped and based on the actual pq each pixel is assigned to its most likely class. In Fig. 2, classification results using different values for the compatibility coefficients and number of iterations on the neighbourhood function are shown. In Fig.2b, identical probabilities for a transition from one class to the same class and a transition from one class to another class have been used. In this case, the neighbourhood function loses its significance and the result corresponds to that of expectation maximisation alone. As expected, the result appears relatively noisy due to the low number of looks of the input data. In Fig. 2c, a compatibility coefficient has been used, which is five times higher for equal classes than for different classes. H has been set to 5, i.e. 5 iterations on the neighbourhood function have been performed within each complete iteration cycle. The classification results get more homogenous and less pixels without relevant information are appearing. When increasing the ratio between the compatibility coefficients to 100 (Fig. 2d), a much stronger homogenisation is achieved. Fig. 2e and 2f demonstrate the effect of changing the number of iterations on the neighbourhood function. In Fig. 2e, H = 5 iterations have been used, in Fig. 2f H = 20, both with a ratio between the compatibility coefficients of 10. Again, a stronger homogenisation can be achieved by increasing H, since the size of the considered neighbourhood increases. Despite the low number of looks of the data, qualitatively appealing and high-resolution classification results can be achieved with the proposed method. For example, several small point targets persist as small points in the classification result, even when using very strong homogenisation (Fig. 2f)
5
DISCUSSION & CONCLUSIONS
This paper proposes a new unsupervised classification approach for polarimetric SAR data, which has the goal to achieve a very homogenous classification result. Up to now, this has been tried mainly by smart speckle filtering of the covariance matrices, the actual classification process, however, consisted in standard clustering techniques on a pixelby-pixel basis using the statistical characteristics of SAR data. In the proposed approach, it is instead tried to embrace spatial context directly in the statistical classification process. This is achieved by combining an iterative expectation maximisation approach with the principle of probabilistic label relaxation. Both techniques together provide spatially homogenous and polarimetrically motivated classification results even in presence of strong speckle noise. As it could be shown, the proposed approach of
Figure 2: Classification results of neighbourhood supported expectation maximisation with different ratios of compatibility coefficients. a) image amplitude, b) EM classification c) equal class 5 times more probable than unequal class, K = 5, d) 100 times more probable, K = 5, e) 10 times more probable, K = 5, f) 10 times more probable, K = 20
a neighbourhood-supported classification is superior to known standard techniques for unsupervised polarimetric classification, particularly in case of a low number of looks. This is appealing since only slight averaging of the covariance matrices causes distinct speckle effects but preserves also high spatial resolution. Of course probabilistic label relaxation results in a strong smoothing of homogenous areas; however class boundaries and point targets are preserved at full resolution. In other words, adaptive speckle filtering is in some way replaced by adaptive classification. The disadvantages of the proposed approach are typical for all unsupervised classification methods: The chosen number of classes might have a strong influence on the classification result, the same holds for the chosen iteration strategy. In general, it can be concluded that the proposed technique sounds interesting when homogenous classification results and not point-wise assignments are desired, as it is usually the case in agricultural or forested areas. In these cases, the proposed approach generates results of higher quality than conventional approaches based only on the analysis of the covariance matrices.
6
ACKNOWLEDGEMENT
The authors wish to thank QinetiQ Corporation and the German Aerospace Center (DLR) for providing the SAR data in the frame of the S.P.A.R.C. (Surface Parameter Retrieval Collaboration) project. This work was supported by the German Research Foundation (DFG) under project number RE 1698/2.
References [1] J. J. van Zyl, “Unsupervised classification of scattering mechanisms using radar polarimetry data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 27, pp. 36–45, 1989. [2] S. R. Cloude and E. Pottier, “An entropy based classification scheme for land applications of polarimetric SAR,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 1, pp. 68–78, 1997. [3] L. Ferro-Famil, E. Pottier, and J.-S. Lee, “Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier,” IEEE Transactions on Geoscience and Remote Sensing, vol. 39, no. 11, pp. 2332–2342, 2001. [4] G. H. Ball and J. David, ISODATA, a Novel Method of Data Analysis and Pattern Classification. Stanford Research Institute, 1965.
[5] J.-S. Lee, M. R. Grunes, and R. Kwok, “Classification of multi-look polarimetric SAR imagery based on the complex Wishart distribution,” International Journal of Remote Sensing, vol. 15, no. 11, pp. 2299–2311, 1994. [6] L. J. Du and J.-S. Lee, “Fuzzy classification of earth terrain covers using multi-look polarimetric SAR image data,” International Journal of Remote Sensing, vol. 17, no. 4, pp. 809–826, 1996. [7] G. Davidson, K. Ouchi, G. Saito, N. Ishitsuka, N. Mohri, and S. Uratsuka, “Polarimetric classication using expectation methods,” Polarimetric and Interferometric SAR Workshop, Communications Research Laboratory, Tokyo, 2002. [8] C.-T. Chen, K.-S. Chen, and J.-S. Lee, “The use of fully polarimetric information for the fuzzy neural classification of SAR images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 9, pp. 2089–2100, 2003. [9] J.-S. Lee, M. R. Grunes, T. L. Ainsworth, L.-J. Du, D. L. Schuler, and S. R. Cloude, “Unsupervised classification using polarimetric decomposition and the complex Wishart classifier,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 5, pp. 2249–2258, 1999. [10] J.-S. Lee, K. W. Hoppel, S. A. Mango, and A. R. Miller, “Intensity and phase statistics of multilook polarimetric and interferometric imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 32, no. 5, pp. 1017–1028, 1994. [11] L. A. Zadeh, Fuzzy sets and their applications to cognitive and decision processes. Academic Press New York, 1975. [12] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, 1981.