A Self-Initializing PolInSAR Classifier Using ... - CiteSeerX

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

1

A Self-Initializing PolInSAR Classifier Using Interferometric Phase Differences Marc Jäger, Student Member, IEEE, Maxim Neumann, Student Member, IEEE, Stéphane Guillaso and Andreas Reigber, Member, IEEE

Abstract This paper describes an unsupervised classifier for polarimetric interferometric SAR (PolInSAR) data. Expectation maximization is used to estimate class parameters that maximize the likelihood of observations in an input dataset for a given number of classes. Polarimetric information, in the form of coherency matrices, and interferometric information, in the form of complex coherences, are taken into account. Differences in interferometric phase across different polarization states are explicitly modeled to make the classifier sensitive to the vertical structure of the scene under observation, and the distribution over such phase differences is introduced. The classifier is self-initializing, in that it does not rely on decompositions or thresholds. Classification results obtained for real polarimetric interferometric data are presented and discussed. Index Terms Synthetic aperture radar, Radar polarimetry, Interferometry, Radar target classification

I. I NTRODUCTION Unsupervised classification is often essential in the automated analysis of SAR remote sensing data. Classification results make data easier to interpret by users, who may not be familiar with SAR data, and can serve as a starting point for more sophisticated automated analyses that apply to homogeneous regions of a particular type of land cover, such as the model driven inference of forest or crop parameters. The classification of PolInSAR data is, in view of potential applications, of particular interest. PolInSAR data contain a wealth of information: polarimetric information, which relates to material and local geometric properties, is complemented by interferometric information, which primarily describes topography. PolInSAR data also contain information that is not inherent in either polarimetric or interferometric measurements alone. The combination of polarimetric and interferometric information makes it possible to resolve the vertical M. Jäger and A. Reigber are with the Berlin University of Technology, Computer Vision and Remote Sensing Group, Franklinstr. 28/29 (FR3-1), D-10587 Berlin, Germany. Email: [email protected] M. Neumann is with the University of Rennes 1, Institute of Electronics and Telecommunications of Rennes Campus de Beaulieu, Bât 11D, 263 Avenue Général Leclerc, 35042 Rennes, France. S. Guillaso is with the Laboratoire de Géologie, ENS, CNRS, 24 Rue Lhomond, 75005 Paris, France.

October 5, 2007

DRAFT


2

structure of the observed scene when scatterers within the same resolution cell and with different polarimetric signatures are separated along the sensor’s line of sight. The ability to characterize the vertical structure of media has lead to a number of model inversion techniques that infer structural parameters of forest (for example [1]), urban areas [3] and agricultural areas (for example [2] [4]). The automated classification of PolInSAR data is therefore potentially useful in a wide range of remote sensing applications. Polarimetric and polarimetric interferometric SAR data are closely related in terms of structure and their underlying statistical properties. As a consequence, advances in the classification of PolInSAR data can build on a large body of work concerning the unsupervised classification of polarimetric SAR data. The most widely studied, and perhaps the most successful, class of algorithms uses statistical clustering techniques to iteratively estimate a set of class parameters that characterize the homogeneous regions constituting a given dataset. These algorithms exploit the fact that polarimetric data, in the form of covariance or coherency matrices, are known to follow the complex Wishart distribution to define a measure of distance between observed samples and classes. A given initial classification can then be iteratively refined by alternately assigning samples to classes and updating class parameters to reflect changes in assignment until convergence is reached. This approach to classification was pioneered in [5], where the H/α/A decomposition in conjunction with fixed thresholds was used to derive an initial classification. Later refinements include the use of alternative decompositions for initialization and physically motivated constraints during the iterated parameter estimation [6] as well as the use of more sophisticated clustering algorithms such as expectation maximization [7]. This general framework for classification has been extended and adapted to PolInSAR data, for example in [8] and [9]. These classifiers have been applied to the problem of forest classification, where they are able to distinguish different types of forest with a high degree of accuracy. The results obtained clearly demonstrate that interferometric information, in the form of (optimized) interferometric coherences, is essential for classifying objects with a complex vertical structure. In addition, interferometric coherence is also known to enhance the discriminability over other types of land cover, e.g. man made structures, which can be correctly classified even when polarimetric information alone is ambiguous [10] [11]. Since PolInSAR covariance or coherency matrices follow the complex Wishart distribution, any polarimetric classifier based on this distribution can, in principle, be almost trivially extended to process PolInSAR data by replacing 3 × 3 polarimetric covariance matrices with the larger 6 × 6 PolInSAR covariance matrices. Although straight forward, using classifiers designed for polarimetric data to classify PolInSAR data can lead to undesirable results. Firstly, there is, at present, no decomposition for PolInSAR data that represents both polarimetric and interferometric information content. The decomposition approach used to determine a suitable set of initial class parameters is therefore generally difficult to extend to PolInSAR data, although it should be noted that excellent initialization strategies do exist for more restricted problem domains (for example the use of optimised coherences in the context of forest classification [8]). The quality of initialization is important, as the iterated estimation of class parameters converges on a sub-optimal solution if the initial parameters do not capture the structure of a dataset sufficiently well. This issue is especially relevant for PolInSAR data, where the parameter space is comparatively October 5, 2007

DRAFT


3

large. Secondly, PolInSAR data contains information related to topography in the form of absolute interferometric phases. Although topography related information can be used in conjunction with prior knowledge to resolve ambiguities in certain applications, it is undesirable in a fully unsupervised classifier where no such knowledge is available. Previous approaches to PolInSAR classification avoid this issue by using the degree of optimized coherence [8] or by explicitly subtracting the topographic phase prior to classification [9]. The classifier presented in this paper represents a non-trivial extension to the statistical classification algorithms developed for polarimetric SAR data, and is intended to address the issues outlined above. Class parameters are estimated using expectation maximization applied to a statistical model that takes into account the polarimetric signature, absolute degrees of interferometric coherence, and differences in interferometric phase across polarization states. Interferometric phase differences, directly related to the vertical structure of observed media, are incorporated in a fashion that minimizes the impact of phase noise. Furthermore, the iterated estimation of class parameters is self-initializing and does not rely on decompositions or thresholds. Finally, the structure of the classifier is inherently modular, allowing the user to specify what types of information are to be taken into account. Indeed, classification results can be based purely on information relevant to the application at hand. The paper is structured as follows. Section 2 introduces the distributions relevant to the classifier, and section 3 describes how these distributions can be used for classification by combining them in an expectation maximization framework. Section 4 motivates and describes the process of self-initialization. In section 5, classification results for real PolInSAR data are presented and discussed, and section 6 contains conclusions. II. DATA AND D ISTRIBUTIONS PolInSAR data are obtained by coherently combining measurements from two polarimetric SAR acquisitions. The raw data of each polarimetric acquisition is processed to determine the scattering matrix in each resolution cell of a dataset. This matrix contains the complex co- and cross-polar responses of targets within the resolution cell, and can, in the monostatic case and in the Pauli representation, be represented as a vector ki : 1 T ki = √ [SHH,i + SV V,i , SHH,i − SV V,i , 2SHV,i ] 2 The vectors k1 and k2 , one vector from each polarimetric measurement, are concatenated to obtain the PolInSAR scattering vector k, which is then used to compute the polarimetric interferometric coherency matrix T:    

† k1 T11 T12   = kk where k =  T =  k2 T†12 T22 where h. . . i denotes averaging and may be accomplished by multilooking or spatial averaging using a low-pass filter. Over a homogeneous region with complex Gaussian statistics of the backscattered fields, T is known to follow the complex Wishart distribution Wq (Σ, L) [13]. T ∼ Wq (Σ, L)

October 5, 2007

DRAFT


4

LLq |T|L−q |Σ|−L exp −LTr(Σ−1 T) Qq p (T|Σ, L) = π q(q−1)/2 j=1 Γ(L − j + 1)

(1)

where Σ = E[T] is the true coherency matrix, L is the number of looks, q is the dimensionality of k (q = 6 in the monostatic case), Γ denotes the gamma function, and Tr denotes the trace operation. As noted in the introduction, using the full T matrix for classification directly, without appropriate pre-processing, is problematic due to the presence of unwanted topographic information in the phase of the elements of T12 . In anticipation of the classification process, it is desirable to isolate the relevant information in T, namely the polarimetric signatures, degrees of interferometric coherence and interferometric phase differences. The remainder of this section is concerned with the distributions over these quantities, which will be central to the statistical clustering scheme detailed in subsequent sections. The relevant distributions are obtained by marginalizing the complex Wishart distribution of T to obtain separate distributions over the quantities deemed of importance for the purpose of classification. These distributions are then, by design, independent of unwanted information. The distribution over polarimetric information, in the form of T11 (or T22 ), is obtained by marginalizing (1) over T12 and T22 (or T11 ). The associated integral is trivial, since a subset of the elements in k also follows the complex Gaussian distribution, and shows that the matrices Txx are, as expected, themselves Wishart distributed: Txx ∼ Wq/2 (Σxx , L)

(2)

where Σxx = E[Txx ]. The matrix T12 fully describes the interferometric properties of the observed data in all polarization bases. The elements of T12 are directly related to the complex interferometric coherences γij , which are given by γij = dij exp(ı φij ) = p

T12 (i, j) T11 (i, i) T22 (j, j)

1 ≤ i, j ≤ q/2

(3)

where dij is the degree of coherence between polarimetric channels i and j of acquisitions 1 and 2, respectively, and φij is the associated interferometric phase. The distribution over dij is obtained by marginalising (1), as in equation (2), to ascertain that the coherency matrix Γij



Γij =  where I =

p

T11 (i, i)

I dij exp(ı φij )

I dij exp(−ı φij )

T22 (j, j)

 

(4)

T11 (i, i)T22 (j, j) and I dij exp(ı φij ) = T12 (i, j), is Wishart distributed. Following the approach in

[12], the two dimensional Wishart distribution over Γij can be marginalized further by integrating over T11 (i, i) and T22 (j, j) in the range [0, ∞] and φij in the range [0, 2π] to obtain the distribution over the degree of interferometric coherence dij : 2 L p(dij |Dij , L) = 2(L − 1)(1 − Dij ) dij (1 − d2ij )L−2 2 2 × 2 F1 (L, L; 1; Dij dij )

October 5, 2007

(5)

DRAFT


5

where Dij is the true degree of coherence and p Fq denotes the hypergeometric function. Touzi et al. showed in [12] that the mean coherence hdij i over a homogeneous area is a biased estimator of the true coherence Dij : 2 L Γ(L)Γ( 32 )(1−Dij ) Γ(L+ 21 )

hdij i = × 3 F2

3 2 , L, L; L

2 + 21 , 1; Dij

(6)

As will be discussed in more detail in section III, the proposed classifier estimates the true coherence Dij of a homogeneous region by averaging sample coherence values dij . It is therefore important to note that equation (6) can be inverted to obtain an unbiased estimate of the true coherence Dij . The distribution over the interferometric phase φij can be derived in a similar fashion, by marginalizing the distribution of Γij over T11 (i, i), T22 (j, j) and dij , where dij is integrated in the range [0, 1] (see [14]). After simplification, the expression is equivalent to that derived in [15]: 2 L (1−Dij ) 1 2 + p (φij |αij , Dij , L) = 2 F1 1, L; 2 ; z 2π Γ( 12 )Γ(L+ 21 ) 2 −(L+1/2) z(1 − z ) Γ(L)

(7)

where αij is the true interferometric phase and z = Dij cos(φij − αij ). The interferometric phase φij is not immediately useful in the context of unsupervised classification, since it is, to a large extent, determined by the topography of the scene under observation. This is not the case for interferometric phase differences of the form ∆φij,kl = φij − φkl

1 ≤ i, j, k, l ≤ q

(8)

which represent the change in interferometric phase across different interferometric channel combinations. A nonzero phase difference ∆φii,jj indicates that scatterers with different polarimetric signatures are vertically separated within a resolution cell. The distribution of phase differences ∆φij,kl is obtained by considering the joint distribution of interferometric phases φij and φkl . p (φij , φkl |αij , αkl , Dij , Dkl , L)

=

1 Z p (φij |αij , Dij , L)

p (φij − ∆φij,kl |αkl , Dkl , L)

= p (φij , ∆φij,kl |αij , αkl , Dij , Dkl , L)

(9)

where Z is a normalization factor. The distribution over ∆φij,kl is then given by marginalizing p (φij , ∆φij,kl |.) over φij . The associated integral is difficult, and a closed form solution remains to be found. Fortunately, however, the integral over φij can be recast as a convolution: p (∆φij,kl |·)

= =

1 Z

1 Z

Rπ

−π

Rπ

−π

p (φij |αij , ·) p (φij − ∆φij,kl |αkl , ·) d φij

p (φij |0, ·) p (∆φij,kl − φij |αkl − αij , ·) d φij

=

1 Z p (φij |0, ·)

⊗ p (φkl |∆αij,kl , ·)

(10)

where ∆αij,kl = αij − αkl is the true difference between the interferometric phases relating channels {i, j} and {k, l} of the two interferometric acquisitions. October 5, 2007

DRAFT


6

Even without a closed form solution to equation (10), the distribution of interferometric phase differentials can be used in the context of classification. Since range of the distribution p (∆φij,kl |∆αij,kl , .) is finite (it is restricted to the interval [0, 2π]) the distribution can be tabulated and normalized efficiently by numerically convolving the distributions p (φij |·) and p (φkl |·). It should be noted that, since the functions being convolved are symmetric and uni-modal, the distribution of ∆φij,kl is also symmetric and uni-modal. The mode of p (∆φij,kl |·) must lie at ∆φij,kl = ∆αij,kl , and it follows that h∆φij,kl i is an unbiased estimator of ∆αij,kl . Figure 1 shows the distribution of ∆φij,kl with degrees of coherence Dij = 0.4 and Dkl = 0.75 and a true interferometric phase differential ∆αij,kl = 0.6π for several values of L. III. E XPECTATION M AXIMIZATION Expectation maximization (EM) is a general framework for inference in statistical modelling problems; for a more general and rigorous treatment than will be given here, see [16]. The EM method becomes applicable to the problem under the assumption that the contents of a given dataset can be treated as a set of NX samples X = {x1 , . . . , xNX } that were independently generated by a distribution over samples p (x|Θ), where Θ is a set of model parameters. The functional form of p (x|Θ) is fixed, and must be chosen such that the distribution of any conceivable set of observed samples X can be adequately described by specifying appropriate parameters Θ. Under these conditions, the EM method is used to determine the parameters Θ appropriate to a given set of observed samples X. In concrete terms, when no a priori information concerning Θ is available (as is the case for unsupervised classification), the EM method maximises posterior probability of parameters Θ p (Θ|X) ∝ p (X|Θ) =

N X Y

s=1

p (xs |Θ)

(11)

for a given set of samples X by selecting a suitable set of parameters Θ. In the context of classification, the most appropriate functional form for p (x|Θ) is the so-called mixture model [17], which describes the overall distribution of samples as a sum of component densities. All component densities have the same functional form, but are parameterized independently to give a distribution over samples of the form p (x|Θ) =

NC 1 X p (x|θc ) NC c=1

(12)

where Θ is now defined as the set {θ1 , . . . , θNC } and θc denotes the parameters of the cth component density. For the present purposes, each component density is taken to represent the distribution of samples within a class: p (x|θc ) is the distribution of samples within a homogeneous region characterized by parameters θc , and NC denotes the total number of classes. The component density p (x|θ), to be defined in section III-A, is of a form that makes estimating an optimal θ relatively simple, given a set of samples X from a single homogeneous region. In other words, class parameters θc are easily estimated once the assignment of samples to classes (i.e. the classification result) is known. Conversely, once the class parameters are known, it is easy to obtain the classification result by assigning each sample xs to

October 5, 2007

DRAFT


7

the class c for which ∀e 6= c : p (xs |θc ) > p (xs |θe )

(13)

The difficulty of maximizing equation (11) lies in the fact that neither component parameters nor the classification result are known. Because direct maximization of (11) is (typically) analytically intractable, EM is implemented as an approximate method in which an initial set of component parameters Θ0 is iteratively refined to give Θ0 , Θ1 , . . . until a convergence criterion is met. Each iteration consists of two steps, the so-called E- and M-Steps to be described in sections III-B and III-C, respectively. A simple condition for terminating the iterated estimation ensures that the combined sample likelihood of equation (11) increases in subsequent iterations: p X|Θ(t+1) > p X|Θ(t)

(14)

A. Component Densities Before the process of iterative parameter refinement can be described in more detail, it is necessary to specify the component densities used. In this classifier, a homogeneous region is characterized by its polarimetric signature Σpp , degrees of interferometric coherence Dij and interferometric phase differences ∆αij,kl . The parameters of each component density therefore constitute a set of the form (c)

(c)

(c)

θc = {Σ(c) pp , ∀(ij) ∈ ID . Dij , ∀(ij, kl) ∈ I∆α . ∆αij,kl }

(15)

(c)

where the sets ID and I∆α specify which combinations of channels are used to characterize the component, or class, in terms of degrees of coherence and interferometric phase differences, respectively. The contents of these sets is the subject of section III-D. The component density p (xs |θc ) can then be defined in terms of the distributions introduced in section II:   Y (c) (s) (c)  p (xs |θc ) = p T(s) p dij |Dij , L  pp |Σpp , L (ij)∈ID



 × (s)

(s)

Y

(c) (ij,kl)∈I∆α

  (c) (c) (c) (s) p ∆φij,kl |∆αij,kl , Dij , Dkl , L 

(16)

(s)

where Tpp , dij and ∆αij denote the polarimetric signature, the degrees of coherence and differences in interferometric phase associated with sample xs , respectively. It is important to note that the assumptions of independence inherent in the component density defined above are not in line with the Gaussian hypothesis. Although the correlations between the quantities involved have been suppressed by marginalization, the Wishart distribution still dictates that they exist. As an example of correlations that are no longer enforced, consider the case dij = djk = 1. Clearly, the degree of coherence dik is not independently distributed with respect to dij and djk , since dik = 1 must hold. In the process of classification, this circumstance is not necessarily a disadvantage. Firstly, detailed investigations have shown that classifiers based on marginal distributions tend to perform well, even in the case of strong correlations [18]. Secondly, and more importantly,

October 5, 2007

DRAFT


8

the proposed component distribution essentially relaxes the Gaussian hypothesis, which can even be considered an advantage when classifying real SAR data featuring complex media and textured surfaces. B. E-Step t The refinement of a set of given component parameters Θt = {θ1t , . . . , θN } begins with the assignment of C

samples xs to components on the basis of Θt . In contrast to the k-means approach usually employed in iterative classifiers, this assignment is not binary. Rather, each sample xs has a degree of membership ζcs in each component

c. As the prior distribution over components is flat, the degree of membership ζcs , which is formally defined as the posterior p (c|xs ), is the likelihood of xs normalized over all components: p (xs |θct ) ζcs = PNC t e=1 p (xs |θe )

(17)

C. M-Step The degrees of membership ζcs can subsequently be used to refine the parameter set Θt to produce new estimates Θt+1 that increase the combined sample likelihood. Roughly speaking, component parameters are updated taking all samples into account, while using the degree of membership defined above to weigh the influence of each sample. More precisely, updated parameters are obtained by maximizing the expectation of the combined likelihood of equation (11) with respect to all possible assignments of samples to components (taking p (c|xs ) into account) by gradient ascent on Θt+1 . The updated component parameters are then given by:  PNX s (s) (c) 1    Σxx = Wc h s=1 ζc Txx  PNX s (s) i (c) θct+1 ← ζc dij Dij = inv W1c s=1  i hP   (s) NX s  ∆α(c) = arg exp ı∆φ ζ s=1 c ij,kl ij,kl where Wc =

PNX

s s=1 ζc ,

(18)

arg[. . . ] denotes the phase of the operand, and the operator inv[. . . ] indicates that the true

degree of coherence is obtained from the mean by numerically inverting equation (6). D. The Selection of Coherence Channels (c)

The last step in formulating an operational classifier is to specify the contents of sets ID and I∆α , introduced in (15), to determine which degrees of coherence dij and phase differentials ∆φij,kl to consider in component densities. Information regarding coherence and interferometric phase is contained in the

q 2

×

q 2

matrix T12 . Although the

elements of this matrix are treated as independent in equation (16), elements T12 (i, j) and T12 (j, i) are, in practice, highly correlated due to the similarity of the matrices T11 and T22 . The degrees of coherence and interferometric phase differences considered are therefore restricted to the 4q 2q + 1 elements on and above the diagonal of T12 . For the degree of coherence, all of these elements are potentially informative and therefore constitute the contents of ID .

October 5, 2007

DRAFT


9

In terms of the interferometric phase φij , T12 has, again,

q 4

q 2

+ 1 degrees of freedom. In principle, interfero-

metric phase differences can be defined for all possible pairs (φij , φkl ) on and above the diagonal of T12 . Using all possible pairings for classification would, however introduce significant amounts of redundant information, since the q q q q 4 2 + 1 interferometric phases can be reconstructed from a single absolute phase and 4 2 + 1 − 1 differences ∆φij,kl . To single out a subset of all possible phase differences for classification, it is helpful to consider the degrees of (c)

(c)

coherence Dij in a certain class c. The pair of channels a and b of the Pauli representation for which Dab is maximal provides a stable reference phase φab : high coherence implies a low level of phase noise. To minimize the impacts of phase noise, the 4q 2q + 1 − 1 non-redundant interferometric phase differences considered for classification are given by the difference between the reference phase and all other interferometric phases on and above the diagonal of T12 . ∆φij,ab = φij − φab

for

1 ≤ i, j ≤

q , i ≥ j, (i, j) 6= (a, b) 2

(19)

(c)

Consequently, the set I∆α then contains all pairs (ij, ab) in the expression above. The set of phase differences considered is, by this definition, class c dependent. The determination of (a, b) is therefore part of the process of updating class parameters during the M-Step of the EM iteration, and takes place (c)

after the degrees of coherence Dij have been updated. IV. S ELF -I NITIALIZATION Iterated statistical classifiers typically use decompositions and, in some cases, fixed thresholds to initialize class parameters before the refinement process can begin. The iterated refinement of class parameters using the EM algorithm, or the closely related k-means procedure, is a non-convex optimization problem. Thus, if the initial values of class parameters fail to capture the structure of the dataset sufficiently well, the iterated refinement of class parameters will converge to a local, sub-optimal maximum of the combined sample likelihood. This type of problem is generally to be expected when a class straddles one or more of the thresholds used for initialization. The alternative approach, developed in this section, is to begin the classification process with a single class, and subsequently introduce new classes sequentially. It should be noted that the very first class can be initialized with almost arbitrary parameters, since, for a single class, the EM algorithm will converge to the globally optimal parameters in a single iteration. A possible set of initial parameters is the following.    Σ(1) = Iq/2×q/2   xx 1 (1) θ1 ← Dij = 0.5     ∆α(1) = 0

(20)

ij,kl

where Iq/2×q/2 denotes the q/2 × q/2 identity matrix. As illustrated in figure 2, the EM algorithm iterates until the current collection of N classes has converged to a stable configuration before initializing class N + 1. The strategy of gradually introducing classes throughout the classification process has two principal advantages over established methods of initialization. Firstly, the problem of initialization is reduced from simultaneously

October 5, 2007

DRAFT


10

determining suitable initialization parameters for all classes 1 . . . NC to determining suitable parameters for a single class. Secondly, the initialization of class N + 1 can take into account the current classification state, including sample likelihoods and degrees of membership, for classes 1 . . . N . This information can be used to perform an initialization that is consistent with the underlying statistical model and the structure of the dataset. The initialization of a new class N + 1 is accomplished by identifying a class w that appears weak, in the sense that it appears to span more than one homogeneous region. This class is then split into two classes, and the process of parameter refinement continues. For the present purpose, the weakness, or strength, of a class should reflect the extent to which its parameters are representative of its contents. The approach adopted here is based on using likelihood ratios to measure whether or not samples within the same class are drawn from the same distribution. A class in which pairs of samples often appear to come from different distributions is then considered weak. A. Likelihood Ratios The likelihood ratio used was introduced in [19] to test for the equality of complex Wishart matrices. For two samples x and y it takes the general form Qxy =

argmaxθxy (p (x|θxy ) p (y|θxy ))

(21)

argmaxθx (p (x|θx )) argmaxθy (p (y|θy ))

and lies in the range [0, 1]. Small values indicate that samples x and y are not drawn from the same distribution. Although the ratio was originally formulated for the complex Wishart distribution, it can be extended to the component distribution of equation (16) easily, since the constituent densities can be treated independently. For the (s) (c) Wishart distribution, denoted p Tpp |Σpp , L in (16), the likelihood ratio QT , after discarding constant factors, was obtained in [19] as: QT xy

(x) L (y) L Tpp Tpp = 2L (x) (y) Tpp + Tpp

(22)

(s) (c) For degrees of interferometric coherence dij , distributed according to p dij |Dij , L , the likelihood ratio of equation (21) becomes Qdxy

(x) (y) argmaxDxy p dij |Dxy , L p dij |Dxy , L = (y) (x) argmaxDy p dij |Dy , L argmaxDx p dij |Dx , L

(23)

Unfortunately, the maximization in the numerator of (23) cannot be carried out analytically. To be practical in the (x)

(y)

context of this classifier, Qdxy must therefore be tabulated over possible values of dij and dij (both of which lie in the range [0,1]). The likelihood ratio for the distribution of interferometric phase differences ∆φij,kl is, in principle, obtained in the same manner. Since the inclusion of this ratio would, however, come at an enormous computational expense, it will not be considered. The likelihood ratio used in the definition of class strength below therefore takes into

October 5, 2007

DRAFT


11

account polarimetric information and degrees of interferometric coherence, and is given by d Qxy = QT xy Qxy

(24)

B. Initialization The process of initializing a new class begins with selecting a representative sample in all existing classes. In each existing class c, the representative rc is the most likely sample: ∀x ∈ X, x 6= rc . p (rc |θc ) > p (x|θc )

(25)

The strength Sc of a class can then be defined as the expected logarithm of the likelihood ratio Qrc x over all samples in X, with contributions weighted by the class membership ζcs . Sc = where Wc =

PNX

s s=1 ζc .

NX 1 X ζ s log (Qrc x ) Wc s=1 c

(26)

By this definition, classes that span several distinct homogeneous areas will be associated

with small values of Sc : samples that are dissimilar to the representative rc but have a high degree of class membership ζcs will make large negative contributions to the sum. The class w to be split is associated with the smallest value of Sc . Splitting consists of reassigning samples that were closer than average to the representative rw to the new class N + 1. Reassignment, in turn, is accomplished by modifying the degrees of class membership ζ to obtain ζ ′ .    ζs  0 log (Q ) > S r s w w w ′ s s′ ζN = ζw = +1  0  ζ s log (Q ) ≤ S rw s w w ζcs ′ = ζcs

log (Qrw s ) > Sw log (Qrw s ) ≤ Sw

(27)

∀c 6= w, c 6= N + 1

The initialization of the new class N + 1 is completed after using the M-Step of section III-C to obtain new parameters θw and θN +1 based on ζ ′ . C. Non-Gaussian Backscattering The procedure for incremental initialization is based on the statistics of complex Gaussian backscattering, however samples that do not conform to this model often arise in practice (e.g. the bright point targets associated with manmade structures). Classes that contain predominantly non-conforming samples are, per definition, associated with low likelihood ratios. Nevertheless, such classes should not be split, since the in-class distribution of samples simply cannot be adequately described by the component densities of equation (16). To avoid essentially futile divisions, each class c is associated with a threshold τc . If Sc < τc , class c will not be split, even when Sc is small compared to other classes. The initial class, which spans the entire dataset, is assigned a threshold τ1 = −∞ and will be split. When a class w of strength Sw is divided to give classes w and N + 1, the new classes are assigned thresholds τw = τN +1 = Sw , thereby preventing further division when a class becomes weaker after initialization.

October 5, 2007

DRAFT


12

For a class consisting of samples that do and others that do not conform the Gaussian backscattering model, equation (27) will tend to assign conforming and non-conforming samples to distinct classes. The threshold τw then ensures that only the predominantly conforming class can be further subdivided. If, on the other hand, both resulting classes are predominantly conforming, the in-class dispersion of samples decreases (the classes become more compact), and both will be considered for further division. V. R ESULTS AND D ISCUSSION The theory outlined in previous sections describes an integrated framework for the classification of PolInSAR data. This section shows how the theoretical building blocks can be recombined to give a range of classifiers, each of which may be particularly appropriate depending on the application considered. A comparison of the classification results obtained serves as the basis for discussing the merits and shortcomings of the proposed approach, and an assessment, for the most part qualitative, of classifier performance is given. A rigorous, quantitative evaluation with respect to one or several potential PolInSAR applications using ground truth data should be the subject of a comparative study in future. Figure 3 introduces the PolInSAR dataset used as a basis for the results discussed in the remainder of this section. 3a shows an optical image of the test area as a reference for interpreting the SAR data and classification results, and 3b indicates the regions of interest (ROIs) that will be referred to in the following. The PolInSAR dataset shown in 3c was acquired over Oberfaffenhofen, Germany, by DLR’s E-SAR sensor at L-Band and in repeat-pass mode. The average baseline across the scene is approximately 26m, which corresponds to a vertical wavenumber kz = 0.45 and a relatively high degree of volume decorrelation. The decorrelation in the forested areas (for example region A) is clearly evident in Figure 3d. The data, in form of coherency matrices, was range spectral filtered and multilooked using the IDAN region growing speckle filter [20]. The latter filter was chosen because it achieves homogenization without causing significant loss in spatial resolution. It should be noted that the IDAN filter is not optimal in the context of the proposed classifier, since the underlying region growing algorithm considers only intensity information in the stopping criterion: the filter may therefore miss, and fail to preserve, any non-stationarity in the signal that is not reflected in the backscattered intensity. This is a drawback in the analysis of PolInSAR data, especially in the case of forestry applications, where the backscattered intensity is often saturated. Unfortunately there is, at present, no filter that is adaptive with respect to all types of information considered here. Since all PolInSAR applications rely on second order statistics (such as coherence), it is to be hoped that specialized filters will be developed in future. Experiments on the dataset used in this section have shown that the IDAN filter is, on the whole, more effective at preserving signal characteristics than other established filtering algorithms. The dataset of Figure 3c has been IDAN filtered to give 200 looks before classification.

October 5, 2007

DRAFT


13

A. Class Labels and Post-Processing In classification, the assignment of class labels to observed samples is followed by a post-processing step in which classes are grouped or associated with semantic information on the basis of their contents. This process is highly specific to the application under consideration, as the parameters of interest vary from case to case. In the classification results of figures 4 and 5 (excepting 5b), class contents were analyzed using the results of a Freeman-Durden decomposition [21] to assign each class to one of the three elementary categories surface, double bounce and volume. This procedure represents a simple strategy for extracting elementary semantic information from the polarimetric signatures in a classified dataset. The categorization proceeds by averaging the decomposition vectors obtained from T11 matrices of all samples in a given class. To do so, coherency matrices are first transformed to covariance matrices, after which the Freman-Durden decomposition yields, for each covariance matrix, a three dimensional vector indicating the backscattering power for each elementary scattering type. The category assigned to each class is then reflected in the color palette: predominantly surface classes are assigned a brown hue, while double bounce and volume classes correspond to the colors red and green, respectively. Of course, this simple strategy by no means exhausts the potential for post-classification, thematic labeling. Most importantly, it is clearly desirable to also take interferometric information into account when associating semantic information with class contents. For instance, the approach described in [10] could be used to more effectively separate natural from man-made targets by regrouping the classes obtained into these broad categories. Alternatively, various PolInSAR model inversion techniques (see e.g. [1] [2]) could provide more salient physical parameters concerning class contents. This type of sophisticated labeling has not been attempted here, as an optimal approach must be determined on the basis of the concrete application being considered. In this sense, the labeling given is primarily intended to make the classification results easier to interpret and compare. The potential for assigning more meaningful semantic labels, with respect to concrete PolInSAR applications, is a central part of the discussion in the remainder of this section. The result of 5b is not post-processed using the Freeman-Durden decomposition and does not use the corresponding palette, as it is too different from other results in this section to make a direct comparison meaningful. Instead, the colors were chosen to maximize the contrast and make class boundaries easy to discern. Class colors are, in this case, not determined by class contents. B. Comparing Polarimetric and Polarimetric Interferometric Classification Figure 4 part a shows the classification result obtained using the full PolInSAR model described in previous sections, while 5a is based on a simplified model that takes into account only polarimetric information. The simplified model is obtained by discarding all terms related to the complex interferometric coherence from the component distribution and the likelihood ratio used for self-initialization. The reduced component density then becomes (c) p (xs |θc ) = p T(s) pp |Σpp , L

October 5, 2007

(28)

DRAFT


14

while the likelihood ratio is given by Qxy = QT xy

(x) L (y) L Tpp Tpp = 2L (x) (y) Tpp + Tpp

(29)

The two most apparent differences between the results are that the full PolInSAR classifier appears more successful at identifying the man-made structures (buildings) in the scene, and that the polarimetric classification shows more finely delimited structures in the forested areas. In addition, the polarimetric classifier shows a higher sensitivity over agricultural areas, which tend to be under-segmented in the PolInSAR result of 4a. The ability of PolInSAR classifiers to accurately identify man-made structures, even when their polarimetric signature is ambiguous, has been noted before in [10] and [11]. The polarimetric signature of buildings that are not parallel to the sensor trajectory is often indistinguishable from that of a volume scatterer. Interferometric information resolves this ambiguity, since man-made structures are typically associated with high degrees of coherence, while volume scatterers are associated with low coherence due to volume decorrelation. The improved separation of manmade structures is particularly evident for the buildings in region C1 and for the smaller structures immediately to the left of region D. Although interferometric information brings a clear improvement, the result shows that false-negatives, where low-coherent structures remain grouped with regions of volumetric backscattering, and to a lesser extent false-positives, where surfaces are grouped with man-made structures (e.g. two patches near the right edge of ROI A), remain. In general, it is highly unlikely that the broad category of man-made structures, associated with a wide range of polarimetric and interferometric characteristics, can be completely characterized by local scattering behavior alone. Further improvements on the basis of unsupervised classification results would require per-segment analysis, taking into account scattering behavior as well as application specific prior knowledge such as shape, context or, for instance, the presence of bright, coherent, deterministic scatterers. The higher sensitivity of the polarimetric classifier over the forested areas (see ROI A) is due to the high levels of volume decorrelation. In the PolInSAR model, low degrees of coherence imply noisy interferometric information. During class initialization, a comparatively high dispersion of samples in low-coherent classes is attributed to noise, such that regions with low interferometric coherence tend to be adequately described by few classes. A smaller baseline, with lower volume decorrelation, would encourage the detection of structures in the forest area. In case of the polarimetric model, on the other hand, it can be assumed that polarimetric information is saturated over the forested area. The structures detected are thus primarily distinguished by their intensity, and are of little relevance in most PolInSAR applications (e.g. current model inversion methods such as [1]). The under-segmentation observed in the PolInSAR classification, evident in region D, is a symptom of the fact that both results use the same number of classes. Since the PolInSAR model has a significantly larger parameter space, more classes are needed to describe the observed samples adequately. The result of 4b, in which a larger number of classes has lead to more success in separating agricultural fields, demonstrates this clearly. In addition, some surface classes only become separated when interferometric information is taken into account. For example, both results of figure 4 differentiate between the two types of surface constituting the runway of regions B1 and

October 5, 2007

DRAFT


15

B2, whereas the polarimetric classifier does not. Although these surfaces are almost indistinguishable in 3c, the degrees of coherence of 3d and the optical image 3a show that they are distinct. Figure 6 presents a more detailed analysis of the contents of these two classes separated in 4a. The plots in the first row show the dispersion of intensities on the diagonal of the T11 matrix. The dispersion observed is consistent with the multiplicative speckle model and, as expected, samples in both classes are associated predominantly with surface backscattering (the first channel in the Pauli representation). Importantly, the plots reveal that the polarimetric signatures of samples in the two classes are very similar in terms of intensity and coefficients of correlation: the plot 6a, in which samples of both classes are shown combined, suggests a single cluster and not two. Furthermore, plots 6b and 6c, in which samples are separated according to the classification in 4a, show that the classes are highly non-separable, making it no surprise that the polarimetric classifier fails to distinguish between them. The second row of plots concerns the degrees of coherence in both classes of 4a. The histograms of γi i, in gray, show that the two classes become separable when interferometric information is taken into account. In addition, the plots show the distributions over γ used in the component density of each class. The model generally approximates the empirical distributions acceptably well. In particular, the mode of the distribution is estimated accurately at low coherence due to the removal of bias in 18. The distributions of 6d match the class contents less well, as they fail to describe the long tails towards lower degrees of coherence. This mismatch is presumably due to the presence of additive terminal noise in dark areas of the dataset and, whatever the cause, serves as a reminder that the hypothesis of complex Gaussian backscattering is not always valid in practice. A more specific observation concerns classes that appear on the boundaries of some homogeneous regions in the dataset, particularly evident between forest and fields in the concave part of ROI A. The class corresponds largely to radar shadow, but appears exaggerated in some of the classification results (e.g. in figure 4). A detailed inspection of the dataset of 3c shows that this artefact is introduced by the speckle filter, which apparently fails to detect the non-stationarity on the edge of the shadow region. On the one hand, this can be interpreted as a weakness of the filter. Accumulating evidence from the second order statistics could conceivably improve the detection of such non-stationarities. On the other hand the problem in some sense lies with the proposed approach and indeed with the nature of PolInSAR data itself. It is well known that a large number of looks is necessary to accurately estimate the second order statistics in the PolInSAR coherency matrix (see the discussion of figure 8 below). Since the perfect speckle filter does not exist, and will in all probability never exist, such artefacts are to some extent an inherent feature of real PolInSAR data. C. Classification Based on the Complex Coherence Maps 5b to 5d show classification results obtained based solely on information related to complex interferometric coherence. The result of 5c uses a model that takes into account the degrees of coherence and interferometric phase

October 5, 2007

DRAFT


16

differences. The corresponding component density is, in this case,    Y  Y (s) (c) p (xs |θc ) =  p dij |Dij , L  ×  (ij)∈ID

  (s) (c) (c) (c) p ∆φij,kl |∆αij,kl , Dij , Dkl , L 

(c)

(ij,kl)∈I∆α

and the likelihood ratio for class initialization becomes (x) (y) argmaxDxy p dij |Dxy , L p dij |Dxy , L Qxy = Qdxy = (y) (x) argmaxDy p dij |Dy , L argmaxDx p dij |Dx , L

(30)

The classification result, compared to those of figure 4 and 5a, is more noisy since the degree of coherence and especially the interferometric phase differences are, statistically, more strongly affected by speckle. The result shows a high sensitivity with respect to surface classes: the classifier, for example, correctly separates the surfaces on the runway (ROI B1 and B2) and discriminates between fields in ROI D well. As in the case of the full PolInSAR model, highly coherent man-made structures tend to be identified as such (see ROI C1). Interestingly, however, a new ambiguity emerges. Highly coherent man-made structures appear indistinguishable from highly coherent surfaces (see ROI D) when the backscattered intensity is not considered. In this light, the full PolInSAR model is attractive since polarimetric ambiguities are resolved by interferometric information and vice versa. The sensitivity over the forest areas is low, since the classification no longer takes into account the intensity information that distinguished the finer structures evident in figure 4. In general, the result shows that complex interferometric coherences are sufficient to support a large number of seemingly meaningful classes. As a benchmark result based on interferometric information, illustration 5d is the classification obtained using k-means clustering based on the Euclidean distance between the degrees of coherence associated with each sample (a vector of 4q 2q + 1 = 6 elements ∈ [0, 1] per resolution cell). The simple k-means algorithm delivers results which, upon qualitative visual inspection, do not appear significantly inferior to those of 5c. A noticeable difference is that the proposed approach allocates less classes to low-coherence areas (such as ROI A) and more to highly coherent areas (such as ROI D). In accordance with the underlying statistical model, the proposed approach tends to attribute variations in low-coherence areas to in-class variability, while even small differences at high levels of coherence are considered significant. From the statistical point of view, therefore, simple clustering algorithms such as k-means tend to over-segment where coherence is low and under-segment where coherence is high. A further difference between the approaches is that the distributions employed in the proposed approach become strongly peaked as the number of looks increases. The number of looks, therefore, represents strong prior knowledge concerning the dispersion of samples within a cluster, or class, that is ignored in the k-means iteration. In terms of iterative clustering, strongly peaked distributions mean that a class center is less likely to converge on a point in between clusters in feature space. In practice, if the number of looks truly reflects the dispersion within homogeneous regions, this encourages homogeneous classification results. In terms of the results presented, the fact that k-means fails to take the number of looks into account is most probably responsible for the noise in the interior of ROI B1 as well as the increased confusion between the buildings of C1 and the highly coherent surfaces of D. Similarly, many of the buildings immediately below ROI A fall into a class that is primarily associated with forested areas. October 5, 2007

DRAFT


17

As an aside, very similar arguments apply when k-means is used to cluster the intensities on the diagonal of the T11 matrix. Although the corresponding results are not shown, k-means will tend to over-segment where intensities are high and under-segment where intensities are low, since the multiplicative nature of speckle noise is not taken into account. Once again, the number of looks as a measure of in-class dispersion is ignored and the consequences are as described. Part b of figure 5 shows a classification obtained using only interferometric phase differences. The corresponding component density is 

 p (xs |θc ) = 

Y

  (s) (c) (c) (c) p ∆φij,kl |∆αij,kl , Dij , Dkl , L 

(c)

(ij,kl)∈I∆α

(31)

The likelihood ration for initialization is based on the degree of coherence, and is given by Qdxy as given in (30). Although the classification result bears less resemblance to the structures evident in figure 3 than other results presented, it is perhaps the most interesting in view of potential PolInSAR applications. Interferometric phase differences are the basis for PolInSAR model inversion, and it is therefore encouraging to find that a number of agricultural fields are clearly characterized by phase differences alone. Also, classes in the forested area of region A (especially those colored green and purple) appear to correlate with vegetation characteristics (perhaps density or even height) suggested by 3a and 3d. The legend for 5b associates each class with an angle equal to the mean absolute interferometric phase difference between elements on the diagonal of the T12 matrix: h|∆αii,jj |iij

1 ≤ i, j ≤ 3

(32)

As non-zero phase differences ∆αii,jj indicate the existence of vertically displaced phase centers, large angles correspond to media with a complex vertical structure while small angles correspond to surfaces. As shown, the classes that feature prominently in the forested area of region A (first and third from the left in the legend) are associated with larger phase differences than most other classes. Figure 7 illustrates the dispersion of phase differences in this pair of classes. The labels F1 and F2 used in the figure correspond to the first and the third class from the left in the legend of figure 5, respectively. The graphs show the distributions over phase differences used in the component density of each class, as well as empirical densities for class F1. The distributions confirm that the classes are associated with distinct characteristic phase differences and the model distribution in class F1 is reasonably close to the empirical distribution. All distributions, however, are seen to be very wide due to the noise associated with low levels of coherence. In practice, noise impairs the accuracy of class parameter estimation and means that fewer classes can be identified due to decreasing separability. It is therefore desirable to combine phase differences with other features, as described above, and to select baselines associated with appropriate levels of volume decorrelation. The connection between phase differences and the vertical structure of media is also suggested in figure 8, which shows two-class results based on phase differences alone. The two classes generally separate media with a complex vertical structure, such as the forested area of region A, from surfaces. October 5, 2007

DRAFT


18

In addition, figure 8 illustrates the sensitivity of the classifier to speckle noise. The clarity with which structures are delineated increases steadily from 20 look data to 100 look data to 220 look data. Classifications based on interferometric phase differences are the most suitable when investigating noise sensitivity, as phase is most affected by speckle in practice. The results indicate that the full PolInSAR model requires data with at least 100 looks, preferably more. D. Self-Initialization Parts a and b of figure 9 show preliminary classifications obtained in the process of self-initialization for the full PolInSAR model and the model of absolute coherences and phase differences, respectively. Each column contains the classification state for NC = 2, 4, 6, 8 classes after the EM algorithm has converged. In addition, part a) shows, for a given class number N , the samples used to initialize class N + 1 in white. The sequence of initialization is evidently different for both models, with the full PolInSAR model saturating the information inherent in the intensity before other factors become dominant (the complex Wishart distribution initially makes the largest contribution, and is strongly affected by intensity). The initialization masks indicate the samples where log (Qrw s ) > Sw (see equation 27). Table I gives a quantitative comparison of alternative initialization strategies. The table compares the logarithm of the combined sample likelihood log (p (X|Θ)) of (11) obtained after EM convergence based on a variety of initial conditions. A higher combined likelihood indicates that the initialization strategy has been more successful (i.e. the estimated class parameters are closer to the global optimum). The initialization strategies compared are the H/α pre-segmentation for 8 classes described in [22], the H/α/A pre-segmentation for 16 classes described in [23], a random initialization in which samples are randomly assigned to one of 8 or 16 classes, and the self-initialization approach described in the preceding sections using 8 and 16 classes. In all cases, the underlying model takes into account only the polarimetric information in T11 , as in (28), and the figures for the random initialization were averaged over a dozen trials. For both 8 and 16 classes, the self-initialization approach outperforms alternative initialization strategies. Significantly, the advantage of self-initialization becomes more pronounced as the number of classes increases and initialization becomes more difficult. It is also interesting to note that the random initialization outperforms H/α for 8 classes. This result shows that the EM iteration on its own is effective in selecting an appropriate partitioning. This, however, is only true when the number of classes is small: as the size of the parameter space increases, the performance deteriorates and the need for effective initialization strategies is evident. E. Implementation and Performance The proposed classifier has a higher computational complexity than other iterative classifiers proposed previously mainly because classes are introduced sequentially, and the EM algorithm iterates until convergence after each introduction. The proposed method of self-initialization helps to keep the number of iterations required relatively low, since the initial class centers are already close to the local optimum in the data likelihood and previously October 5, 2007

DRAFT


19

initialized classes have already converged. In practice, the EM iteration was observed to converge in two to three iterations per class. The iterations themselves are, in terms of computational complexity, practically identical to iterations based on the Wishart distribution alone (provided the relevant distributions over degrees of coherence and interferometric phase differences are tabulated once per EM iteration). In summary, the classification process is more computationally intensive than in previously proposed classifiers, but not by orders of magnitude. In terms of implementation, it should be noted that numerical accuracy is essential. In particular, it is advisable to work with the logarithms of probability densities and likelihood ratios throughout instead of computing their values directly. This avoids numerical issues that are otherwise inevitable due to the strict limits on numerical accuracy encountered in practice. VI. C ONCLUSION The preceding sections describe a mathematical framework for the unsupervised classification of polarimetric interferometric SAR data. Herein, expectation maximization is used to derive a classification result that maximizes the likelihood of observations in an input dataset for a given number of classes. The statistical model underlying the expectation maximization process is based on the previously established distributions of the polarimetric coherency matrix and the degrees of interferometric coherence. In addition, the model takes into account differences in interferometric phase across polarization states, which are known to describe the vertical structure of an observed scene. To integrate this type of information into the classification process, the distribution over interferometric phase differences is introduced. The classifier described is modular, and can therefore be tailored to particular applications where only certain types of information are relevant. A preliminary evaluation based on theoretical considerations and the discussion of classification results for real polarimetric interferometric SAR data suggests that the using the full PolInSAR feature space resolves ambiguities associated with the partial information in the polarimetric signature and interferometric coherence. A rigorous, quantitative validation with respect to concrete PolInSAR applications should be the subject of a comparative study in future. R EFERENCES [1] K.P. Papathanassiou, and S.R. Cloude “Single Baseline Polarimetric SAR Interferometry,” IEEE Transactions on Geoscience and Remote Sensing, vol. 39 (11), pp. 2352–2363, 2001. [2] R. Treuhaft and P. Siqueira, “Vertical structure of vegetated land surfaces from interferometric and polarimetric radar,” Radio Science, vol. 35 (1), pp. 141–178, 2000. [3] R. Zandoná Schneider, K.P. Papathanassiou, I. Hajnsek, and A. Moreira, “Polarimetric and Interferometric Characterization of Coherent Scatterers in Urban Areas,” IEEE Transactions on Geoscience and Remote Sensing, vol. 44 (4), pp. 971–984, 2006. [4] J.D. Ballester-Berman, J.M. Lopez Sanchez, and J. Fortuny Guasch, “Retrieval of Biophysical Parameters of Agricultural Crops Using Polarimetric SAR Interferometry,” IEEE Transactions on Geoscience and Remote Sensing, vol. 43 (4), pp. 683–649, 2005. [5] J. Lee, M. Grunes, T. Ainsworth, L.-J. Du, D. Schuler, and S. Cloude, “Unsupervised classification using polarimetric decomposition and the complex wishart classifier,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37 (5), pp. 2249–2258, 1999. [6] J. Lee, M. Grunes, E. Pottier, and L. Ferro-Famil, “Unsupervised terrain classification preserving polarimetric scattering characteristics,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42 (4), pp. 722– 731, 2004.

October 5, 2007

DRAFT


20

[7] P.R. Kersten, J. Lee, T.L. Ainsworth, “Unsupervised classification of polarimetric synthetic aperture Radar images using fuzzy clustering and EM clustering,” IEEE Transactions on Geoscience and Remote Sensing, vol. 43 (3), pp. 519–527, 2005. [8] L. Ferro-Famil, F. Kugler, E. Pottier, and J. Lee, “Forest mapping and classification at L-Band using Pol-inSAR Optimal Coherence Set Statistics,” in Proc. EUSAR 2006, 2006. [9] J. Lee, T. Ainsworth, K. Papathanassiou, T. Mette, I. Hajnsek, and L. Ferro-Famil, “Forest classification based on Multi-Baseline Interferometric and Polarimetric E-SAR Data,” in Proc. EUSAR 2006, 2006. [10] L. Ferro-Famil, E. Pottier, and J. Lee, “Classification and Interpretation of Polarimetric Interferometric SAR Data,” in Proc. IGARSS 2002, 2002. [11] S. Guillaso, L. Ferro-Famil, A. Reigber, and E. Pottier, “Building characterization using L-Band polarimetric interferometric SAR data,” IEEE Geosci. and Remote Sensing Letters, vol. 2 (3), pp. 347–351, 2005. [12] R. Touzi and A. Lopes, “Statistics of the stokes parameters and of the complex coherence parameters in one-look and multilook speckle fields,” IEEE Trans. Geosci. and Remote Sensing, vol. 34(2), pp. 519 – 531, 1996. [13] N. Goodman, “Statistical analysis based on a certain complex gaussian distribution (an introduction),” Ann. Math. Statist., vol. 34 (1), pp. 152–180, 1963. [14] A. Lopes, E. Mougin, A. Beaudoin, S. Goze, E. Nezry, R. Touzi, M. Karam, and A. Fung, “Phase difference statistics related to sensor and forest parameters,” in Proc. IGARSS’92 Symp., 1992. [15] J.S. Lee, K.W. Hoppel, S.A. Mango, and A.R. Miller, “Intensity and Phase Statistics of Multi-look Polarimetric and Interferometric SAR Imagery,” IEEE Trans. Geosci. and Remote Sensing, vol. 32(5), pp. 1017–1028, 1994. [16] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum-Likehood from incomplete data via the expectation maximisation algorithm,” Journal of Royal Statistical Society B, vol. 39, 1977. [17] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, “On Combining Classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20 (3), pp. 226–239, 1998. [18] D.J. Hand, and K.Y. Yu, “Idiot’s Bayes–Not So Stupid After All?,” Int. Statistical Review, vol. 69(3), pp. 385–398, 2001. [19] K. Conradsen, A. Nielsen, J. Schou, and H. Skriver, “A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 31 (1), pp. 4–19, 2003. [20] G. Vasile, E. Trouvé, J. Lee, and V. Buzuloiu, “Intensity-Driven Adaptive-Neighborhood Technique for Polarimetric and Interferometric SAR Parameters Estimation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 44 (6), pp. 1609–1621, 2006. [21] A. Freeman and S.L. Durden, “A three-component scattering model for polarimetric SAR data,”, IEEE Transactions on Geoscience and Remote Sensing, vol. 36, pp. 963–973, May 1998. [22] S.R. Cloude, and E. Pottier, “An entropy based classification scheme for land applications of polarimetric SAR,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, pp. 68–78, 1997. [23] E. Pottier, “Unsupervised classification scheme and topography derivation of polsar data based on the H/α/A polarimetric decomposition theorem,” in Proc. 4th International Workshop on Radar Polarimetry, Nantes, France, pp. 535–548, 1998.

October 5, 2007

DRAFT


21

TABLE CAPTIONS

Table I: A comparison of initialization strategies.

October 5, 2007

DRAFT


22

F IGURE CAPTIONS

Fig. 1: The distribution of interferometric phase ∆φij,kl differentials for ∆αij,kl = 0.6π.

Fig. 2: An overview of the self-initialising classification process and the expectation maximisation iteration. Fig. 3: a) An optical image of the test area taken from Google-EarthTM b) The corresponding SAR intensity image, with annotations indicating regions of interest c) The PolInSAR dataset used for evaluation, acquired by DLR’s E-SAR sensor over Oberpfaffenhofen, Germany, at L-Band. The data was speckle filtered and the intensities on the diagonal of the coherency matrix are shown. d) The corresponding degrees of coherence d11 in red, d22 in blue and d33 in green.

Fig. 4: Classification results obtained using the full PolInSAR classifier. a) With NC = 16. b) With NC = 24.

Fig. 5: Classification results based on partial information. a) Using polarimetric information only (NC = 16). b) Using interferometric phase differences only (NC = 10). c) Using degrees of coherence and phase differences (NC = 16). d) Using k-means clustering on the degrees of coherence (NC = 16).

Fig. 6: A characterization of the contents of classes separating the surfaces of regions B1 and B2. a) to c) illustrate the dispersion of intensities on the diagonal of T11 . White triangles indicate the class centers. d) to f) show the histograms of coherence values corresponding to the diagonal elements of T12 , in gray, along with the model distributions. Fig. 7: Two classes separated by interferometric phase differences. From left to right: a map of the two classes concerned (classes F1 and F2 in gray and black, respectively) followed by plots of the interferometric phase differences among elements on the diagonal of the T12 matrix. Each plot shows a histogram of phase differences in class F1, as well as the model distribution for both classes. Fig. 8: The effects of noise on the classification of interferometric phase differences with NC = 2: a) 11 × 11 boxcar (220 Looks), b) 7 × 7 boxcar filter (100 Looks), c) 3 × 3 boxcar filter (20 Looks). Fig. 9: The classification state in the process of iterative self-initialization. a) For the full PolInSAR model (N = 2, 4, 6, 8 from left to right), with initialization masks in the second row. b) For the complex coherence model (N = 2, 4, 6, 8 from left to right).

October 5, 2007

DRAFT


23

F IGURE 1

Fig. 1.

The distribution of interferometric phase ∆φij,kl differentials for ∆αij,kl = 0.6π.

October 5, 2007

DRAFT


24

F IGURE 2

Fig. 2.

An overview of the self-initialising classification process and the expectation maximisation iteration.

October 5, 2007

DRAFT


25

F IGURE 3

(a)

(b)

(c)

(d)

Fig. 3. a) An optical image of the test area taken from Google-EarthTM b) The corresponding SAR intensity image, with annotations indicating regions of interest c) The PolInSAR dataset used for evaluation, acquired by DLR’s E-SAR sensor over Oberpfaffenhofen, Germany, at L-Band. The data was speckle filtered and the intensities on the diagonal of the coherency matrix are shown. d) The corresponding degrees of coherence d11 in red, d22 in blue and d33 in green.

October 5, 2007

DRAFT


26

F IGURE 4

(a) Fig. 4.

(b)

Classification results obtained using the full PolInSAR classifier. a) With NC = 16. b) With NC = 24.

October 5, 2007

DRAFT


27

F IGURE 5

Fig. 5.

(a)

(b)

(c)

(d)

Classification results based on partial information. a) Using polarimetric information only (NC = 16). b) Using interferometric phase

differences only (NC = 10). c) Using degrees of coherence and phase differences (NC = 16). d) Using k-means clustering on the degrees of coherence (NC = 16).

October 5, 2007

DRAFT


28

F IGURE 6

Fig. 6. A characterization of the contents of classes separating the surfaces of regions B1 and B2. a) to c) illustrate the dispersion of intensities on the diagonal of T11 . White triangles indicate the class centers. d) to f) show the histograms of coherence values corresponding to the diagonal elements of T12 , in gray, along with the model distributions.

October 5, 2007

DRAFT


29

F IGURE 7

Fig. 7.

Two classes separated by interferometric phase differences. From left to right: a map of the two classes concerned (classes F1 and F2

in gray and black, respectively) followed by plots of the interferometric phase differences among elements on the diagonal of the T12 matrix. Each plot shows a histogram of phase differences in class F1, as well as the model distribution for both classes.

October 5, 2007

DRAFT


30

F IGURE 8

Fig. 8.

The effects of noise on the classification of interferometric phase differences with NC = 2: a) 11 × 11 boxcar (220 Looks), b) 7 × 7

boxcar filter (100 Looks), c) 3 × 3 boxcar filter (20 Looks)

October 5, 2007

DRAFT


31

F IGURE 9

Fig. 9.

The classification state in the process of iterative self-initialization. a) For the full PolInSAR model (NC = 2, 4, 6, 8 from left to

right), with initialization masks in the second row. b) For the complex coherence model (NC = 2, 4, 6, 8 from left to right).

October 5, 2007

DRAFT


32

TABLE I TABLE I A COMPARISON OF INITIALIZATION STRATEGIES

October 5, 2007

Strategy

NC

log (p (X|Θ))

Random

8

-1.71e8

H/α

8

-1.76e8

Self Init.

8

-1.68e8

Random

16

-1.69e8

H/α/A

16

-1.68e8

Self Init.

16

-1.63e8

DRAFT

A Self-Initializing PolInSAR Classifier Using ... - CiteSeerX

A Self-Initializing PolInSAR Classifier Using ... - CiteSeerX

Suggest Documents

Classifier Fusion using Triangular Norms - CiteSeerX

Feature Selection using Linear Classifier Weights - CiteSeerX

1 unsupervised classification using wishart classifier - CiteSeerX

Classifier Conditions Using Gene Expression Programming - CiteSeerX

Using a Slim Function Word Classifier to Recognise ... - CiteSeerX

Using Support Vector Machine as a Binary Classifier - CiteSeerX

Using Support Vector Machine as a Binary Classifier - CiteSeerX

a hybrid method in vegetation height estimation using polinsar images ...

A Two-Stage Classifier Approach using RepTree

The Laplacian Classifier - CiteSeerX

PolInSAR - Online Journals UMS

FINGERPRINT PREDICTION USING CLASSIFIER ...

CLASSIFIER ENSEMBLES USING STRUCTURAL ...

A cloning approach to classifier training - CiteSeerX

Botnet Analysis Using Ensemble Classifier

Combining a Bayesian Classifier with Visualisation ... - CiteSeerX

A Classifier Ensemble of Binary Classifier Ensembles

Inversion of vegetation height from PolInSAR using complex least ...

Road Sign Classification using Laplace Kernel Classifier - CiteSeerX

Cost-Sensitive Classifier Selection Using the ROC ... - CiteSeerX

Learning Text Classifier using the Domain Concept ... - CiteSeerX

Face Recognition Using the Moving Window Classifier - CiteSeerX

Estimation of Project Success Using Bayesian Classifier - CiteSeerX

relevance feedback in cbir using the rls classifier - CiteSeerX