Collective Network of Binary Classifier Framework for ... - IEEE Xplore

2 downloads 0 Views 3MB Size Report
Jul 13, 2012 - 1169. Collective Network of Binary Classifier Framework for Polarimetric SAR Image Classification: An Evolutionary Approach. Serkan Kiranyaz ...
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

1169

Collective Network of Binary Classifier Framework for Polarimetric SAR Image Classification: An Evolutionary Approach Serkan Kiranyaz, Turker Ince, Stefan Uhlmann, Student Member, IEEE, and Moncef Gabbouj, Member, IEEE

Abstract—Terrain classification over polarimetric synthetic aperture radar (SAR) images has been an active research field where several features and classifiers have been proposed up to date. However, some key questions, e.g., 1) how to select certain features so as to achieve highest discrimination over certain classes?, 2) how to combine them in the most effective way?, 3) which distance metric to apply?, 4) how to find the optimal classifier configuration for the classification problem in hand?, 5) how to scale/adapt the classifier if large number of classes/features are present?, and finally, 6) how to train the classifier efficiently to maximize the classification accuracy?, still remain unanswered. In this paper, we propose a collective network of (evolutionary) binary classifier (CNBC) framework to address all these problems and to achieve high classification performance. The CNBC framework adapts a “Divide and Conquer” type approach by allocating several NBCs to discriminate each class and performs evolutionary search to find the optimal BC in each NBC. In such an (incremental) evolution session, the CNBC body can further dynamically adapt itself with each new incoming class/feature set without a full-scale retraining or reconfiguration. Both visual and numerical performance evaluations of the proposed framework over two benchmark SAR images demonstrate its superiority and a significant performance gap against several major classifiers in this field. Index Terms—Evolutionary classifiers, multidimensional particle swarm optimization (MD-PSO), polarimetric synthetic aperture radar (SAR).

I. I NTRODUCTION

T

HE ACCURATE classification of the polarimetric synthetic aperture radar (SAR) images is a major and challenging task. This has been an active research area for the last three decades. Various SAR image types have been used in many applications such as land cover classification using Airborne SAR (AIRSAR) [49], Advanced Land Observing Satellite Phased Array L-band SAR (ALOS-PALSAR) [31], oil-spill observation using ALOS-PALSAR [32], TerraSAR-X Manuscript received March 17, 2011; revised October 15, 2011, October 29, 2011, and January 12, 2012; accepted February 2, 2012. Date of publication March 22, 2012; date of current version July 13, 2012. This work was supported by the Academy of Finland, project No. 213462 (Finnish Centre of Excellence Program (2006–2011). This paper was recommended by Associate Editor E. Santos Jr. S. Kiranyaz, S. Uhlmann, and M. Gabbouj are with the Department of Signal Processing, Tampere University of Technology, 33101 Tampere, Finland (e-mail: [email protected]; [email protected]; [email protected]). T. Ince is with the Faculty of Engineering and Computer Sciences, Izmir University of Economics, Balcova 35330, Turkey (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2012.2187891

[10], and building height retrieval using HR SAR image [15]. The efforts can mainly be divided into three groups. The first type is the classification based on physical scattering mechanisms inherent in data such as the pioneer works [34], [51]. The second type is based on statistical characteristics of data [30], [45], and the most recent attempts belong to the techniques including image processing methods [39], [46]. There are also some hybrid approaches [29], [34], which combine some of the aforementioned types. In general, they can further be divided into supervised or unsupervised methods, their performance and suitability usually depend on applications and the availability of ground truth data (GTD). The supervised methods usually achieve higher classification accuracy compared to the unsupervised ones; however, providing a reliable GTD might be cumbersome. Van Zyl [51] proposed the first unsupervised classification algorithm by comparing the polarization properties of each pixel in an image to that of simple classes of scattering such as even number of reflections, odd number of reflections, and diffuse scattering. The pioneer works applied the Wishart distribution [29] to describe the statistics of the covariance matrix, using a maximum likelihood (ML) classifier. Cloude’s decomposition [8] was another major attempt later used to provide the basis for direct classification of polarimetric data [7]. Similar to [21] and [30], there exist many clustering techniques developed based on statistical modeling of polarimetric SAR (PolSAR) data, such as [4], [9], [18], [22], [35] and [42]. Additionally, recent works incorporate spatial proximity [11] or spatial context model [48] in addition to the initial statistical modeling and segmentation to improve the clustering and classification. It can be stated that including such spatial model will result in visually smoother and cleaner classification results compared to simpler clustering techniques or pixel-based classification approaches on SAR images mainly due to the presence of speckle noise. Unfortunately, most of general clustering methods previously applied to PolSAR images exhibit some common classification problems: 1) inability to perform global search and find global optimum, 2) inability to determine optimum number of clusters, and 3) sensitivity to initialization and parameters. In time, it has been realized that there is no single feature (or a feature set), which is sufficient and reliable for an accurate classification of every terrain type. Some features may discriminate well for certain types, but may not be so useful for the others. For example, a recent study on classification of agricultural fields over Flevoland data demonstrates that the statistics are not

1083-4419/$31.00 © 2012 IEEE

1170

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

necessarily well described by the Wishart distribution [16]. This problem is mainly due to the a-priori assumption of a particular distance metric in the feature space. Unless the feature space is separable by this metric, the classifier will not perform optimally. Such deficiencies drew the focus on the supervised adaptive techniques using classifiers with nonlinear operators such as artificial neural networks (ANNs) since, for instance, they do not make the aforementioned assumption. Instead, they determine distance metric during the training phase according to the distribution of the data, and hence it is not surprising that ANN-based methods recently proposed, [2], [13], [45], [49], for classification of SAR or high-resolution images have been shown to outperform other aforementioned techniques. Yet, designing an optimal ANN for the problem in hand is a crucial and challenging task. For instance, an ANN with no or too few hidden nodes may not differentiate among complex patterns, instead leading to only a linear estimation of such—possibly nonlinear—problem. In contrast, if the ANN has too many nodes/layers, it might be affected severely by the noise in data due to overparameterization, which eventually leads to a poor generalization or training. On such complex networks, proper training may be infeasible and/or highly time consuming. The optimum number of hidden nodes/layers might depend on input/output vector sizes, training and test data sizes, more importantly the characteristics of the problem, e.g., its nonlinearity, dynamic nature, etc. In those works where a single (fixed) classifier is used, the overall performance directly depends on the choice of the classifier and its parameters. Furthermore, the feature set and the number of classes are usually kept as limited as possible not to cause the aforementioned feasibility problems on training process due to the increased complexity and wellknown “curse of dimensionality” phenomenon. For this purpose, it is common to select only certain subset of features while discarding the others or to apply feature dimension reduction techniques such as principal component analysis (PCA). It is also worth mentioning that such systems are also static, that is, any update on either input or output layer (insertion of a new class or feature) will make the classifier useless and require setting up a new classifier from scratch. To address these problems and hence to maximize the classification accuracy, in this paper, we propose a global framework structure, which is designed to seek optimal classifier architecture for each distinct class type and feature set while utilizing a large set of major features within. Specifically, in this approach, the following objectives will be targeted. I. Evolutionary Search: Seeking for the optimum network architecture among a collection of configurations (the socalled architecture space, AS). II. Evolutionary Update in the AS: Keeping only “the best” individual configuration in the AS among indefinite number of evolution runs. III. Feature Scalability: Support for varying number of features. Any feature can be dynamically integrated into the framework without requiring a full-scale setup and reevolution.

IV. Class Scalability: Support for varying number of classes. Any class can dynamically be inserted into the framework without requiring a full-scale setup and re-evolution. V. High efficiency for the evolution (or training) process: Using as compact and simple classifiers as possible in the AS. VI. Online (incremental) Evolution: Continuous online/ incremental training (or evolution) sessions can be performed to improve the classification accuracy. VII. Parallel processing: Classifiers can be evolved using several processors working in parallel. In order to achieve all these objectives, we adopt a divide and conquer type of approach, which is based on a novel framework encapsulating a network of (evolutionary) binary classifiers (NBCs). Each NBC is devoted to a unique class and further encapsulates a set of evolutionary BCs, each of which is optimally chosen within the AS, discriminating the class of the NBC with a unique feature set (or subfeature). The optimality therein can be set with a user-defined criterion. In this paper, two common ANN types, the multilayer perceptrons (MLPs) and the radial basis function (RBF) networks are used as the BCs; however, any other classifier types such as support vector machines and random forests can also be used within CNBC framework as the BC type as long as they can be evolved incrementally. A basic CNBC topology has first been introduced in [23] for macro invertebrate classification using MLPs as the BC type. In addition to the exhaustive search with the numerous runs of the back-propagation (BP) method, the recently proposed multidimensional particle swarm optimization (MD-PSO) [24], [25] is used as the primary evolution technique. The CNBC framework is developed over a dedicated application with a proper user interface (UI) where the user can define new classes, or update the existing ones, while specifying the GTD over a SAR image. Once the evolution process is completed for all individual BCs in all NBCs, CNBC can then be used to classify the entire SAR image with the predefined classes. The rest of the paper is organized as follows. Section II presents evolutionary ANNs. Section III makes a brief introduction to the basic theory of PolSAR, presents the major features with their proper normalization and the formation of the training (ground-truth) data with a dedicated application. The proposed CNBC framework along with the evolutionary update mechanism is explained in detail in Section IV. Section V provides classification results over two benchmark SAR images and performs comparative evaluations with major classifiers. Finally, Section VI concludes the paper and discusses topics for future work.

II. E VOLUTIONARY N EURAL N ETWORKS In this section, we shall discuss the methodology for achieving the first objective that is the evolutionary search for the optimal classifier configuration. First, the evolutionary technique, MD-PSO, will be briefly explained, and then its application over two major ANNs, MLPs and RBF networks, shall be introduced.

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

xd˜a (t)

A. Multidimensional Particle Swarm Optimization As the evolutionary method, we shall use the MD extension of the basic PSO (bP SO) method, the MD-PSO, recently proposed in [24]. Instead of operating at a fixed dimension N , the MD-PSO algorithm is designed to seek both positional and dimensional optima within a dimension range, {Dmin , Dmax }. To accomplish this, each particle has two sets of components, each of which has been subjected to two independent and consecutive processes. The first one is a regular positional PSO, i.e., the traditional velocity updates and due positional shifts in N dimensional search (solution) space. The second one is a dimensional PSO, which allows the particle to navigate through dimensions. Accordingly, each particle keeps track of its last position, velocity, and personal best position (pbest) in a particular dimension so that when it revisits the same dimension later, it can perform its regular “positional” update using this information. The dimensional PSO process of each particle may then move the particle to another dimension where it will remember its positional status and will be updated within the positional PSO process at this dimension, and so on. The swarm, on the other hand, keeps track of the gbest particle in each dimension, indicating the best (global) position so far achieved. Similarly, the dimensional PSO process of each particle uses its personal best dimension in which the personal best fitness score has so far been achieved. Finally, the swarm keeps track of the global best dimension, dbest, among all the personal best dimensions. The gbest particle in the dbest dimension represents the optimum solution and dimension, respectively. In a MD-PSO process at time (iteration) t, each particle a in the swarm with S particles, ξ = {x1 , . . . , xa , . . . , xS }, is represented by the following characteristics: xd (t)

xxa,ja

(t)

xd (t)

(t)

xd (t)

(t)

vxa,ja

xya,ja

gbest(d) xˆ yjd (t) xda (t) vda (t)

jth component (dimension) of the position of particle a, in dimension xda (t) jth component (dimension) of the position of particle a, in dimension xda (t) jth component (dimension) of the personal best position of particle a, in dimension xda (t) global best particle index in dimension d jth component (dimension) of the global best position of swarm, in dimension d dimension component of particle a velocity component of dimension of particle a

 xd (t+T ) xya,ja (t

xd (t)

xya,ja

+ T) =

xd (t+T )

xxa,ja

(t)

personal best particle a

dimension

B. MD-PSO for Evolving MLPs As a stochastic search process in MD search space, MDPSO seeks (near-) optimal networks in an AS, which can be defined over any type of ANNs with any properties. All network configurations in the AS are enumerated into a hash table with a proper hash function, which ranks the networks with respect to their complexity associating higher hash indices to networks with higher complexity. MD-PSO can then use each index as a unique dimension of the search space where particles can make interdimensional navigations to seek an optimum dimension (dbest) and the optimum solution on that dimension, xˆ y dbest . The former corresponds to the optimal architecture, and the latter encapsulates the optimum network parameters (connections, weights, and biases). Suppose for the sake of simplicity, a range is defined for the minimum and maximum number of layers, {Lmin , Lmax } and number of l l , Nmax }. The sizes of neurons for the hidden layer l, {Nmin both input and output layers are determined by the problem and hence are fixed. The AS can then be defined only by Lmax −1 1 , . . . , Nmin , No } and two range arrays, Rmin = {Ni , Nmin 1 Lmax −1 Rmax = {Ni , Nmax , . . . , Nmax , No }, one for the minimum and the other for the maximum number of neurons allowed for each layer of a MLP. The size of both arrays is naturally Lmax + 1 where the corresponding entries define the range of the lth hidden layer for all those MLPs, which can have an lth hidden layer. The size of the input and output layers, {Ni , No }, is fixed, and is the same for all configurations in the AS. Lmin ≥ 1 and Lmax can be set to any value meaningful for the

    xd (t+T ) xd (t)(t) if f xxa a (t + T ) > f xya a

(t + T )

xd˜a (t + 1) =

xd˜a (t) xda (t + 1)

of

Let f denote the fitness function that is to be optimized within a certain dimension range, {Dmin , Dmax }. Without loss of generality, assume that the objective is to find the minimum of f at the optimum dimension within a MD search space. Assume that the particle a visits (back) the same dimension after T iterations, xda (t) = xda (t + T ), then the personal best position can be updated in iteration t + T given in (1), shown at the bottom of the page. Furthermore, the personal best dimension of particle a can be updated in iteration given in (2), shown at bottom of the page. Further details and pseudocode of MD-PSO can be found in [24].

else

j = 1, 2, . . . , xda (t + T )



component

1171

(1)

    xd (t+1) xd˜ (t) if f xxa a (t + 1) > xya a (t) else

(2)

1172

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

problem encountered. The hash function then enumerates all potential MLP configurations into hash indices, starting from the simplest MLP with Lmin − 1 hidden layers, each of which has minimum number of neurons given by Rmin , to the most complex network with Lmax − 1 hidden layers, each of which has maximum number of neurons given by Rmax . Let Nhl be the number of hidden neurons in layer l of a MLP with input and output layer sizes Ni and No , respectively. The input neurons are merely fan-out units since no processing takes place. Let F be the activation function applied over the weighted inputs plus a bias, as follows: p,l ykp,l = F (sp,l k ) where sk =



l−1 p,l−1 wjk yj + θkl

(3)

j

where ykp,l is the output of the kth neuron of the lth hidl−1 is the weight den/output layer when the pattern p is fed, wjk from the jth neuron in layer l − 1 to the kth neuron in layer l, and θkl is the bias value of the kth neuron of the lth hidden/output layer. The training mean square error, M SE, is formulated as follows: No 1  2 MSE = (tpk − ykp,o ) 2P No

(4)

p∈T k=1

and biases. The activation function of the kth RBF unit is defined as   X − µk  (5) yk = ϕ σk where ϕ is a RBF, i.e., a strictly positive radially symmetric function, which has a unique maximum at N -dimensional center µk and whose value drops rapidly close to zero away from the center. σk is the width of the peak around the center µk . The activation function gets noteworthy values only when the distance between the N -dimensional input X and the center µk , X − µk , is smaller than the width σk . The most commonly used activation function in RBF networks is the Gaussian basis function defined as   X − µk  (6) yk = exp − 2σk2 where µk and σk are the mean and standard deviation, respectively, and . denotes the Euclidean norm. MD-PSO is used to dynamically cluster the training data to extract the optimal number of clusters from which the cluster centroids and variance (µk and σk ) of the Gaussian neurons can be found. At time t, the particle a has the positional compoxd (t) nent formed as, xxa a (t) = {µa,1 , . . . , µa,j , . . . , µa,xda (t) } xd (t)

tpk

ykp,o

where is the target (desired) output and is the actual output from the kth neuron in the output layer, l = o, for pattern p in the training data set T with size P , respectively. At a time t, suppose that particle a, has the positional xd (t) 0 1 }, {wjk }, component formed as, xxa a (t) = Ψxda (t) {{wjk o−1 o−1 1 2 2 o l {θk }, {wjk }, {θk }, . . . , {wjk }, {θk }, {θk }} where {wjk } l and {θk } represent the sets of weights and biases of the layer l of the MLP configuration, Ψxda (t) . Note that the input layer (l = 0) contains only weights whereas the output layer (l = o) has only biases. By means of such a direct encoding scheme, particle a represents all potential network parameters of the MLP architecture at the dimension (hash index) xda (t). As mentioned earlier, the dimension range, {Dmin , Dmax }, where MD-PSO particles can make interdimensional jumps, is determined by the AS defined. Apart from the regular limits such as (positional) velocity range, {Vmin , Vmax }, dimensional velocity range, {V Dmin , V Dmax }, the data space can also be xd (t) limited with some practical range, i.e., Xmin < xxa a (t) < Xmax . Setting MSE in (4) as the fitness function enables MD-PSO to perform evolutions of both network parameters and architectures within its native process. Further details and an extensive set of experiments demonstrating the optimality of the networks evolved with respect to several benchmark problems can be found in [17] and [25].

C. MD-PSO for Evolving RBFs Another popular type of feedforward ANNs is the RBF network [32], which has always two layers in addition to the passive input layer: a hidden layer of RBF units and a linear output layer. Only the output layer has connection weights

where the jth component, xxa,ja (t) = µa,j represents a potential solution in form of the cluster centroids for xda (t) clusters. The fitness (the clustering validity index) function f that is to be optimized is formed with respect to two widely used criteria in clustering: • Compactness: Data items in one cluster should be similar or close to each other in N dimensional space and different or far away from the others when belonging to different clusters. • Separation: Clusters and their respective centroids should be distinct and well separated from each other. The following validity index is used to obtain computational simplicity with minimal or no parameter dependency: a (t) a (t) , Z) = Qe (xxxd )(xda (t))τ where f (xxxd a a a (t) Qe (xxxd )= a

1 xda (t) 

xda (t)

×

j=1



xda (t) − zp xxa,j xda (t) xxa

xd (t) ∀zp ∈xxa,ja

(7) where Qe is the quantization error (or the average intracluster distance) as the Compactness term and (xda (t))τ is the Separation term, by simply penalizing higher cluster numbers with an exponential, τ > 0. Using τ = 1, the validity index yields the simplest form and becomes entirely parameter-free. Once the optimum number of clusters, which is equal to the number of Gaussian neurons, and their centroids, µk , are found, then the computation of the variance of each Gaussian

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

neuron, σk , is straightforward. Finally, weights (w) and bias (θ) of the RBF can be computed by the BP technique, [5], [36], [41] which will be skipped due to space limitations. III. P OLARIMETRIC SAR DATA P ROCESSING The first part of this section describes a wide variety of PolSAR features used in this work and discusses how they are normalized and combined to form the feature vectors (FVs) that are to be used for evolving the CNBC framework. The second part then presents several data manipulations performed by a dedicated application such as SAR data visualization, user GTD retrieval, and finally the formation of the training data set. A. Feature Extraction and Normalization PolSAR features can generally be divided into two categories. The first group belongs to the features extracted directly from the PolSAR data and its different transforms such as the scattering matrix, and from which the Stokes matrix, the covariance matrix, and the coherency matrix can be derived. The second group is based on the polarimetric target decomposition theorems, which are used for information extraction in PolSAR. The coherent decomposition theorems such as the Pauli decomposition [8], the Krogager decomposition [26], the Cameron decomposition [3], and the sphere, diplane, helix decomposition [27] aim to express the measured scattering matrix by the radar as the combination of scattering responses of coherent scatterers. In the other category, the incoherent decompositions such as the Freeman decomposition [14], the Huynen decomposition [19], and the Cloude-Pottier (eigenvector-eigenvalue or H/α/A) decomposition [8] employ the second-order polarimetric representations of PolSAR data (such as covariance matrix or coherency matrix) to characterize distributed scatterers. Each feature has its own strength and weaknesses for discriminating different SAR class types, and this fact has been approved by several recent researches such as [37] and [38] which concluded that employing multiple features and different combinations can significantly improve the SAR image classification. Particularly, a recent study [50] has shown that multifeature combination using randomized clustering forest classifier can improve the classification performance up to 8%. Accordingly, we shall use a large set of features from both groups since feature scalability and optimal feature selection are the major objectives aimed by the proposed classification framework. PolSAR systems often measure the complex scattering matrix, [S], produced by a target under study with the objective to infer its physical properties. Assuming linear horizontal and vertical polarizations for transmitting and receiving, [S] can be expressed as

Shh Shv . (8) [S] = Svh Svv Reciprocity theorem applies in a monostatic system configuration, Shv = Svh . For coherent scatterers only, the decompositions of the measured scattering matrix [S] can be employed to characterize the scattering mechanisms of such targets. One

1173

way to analyze coherent targets is the Pauli decomposition [8], which expresses [S] in the so-called Pauli basis as

Shh Shv = α[S]a + β[S]b + γ[S]c where S= Svh Svv



1 1 0 1 1 0 [S]a = √ , [S]b = √ , 2 0 1 2 0 −1

1 1 0 [S]c = √ (9) 2 0 1 √ √ where α = (Shh + Svv )/ 2, β = (Shh − Svv )/ 2, γ = √ 2Shv . Hence, by means of the Pauli decomposition, all polarimetric information in [S] could be represented by combining the intensities |α|2 , |β|2 and |γ|2 , which determine the power scattered by different types of scatterers such as single- or odd-bounce scattering, double- or even-bounce scattering, and orthogonal polarization by volume scattering. Alternatively, the second-order polarimetric descriptors of the 3 × 3 average polarimetric covariance [C] and coherency [T ] matrices can be derived from [S] and employed to extract physical information from the observed scattering process. Due to the presence of speckle noise and random vector scattering from surface or volume, PolSAR data are often multilook processed by averaging n neighboring pixels. By using the Pauli-based scattering√ matrix for a pixel i, ki = [Shh + Svv , Shh − Svv , 2Shv ]T / 2, the multilook coherency matrix [T ] can be written as 1 ∗ ki ki T . n i=1 n

[T ] =

(10)

Both coherency [T ] and covariance [C] are 3 × 3 Hermitian positive semi definite matrices, and since they can be converted into one another by a linear transform, both are equivalent representations of the target polarimetric information. The Cloude-Pottier decomposition [8] is based on eigenanalysis of the polarimetric coherency matrix, [T ] ∗





[T ] = λ1 e1 e1T + λ2 e2 e2T + λ3 e3 e3T

(11)

where λ1 > λ2 > λ3 ≥ 0 are real eigenvalues and the corresponding orthonormal eigenvectors ei (representing three scattering mechanisms) are  T ei = eiφi cos αi , sin αi cos βi eiδi , sin αi sin βi eiγi . (12) Cloude and Pottier defined entropy H, average of set of four angles α, β, δ, and γ, and anisotropy A for analysis of the physical information related to the scattering characteristics of a medium H=−

3 

pi log3 pi

δ¯ =

3  i=1 3  i=1

pi αi , pi δ i ,

p i = 3

λi

i=1

i=1

α ¯=

where

β¯ = γ¯ =

3  i=1 3  i=1

1λi

piβi

pi γi ,

(13) (14)

A=

p2 − p3 . p2 + p3

(15)

1174

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

Fig. 1. Visualization of different feature components (entropy, H, anisotropy, A, alpha, α) that are discriminative for certain land class(es) such as water, urban/flat zone and mountain.

For a multilook coherency matrix, the entropy, 0 ≤ H ≤ 1, represents the randomness of a scattering medium between isotropic scattering (H = 0) and fully random scattering (H = 1), while the average alpha angle (α) can be related to target average scattering mechanisms from single-bounce (or surface) scattering (α ≈ 0) to dipole (or volume) scattering (α ≈ π/4) to double-bounce scattering (α ≈ π/2). Due to basis invariance of the target decomposition, H and α are roll invariant, hence they do not depend on orientation of target about the radar line of sight. Additionally, information about target’s total backscattered power can be determined by the Span as Span =

3 

λi .

(16)

i=1

Entropy (H), estimate of the average alpha angle (α), and Span calculated by the above incoherent target decomposition method have been commonly used as polarimetric features of a scatterer in many target classification schemes [28], [29]. Additionally, there are other measures derived from PolSAR data such as three complex correlation coefficients (ρ12 , ρ13 , ρ23 ) between scattering matrix terms that can be included in the FV to be used as the input to the CNBC framework. As a result, we formed the following three set of FVs (FVn ), which will be input (sub)features for the proposed network of BCs that will be detailed in Section IV. Each FV has the following components selected from the aforementioned features: F V1 = [T11 , T22 , T33 , C11 |C12 , | ∠C12 , |C13 , | ∠C13 , C22 , |C23 , | ∠C23 , C33 ]  (17)  ¯ δ, ¯ γ¯ , λ1 , λ2 , λ3 (18) ¯ β, F V2 = Span, H, A, α, (19) F V3 = [|ρ12 |, ∠ρ12 , |ρ13 |, ∠ρ13 , |ρ23 |, ∠ρ23 ] . For the purpose of normalizing and scaling each FV, first the logarithm of magnitude features in FV1 , and Span and three eigenvalues in FV2 are taken before all features are linearly scaled into [−1, 1] interval. Note that while calculating the logarithm, features with zero values are replaced with their respective nonzero minimum. Finally, histogram equalized FVs that are found to be more effective using the FV visualization application as explained next, can then be presented to the input layer of the proposed evolutionary classifier framework.

Fig. 2. CNBC test-bed application GUI showing a sample user-defined ground-truth set over San Francisco Bay area. Each point (pixel) selected per class is marked with a “+” and zoomed in the image for a better visualization.

B. Formation of the Training Data set The proposed framework is developed over a dedicated application, which first of all, provides an appropriate “data visualization” of the SAR image where any feature component selected by the user is properly scaled and mapped to one of the three main color channels, red, green, and blue. Since each feature may have certain discrimination ability for any SAR class type, the user can thus have the opportunity to visualize several SAR images formed with different feature components to set the GTD in a greater accuracy. This is clearly visible on the example given in Fig. 1 where three feature components (second, third, and fourth components of the FV2 corresponding to H/A/α features) can discriminate quite well one (or some) class(es) among all. This is basically the advantage of using as many features as possible since the proposed framework can use the individual discrimination power of each of them while compensating their weaknesses with the others, and thus it can further improve the overall classification performance with incremental contribution of each feature. As for the GTD, the user can define any number of classes to which (s)he can register some points (series of pixels) in a SAR image. Recall that this is one of the major objectives of the proposed classification framework that is, making the classification possible on any SAR image particularly even if the GTD is not readily available or subjective. Fig. 2 shows

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

1175

Fig. 3. Evolutionary update in a sample architecture space for MLP configuration arrays Rmin = {15, 1, 2} and Rmax = {15, 4, 2} where NR = 3 and NC = 5. The best runs for each configuration are highlighted and the best configuration in each run is tagged with a “∗.”

the GUI of this application where the SAR image is formed using the first three components of the aforementioned SAR feature from coherency matrix, T11 , T22 , and T33 , each of which properly scaled and mapped to R, G, B channels. In a short time, the user registered those points or rectangular areas, any of which defines a set of positive samples for each class. Since this is a uniclass data set where one sample can belong to only one class, each positive sample of an individual class can be used as a negative sample for all others. Yet, if the user defines a large number of classes, the unbalanced numbers of positive and negative samples per class may cause a bias problem, where for every positive sample, there will be a large number of negative samples, which may bias the classifier. To prevent this, a negative sample selection is performed in such a way that for each positive sample, number of the negative samples (per positive sample) will be limited according to a predetermined positiveto-negative ratio (PNR). Selection of the negative samples is performed with respect to the closest proximity to the positive sample so that the classifier can be evolved by discriminating those negative samples (from the positive sample) that have the highest potential for the false positive. Therefore, if properly trained, the classifier can draw the “best possible” boundary between the positive and (PNR number of) negative samples, which shall in turn improve the classification accuracy. The features of those selected points and the classes will form the FVs of the training data set, over which the CNBC body can be created (if not already) and evolved. IV. SAR I MAGE C LASSIFICATION F RAMEWORK This section describes in detail the proposed framework: collective NBCs, the CNBC, which uses the training data set to configure its internal structure and to evolve its BCs individually. Before going into details of CNBC, the evolutionary update mechanism, which keeps only the best networks within the AS of each BC, will be introduced next. A. Evolutionary Update in the Architecture Space Since the evolutionary technique, MD-PSO, is a stochastic optimization method, in order to improve the probability of convergence to the global optimum, several evolutionary runs

can be performed. Let NR be the number of runs and NC be the number of configurations in the AS. For each run, the objective is to find the optimal (the best) classifier within the AS with respect to a predefined criterion. Note that along with the best classifier, all other configurations in the AS are also subject to evolution, and therefore, they are continuously (re)trained with each run. Hence, during this ongoing process, between any two consecutive runs, any network configuration can replace the current best one in the AS if it surpasses it. In addition to the MD-PSO evolutionary search, this is also true for the exhaustive search, where each network configuration in the AS is trained by NR BP runs and the same evolutionary update rule applies. Fig. 3 demonstrates an evolutionary update operation over a sample AS containing five MLP configurations. The table shows the training MSE, which is the criterion, used to select the optimal configuration at each run. The best runs for each configuration are highlighted and the best configuration in each run is tagged with “∗”. Note that at the end of the three runs, the overall best network with M SE = 0.10 has the configuration: 15 × 2 × 2 and thus used as the classifier for any classification task until any other configuration surpasses it in a future run. In this way, each BC configuration in the AS can only evolve to a better state, which is the main purpose of the proposed evolutionary update mechanism. B. Collective Network of Binary Classifiers 1) Topology: To achieve the third and fourth objectives mentioned earlier, the scalability with respect to a varying number of classes and features, a novel framework encapsulating a NBCs is developed, where NBCs can evolve continuously with the ongoing evolution sessions using the user supplied GTD. Each NBC corresponds to a unique SAR class and shall contain varying number of evolutionary BCs in the input layer where each BC performs binary classification using a single (sub)feature. Therefore, whenever a new feature is extracted, its corresponding BC will be created, evolved (using the available GTD logs so far), and inserted into each NBC, yet keeping each of the other BCs “as is.” On the other hand, whenever an existing feature is removed, the corresponding BC is simply removed from each NBC in the system. In this way, scalability with respect to any number of features is

1176

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

Fig. 4. Topology of the proposed CNBC framework with C classes and N features (FVs).

achieved. and the overall system can avoid re-evolving from scratch. Each NBC has a “fuser” BC in the output layer, which collects and fuses the binary outputs of all BCs in the input layer and generates a single binary output, indicating the relevancy of each FV to the NBC’s corresponding class. Furthermore, CNBC is also scalable to any number of classes since whenever a new class is defined by the user, a new NBC can simply be created (and evolved) only for this class without requiring any need for change or update on the other NBCs. This way, the overall system dynamically adapts to user demands for varying number of SAR classes. As shown in Fig. 4, the main idea in this approach is to use as large number of classifiers as necessary, so as to divide a massive learning problem into many NBC units along with the BCs within. This prevents the need of using complex classifiers as the performance of both training and evolution processes degrades significantly as the complexity rises due to the curse of dimensionality. A major benefit of our approach with respect to efficient training and evolution process is that the configurations in the AS can be kept as compact as possible avoiding unfeasibly large storage and training time requirements. This is a significant advantage particularly for the training methods performing local search, such as BP since the amount of deceiving local minima is significantly lower in the error space for such simple and compact ANNs. Furthermore, when BP is applied exhaustively, the probability of finding the optimum solution is significantly increased. In order to maximize the classification accuracy, we applied a dedicated class selection technique for CNBC. We used 1-of-n encoding scheme in all BCs, therefore, the output layer size of all BCs is always two. Let CVc,1 and CVc,2 be the first and second output of the cth BC’s class vector (CV). The class selection in 1-of-n encoding scheme can simply be performed by comparing the individual outputs, e.g., say a positive output if CVc,2 > CVc,1 , and vice versa for negative. This is also true for the fuser BC, the output of which makes the output of its NBC. FVs of each data set item are fed to each NBC in the CNBC. Each FV drives through (via forward propagation) its corresponding BC in the input layer of the NBC. The outputs of

these BCs are then fed to the fuser BC of each NBC to produce all CVs. The class selection block shown in Fig. 4 collects them and selects the positive class(es) of the CNBC as the final outcome. This selection scheme, first of all, differs with respect to the data set class type, where the data set can be called as “uniclass,” if an item in the data set can belong to only one class, otherwise called as “multiclass.” Therefore, in a uniclass data set, there must be only one class, the c∗ , selected as the positive outcome whereas in a multiclass data set, there can be one or more NBCs, {c∗ }, with a positive outcome. In the class selection scheme, the winner-takes-all strategy is utilized. Assume without loss of generality that a CV of {0, 1} or {−1, 1} corresponds to a positive outcome where CVc,2 − CVc,1 is maximum. Therefore, for uniclass data sets, the positive class index, c∗ , (“the winner”) is determined as follows: c∗ = arg max (CVc,2 − CVc,1 ). c∈[0,C−1]

(20)

In this way, the erroneous cases (false negative and false positives) where no or more than one NBC exists with a positive outcome can be properly handled. However, for multiclass data sets, the winner-takes-all strategy can only be applied when no NBC yields a positive outcome, CVc,2 ≤ CVc,1 ∀c ∈ [0, C − 1], otherwise for an input set of FVs belonging to a data set item, multiple NBCs with positive outcome may indicate the multiple true-positives and hence cannot be further pruned. As a result, for a multiclass data set the (set of) positive class indices, {c∗ }, is selected as follows:   arg max(CVc,2 − CVc,1 ) if CVc,2 ≤ CVc,1 ∗ c {c } = , {arg(CVc,2 > CVc,1 )} else ∀c ∈ [0, C − 1]. (21) 2) Evolution of the CNBC: The evolution of a subset of the NBCs or the entire CNBC is performed for each NBC individually with a two-phase operation, as shown in Fig. 5. As explained earlier, using the FVs and the target CVs of the training data set, the evolution process of each BC in a NBC is performed within the current AS to find the best (optimal) BC configuration with respect to a given criterion (e.g., training/validation MSE or classification error, CE). During the

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

1177

Fig. 5. Illustration of the two-phase evolution session over BCs’ architecture spaces in each NBC. At the top (phase 1), the evolution of the BCs with the feature and class vectors and at the bottom (phase 2), the evolution of the fuser BCs with the actual BC outputs and the class vectors are illustrated.

evolution, only NBCs associated with those classes represented in the training data set are evolved. If the training data set contains new classes, which do not yet have a corresponding NBC, a new NBC is created for each and evolved using the training data set. In Phase 1, see top of Fig. 5, the BCs of each NBC are first evolved given an input set of FVs and a target CV. Recall that each CV is associated with a unique NBC and the fuser BCs are not used in this phase. Once an evolution session is over, the AS of each BC is then recorded to be used for potential (incremental) evolution sessions in the future. Recall that each evolution process may contain several runs and according to the aforementioned evolutionary update rule, the best configuration achieved will be used as the classifier. Hence, once the evolution process is completed for all BCs in the input layer (Phase 1), the best BC configurations are used to forward propagate all FVs of the items in the training data set to compose the FV for the fuser BC from their output CVs, so as to evolve the fuser BC in the second phase. Apart from the difference in the generation of the FVs, the evolutionary method (and update) of the fuser BC is same as any other BC has in the input layer. In this phase, the fuser BC learns the significance of each individual BC (and its feature) for the discrimination of that particular class. This can be viewed as the adaptation of the entire feature space to discriminate a specific class in a large data set. In other words, a crucial way of applying an efficient feature selection scheme as some FVs may be quite discriminative for some classes whereas others may not and the fuser, if properly evolved and trained, can “weigh” each BC (with its FV), accordingly. In this way, the usage of each feature (and its BC) shall optimally be “fused” according to their discrimination power of each class. Similarly, each BC in the first

layer shall learn in time the significance of individual feature components of the corresponding FV for the discrimination of its class. In short, the CNBC, if properly evolved, shall learn the significance (or the discrimination power) of each FV and its individual components. 3) Incremental Evolution of the CNBC: To accomplish the major objective IV: online evolution, the proposed CNBC framework is designed for continuous “incremental” evolution sessions where each session may further improve the classification performance of each BC using the advantage of the “evolutionary updates.” The main difference between the initial and the subsequent evolution sessions is the initialization of the evolution process: the former uses random initialization whereas the latter starts from the last AS parameters of each classifier in each BC. Note that the training data set used for the incremental evolution sessions may be different from the previous ones, and each session may contain several runs. Thus, the evolutionary update rule compares the performance of the last recorded and the current (after the run) network over the current training data set. Consequently, the proposed MD-PSO evolutionary technique used for evolving MLP or RBF networks is initialized with the current AS parameters of the network. That is the swarm particles are randomly initialized (as in the initial evolutionary step) except that one of the particles (without loss of generality we assume the first particle with a = 0) has its personal best set to the optimal solution found in the previous evolutionary session. For MD-PSO evolution over MLPs, this can be expressed as,  0   1   1   2   2  , w , θ , wjk , θk , . . . , xy0d (0) ← Ψd wjk     jk  k o−1 o−1 , {θko } ∀d ∈ [1, NC ] (22) wjk , θk

1178

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

TABLE I C LASSIFIER AND E VOLUTION T YPES U SED T HROUGHOUT THE E XPERIMENTS

l where{wjk } and {θkl } represent the sets of weights and biases of the layer l of the MLP network, Ψd , which is the dth (MLP) configuration retrieved from the last AS record. For RBF networks, a similar approach is adopted within the dynamic clustering scheme, that is, the personal best position of the first swarm particle is set the last recorded centroids, that is

TABLE II A RCHITECTURE S PACE , OF S IZE NC = 17, U SED FOR MLP S W ITH O NE S INGLE L AYER AND 16 M ULTILAYER P ERCEPTRONS (O NE H IDDEN L AYER )

d (0) = µa,j , xy0d (0) ← {µ0,1 , . . . , µ0,j , . . . , µ0,d } ⇒ xy0,j

∀d ∈ [1, Nc ].

(23)

It is expected that particularly at the early stages of the MDPSO run, the first particle is likely to be the gbest particle in every dimension, guiding the swarm toward the last solution otherwise keeping the process independent and unconstrained. Particularly, if the training data set is considerably different in the incremental evolution sessions, it is quite probable that MD-PSO can converge to a new solution while taking the past solution (experience) into account. For the alternative technique, the exhaustive search via repetitive BP training of each network in the AS, the first step of an incremental training will simply be the initialization of the l and biases θkl with the parameters retrieved from weights wjk the last record of the AS of that BC. The same also applies for the Gaussian centroids, µk , and sigmas, σk , of RBF networks. Starting from this as the initial point, and using the current training data set with the target CVs, the BP algorithm can then perform its gradient descent in the error space. V. E XPERIMENTAL R ESULTS In this section, we first detail the experimental setup and the benchmark PolSAR images used for the classification experiments. We then perform comparative performance evaluations both visually and statistically with in-depth analysis. Finally, we investigate the performance gain/loss by feature combination and evaluate the performance gain achieved by incremental evolutions. A. Experimental Setup Two benchmark PolSAR images were used for qualitative and numerical performance evaluations. The first one is the NASA/Jet Propulsion Laboratory AIRSAR L-band image of the San Francisco Bay (SF Bay) [11]. The original four-look fully PolSAR data of the SF Bay, having a dimension of 900 × 1024 pixels, provides good coverage of both natural (sea, mountains, forests, etc.) and man-made targets (buildings, streets, parks, golf course, etc.) We defined five distinct classes for both natural (such as water—sea, mountain—cliffs, forest—trees, flat zones such as beach, grass, etc.) and urban area (buildings, streets, roads, etc.) targets with a more complex inner structure. The aerial photographs for this area, which can be used as ground truth, are provided by the TerraServer Web site [42]. Furthermore, for numerical evaluation, the AIRSAR

L-band image of Flevoland, The Netherlands [11] is used to perform crop and land classification. This image has a size of 1024 × 750 pixels and collected in mid-August 1989 during MAESTRO-1 Campaign. There are 12 ground-truth classes in this image: water, forest, stem beans, lucerne, roads, bare soil, grass, peas, rapeseed, beet, potatoes, and wheat [1]. Over both images, the speckle filter suggested by Lee et al. [27] is employed within a 5 × 5 window. The CNBCs created and evolved for each SAR image contains number of NBCs, which is equivalent to the number of those predefined classes (i.e., 5 for the SF Bay and 12 for the Flevoland data). Recall that each NBC in both CNBCs, contains certain number of BCs in the input layer, which is equivalent to the number of FV sets. Therefore, each NBC has four BCs (three in the input layer for the applied FVs + one fuser BC to merge the feature BC outputs). Thus, a total of 4 × 5 = 20 classifiers for SF Bay and 4 × 12 = 48 classifiers for Flevoland data are individually evolved using two major ANN types (RBFs and MLPs) and two evolutionary techniques (MD-PSO and exhaustive BP). In this paper, we used the three classifier and evolution types (CETs), as presented in Table I. Furthermore, note that for each BC, its input size is determined by the size of its FV, i.e., 12, 10, and 6 for the input layer BCs of all NBCs and naturally, 3 × 2 = 6 for all fuser BCs. We used PNR, PNR = 9, for negative feature selection for all BCs. The evolution (and training) parameters and internal settings of the BCs are as follows: For MD-PSO, we use the termination criteria as the combination of the maximum number of iterations allowed (iterN o = 1000) and the cutoff error (εC = 10−4 ). Other parameters were empirically set as: the swarm size, S = 100, Vmax = xmax /5 = 0.2 and V Dmax = 10, respectively. For MLPs, we use the learning parameter for BP as λ = 0.002 and iteration number is 1500. We use the typical activation function: hyperbolic tangent (tanh(x) = ex − e−x /ex + e−x ). For RBFs, increase and decrease factors for SuperSAB BP [41] are set to 1.05 and 0.25, respectively. Unless stated otherwise, these parameters are used in all experiments presented in this section. For both MLP and RBF ASs, we used the simplest configurations with the following range arrays: Rmin = {Ni , 1, 2}

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

1179

Fig. 6. SF Bay image overlaid with training ground truth (top-left) and CNBC classification results from the three CETs.

and Rmax = {Ni , 16, 2}, which indicate that in addition to the single layer perceptron (SLP), all MLPs (and RBF networks naturally) have only a single hidden layer, Lmax = 2, with no more than 16 hidden neurons. In addition to the SLP, the hash function enumerates all MLP configurations in the AS as shown in Table II. Since there is only one hidden layer for RBF networks, the number of Gaussian neurons can directly be used as the hash index in the AS. Accordingly, we used an AS consisting of RBF networks containing 1 to 16 hidden (Gaussian) neurons. Finally, for both evolutionary techniques, the exhaustive BP and MD-PSO, NR = 10 independent runs are performed. Note that for exhaustive BP, this corresponds to ten runs for each configuration in the AS. B. Performance Evaluation and Analysis For the first benchmark SAR image, SF Bay, the GTD, and the classification results with the three CETs are shown in Fig. 6. The GTD, which is equal to the training data set used contains only 951 pixels, which correspond to just ∼0.1% of the entire SAR data, and its accuracy cannot be 100% guaranteed. For instance, due to the low resolution, when selecting training patches, it might happen that accidently pixels are assigned erroneous due to their close proximity (e.g., patch of grass in forest might not be clearly recognizable visually). The urban class may also cover trees (planted alongside roads or gardens of houses), thus classification is performed by taking the majority terrain type into account. We can—at best—assume that the majority of those points belong to the classes they are assigned to as in the urban case, some training samples might fall onto pixels with more grass or tree-like characteristics. From the figure, it is clear that all three CETs achieved quite similar classification results, yet perhaps the CET-1 (MLPs evolved with exhaustive BP) classification is a bit noisy or fractured compared to the other two. However, all CETs in

Fig. 7. Visual evaluation of the SF Bay classification result with an aerial photo taken from [42].

general managed to draw major boundaries between distinct classes, such as sea—beach, urban—forest, sea—urban. Since the ground-truth of SF Bay is not entirely available, we, therefore, evaluate our classification results visually using the one from CET-1, and the aerial photo snapshots taken from [42]. Some parts are purposefully enlarged for enhanced visual clarity and compared with the CET-1 classification result, as shown in Fig. 7. It is quite visible that some large terrains, such as the bridge, the stadium, and the island, or even some small areas such as the lighthouse and its island are all classified correctly. Nevertheless, the classification is not pixel accurate as some deformations are visible over some known geometrical constructions such as the bridge and the stadium. Note that SAR data is quite noisy and speckle filtering was initially performed, and thus accurate pixel-based classification is not expected in this particular case. For comparative evaluations classification results from some of the major techniques, A) the Wishart H/A/α clustering [28], B) EM clustering [20], C) a three-layer ANN [48], D) Wishart ML (WML) [48], E) optimization of polarimetric contrast enhancement (OPCE) [46], and 6) WML [46]-based classifiers, are shown in Fig. 8. Same set of features are used for A-B, C-D, and E-F. For the pioneer works A and B that are both unsupervised, it is obvious that there are multiple clusters assigned to a single class type due to individual scattering schemes received from each area. This can be clearly seen for the sea zone, which is divided into at least four clusters due to the effect of incidence angle variations from 10◦ to 60◦ . Similar differences in classes are also seen in the urban and forest areas. Large sections of erroneous classification are also visible, such as the

1180

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

Fig. 8. Classification results taken from (a) Wishart H/A/α clustering [28], (b) EM clustering [20], (c) Wishart ML [48], (d) ANN-based [48], (e) OPCE [46] and (f) Wishart ML classifiers [46]. A and B are examples of unsupervised classification based on clustering.

large sea—water sections in the mountain zone and the majority of the island is misclassified as sea in both methods. The rest of the competing methods are all supervised. C and D are applied only over a subarea by 600 × 600 pixels of the SAR image with only three classes, sea, urban, and vegetation corresponding to the colors red, green, and blue in Fig. 8. They both used more than 26 000 pixels for training with a 19-D FV which has been reduced to 10-D by PCA. Although with minimal number of classes used and a massive size of training data set, it is evident that both methods have large misclassified sections, particularly in the sea and mountain (or vegetation) terrains. E and F are too applied over a subarea by 700 × 900 pixels of the SAR image with four classes, sea, urban, forest (or woods), and quasinatural (as what we called the flat zones) with their color coding blue, pink, green, and brown, respectively, in Fig. 8. The OPCE classifier (E) performed better than the WML classifier; however, it still suffered from excessive noise and misclassified sections in the sea and mountain (or forest) terrains. The second benchmark SAR image, the Flevoland data, allows us to perform numerical evaluations of the classification performance of the proposed CNBC framework, for each three CETs individually since it has a publicly available GTD with 12 classes [1]. The GTD contains 164 278 pixels, 962 of

which were randomly selected as the training data set, and the classification results from each CET are presented in Fig. 9. Table III presents the size (the coverage area) of each GTD class along with the classification performances achieved with each CET for both training and test data sets. Accordingly, some important remarks can be made. First of all, CET-1 (BCs evolved with exhaustive BP) achieved the best classification performances for the training data sets of all class types. This is an expected outcome since BP, which is basically a gradient descent method in the error space, has better local convergence ability than PSO (or any other evolutionary technique). However, the opposite is true for the test sets since PSO as a global technique, can generalize better, and the highest accuracies in the majority of classes have been achieved either by CET-2 or CET-3, both of which evolved by MD-PSO. Nevertheless, the performance gap is in general quite narrow, which is also true for overall classification accuracies. As presented in Table IV, the proposed CNBC framework can distinguish quite well each individual class in the training data set and reasonably well in the test data set except perhaps few classes such as potatoes—forest (and vice versa), roads—grass and particularly wheat—rapeseed (and vice versa). The reason for this is the lack of discrimination between those classes presented by the current features used, and hence the need for new feature(s), which can better discriminate these classes, is imminent to improve the overall classification performance significantly. For comparative evaluations, we shall compare the overall classification accuracy with the results of seven state-of-theart methods over the same (Flevoland) data reported recently in [6]. Although it has the same GTD, there are three major differences. First, those methods used only a limited part of the GTD available, 47 037 pixels for training and (only) 41 278 pixels for test, so a mere total of 88 315 whereas our GTD size is roughly double of this figure. Second, all classifiers are trained with a massive size of the overall GTD data (47 037) whereas we used only a ∼2% of this (962) to classify the test region, which is roughly four times larger than they used. Finally, in that work, they used only ten classes, excluding roads and merging peas and beet into a single class whereas we treat them separately as recommended in [1]. Table V presents the overall accuracies achieved by each classifier (using certain features therein) where the classification accuracies vary between 32.5% and 81.3%. Although it is unfair to make direct performance comparison between the proposed CNBC classifier and these competing methods due to the unbalanced train—test data set sizes and less number of classes they used the proposed technique still achieved a far superior classification accuracy than the most of the competing classifiers presented in the table. Even the best classifier, the ECHO-(6I+3P), is short of ∼6% from the CNBC’s classification accuracy level, and it is an expected outcome that the performance gap may significantly widen if the comparative evaluations are performed under equal terms. The primary reason of this poor performance is that they used a limited set of features and thus cannot benefit from the discrimination of many features as used in the proposed CNBC classifier. As discussed earlier, when a single classifier is used, the motivation for such limited feature usage is obviously the

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

Fig. 9.

1181

Flevoland image overlaid with ground truth (top-left) and CNBC classification results from the three CETs.

TABLE III F LEVOLAND GTD C LASSES AND C LASSIFICATION ACCURACIES ACHIEVED BY T HREE CETs. T HE B EST P ERFORMANCES A RE H IGHLIGHTED

desire to avoid the “curse of dimensionality,” as it is also a known fact that the performance degrades with the increased dimensionality. In addition to this, the proposed framework has the advantage to search for the optimal classifiers, which “learn” and “discriminate” the best (sub)features for each class. Moreover, the CNBC can also perform an optimal feature selection and thus “combines” and “scales” the discrimination ability of each feature used to maximize the classification performance. Obviously, such important abilities cannot be fully achieved with a single classifier used in each competing method. In the next section, we shall demonstrate severe

degradation of the classification performance, which eventually occurs due to the lack of some discriminative features and then evaluate the performance gain by the incremental evolutions. Table VI demonstrates classification results when the classes and features used are identical between the CNBC and the competing methods (ten classes)—yet CNBCs are evolved with only 2% of the training set used by the competitors. As presented in the table, CNBC with same number (10) of classes with same features achieves significantly better (test) classification performance than the competitors whereas even higher (92.6%) classification accuracy can be achieved when

1182

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

TABLE IV C ONFUSION M ATRICES OVER F LEVOLAND I MAGE FOR T RAINING (T OP ) AND T EST (B OTTOM ) F ROM CET-2. H IGHEST C LASSIFICATION E RRORS A RE H IGHLIGHTED IN THE T EST DATA S ET

TABLE V OVERALL , ACCURACIES OVER F LEVOLAND I MAGE FOR T HREE C LASSIFICATION M ETHODS U SING D IFFERENT F EATURES

TABLE VI OVERALL , ACCURACIES OVER THE F LEVOLAND I MAGE FOR T HREE C LASSIFICATION M ETHODS OVER A T EN -C LASS T EST DATA S ET

the class number is reduced from 12 to 10, which is an expected outcome since interclass confusions can be reduced with less number of classes.

C. Evaluation of Feature Combination and Incremental Evolution We want to demonstrate the performance gain by combining a large set of features, which have varying discrimination power for any class, or on the contrary, the performance loss, which occurs due to the lack of such features. While keeping the same experimental setup (same GTD, BP parameters) and using the SAR data from SF Bay, we evolved a CNBC by extracting a single FV composed with only covariance matrix elements, as in F V = [C11 ,|C12 |,∠C12 ,|C13 |,∠C13 ,C22 ,|C23 |,∠C23 ,C33 ].

(24)

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

1183

Fig. 10. CNBC classification result from CET-1 using a 9-D FV.

Since there is only a single FV used, note that each NBC contains only a single BC, resulting in only 5 × 1 = 5 classifiers in the CNBC, as opposed to 4 × 5 = 20 classifiers used before. In this scheme, naturally there will not be any feature selection either since fuser BCs are not used at all. Fig. 10 shows the CNBC classification result with CET-1 and a visual comparison of this figure with the corresponding classification result in Fig. 6 clearly reveals how severely the classification degradation occurs. Not only does the image contain a significant level of noise, large blobs of erroneous classifications are also visible, e.g., large water—sea areas in the top-left (mountain) section of the image. This confirms why several methods cannot exceed a certain level of classification accuracy even though they use significantly large training data sets and fewer classes. Finally, we shall show a sample utilization of the incremental evolution of the CNBC and evaluate the (incremental) performance gain each time it is performed. We used the same experimental setup as before and only changed the GTD for SF Bay image. Initially the GTD contained only 150 points, 30 points per class. The BCs, having the same AS as before, are first evolved with exhaustive BP (CET-1). Then the user updates the GTD with only 28 new entries (15 to water, 10 to flat, and 3 to mountain zones) over the erroneous zones and then performs a new (incremental) evolution session this time with MD-PSO using 500 iterations and five runs. Recall that MD-PSO swarm was not fully randomized this time, instead “learn” from the previous experience of the CNBC, yet took the new training data into account. Fig. 11 shows classification results of the initial and incremental evolution by which the CNBC is able to correct the majority of classification errors. An important observation worth mentioning here is that the classification result obtained by the GTD with only 178 points

is just slightly worse than the one obtained with 951 points (see Fig. 6). This shows that the CNBC can scale quite well with the size of the GTD as long as it does not lack discriminative features for evolution. D. Parallel Processing and Computational Complexity Analysis In order to achieve the last objective (VII: parallel processing), the entire framework has been developed in harmony with a distributed computing scheme, the Techila Grid [39]. Computational problems can be divided into two main categories: parallel problems and embarrassingly parallel problems. The CNBC evolution process is an example of the latter and therefore; this makes it an ideal work that can be distributed within the Techila grid, which contains a massive number of computers (i.e., the so-called “workers” within the grid). In the current scheme, each NBC evolution is assigned to an individual worker, but in theory, each BC evolution can even be parallelized with a proper semaphore implementation so as to secure the order of evolutionary phases, 1 and 2 (e.g., see Fig. 5). In this case, the computational complexity of an (incremental) CNBC evolution will only be proportional with the evolution of a single BC, regardless of the results of the verification test. The computational complexity analysis of the evolutionary ANNs were presented in [24] and as stated earlier, the CNBC design enables us to use a limited AS with a few compact MLP configurations, the computational complexity can, therefore, be significantly reduced. As a result, regardless of the database size, number of classes and features extracted, the CNBC evolution, batch or incremental, is performed in parallel within a reasonably short time.

1184

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

Fig. 11. Classification results from two distinct evolutions. The second (incremental) evolution corrects most of the misclassification errors.

VI. C ONCLUSIONS In this paper, a novel CNBC framework is introduced to address the PolSAR image classification problem with the primary objective of maximizing accuracy and efficiency. The proposed framework mainly adopts a “Divide and Conquer” type of approach, so as to handle efficiently indefinite number of SAR features and classes, which otherwise turn out to be difficult, if not infeasible problem for a single classifier due to the well-known “curse of dimensionality” phenomenon. In the proposed framework approach, compact classifiers, which can be evolved and trained in a much more efficient way than a single but complex classifier, can be conveniently used in the AS, in which the optimum classifier for the classification problem in hand can be searched with the proposed evolutionary techniques. At a given time, this allows to create a dedicated classifier (BC) for discriminating a certain class type from the others with the use of a single (sub)feature. Each (incremental) evolution session “learns” from the current best classifier and can improve it further, possibly with another configuration in the AS. Moreover, with each incremental evolution, new classes/features can also be introduced which signals CNBC to create new corresponding NBCs and BCs to adapt dynamically to the change. In this way, the CNBC can dynamically adapt itself to a classification problem while striving for maximizing the classification accuracy.

An extensive set of experimental results, first of all, approve the superiority of the CNBC framework in terms of classification accuracy, despite the GTD may not be 100% accurate and the training data set may contain only a mere fraction (0.1% or even less) of the available SAR data. The former shows the robustness against training data set errors, and the latter indicates a high level of learning and generalization ability with the least amount of data. The comparative evaluations with several single-classifier methods presents that a significant performance gap occurs despite of the fact that they have the advantage of classifying less number of classes, while having a much smaller test, yet significantly larger training data sets. Although the results indicate that all the aforementioned objectives have been successfully fulfilled, even higher accuracy levels can still be expected from the CNBC framework with the addition of new powerful SAR features and spatial descriptors extracted by image processing techniques (i.e., color, texture, edge). With such high accuracy levels, we can further foretell that a CNBC evolved for one SAR image can then be used “as is” or perhaps with minimal incremental evolutions, to classify another SAR image with similar terrain classes. This can—even—be further extended into a retrieval framework for SAR image databases where SAR image(s) with certain class type(s) can be queried and retrieved. These are all subject to our future work.

KIRANYAZ et al.: COLLECTIVE NETWORK OF BINARY CLASSIFIER FRAMEWORK FOR POLARIMETRIC SAR IMAGE CLASSIFICATION

ACKNOWLEDGMENT The authors would like to thank the ESA/ESRIN and NASA/JPL-Caltech for making the used data sets available as part of the Polarimetric SAR Data Processing and Educational Tool (PolSARpro) [12]. R EFERENCES [1] T. L. Ainsworth, J. P. Kelly, and J.-S. Lee, “Classification comparisons between dual-pol, compact polarimetric and quad-pol SAR imagery,” ISPRS J. Photogramm. Remote Sens., vol. 64, no. 5, pp. 464–471, Sep. 2009. [2] L. Bruzzone, M. Marconcini, U. Wegmuller, and A. Wiesmann, “An advanced system for the automatic classification of multitemporal SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 6, pp. 1321– 1334, Jun. 2004. [3] W. L. Cameron, N. N. Youssef, and L. K. Leung, “Feature motivated polarization scattering matrix decomposition,” in Proc. IEEE Int. Radar Conf., Arlington, VA, May 1990, pp. 549–557. [4] F. Cao, W. Hong, Y. Wu, and E. Pottier, “An unsupervised segmentation with an adaptive number of clusters using the Span/H/alpha/A space and the complex wishart clustering for fully polarimetric SAR data analysis,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 11, pp. 3454–3466, Nov. 2007. [5] Y. Chauvin and D. E. Rumelhart, Back Propagation: Theory, Architectures, Applications. Hove, U.K.: Lawrence Erlbaum Assoc. Publ., 1995. [6] E. Chen, Z. Li, Y. Pang, and X. Tian, “Quantitative evaluation of polarimetric classification for agricultural crop mapping,” Photogramm. Eng. Remote Sens., vol. 73, no. 3, pp. 279–284, Mar. 2007. [7] S. R. Cloude and E. Pottier, “An entropy based classification scheme for land applications of polarimetric SAR,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 1, pp. 68–78, Jan. 1997. [8] S. R. Cloude and E. Pottier, “A review of target decomposition theorems in radar polarimetry,” IEEE Trans. Geosci. Remote Sens., vol. 34, no. 2, pp. 498–518, Mar. 1996. [9] L. J. Du and J. S. Lee, “Fuzzy classification of earth terrain covers using multi-look polarimetric SAR image data,” Int. J. Remote Sens., vol. 17, no. 4, pp. 809–826, 1996. [10] K. Duk-jin, W. M. Moon, and K. Youn-Soo, “Application of TerraSARX data for emergent oil-spill monitoring,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 2, pp. 852–863, Feb. 2010. [11] K. Ersahin, I. G. Cumming, and R. K. Ward, “Segmentation and classification of polarimetric SAR data using spectral graph partitioning,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 1, pp. 164–174, Jan. 2010. [12] ESA PolSARPro, Sample Datasets. [Online]. Available: http://earth.eo. esa.int/polsarpro/datasets.html [13] F. del Frate, F. Pacifici, G. Schiavon, and C. Solimini, “Use of neural networks for automatic classification from high-resolution images,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 800–809, Apr. 2007. [14] A. Freeman and S. L. Durden, “A three-component scattering model for polarimetric SAR data,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 3, pp. 963–973, May 1998. [15] R. Guida, A. Iodice, and D. Riccio, “Height retrieval of isolated buildings from single high-resolution SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp. 2967–2979, Jul. 2010. [16] D. H. Hoekman and M. A. M. Vissers, “A new polarimetric classification approach evaluated for agricultural crops,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 12, pp. 2881–2889, Dec. 2003. [17] T. Ince, S. Kiranyaz, and M. Gabbouj, “A generic and robust system for automated patient-specific classification of electrocardiogram signals,” IEEE Trans. Biomed. Eng., vol. 56, no. 5, pp. 1415–1426, May 2009. [18] M. M. Horta, N. D. A. Mascarenhas, and A. C. Frery, “A comparison of clustering fully polarimetric SAR images using SEM algorithm and G0P mixture model with different initializations,” in Proc. Int. Conf. Pattern Recog., 2008, pp. 1–4. [19] J. R. Huynen, “Phenomenological theory of radar targets,” Ph.D. dissertation, Tech. Univ. Delft, Delft, The Netherlands, 1970. [20] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. IEEE Int. Conf. Neural Netw., Perth, Australia, 1995, vol. 4, pp. 1942–1948. [21] P. Kersten, J. S. Lee, and T. Ainsworth, “Unsupervised classification of polarimetric synthetic aperture radar images using fuzzy clustering and EM clustering,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 519–527, Mar. 2005. [22] K. U. Khan, J. Yang, and W. Zhang, “Unsupervised classification of polarimetric SAR images by EM algorithm,” IEICE Trans. Commun., vol. 90, no. 12, pp. 3632–3642, Dec. 2007.

1185

[23] S. Kiranyaz, M. Gabbouj, J. Pulkkinen, T. Ince, and K. Meissner, “Network of evolutionary binary classifiers for classification and retrieval in macroinvertebrate databases,” in Proc. IEEE ICIP, Hong Kong, Sep. 26–29, 2010, pp. 2257–2260. [24] S. Kiranyaz, T. Ince, A. Yildirim, and M. Gabbouj, “Fractional particle swarm optimization in multi-dimensional search space,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 2, pp. 298–319, Apr. 2010. [25] S. Kiranyaz, T. Ince, A. Yildirim, and M. Gabbouj, “Evolutionary artificial neural networks by multi-dimensional particle swarm optimization,” Neural Netw., vol. 22, no. 10, pp. 1448–1462, Dec. 2009. [26] E. Krogager, “A new decomposition of the radar target scattering matrix,” Electron. Lett., vol. 26, no. 18, pp. 1525–1527, Aug. 1990. [27] E. Krogager and Z. H. Czyz, “Properties of the sphere, diplane, helix (target scattering matrix decomposition),” in Proc. JIPR-3, J. Saillard et al., Ed., Nantes, France, Mar. 1995, pp. 106–114. [28] J. S. Lee, M. R. Grunes, and R. Kwok, “Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution,” Int. J. Remote Sens., vol. 15, no. 11, pp. 2299–2311, 1994. [29] J. S. Lee, M. R. Grunes, T. Ainsworth, L.-J. Du, D. Schuler, and S. R. Cloude, “Unsupervised classification using polarimetric decomposition and the complex Wishart classifier,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 5, pp. 2249–2258, Sep. 1999. [30] G. Liu, H. Xiong, and S. Huang, “Study on segmentation and interpretation of multi-look polarimetric SAR images,” Int. J. Remote Sens., vol. 21, no. 8, pp. 1675–1691, May 2000. [31] A. Lönnqvist, Y. Rauste, M. Molinier, and T. Haäme, “Polarimetric SAR data in land cover mapping in boreal zone,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10, pp. 3652–3662, Oct. 2010. [32] M. Migliaccio, A. Gambardella, F. Nunziata, M. Shimada, and O. Isoguchi, “The PALSAR polarimetric mode for sea oil slick observation,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 12, pp. 4032–4041, Dec. 2009. [33] T. Poggio and F. Girosi, “A theory of networks for approximation and learning,” MIT, AI Lab., Cambridge, MA, A.I. Memo 1140, 1989. [34] E. Pottier and J. S. Lee, “Unsupervised classification scheme of POLSAR images based on the complex Wishart distribution and the H/A/alphaPolarimetric decomposition theorem,” in Proc. 3rd EUSAR Conf., Munich, Germany, May 2000. [35] E. Rignot, R. Chellappa, and P. Dubois, “Unsupervised segmentation of polarimetric SAR data using the covariance matrix,” IEEE Trans. Geosci. Remote Sens., vol. 30, no. 4, pp. 697–705, Jul. 1992. [36] D. Rumelhart, G. Hinton, and R. Williams, “Learning internal representation by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. Cambridge, MA: MIT Press, 1986, pp. 318–362. [37] X. L. She, J. Yang, and W. J. Zhang, “The boosting algorithm with application to polarimetric SAR image classification,” in Proc. 1st APSAR, Huangshan, China, Nov. 2007, pp. 779–783. [38] M. Shimoni, D. Borghys, R. Heremans, C. Perneel, and M. Acheroy, “Fusion of PolSAR and PolInSAR data for land cover classification,” Int. J. Appl. Earth Obs. Geoinf., vol. 11, no. 3, pp. 169–180, Jun. 2009. [39] C. P. Tan, K. S. Lim, and H. T. Ewe, “Image processing in polarimetric SAR images using a hybrid entropy decomposition and maximum likelihood (EDML),” in Proc. ISPA, Sep. 2007, pp. 418–422. [40] Techila Grid. [Online]. Available: http://www.techila.fi/technology/ [41] T. Tollenaere, “SuperSAB: Fast adaptive back propagation with good scaling properties,” Neural Netw., vol. 3, no. 5, pp. 561–573, 1990. [42] T. N. Tran, R. Wehrens, D. H. Hoekman, and L. M. C. Buydens, “Initialization of Markov random field clustering of large remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 8, pp. 1912– 1919, Aug. 2005. [43] U.S. Geological Survey Images. [Online]. Available: http://terraserverusa.com [44] Y. Wu, K. Ji, W. Yu, and Y. Su, “Region-based classification of polarimetric SAR images using Wishart MRF,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 4, pp. 668–672, Oct. 2008. [45] S. Y. Yang, M. Wang, and L. C. Jiao, “Radar target recognition using contourlet packet transform and neural network approach,” Signal Process., vol. 89, no. 4, pp. 394–409, Apr. 2009. [46] Z. Ye and C.-C. Lu, “Wavelet-based unsupervised SAR image segmentation using hidden markov tree models,” in Proc. 16th ICPR, 2002, vol. 2, pp. 729–732. [47] J. J. Yin, J. Yang, and Y. Yamaguchi, “A new method for polarimetric SAR image classification,” in Proc. 2nd APSAR, Oct. 26–30, 2009, pp. 733–737. [48] P. Yu, A. K. Qin, and D. A. Clausi, “Unsupervised Polarimetric SAR image segmentation using region growing with edge penalty,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4, pp. 1302–1317, Apr. 2012.

1186

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 42, NO. 4, AUGUST 2012

[49] Y.-D. Zhang, L. Wu, and G. Wei, “A new classifier for polarimetric sar images,” Progress Electromagn. Res., vol. PIER 94, pp. 83–104, 2009, DOI: 10.2528/PIER09041905. [50] T. Zou, W. Yang, D. Dai, and H. Sun, “Polarimetric SAR image classification using multifeatures combination and extremely randomized clustering forests,” EURASIP J. Adv. Signal Process., vol. 2010, pp. 1–10, Jan. 2010, DOI: 10.1155/2010/465612. [51] J. J. van Zyl, “Unsupervised classification of scattering mechanisms using radar polarimetry data,” IEEE Trans. Geosci. Remote Sens., vol. 27, no. 1, pp. 36–45, Jan. 1989.

Serkan Kiranyaz was born in Turkey, in 1972. He received the B.S. degree in Department of Electrical and Electronics at Bilkent University, Ankara, Turkey, in 1994, and M.S. degree in signal and video processing from the same University, in 1996. He worked as a Researcher in Nokia Research Center and later in Nokia Mobile Phones, Tampere, Finland. He received the Ph.D. degree and his Docency from Institute of Signal Processing, Tampere University of Technology, Tampere, Finland, in 2005 and 2007, respectively, where he is currently working as a Professor. He is the architect and principal developer of the ongoing content-based multimedia indexing and retrieval framework, MUVIS. His research interests include multidimensional optimization, evolutionary neural networks, contentbased multimedia indexing, browsing and retrieval algorithms, audio analysis and audio-based multimedia retrieval, object extraction, motion estimation, and VLBR video coding.

Turker Ince received the B.S. degree from the Bilkent University, Ankara, Turkey, in 1994, the M.S. degree from the Middle East Technical University, Ankara, Turkey, in 1996, and the Ph.D. degree from the University of Massachusetts Amherst (UMass-Amherst), Amherst, in 2001, all in electrical engineering. From 1996 to 2001, he was a Research Assistant at the Microwave Remote Sensing Laboratory, UMassAmherst. He worked as a Design Engineer at Aware, Inc., Boston, from 2001 to 2004, and at Texas Instruments, Inc., Dallas, from 2004 to 2006. In 2006, he joined the Faculty of Engineering and Computer Sciences at Izmir University of Economics, Balcova, Turkey, where he is currently an Associate Professor. His research interests include electromagnetic remote sensing, radar systems and signal processing, neural networks, and global optimization techniques.

Stefan Uhlmann (S’10) received the M.S.(Tech.) degree in computing and electrical engineering from the Tampere University of Technology, Tampere, Finland, in 2007, where he is currently working toward the doctoral degree in the Department of Signal Processing. He was a Visiting Researcher at the Kodak European Research, Cambridge, U.K., during August 2008–June 2009, and he worked as a Research Scientist at Deep Visuals Ltd., Cambridge, U.K., during July 2009–December 2009. His research interests are in the areas of remote sensing, image classification, global optimization techniques, pattern recognition, and machine learning. Mr. Uhlmann is a student member of IEEE SP, EURASIP, SPIE, and Gesellschaft für Informatik e.V. (GI).

Moncef Gabbouj (M’03) received the B.S. degree in electrical engineering from Oklahoma State University, Stillwater, in 1985, and the M.S. and Ph.D. degrees in electrical engineering from Purdue University, West Lafayette, IN, in 1986 and 1989, respectively. He is an Academy Professor with the Academy of Finland since January 2011. He is currently on sabbatical leave at the School of Electrical Engineering, Purdue University, West Lafayette, IN, August–December 2011, and the Viterbi School of Engineering at the University of Southern California (January–June 2012). He was Professor at the Department of Signal Processing, Tampere University of Technology, Tampere, Finland. He was Head of the Department during 2002–2007. He was a Visiting Professor at the American University of Sharjah, UAE, in 2007–2008, and Senior Research Fellow of the Academy of Finland in 1997–1998 and 2007–2008. His research interests include multimedia contentbased analysis, indexing and retrieval, nonlinear signal and image processing (NSIP) and analysis, voice conversion, and video processing and coing. Dr. Gabbouj is Honorary Guest Professor of Jilin University, China (2005–2010). Dr. Gabbouj is a Fellow of the IEEE. He served as Distinguished Lecturer for the IEEE Circuits and Systems Society in 2004–2005, and PastChairman of the IEEE-EURASIP NSIP Board. He was Chairman of the Algorithm Group of the EC COST 211quat. He served as Associate Editor of the IEEE T RANSACTIONS ON I MAGE P ROCESSING and was Guest Editor of Multimedia Tools and Applications, the European journal Applied Signal Processing. He is the past Chairman of the IEEE Finland Section, the IEEE Circuits and Systems Society, Technical Committee on Digital Signal Processing, and the IEEE SP/CAS Finland Chapter. He was also Chairman of CBMI 2005, WIAMIS 2001 and the TPC Chair of ISCCSP 2012, 2006, and 2004, CBMI 2003, EUSIPCO 2000, NORSIG 1996 and the DSP track cochair of the 2012, 2011, and 1996 IEEE ISCAS. He is also member of EURASIP Advisory Board and past member of AdCom. He also served as Publication Chair and Publicity Chair of IEEE ICIP 2005 and IEEE ICASSP 2006, respectively, and the Innovation Chair of ICIP 2011. Dr. Gabbouj was the Director of the International University Programs in Information Technology (1991–2007) and vice member of the Council of the Department of Information Technology at Tampere University of Technology. He is also the Vice-Director of the Academy of Finland Center of Excellence SPAG, Secretary of the International Advisory Board of Tampere International Center of Signal Processing, TICSP, and member of the Board of the Digital Media Institute. He served as Tutoring Professor for Nokia Mobile Phones Leading Science Program (2005–2007 and 1998–2001). He is a member of IEEE SP and CAS societies. Dr. Gabbouj was the supervisor of the main author receiving the Best Student Paper Award from IEEE International Symposium on Multimedia, ISM 2011. He was recipient of the 2012 Nokia Foundation Visiting Professor Award, the 2005 Nokia Foundation Recognition Award, and corecipient of the Myril B. Reed Best Paper Award from the 32nd Midwest Symposium on Circuits and Systems and the NORSIG Best Paper Award from the 1994 Nordic Signal Processing Symposium. He is coauthor of 480 publications. He has been involved in several past and current EU Research and education projects and programs, including ESPRIT, HCM, IST, COST, Tempus and Erasmus. He also served as Evaluator of IST proposals, and Auditor of a number of ACTS and IST projects on multimedia security, augmented and virtual reality, image and video signal processing.

Suggest Documents