Developing an Object-based Hyperspatial Image Classifier with a ...

Developing an Object-based Hyperspatial Image Classifier with a Case Study Using WorldView-2 Data AAG REMOTE SENSING SPECIALTY GROUP 2013 AWARD WINNER1 Harini Sridharan and Fang Qiu

Abstract Recent advancements in remote sensing technology have provided a plethora of very high spatial resolution images. From pixel-based processing designed for low spatial resolution data, image processing has shifted towards object-based analysis in order to adapt to the hyperspatial nature of currently available remote sensing data. However, standard object-based classifiers work with only object-level summary statistics of the reflectance values and do not sufficiently exploit within-object reflectance pattern. In this research, a novel approach of utilizing the object-level distribution of reflectance values is presented. A fuzzy Kolmogorov-Smirnov based classifier is proposed to provide an object-to-object matching of the empirical distribution of the reflectance values of each object and derive a fuzzy membership grade to each class. This object- based classifier is tested for urban objects recognition from WorldView-2 data. Results indicate at least 10 percent increase in overall classification accuracy using the proposed classifier in comparison to various popular object- and pixel-based classifiers.

Introduction Remote sensing technology has seen a remarkable growth ever since its formally recognized inception in the 1800s. One prominent achievement is the escalating improvement in the spatial resolution of remotely sensed images. During the early days of remote sensing technologies, only coarse spatial resolution (>20 meters) sensors such as Landsat MSS, Landsat TM, AVHRR, and LISS, etc., were available. In the 1990s, satellites such as SPOT, ASTER, etc., with moderate spatial resolutions (10 to 20 meters) were launched (Jensen, 2000). A great leap in remote sensing came with the launch of satellites with high 1

In recognition of the 100th Anniversary of the Association of American Geographers (AAG) in 2004, the AAG Remote Sensing Specialty Group (RSSG) established a competition to recognize exemplary research scholarship in remote sensing by post-doctoral students and faculty in Geography and allied fields. Dr. Harini Sridharan and Dr. Fang Qiu submitted this paper which was selected as the 2013 winner. Harini Sridharan is with the Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, and formerly with the Geospatial Information Sciences Program, University of Texas at Dallas, Dallas, TX 75080. Fang Qiu is with the Geospatial Information Sciences Program, University of Texas at Dallas, TX 75080 ([email protected]).

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

spatial resolution (≤10 meters) in the panchromatic band, for e.g., CARTOSAT-1 and SPOT5 (Jensen, 2000). Further advancements in sensor technology have continued to refine the pixel size of satellite data, resulting in a transition from high spatial resolution to hyperspatial resolution. Hyperspatial sensors can be defined as those that provide very high spatial resolution data (≤2 meters), usually having pixel sizes much less than the size of individual real-world objects (such as buildings, roads, trees, and grass) (Greenberg et al., 2009). Some popular examples of such satellites include WorldView-2, GeoEye-1, QuickBird and Ikonos. With the increasing availability of this new generation of sensors, may new applications, particularly in the urban domain, have emerged and growing rapidly (Blaschke, 2010). From broad land cover mapping such as distinction between built-up and non-built up areas, hyperspatial imageries have brought upon the possibility of identifying and delineating individual real-world objects such as building, roads, and trees. Owing to the tradeoff between spectral sensitivity of the sensors and size of the instantaneous field of view (IFOV), until recently the hyperspatial images were restricted to typically four bands. In 2009, DigitalGlobe introduced WorldView-2, a new generation of hyperspatial satellite sensor with improved spectral resolution. In addition to the refinement in spatial resolution from 50 cm to 46 cm in the panchromatic band and from 2 meters to 1.84 meters in the color bands, the commendable enhancement of the sensor is its improvement in spectral resolution from the original four bands (blue, green, red, and near-IR-1) to eight by adding coastal blue, yellow, red-edge, and near-IR-2 bands. With the launch of the WorldView-2 satellite, DigitalGlobe initiated the eight-band data research challenge in 2010. Participating projects of the challenge demonstrated various interesting applications of the eight-band data including bathymetric mapping (Bramante, 2010), vegetation and tree species mapping (Borel, 2010; Omar, 2010; Chen, 2010), urban vegetation cover and forests (Cavayas, 2010; Sridharan, 2010) and developing land-use/ land-cover (LULC) databases (Ruiz, 2010). These studies have demonstrated promising potential of this satellite for mapping various urban objects and natural resources. While the improved spatial resolutions have opened new avenues of applications, they have also invited commensurate Photogrammetric Engineering & Remote Sensing Vol. 79, No. 11, November 2013, pp. 1027–1036. 0099-1112/13/7911–1027/$3.00/0 © 2013 American Society for Photogrammetry and Remote Sensing doi: 10.14358/PERS.79.11.1027 N o v e m b e r 2 0 1 3 1027

degree of complexity in the relationship between the image pixels and real-world objects (Blaschke et al., 2004). With coarse to moderate spatial resolution imagery, the size of an individual pixel is large and may encompass a number of different land-use/land-cover (LULC) entities, which leads to a considerable degree of between-class confusion. The key issue is, therefore, mixed pixels, with the solution focusing on disintegrating the different components of an individual pixel. Every pixel is treated as an individual unit and is processed without any contextual relationship to their neighboring pixels. Such classification procedures are commonly termed a pixel-based approach. Pixel-based classification typically compares the spectral measures of each pixel with those of the training classes and assigns it to the class with closest similarity (Carrasco et al., 2000). Some of the well-established and popularly employed pixel-based classifiers for multispectral data include minimum distance to mean, parallelepiped, k-nearest neighbor, and maximum likelihood classifiers (Richards and Jia, 2006). In order to keep up with the parallel growth in the spectral resolution of the satellite imagery, numerous improved pixel based classifiers suitable for hyperspectral data have also been developed. Some examples of these include endmember based methods such as Spectral Angel Mapper (SAM), Spectral Information Divergence (SID), Binary Encoding classifier (BE), etc., and other advanced classifiers such as neural networks (Heermann and Khazenie, 1992; Benediktsson et al., 1990; Yoshida and Omatu, 1994) and Support Vector Machine (SVM) (Melgani and Bruzzone, 2004). While proven highly successful with low to moderate spatial resolution data, these pixel-based classifiers have produced unsatisfactory classification accuracies with hyperspatial data. The improvement in the spatial resolution of the data has caused a paradigm shift in the level of classification. For low to moderate spatial resolution imagery, the focus of classification was on general LULC classes such as vegetation, residential and transportation, etc., each of which is a collection of the real-world objects belonging to that class. With hyperspatial data, the classification is moving towards the recognition of objects themselves, such as individual trees, buildings, and roads. The issue faced by the classification algorithms has also changed from the mixed pixel problem to within-object spectral heterogeneity. With hyperspatial data, it is rare for one pixel to contain mixed spectral responses, but one real-world object commonly includes pixels of different spectral responses (Yu et al., 2006). The single pixel, therefore, do not completely characterize the target class anymore and concept of contextual information becomes more essential for defining an object. Without considering contextual information the pixel-based approach often leads to salt and pepper noise in the classification results (Richards and Jia, 2006; Campbell, 2002). Furthermore, the within-object spectral heterogeneity problem may also lead to multimodal probability distribution of pixel values of a single class, violating the Gaussian distribution assumption of the statistics based classifiers like the maximum likelihood. A widely accepted solution to process high spatial resolution data, therefore, has been an object-based approach. Object-based image classification is a recently developed technique, which involves the delineation of homogeneous image regions called segments and classifies them to individual objects by utilizing object level spectral information as well as additional ancillary measures (Blaschke and Hay, 2001; Hay et al., 2003; Benz et al., 2004; Navulur, 2007). While the process of obtaining homogeneous image segments (termed segmentation) is not new, the use of these segments for thematic information extraction based an object-based approach is fairly recent. The comprehensive literature review provided by Blaschke (2010) shows that there has been a very rapid growth in the use of object-based image classification. 1028 N o v e m b e r 2 0 1 3

Ever since its documented introduction in the late 1990s and early 2000, the application of object-based image classification have been recorded in more than 820 articles. The first step in object-based image classification is therefore the grouping of spatially contiguous pixels with similar spectral characteristics into meaningful regions, a process often termed segmentation. Once the segments are formed, the next step is object classification that assigns an appropriate object category to these segments using a suitable classifier. Summary statistics (such as the mean vector) of pixel values within a segment are often used to represent its spectral characteristics during object classification. Literature review of the object-based classification approaches in remote sensing has suggested that the rule-based classifier and the standard nearest neighbor classifier are among the most commonly employed object classifiers, popularized by the availability of commercial software such as eCognition and ENVI Zoom (Blaschke, 2010; Galli and Malinverni, 2005). The rule-based classifier assign the objects to a particular class using a set of rules derived from human knowledge using their spatial and spectral characteristics summarized at the object level (Exelis, 2009). This rule-based classifier has been successfully utilized by a number of researchers (Baraldi and Parmiggiani, 1996; Tremeau and Borel, 1997; Hay et al., 2001; Guo et al., 2007), who have found generating superior performance over traditional pixel-level classifiers such as maximum likelihood. However, in a complex urban scene, as the number of object classes and the degree of within-class variability increase, human based decisions and rules become cumbersome to be formed. Additionally, human derived knowledge is very subjective and can vary from user to user, leading to varying classification accuracies. The standard nearest neighbor (SNN) classifier is analogous to the pixel-based k-NN classifier (Duda and Hart, 1973). Each object is parallel to individual pixels and the summary statistics computed at the object-level is treated similar as pixel values used in the nearest neighbor classifier. Apart from these two popular approaches, a growing trend for object-based classification has been similar extension of other existing pixel-based classifiers (such as CART, ANN, and SVM) to object classification (Yu et al., 2006; Navalur, 2007). Most of the current object-based classification approaches commonly rely on object-level summary statistics, such as mean and standard deviation of the reflectance values within each segment. Such summary measures provide one value per band for each segment to describe its data central tendency or dispersion. The summary statistics are truly representative of a segment if its pixel values ideally follow a unimodal distribution. However, with increasing spatial resolution, the pixel values within an object become more varying. For example, a single building object in an urban object is an integration of various components such as different materials (such as glass panels), additional roof structures (for example, a chimney), shadow and other macro-site conditions (Zhou and Troy, 2008). In hyperspatial imagery, these components become visible, thereby causing different reflectance values within a single building object, which consequently increases the heterogeneity within the object class (Blaschke et al., 2004). In such cases, the within-object pixel value distribution may no longer confirm to the assumed unimodal Gaussianity but will rather have multiple peaks. Also, the computation of summary measures is heavily influenced by the presence of outliers. Therefore, the summary statistics may become misrepresentative of the spectral nature of the image segments and may not sufficiently characterize the objects. As pointed out by Blaschke (2010), the massive amount of data of the improved sensor poses increasing complexities to a degree that the existing object-based methods may be insufficient to cope with, and there is a strong need for new PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

methods to efficiently utilize these improved data. Addressing this concern, we propose to improve object-based image classification by explicitly utilizing the within-object spectral heterogeneity presented in hyperspatial imagery. The withinobject spectral heterogeneity is characterized by an empirical cumulative distribution function, which is used as representation of the segment’s reflectance pattern in the subsequent object classification. To achieve this, an innovative nonparametric fuzzy object classifier is proposed in this research, which works with intricacies of the pixel values within an object provided by the hyperspatial nature of the data. The proposed approach is tested for its performance with eightband imagery acquired by the newly launched WorldView-2 satellite for object-based image classification and is compared against selected standard pixel and object-based approaches.

Study Area and Data The selected study area is located between 32°49’N, 96°48’W and 32°47’N, 96°47’W at the Turtle Creek corridor in the north of Dallas Metroplex, Texas. The Turtle Creek is a tributary of the Great Trinity River and the corridor encompasses over 80 city blocks and 90 acres of green space. Covering three square kilometers, this area has a complex assortment of various object classes (buildings, roads, trees, and water bodies, etc.) and a diverse arrangement of subclasses (varied roof materials and different tree species). The complex mixture of geographic objects in the study area poses challenge for individual object mapping using current remote sensing techniques. The WorldView-2 imagery with eight bands for the study area was collected on 29 April 2010. This data was awarded as part of the research challenge by DigitalGlobe to encourage the development of novel image analysis and classification methods specific for the eight-band data. The very high spatial resolution of this sensor provided the opportunity for a detailed object-based classification in which individual objects such as buildings, roads, tree canopies, and shrubs, etc. can be extracted. Since the satellite was recently launched, application of its data are still in a nascent stage, and its unique characteristics demand innovative algorithms to realize its promises in object mapping, particularly for complex urban areas. Corrections for internal sensor geometry, optical distortions, scan distortions, line-rate variations and band registration were performed by the vendor. Regarding the radiometric correction, WorldView-2 products are delivered to the customers as radiometrically corrected digital numbers (qPixel, Band) (DigitalGlobe, 2010). These digital numbers were then atmospherically corrected to obtain the percentage of light reflected from the surface (termed at-surface reflectance) by implementing the correction procedures provided in the WorldView-2 documentation catalogue as a MATLAB routine. The atmospherically corrected multispectral bands were pan-sharpened to

increase their spatial resolution from 1.8 meter to 0.50 meter of the panchromatic band. Based on the preliminary experiment with the WorldView-2 imagery, Gram-Schmdit algorithm have been observed to produce superior results for pansharpening (Padwick et al., 2010) and was used in this study. The Gram-Schmdit algorithm first creates a lower resolution panchromatic band by computing a weighted average of the multispectral band. The lower resolution panchromatic and multispectral bands are then decorrelated by Grand-Schmidt orthogonalization, which orthogonally transforms each multispectral band as one multidimensional vector while keeping the panchromatic band untransformed. The low resolution panchromatic band is then replaced by the original high-resolution panchromatic band and all bands are back-transformed to obtain hyperspatial resolution multispectral data. The hyperspatial resolution of the WorldView-2 data also permitted us to collect the reference data for training and testing the classifiers through visual interpretation. Other auxiliary data including land use, buildings, and road network data from North Central Texas Council of Governments (http://www.nctcog.org) were consulted for identifying the reference data. Since the primary focus of this research is on object-based classifiers that operate on properties of objects, the training samples were selected as object polygons. Based on visual evaluation, six urban object classes including building, grass, road, tree, water, and shadow were determined for the scene and 10 to 15 training objects per class were chosen. For the pixel based classifiers, the individual pixels within each training object were selected as training samples. For testing the classification, a spatially random selection of 685 reference samples was performed. Careful review was done to ensure that the testing samples did not fall inside the selected training objects to keep them independent. Since we were comparing an object based classifier with pixel-based classifiers, accuracy assessment was done at the pixel level. Therefore, individual pixels were chosen by ensuring no two testing pixels fell inside the same image object. The exact numbers of training objects and the corresponding number of pixels per class along with their testing samples are shown in Table 1.

Methodology A general workflow similar to most object-based image classification procedures was adopted in this research. After preprocessing the imagery for geometric and radiometric corrections, homogeneous regions of pixels were segmented. Object level information of unknown image segments were derived and passed on to the proposed classifier, which were then classified by comparing them against the training objects to obtain the final urban object map. Finally accuracy assessment was performed using the testing samples, with a comparison to results from other standard classifiers.

TABLE 1. TRAINING AND TESTING SAMPLE COUNTS Training Samples Class

Testing Samples

Number of Objects

Number of Pixels

Building

15

22275

Building

Grass

13

14078

Grass

50

Road

11

9776

Road

100

Shadow

11

9681

Shadow

100

Tree

13

9391

Tree

200

6

6202

Water

Water


Class

Number of Pixels 200

35

N o v e m b e r 2 0 1 3 1029

Image Segmentation In general, image segmentation is a process that delineates an image into regions of contiguous pixels that share some similar characteristics. The expectation for a segmentation algorithm is that it will arrange the pixels into homogeneous and semantically related groups called as “object candidates” which can be further processed/classified into meaningful objects (Blaschke, 2010). In the context of remote sensing, the characteristics that define the segments depend on the type of data available. For instance, it could be the spectral reflectance from different bands or their derivatives such as vegetation indices and texture measures or the combination of both. Not only the spectral information but also the spatial structure and arrangement information derived from the spectral values can be utilized in the segmentation process. Each of these segments, when rightly formed with both spatial and spectral considerations, corresponds to only one real world object such as buildings, roads, etc. Compared to the pixel based classification, the object-based approach, therefore, has delivered a closer and direct one-to-one correspondence with physical urban structures. In order to derive such segments, a number of techniques have been developed, which can be broadly categorized into pixel/point-, edge- and region-based methods (Pekkarinen, 2002). For this research, a Multi-resolution Fractal Net Evolution based region-growing approach was adopted. This segmentation is a bottom-up region growing technique which begins at the individual pixels as seed points and gradually grows by accumulating neighboring pixels with similar spectral properties. The initial sets of segments formed from the pixels are further grouped and the process is iterated. The assimilation of segments depends on the average spectral homogeneity of the segment formed, and weighted by its size and shape measures. The assimilation process stops when user-defined thresholds for spectral homogeneity, average size, and shape measures of the segments are reached (Benz et al., 2004; Baatz and Schape, 2000). The Multi-resolution Fractal Net Evolution based segmentation of the WorldView-2 image was performed using ENVI Zoom software. The software module requires two parameters for the segmentation process: the scale and merging level. The scale parameter determines the size and shape of the object and the merge parameter determines the threshold for merging two segments based on their spectral homogeneity. A low scale parameter will result in small sized segments and vice versa, while a low merge parameter will provide a low threshold for merging two segments and vice versa. Currently, there are no automatic methods for determining these parameters, and they were often decided based on multiple trials through visual examination of the segmentation results. Consequently, a scale parameter of 35 and merge parameter of 86.2 were chosen for the WorldView-2 image, which resulted in a total of 63,786 objects (Figure 1). As seen from Figure 1, the chosen scale and merge parameters have resulted in a close correspondence between the image segments and the real-world urban objects. Object-level Empirical Distribution With hyperspatial data, objects of different classes demonstrate unique within-object distribution patterns of pixel values, which form its spectral heterogeneity, an object-level property that has been largely ignored by conventional objectbased classifiers. Conventional object-based classifiers often operate based on identifying representative values for each segment, typically by computing summary statistics. The most popularly used ones are mean and standard deviation of pixel values at each band. These statistics may be adequately representative of the segment if the pixel values are relatively homogeneous with a unimodal Gaussian distribution.

1030 N o v e m b e r 2 0 1 3

Figure 1. Zoomed view of segmentation results for a portion of the WorldView-2 image.

However, that may not be the case anymore with hyperspatial data because their pixel sizes are fine enough to allow minute details within a real-world object to be seen. For instance, within a building object, details like projecting structures (e.g., chimney), sloping roofs, and shadows casted by these structures, etc., become visible. In such cases, an image segment that encompasses the building details may no longer have a unimodal distribution of the pixel values. Often the within-building distribution of the pixel values is multimodal with a number of peaks corresponding to each sub-part of the building top. Similar patterns of within-object pixel value distributions can also be observed with tree objects. A tree canopy may have a spectral pattern not only defined by leaf reflectance, but also by that from background soil. These different reflectance patterns of various image objects presented in hyperspatial resolution data has long been used as a key element for visual image interpretation, but are not yet fully utilized by automated object-based classifiers. How to capture the spectral reflectance pattern of the pixel values in an image segment is therefore critical for an enhanced object-based image classification. The spectral reflectance patterns of image objects have been modeled using semi-variogram. In the like object-level statistical summary, only a few measures derived from fitted model such as range, sill, and nugget are used to represent the full spatial distribution of pixel values inside each image object (Shanmugam et al., 2012). Alternatively, the within-object spectral reflectance pattern at a given band can be represented using relative frequency of all pixel values of the object, referred to as the object-level empirical probability distribution. The object-level empirical distributions for objects selected from different classes were computed for the NIR band, as shown in Figure 2. This plot has delivered two important observations. First, it can be observed that the probability distributions of the within-object reflectance values are often not normal and sometimes multi-peaked. This implies PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Figure 2. Object-level Empirical Probability Distribution of the objects selected from different classes at the NIR band. One object per class was chosen and their corresponding within-object distributions were plotted.

that the object-level summary statistics (particularly the most commonly used mean values) can be a misrepresentation of the spectral reflectance pattern of an object. Second, each object class demonstrates unique patterns of within-object empirical distribution. Such empirical distribution representation of within-object spectral values implicitly mirrors the physical structure of the object being analyzed. For instance, the building object in Figure 2 shows two peaks, with one peak at the lower reflectance value, probably caused by a portion of the building in shade and the second peak at the higher reflectance value produced by the rest of the pixels in the well-lit region. In comparison to a tree object, the peak values of the building object are located at the lower end of the reflectance values, which is understandable as buildings reflect less in the NIR region than trees. The road object also reflects less in the NIR band and has fairly uniform reflectance as shown by the single peak at the lower reflectance value. Similarly, the water object has minimum reflectance in the NIR, indicated a narrow distribution with a sharp peak close to zero. The tree object in Figure 2 has a unimodal but nonGaussian distribution of reflectance values; the peak is formed by high reflectance in the NIR band from the majority of leaves of the tree object, while the negatively skewed portion of the distribution in the lower reflectance values may be caused by background soil. The grass object, like the tree object, reflects more in the NIR. The grass object has two distinct peaks, with one with the higher value likely from the portion that completely covered with grass, while the other with the low value corresponding to low density grass cover. Fuzzy Kolmogorov-Smirnov Classifier The empirical distribution of each object is unique, which is a function of the class it belongs to. Therefore, it should be incorporated into the object-based classifier to assign a segment into a class. If a set of training objects with known class information can be detected from the imagery, we can obtain their empirical distribution and use them as references for an unknown segment to compare against, in order to classify them into one of object class. To conduct such object-to-object distribution comparison, a visual matching of two empirical distributions is not only subjective, but also impractical when multispectral bands are involved. Therefore, an objective and quantitative evaluation of the object-level comparison in the form of numerical measures is desired. To this end, the two-sample Kolmogorov-Smirnov (KS) test was employed. The Kolmogorov-Smirnov (KS) test is a non-parametric statistical


test that evaluates the similarity between two sample sets and determines if they come from the same population by comparing their Empirical Cumulative Distribution Function (ECDF) (Burt and Barber, 1996). ECDF of a segment can be defined by the empirical probability distribution function of the pixel values within the segment for a given band. Let {x1, x2, ... xn} be the values of n pixels within an object X at Band 1. The pixels are sorted in ascending order and the empirical cumulative distribution function (ECFD) of each pixel value at Band 1 computed as: #{ x i|x i x } (1) F (x ) = , i , , ... n n where, the cumulative function F(x) at a reflectance value x is defined as the ratio between number of pixels with reflectance value less than x and the total number of pixels (n). Since objects from different classes have unique empirical probability distribution functions, consequently their ECDFs are also different from each other. Figure 3 shows an example of the empirical cumulative distribution functions of two objects corresponding to the building and the tree classes, respectively, measured at the red-edge band 5 of the WorldView-2 image. The ECDFs from these two objects are quite different from each other and reflects the distinctive spectral patterns of their respective objects. The ECDF of an unknown segment is then added into Figure 3 to match with these two ECDFs. It is observed that the unknown object has an ECDF closer to that

Figure 3. ECDF plots of two known objects (tree and building) along with that of unknown object.

N o v e m b e r 2 0 1 3 1031

of the tree object than the building object, therefore having a greater possibility of being a tree object. To perform an object-to-object comparison, the KS test statistic (D) measuring the maximum absolute deviation between the ECDFs the two samples was employed: D

max|F1 ( x ) F2 ( y )|

(2)

where F1(x) and F2(y) are the empirical cumulative distributions of two objects x and y. In order to assess if the differences between two samples is because of chance fluctuations or because the two samples are from different populations, a probability value (p-value) can be obtained to reject or accept the null hypothesis, which is that the distributions of the two samples are identical (Burt and Barber, 1996). If two sample distributions come from the same population, a small KS distance (D) will be derived, and the p-value will be high (close to 1), which indicates the chances of error in rejecting a null hypothesis are high. When a KS distance (D) is large, a low p-value will be obtained, suggesting that the probability of error we make in rejecting the null hypothesis is low, or in other words, the two samples are likely from different distribution populations (Lehmann, 1975). Therefore, the p-value is also a degree of matching between the two sample distributions. The KS test is sensitive to both the shape and position of the empirical distribution of the two samples. The KS test was chosen not only for its simplicity as a goodness of fit measure, but also its ability to work with small samples sizes (i.e., small sized segments with less number of pixels) and to compare two objects with varying samples (i.e., different number of pixels). By utilizing the maximum absolute deviation between two distributions, the KS test maximizes the differences between two sampling by avoiding large differences being compensated by small ones as in root mean square errors (RMSE). KS test is suitable for comparing two image objects at a single band. To assess object-to-object matching for multispectral imagery, a Fuzzy Kolmogorov-Smirnov (FKS) classifier is developed, which compares the ECDFs of two objects at multiple bands of the imagery simultaneously. As shown in Figure 4, an unknown segment is compared against training objects with known classes and the KS test is conducted for each band to derive the p-value for the band, which can be considered as a fuzzy membership degree for an unknown object belonging to a specific class.

The fuzzy member (ωij, where i denotes the object and j denotes the band) determined from the KS test based on one band can be combined with those of other bands to determine the overall membership of object belonging to a class because the object-level match should be considered at all spectral bands and their derivatives. The overall membership (αi) of an object belonging to a given class (i ) is obtained using a fuzzy and-or operator that computes the geometric mean of the membership grades of all band (Equation 3).

αi

(∏

N j

ω ij

)

1 N

(3)

where N is the total number of bands (including all spectral and spatial features). This operator is an idempotent operator that is in between a fuzzy intersection and fuzzy union operator. The geometric mean operation allows the high membership of one band to be compensated by the low membership of another band so that they are not affected by extreme values due to noise in the data. The output from the previous step is a fuzzy membership grade for each class. It is a decimal value ranging between 0 (which indicates no membership) to 1 (which indicates perfect membership to a given class). To obtain a crisp output, these fuzzy numbers are defuzzified using a maximum function in order to assign the object to the class with maximum membership grade (Equation 4). In case of ties, the object can be left unclassified or can be randomly assigned to one of the tied classes. For this research, the former approach was adopted. Class = Class(Max(αi))

(4)

In summary, the Fuzzy Kolmogorov-Smirnov classifier developed in this research classified the image after segmentation by using individual segment as a processing unit. At each band, ECDF from the unknown segment was compared against the ECDF of a training object of a given class using the KS test and the probability value was obtained for that band. After the KS test was performed for all bands, the fuzzy membership grade of the unknown segment for a given class was obtained by using geometric mean of the probability values of all the individual bands. The unknown segment was finally assigned to the class with maximum overall membership grade. Performance Assessment The classification accuracy of the FKS classifier was assessed by using the Confusion matrix (Story and Congalton, 1986)

Figure 4. Fuzzy Kolmogorov-Smirnov Classifier.

1032 N o v e m b e r 2 0 1 3


derived from the testing samples. The performance of the proposed classifier was compared against some of the common object- and pixel-based classifiers using percent accuracy. The conventional object-based classifiers chosen for comparison include standard nearest-neighbor and Support Vector Machine (SVM) approaches. For the conventional object-based classification, the same segments used for the FKS classifier were employed and were classified based on summary statistics (mean and standard deviation) of the pixel values computed at each band. For pixel based classification, four popular classifiers were employed: Spectral Angle Mapper, Maximum Likelihood classifier, Minimum Distance to Mean classifier and Parallelepiped classifier.

Results and Discussion The accuracy assessment of the FKS classifier for the WorldView-2 imagery is given in Table 2. The overall accuracy of 90.51 percent of the FKS classifier suggests very good performance in mapping the six object classes (building, roads, trees, grass, water and shadow). Scrutinizing the individual accuracies of each class, the trees have been identified with the highest user’s accuracy (94.5 percent) and producer’s accuracy (96.9 percent). It can be observed from both the confusion matrix of the FKS classifier (Table 3) and its classification map (Plate 1a) that the building and road classes have demonstrated the largest confusion with each other. These

two classes were observed to be spectrally similar as they both could be constructed using the same material (such as asphalt). Similar misclassification of grass objects occurs due to the confusion with the tree objects. The FKS test is insensitive to the size of the segments, but accuracy of the segmentation does have an impact on subsequent object classification. The Turtle Creek study area is a complex scene consisting of objects of different sizes, which demand varying segmentation scales. Individual tree and building objects are usually small in size therefore requiring fine segmentation scale parameter, while the grass and road objects are often big in size, needing a larger scale parameter. Since only one single segmentation scale is allowed to employ to fit all different types of objects, some of the grass and road objects were over-segmented. The pattern of the pixel values inside an object is a function of the internal structure, and so does the empirical distribution that models the within-object pattern. When objects are not correctly segmented, their internal structure may not be completely enclosed, which may make them spectrally different from their true distributions. This may cause a road to be confused with buildings with a flat roof. Similar problem exists with the grass objects, which when over-segmented into smaller segments, may be confused with tree objects and thick canopy cover. Based on the comparison of the accuracies of all the classifiers (Tables 3), a general pattern of accuracy is observed. The FKS classifier has performed substantially better than all

TABLE 2. ACCURACY ASSESSMENT OF URBAN SCENE CLASSIFICATIONS FOR WORLDVIEW-2 IMAGERY (UA= USER’S ACCURACY; PA = PRODUCER’S ACCURACY) Object Based FKS Class

UA

Pixel Based

KNN PA

UA

SVM RBF PA

UA

PA

ML UA

MD PA

UA

PPD PA

UA

SAM PA

UA

PA

Building

86.5

93.5

58.1

84.5

92.0

92.5

66.7

94.0

73.3

57.5

0.0

0.0

89.4

67.5

Road

87.0

78.4

48.1

38.0

87.4

66.0

66.7

26.0

43.3

58.0

0.0

0.0

52.9

45.0

Water

94.3

91.7

69.6

45.7

74.6

82.0

87.5

60.0

88.6

88.6

0.0

0.0

38.9

42.0

Shadow

94.0

92.2

86.2

25.3

56.5

91.0

82.9

68.7

0.0

0.0

0.0

0.0

42.9

9.1

Trees

94.5

96.9

86.6

93.5

73.9

97.1

85.3

99.0

0.0

0.0

28.3

95.5

38.9

40.0

Grass

88.0

78.6

65.2

60.0

92.9

65.7

76.0

38.0

2.0

6.0

0.0

0.0

63.7

46.5

Overall Accuracy Kappa

90.51

67.98

80.117

76.02

30.26

27.924

46.345

0.88

0.58

0.75

0.68

0.15

0.013

0.36

TABLE 3. CONFUSION MATRIX OF THE FKS CLASSIFIER FKS Classification

Reference Information

Class Building Road

Building

Road

Water

Shadow

Trees

Grass

Row Total

Producer’s Accuracy

173

24

0

3

0

0

200

93.5%

9

87

0

3

0

1

100

78.4%

Water

0

0

33

2

0

0

35

91.7%

Shadow

2

0

3

94

1

0

100

92.2%

Trees

0

0

0

0

189

11

200

96.9%

Grass

1

0

0

0

5

44

50

78.6%

685

Overall Accuracy

86.5%

90.51%

Column Total User’s Accuracy

185

111

36

102

195

86.5%

87%

94.3%

94%

94.5%


56 88%

N o v e m b e r 2 0 1 3 1033

other classifiers. The overall accuracy of the FKS classifier has been higher than the other classifiers by at least 10 percent, with the highest difference in accuracy being 63 percent. These results are encouraging and affirm the strength of the FKS classifier owing to the use of object-level empirical distribution. The classification maps in Plate 1 and Plate 2 also indicate that the FKS classifier has produced a less noisy

and more accurate object class maps in comparison to all the other classifiers. To statistically assess the performance of the FKS classifier, its error matrix is compared with those from the other classifiers, and the difference between the kappa values is computed. Based on the standard normal deviate, this test computes a Z-score that determines if the difference in kappa values is statistically significant (Congalton, 1991; Stehman,

Plate 1. Urban scene classification maps of WorldView-2 image from Object-based classifier: (a) FKS classifier, (b) KNN classifier, and (c) SVM-RBF classifier.

Plate 2. Urban scene classification maps of WorldView-2 image from Pixel-based classifier: (a) ML classifier, (b) MD classifier, (c) PPD classifier, and (d) SAM classifier.

1034 N o v e m b e r 2 0 1 3


TABLE 4. KAPPA ANALYSIS FOR TEST OF SIGNIFICANT DIFFERENCE WITH THE FKS CLASSIFIER Classifier

KNN

SVM RBF

ML

MD

PPD

SAM

Z Statistic

9.479

3.548

20.849

4.714

25.957

14.554

S

S

S

S

S

S

Results*

*At a confidence level of 95 percent S = significant

1996; Congalton and Green, 2009). Table 4 shows the z-statistics of the various classifiers against the FKS classifier. Based on these results, it can be observed that the FKS classifier has produced significantly better results than all the other classifiers under consideration. For the object-based KNN classifier, it can be observed that the difference in overall accuracy with the FKS classifier has approximately been 23 percent, with all classes having substantially higher accuracies with the FKS classifier. Plate 1b indicates that with the KNN classifier, almost all the road objects were misclassified as buildings. There is also a high degree of confusion between trees and grass. The object-based SVM classifier also has produced lower overall accuracies compared to the FKS classifier, with a difference in overall accuracy of 10 percent. The SVM classifier has misclassified many roads as buildings as seen in the KNN classifier. Most of the shadow and grass objects were also misclassified by SVM classifier (Plate 1c). These results demonstrate the possible pitfalls of using object-level statistics due to their possible misrepresentation of the within-object empirical distribution. For hyperspatial resolution data, the within-object heterogeneity of the pixel values may produce multi-modal distribution for an object. Utilizing only the summary statistics (particularly mean and standard deviation) fails to capture this withinobject heterogeneity, thereby leading to poor accuracy. Compared with the pixel-based classifiers, the FKS classifier has outperformed the Maximum Likelihood (ML) classifier by 6 percent. The classification map of the ML classifier (Plate 2a) shows that a number of tree objects have been misclassified as grass objects. The confusion occurs due to the lack of considering spatial/contextual information in the pixel based classification. The ML classifier is a parametric classifier based on the assumption of multi-band Gaussian distribution for each class. For complex data such as the hyperspatial imagery, a violation of such assumption is common and therefore the ML classifier performs poorly. The FKS classifier, on the other hand, is non-parametric and does not need to conform to any distribution assumption. This leads to a better performance than the ML classifier. The minimum distance (MD) to mean classifier Parallelepiped (PPD) classifier performed the poorest among all classifier compared, with a difference in accuracy as large as 60 percent. As in the case of the ML classifier, the MD classifier has also failed to distinguish between trees and grass. The PPD classifier cannot correctly identified almost any of the classes and misclassified most of them into the building class (Plate 2c). On observing the training samples for the Turtle Creek area, it can be seen that a large number of samples for the building class has been provided to increase the training variance of this class. This has caused the parallelepiped for this class to have a wide range of spectral values, which therefore has led to many pixels being misclassified to this class. Finally, for the SAM classifier, it can be seen that the difference in accuracy with FKS classifier is 44 percent, with most of the road objects misclassified as buildings (Plate 2d). With a complex urban scene, hyperspatial imagery results a large degree of heterogeneity within broad object classes such as buildings, roads, and trees. In such cases, the collection


of endmember spectra for an object is difficult because the endmember pixels are usually corresponding to a constituent material of an object. Due to the strong dependency of the SAM classifier on the quality of endmember, it leads to poor performance. The FKS classifier, however, are free from such dependency, and therefore are well-suited for hyperspatial data. Compared to the FKS classifier, all pixel-based classifier are producing much noisy classification map with salt and pepper speckles.

Conclusions A novel non-parametric fuzzy object-based classification algorithm was developed for classifying hyperspatial data in this research. The important implication of this research lies in the use of all original pixel values in the form of the empirical distribution rather than the conventional summary statistics of the pixel values at the object level. The within-object empirical distributions are strongly affected by the physical structure of the objects and can be very useful for differentiating them from each other. The Fuzzy Kolmogorov-Smirnov classifier exploits this property and performs a classification based on an object-to-object comparison of the within-object empirical distributions. In general, the object based classifiers have produced a less noisy and more accurate classification maps. As a nonparametric approach, the FKS is insensitive to distribution assumptions and can capture large within-class variations that are not necessarily Gaussian in nature. However, as an object based approach, the FKS classifier is dependent on the quality of the segmentation that derives the object, which may lead to confusion among spectrally similar but spatially different classes like buildings and roads. One possible solution to this problem could be the use of elevation information provided by lidar data to differentiate road and building. The within-object empirical distribution of elevation data can also be extracted to be integrated with the spectral values for an improved classification. The FKS classifier can be applied to the elevation information in a similar fashion as that of the spectral data.

Acknowledgments The authors would acknowledge DigitalGlobe for providing the WorldView-2 data as an award for the 8-band Challenge. This paper was prepared by Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 378316285, managed by UT-Battelle, LLC for the United States Department of Energy under Contract Number: DEAC05-00OR22725.

References Baatz, M., and A. Schape, 2000. Multiresolution segmentation: An optimization approach for high quality multi-scale image segmentation, XII Angewandte Geographische Informationsverarbeitung, Heidelberg: Wichmann-Verlag. Baraldi, A., and F. Parmiggiani, 1996. Single linkage region growing algorithms based on the vector degree of match, IEEE Transactions on Geoscience and Remote Sensing, 34:137–148. Benediktsson, J., P. Swain, and O. Ersoy, 1990. Neural network approaches versus statistical methods on classification of multisource remote sensing data, IEEE Transactions on Geoscience and Remote Sensing, 28:540–552. Benz, U., P. Hofmann, G. Willhauck, I. Lingenfelder, and M. Heynen, 2004. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information, ISPRS Journal of Photogrammetry and Remote Sensing, 58:239–258. Blaschke, T., 2010. Object-based image analysis for remote sensing, ISPRS Journal of Photogrammetry and Remote Sensing, 65(1):2–16.

N o v e m b e r 2 0 1 3 1035

Blaschke, T., C. Burnett, and A. Pekkarinen, 2004. New contextual approaches using image segmentation for object-based classification, Remote Sensing Image Analysis: Including the Spatial Domain (F. De Meer, and S. De Jong, editors), Dordrecht: Kluver Academic Publishers., pp. 211–236. Blaschke, T., and G. Hay, 2001. Object-oriented image analysis and scale-space: Theory and methods for modeling and evaluating multi-scale landscape structure, International Archives of Photogrammetry Remote Sensing, 34:22–29. Borel, C., 2010. Vegetative canopy parameter retrieval using 8-band data, 8-Band Research Challenge, DigitalGlobe, Inc. Bramante, J., D.K. Raju, and S. Tsai Min, 2010. Derivation of bathymetry from multispectral imagery in the highly turbid waters of Singapore’s south islands: A comparative study, 8-Band Research Challenge, DigitalGlobe, Inc. Burt, J., and G. Barber, 1996. Elementary Statistics for Geographers, New York: The Guildford Press. Campbell, J., 2002. Introduction to Remote Sensing, Third edition, New York: Guilford Press. Casals-Carrasco, P., S. Kubo, and B. Madhavan, 2000. Application of spectral mixture analysis for terrain evaluation studies, International Journal of Remote Sensing, 21(16):3039–3055. Cavayas, F., 2010. Urban vegetation cover inventory update and monitoring from space using WorldView-2 imagery: The case of the Montreal metropolitan community cerritory, 8-Band Research Challenge, DigitalGlobe, Inc. Chen, Q., 2010. Comparison of WorldView-2 and IKONOS-2 imagery for identifying tree species in the habitat of an endangered bird species in Hawaii, 8-Band Research Challenge. DigitalGlobe, Inc. Congalton, R., and K. Green., 2009. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, Second edition, CRC/Taylor & Francis, Boca Raton, Florida. Congalton, R.G., 1991. A review of assessing the accuracy of classifications of remotely sensed data, Remote Sensing of Environment, 37:35–46. DigitalGlobe, Inc., 2010. Radiometric Correction for WorldView-2, URL:http://www.digitalglobe.com/downloads/Radiometric_Use_ of_WorldView-2_Imagery.pdf (last date accessed: 13 August 2013). Duda, R., and P. Hart, 1973. Pattern Classification and Scene Analysis, New York: Wiley. Exelis., 2009. ENVI Tutorial, URL: https://www.exelisvis.com/portals/ 0/pdfs/envi/QuickStartENVI5.pdf (last date accessed: 13 August 2013). Galli, A., and E. Malinverni, 2005. Hyperspectral Data Classification by Object-Oriented Approach for The Management of Urban Landscape, Commission on Mapping from Satellite Imagery, International Cartographic Association. Greenberg, J., S. Dobrowski, and V. Vanderbilt, 2009. Limitations on maximum tree density using hyperspatial remote sensing and environmental gradient analysis, Remote Sensing of Environment, 113:94-101. Guo, Q., M. Kelly, P. Gong, and D. Liu, 2007. An object-based classification approach in mapping tree mortality using high spatial resolution imagery, GIScience & Remote Sensing, 44(1):24–47. Hay, G., T. Blaschke, D. Marceau, and A. Bouchard, 2003. A comparison of three image-object methods for the multiscale analysis of

1036 N o v e m b e r 2 0 1 3

landscape structure, Photogrammetric Engineering & Remote Sensing, 57(3):327–345. Hay, G., D. Marceau, P. Dube, and A. Bouchard, 2001. A multiscale framework for landscape analysis: Object-specific analysis and upscaling, Landscape Ecology, 16:471–490. Heermann, P., and N. Khazenie, 1992. Classification of multispectral remote sensing data using a back propagation neural network, IEEE Transactions on Geoscience and Remote Sensing, 30:81–88. Jensen, J., 2000. Remote Sensing of the Environment: An Earth Resource Perspective, Prentice- Hall, Inc., Upper Saddle River, New Jersey. Lehmann, E., 1975. Nonparametric Statistical Methods Based on Ranks, Holden-Day, San Francisco, California. Melgani, F., and L. Bruzzone, 2004. Classification of hyperspectral remote sensing images with support vector machines, IEEE Transactions on Geoscience and Remote Sensing, 42:1778–1790. Navalur, K., 2007. Mulispectral Image Analysis Using the Objectoriented Paradigm, CRC Press, Boca Raton, Florida. Omar, H., 2010. Commercial timber tree species identification using multispectral WorldView-2 data, 8-Band Research Challenge, DigitalGlobe, Inc. Padwick, C., M. Deskevich, F. Pacifici, and S. Smallwood, 2010. WorldView 2 pan-sharpening, Proceedings of the ASPRS Annual Conference, San Diego, California. Pekkarinen, A., 2002. A method for the segmentation of very high spatial resolution images of forested landscapes, International Journal of Remote Sensing, 23(14):2817–2836. Richards, J., and X. Jia, 2006. Remote Sensing Digital Analysis, An Introduction, Third edition, Springer-Verlag, Berlin. Ruiz, L., 2010. A multi-approach and object-oriented strategy for updating LU/LC geo- databases based on WorldView-2 imagery, 8-Band Research Challenge, DigitalGlobe, Inc. Shanmugam, M., N. Sivagnanam, and M. Ramalingam, 2012. Combined variograms of spectral and texture bands in urban land use classification using high resolution satellite data, European Journal of Scientific Research, 75(4):611–632. Sridharan, H., 2010. Multi-level urban forest classification using the Worldview-2 8-band hyperspatial imagery, 8-Band Research Challenge, DigitalGlobe, Inc. Stehman, S., 1996. Estimating the Kappa coefficient and its variance under stratified random sampling, Photogrammetric Engineering & Remote Sensing, 62(4):401–407. Story, M., and R.G. Congalton, 1986. Accuracy assessment: A user’s perspective, Photogrammetric Engineering & Remote Sensing, 52(3):397–399. Tremeau, A., and N. Borel, 1997. A region growing and merging algorithm to color segmentation, Pattern Recognition, 30:1191–1203. Yoshida, T., and S. Omatu, 1994. Neural network approach to land cover mapping, IEEE Transactions on Geoscience and Remote Sensing, 32(5):1103–1109. Yu, Q., P. Gong, N. Clinton, G. Biging, M. Kelly, and D. Schirokauer, 2006. Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery, Photogrammetric Engineering & Remote Sensing, 72(7):799–811. Zhou, W., and A. Troy, 2008. An object-oriented approach for analysing and characterizing urban landscape at the parcel level, International Journal of Remote Sensing, 29(11):3119–3135.


Developing an Object-based Hyperspatial Image Classifier with a ...

Developing an Object-based Hyperspatial Image Classifier with a ...

Suggest Documents

Image Edge Detection with Fuzzy Classifier - CiteSeerX

Hyperspatial Dynamics

Developing an Automated Land Cover Classifier Using LiDAR and ...

A Document With An Image - arXiv

Personalized Classifier for Food Image Recognition

A robust regression based classifier with ...

A Classifier Design For Detecting Image Manipulations - CiteSeerX

A Similarity Classifier with Bonferroni Mean Operators

Combining a Bayesian Classifier with Visualisation ... - CiteSeerX

Title slide with an image

Title slide with an image

TEACHERS DEVELOPING A NEW IMAGE OF ... - CiteSeerX

Developing a Distributed Image Processing and

A Classifier Ensemble of Binary Classifier Ensembles

A Framework for Developing Image Processing ... - CiteSeerX

An efficient fuzzy classifier with feature selection ... - Semantic Scholar

A Biologically Inspired Classifier

A genetic classifier tool

Input Settings Importing an Image Replace Green Screen with Image ...

SOFIA: An Image Database with Image Retrieval ... - Semantic Scholar

Input Settings Importing an Image Replace Green Screen with Image ...

ARMISTICE: An Experience Developing Management Software with ...

Developing an algorithm to identify people with

ARMISTICE: An Experience Developing Management Software with ...