Comparison of Gray-Level Reduction and Different Texture ... - asprs

39 downloads 0 Views 1MB Size Report
Different Texture Spectrum Encoding Methods for Land-Use Classification Using a Panchromatic Ikonos Image. Bing Xu, Peng Gong, Edmund Seto, and Robert ...
Comparison of Gray-Level Reduction and Different Texture Spectrum Encoding Methods for Land-Use Classification Using a Panchromatic Ikonos Image Bing Xu, Peng Gong, Edmund Seto, and Robert Spear

Abstract In this paper, we evaluate the potential of a frequencybased contextual classifier (FBC) for land-use classification with a panchromatic Ikonos image. To capture the spatial arrangement of image gray-level values and use such information in image classification, we applied texture spectrum (TS) directly in the FBC. The effects of several data preprocessing and reduction methods on the performance of the FBC are also evaluated. The methods include four gray-level reduction (GLR) techniques and several modifications to the TS technique. The purpose of data reduction is to improve the classification efficiency of the FBC. The GLR schemes were min-max linear compression (LC), gray level binning (BN), histogram equalization (HE), and piece-wise nonlinear compression (PC). Instead of using the texture measures derived from the texture spectrum, we directly applied texture spectra of various sizes in the classification. We modified the encoding algorithm in the TS and were able to reduce the number of texture units from its original 6561 to 256, 81, and 16, leading to as much as a 410 times computation efficiency. The original image and GLR images were subsequently classified with the FBC. We compared the classification accuracies and found that the GLR methods resulted in accuracies similar to that of the original image (within 0.03 kappa value). There was little difference in classification accuracy (within 0.03 kappa value) among the three modified TS methods, which were all outperformed by the original TS method. All TS methods performed considerably better than the use of the original image and the GLR methods.

Introduction In land-cover and land-use classification of high-resolution images, spatial context and texture features are often used to take advantage of the rich amount of spatial information. Pre-classification spatial feature extraction, post-classifica-

B. Xu and P. Gong are with the Center for Assessment and Monitoring of Forest and Environmental Resources (CAMFER), 151 Hilgard Hall, University of California, Berkeley, CA 94720-3110 ([email protected]; [email protected]). E. Seto and R. Spear are with the School of Public Health, 151 Hilgard Hall, University of California, Berkeley, CA 94720-3110 P. Gong is also with the International Institute for Earth System Science, Nanjing University, China

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

tion processing, and direct contextual classification are three general approaches to incorporate spectral properties of the surrounding pixels in image classification (Gong and Howarth, 1992). Among these, the pre-processing techniques are most commonly used. They involve use of lowpass and high-pass filters (Cushnie and Atkinson, 1985; Richards and Jia, 1999, pp. 118–122, 195–201), structural information (e.g., Gong and Howarth, 1990), gray-level co-occurrence matrix (GLCM) (Haralick, 1979), and texture spectrum (TS) (Wang and He, 1990) methods. All of these preprocessing techniques involve the extraction of texture feature images and the combined use of spectral and textural images in commonly used per-pixel classifiers. Comparing a number of contextual classification approaches in classifying 14 land-use classes in a rural-urban fringe area using 20-m resolution SPOT multispectral data, Gong (1990) concluded that a frequency-based contextual classifier (FBC) was most effective. Instead of using texture measures, the FBC uses histograms extracted from pixel windows directly in image classification. The computational requirement of FBC has a direct linear relationship with the number of gray-level values. Therefore, it is desirable to reduce the gray-level values to improve efficiency. Gong and Howarth (1992) proposed an effective method to reduce gray-level numbers in multispectral space before the classification is undertaken. However, the method degrades to linear compression when applied to images with a single band. Although an increasing amount of higher spatial resolution satellite data is becoming available in the form of a single panchromatic band (e.g., 10-m SPOT, 15-m Landsat Enhanced Thematic Mapper Plus, 6-m IRS, 1-m Ikonos, and 0.6-m Quickbird), the potential of the FBC in classifying such single band images has not been assessed. In addition, the FBC has only been tested using histograms extracted from pixel windows of the original image in the past. Higher order histograms that can capture the spatial arrangements of image brightness have not been evaluated with the FBC. All these led us to ask the following research questions: ● What is the potential of the FBC when applied to a single-band high spatial resolution image without any preprocessing?

Photogrammetric Engineering & Remote Sensing Vol. 69, No. 5, May 2003, pp. 529–536. 0099-1112/03/6905–529$3.00/0 © 2003 American Society for Photogrammetry and Remote Sensing

May 2003

529

● How well can FBC perform when applied to preprocessed images that contain information about spatial arrangements of pixel gray-level values? ● What might be lost if we attempt to improve the computational efficiency of the FBC by reducing the gray-level values?

To answer these questions, we experimented with a panchromatic Ikonos image. Nine subimages, each representing one land-use type from the Ikonos image, were mosaicked and subsequently classified by the FBC. With the FBC, we tested histograms of the original panchromatic image, histograms of a gray-level reduced version of the original image, and a texture spectrum (TS) that was calculated from pixel windows to capture the local spatial arrangement of gray-level values. We modified the encoding schemes of the original TS to reduce its size to improve the efficiency of the FBC. The rest of the paper presents our study site, methods, and result analysis.

Study Site and Data Description Surrounding a lake at the edge of Xichang City, Sichuan Province, our study site is in the valley of a mountainous area of western China with an elevation range of 1500 to 2500 m. A panchromatic Ikonos image of the study site was acquired in December 2000, covering an area of 137 km2 (Figure 1). The spatial resolution of the image is 1 m (Space Imaging, 1999). The Ikonos image has been georeferenced to the UTM projection based on the 1984 World Geodetic System. For the purpose of this study, we selected image chips for each of nine typical land-use types at the same size (128 by 128 pixels) as the original image. The image chips were mosaicked into one image (384 by 384 pixels) (Figure 2). The land-use types included residential, clear water, lowland crop, forest, silt land, fish pond, terrace, up-

Figure 2. Mosaicked CARTERRA image with nine sampled typical land-use classes. (1) Residential. (2) Clear lake water. (3) Lowland crop. (4) Forest. (5) Silt land. (6) Fish pond. (7) Terrace. (8) Upland crop. (9) Shrub.

land crop and shrub, each of which was assigned a number from 1 to 9 according to their order. The observed spatial characteristic of each land-use class is described in Table 1.

Texture Spectrum Construction and Its Simplification The original TS algorithm was designed to consider the relationship of spectral properties of a pixel with its neighboring pixels within a 3 by 3 window. Each pixel has eight neighbors along eight directions (starting at the upper left: 135°, 90°, 45°, 0°, 315°, 270°, 225°, and180°) coded from 1 to 8 (Wang and He, 1990). The gray-level value of the target pixel at the center is compared with that of each neighbor to produce one of three logical relationships “smaller,” “equal,” or “greater” coded as “0,” “1,” or “2,” respectively. A “texture unit” is calculated for the central pixel according to Equation 1. Calculating the texture unit for each pixel in an image will result in a texture unit image with a total possibility of 38 (6561) texture units. A histogram based on NTU, called the texture spectrum (TS), can be built for a pixel window of a given size from the texture unit image: i.e., 8

NTU  a TUi 3i1

(1)

i1

Figure 1. Study site in Xichang, Sichuan, China in a panchromatic display of the Ikonos CARTERRA imagery in December 2000.

530

May 2003

Because the TS is based on a comparison of the graylevel value of the center pixel with those of its surrounding pixels, it is not as sensitive to noise as is the GLCM that uses gray values directly. Wang and He (1990) did not apply the TS directly in image classification. They proposed three texture measures that can be calculated from a TS and then tested the resultant texture images in a classification. A comparison study conducted in a rural-urban fringe environment showed that the texture images extracted from the TS performed poorer than the other texture extraction techniques such as GLCM and SST (simple statistical transformation) (Gong et al., 1992). In this study, the TS was directly

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

TABLE 1. Code/ Gray Tone

LAND-USE CLASSES, CODINGS,

Land-Use Class

1 2 3

Residential Clear water Lowland crop

4 5 6 7

Forest Silt land Fish pond Terrace

8

Upland crop

9

Shrub

AND

THEIR DESCRIPTION Description

Regularly patterned residential areas where roof tops, bare fields, and trees dominate Clear lake water surfaces Crop fields at lower-elevation flood plains, typically having straight ditches in the field (or straight linear features) Conifer forest on steep slopes and higher elevations Flood fans along streams in uplands, typically silt Mostly distributed along lake edges, having rectangle shapes with different sizes Landscape on hilly slopes above lowland crop fields, typically having irregularly stepped features, mostly growing vegetables Crop fields at upper-elevation plains, typically having curved ditches in the field (or curved linear features) Shrub land, sometimes mixed with grassland and slopes

used in the FBC. However, the large dimension of 6561 is a computational burden. One way to simplify this is to combine the logical comparisons of “greater” and “equal” into one “greater and equal” (Gong et al., 1992). This will dramatically reduce the total number of texture units to 256. Because the eight directions involve a double comparison (e.g., a “greater” at 45° from one pixel location is the “smaller” at the other side of the pixel at 225°), we can avoid the double comparison by using any four consecutive directions. We used the 135°, 90°, 45°, and 0° directions to save some computational time (Equation 2, Figure 3). The four-direction encoding would include almost all the information of an eight-direction encoding, but all pixel pairs are compared only once instead of twice. The only difference is at the left column of each pixel window where the 0 degree comparison will be lost with the four-direction method. The gain for the four-direction comparison is the further reduction of the total number of texture units from 38 to 34 (81) or 24 (16), depending on the number of logical comparisons between each pixel pair. Each value (NTU) is a unique representation of spatial arrangement within one distance along four directions (Equation 3). The resulting 16 texture-unit image using the two logical comparison case is shown in Figure 3 (right). If V0  Vi, i  1, 2, 3, 4; then TUi  0; else TUi  1,

Figure 3. Image (right) generated by using the revised texture spectrum algorithm (left, the target pixel was compared with its four neighbors) with 16 gray levels (texture units).

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

(2)

where V0 is the gray value of the target pixel and Vi is the gray value of neighboring pixels. 4

NTU  a TUi 2i1

(3)

i1

Obviously, this will significantly improve the computation efficiency of the FBC algorithm. The modified TS approach (NTU  16) uses one-fifth of the computation time of that having 81 gray levels; one-sixteenth of the computation time of that having 256 gray levels; and only 1/410 of the computation time of the original TS approach (NTU  6561). In the meantime, we hope that the simplification of the TS will not lose much power in characterizing textures in a pixel neighborhood.

Gray-Level Reduction Schemes Although the original image was encoded in 11 bits, the actual effective data range in our study is only 371 (from 102 to 472). To allow for direct comparison, we used four methods to reduce the gray-level range of the mosaic image to 256 (equivalent to 8 bits), 81, and 16 (equivalent to 4 bits). Because image classification is a data generalization process, a preprocessing that reduces the data variability to some extent should not seriously influence the classification accuracy. According to Narayanan et al. (2000), reducing the data down to 4 bits would still preserve more than 90 percent of the information content. We applied four gray-level reduction (GLR) schemes, including the min-max linear compression (LC), gray-level binning (BN), histogram equalization (HE), and piece-wise nonlinear compression (PC) (Jensen, 1996). LC reduces gray levels linearly within the specified minimum and maximum range. BN bins gray levels within every consecutive 27 of the original gray levels (211) into one gray level to achieve a 4-bit image. The resulting histogram is similar to that of LC; for simplicity, we may drop BN from our plot later on. HE is designed in such a way that the histogram is theoretically adjusted to have the same frequency for each gray level. For the discrete case, this cannot be perfectly achieved, but the histogram is flattened (Jensen, 1996, p. 150). Our HE algorithm first equalizes the original histogram and then reduces the total number of gray-level values from the original image to a specified smaller number such as 16, 81, and 256 to achieve the gray-level reduction effect. The problem that might occur with HE is that it favors high-occurrence gray values while low-occurrence consecutive gray values are combined into one. However, relatively low-occurrence ones might contain useful information for classification. PC was purposely applied in this study to reserve relatively low-occurrence parts while equalizing

May 2003

531

high-occurrence parts of the original histogram. The original histogram was first segmented into five intervals according to its shape. The left tail and right tail having skewed distribution were empirically segmented into two individual intervals. LC was applied to each of the two tail segments while the remaining three middle intervals, having relatively small-variance normal distributions, were equalized. It was supposed to combine the strengths of both LC and HE, but the decision on the cutting points for the two tail segments could be arbitrary.

Supervised Contextual Classifier The Contextual Classifier A supervised procedure is adopted in the FBC. A land-use map can be obtained by applying a training procedure to get a frequency table for each land-use class, and then classifying the image using the FBC. For each pixel within a specific window size, we calculated a frequency table. To decide the class of a pixel, we used a minimum-distance classifier with the city block metric (Gonzalez and Wintz, 1987; Gong and Howarth, 1992): i.e., N1

dC  a 0 f (C,v)  f (i,j,v) 0

C  1, 2, ...,9

(4)

v0

where dC is the summed absolute difference between the frequency table of the target pixel f(i,j) and that of class C, f(C) over v. v is the gray level. N is the total number of effective gray-level values in an image. For the raw image in this study, N  371. For the texture unit image constructed using the original TS method, N  6561. N is total number of gray-level values in a GLR image or total possible texture units in the modified TS algorithm. The classifier compares all the class distances and assigns pixel (i,j) to the class that has a minimum distance to f(i,j). There are a number of factors affecting the land-use classification accuracy of the FBC. The training area must be representative and of a reasonable size to capture the spatial structure of any land-use type in an image. Pixel-window size determines the amount of spatial information that can be included in the classification. Because the optimal pixel window varies with the individual class and image resolution, it is usually difficult to determine before image classification. Therefore, an appropriate window size is usually determined empirically. It is also recognized that there is a common drawback to a kernelbased approach, which is the window effect or edge effect between classes. A similarity thresholding and a region-growing strategy were proposed to remove the boundary effect (Gong, 1994). An adaptive windowing procedure was used to improve accuracy (Ryherd and Woodcock, 1996). In this study, boundaries between different land-use types are caused by artificial mosaics. Because the focus of this study is not in achieving the best classification accuracy, we did not attempt to search for the optimal window sizes or to deal with the boundary effects. Instead, we concentrated on the data reduction issue and the direct use of the TS in land-use classification with the FBC. Due to the limit of the subimage size and the training block size, we experimented 11 with window sizes from 5 by 5 to 65 by 65 with an increment of 6. For convenience, we included the 65 by 65 window size that was 1 row and 1 column wider than the training block of 64 by 64 (in the next section). This would, to a maximum extent, cause a discrepancy between the training statistics and that of any pixel window by approximately 5 percent. This small level of discrepancy will not seriously affect the classification results.

532

May 2003

Training and Test Strategy and Procedure To capture the spatial structure and brightness variation of a certain land-use type, a block-training strategy was applied. The shape and size of the training blocks contain clues for selecting the appropriate pixel window size to be used in generating frequency tables at the classification stage (Gong and Howarth, 1992). A small training area may not reflect and capture the spatial pattern of the class while a relatively large and representative training area will only cause multiple counting of the pattern that can be eliminated by normalizing the frequency into percentage. In consideration of the subimage size used in this study, we chose one 64 by 64 pixel block as the training area for each class. Placing the training blocks was arbitrarily done as we consider each subimage a pure land-use class. To obtain a training histogram for each class with a specific window size, we compared the averaged histograms with frequencies in percentage generated from different window sizes ranging from 5 by 5 to 65 by 65 running through each particular training block. It was found that the average histogram of a specific land-use class does not vary with window size; in other words, the histogram running from varied window size was kept almost the same. This is logical because the calculation of frequency once from the whole training-block will be the same as generating a frequency for each pixel once within a smaller window size and averaging them. This is however not quite intuitive. It suggests that we should choose a representative area for the particular class and generate a histogram from the whole trainingblock rather than calculating frequency tables of a specific window size and subsequently averaging them. As for the TS approach, a histogram, i.e., the texture spectrum, was obtained for each class from the texture unit image (Figure 4a). We noticed the roughly symmetric behavior of the texture spectra. Our assumption was that the symmetric pattern was caused by the double counting in the use of eight directions and it would disappear when only four directions were used. This was proven to be incorrect because the TS calculated from the four-direction texture unit images also shows symmetric patterns. We now suspect that it is due to the gray-level value comparison between the central pixel and its neighbors. This will be further explored elsewhere. The training histograms resulted from images by using LC, HE, and PC and are shown in Figures 4b, 4c, and 4d. For test sample selection, to avoid using the training pixels in the accuracy assessment we chose a sample area of approximate 2000 pixels for each land-use subimage not overlapping with the training block. Test pixels were chosen at the center portion of each subimage to make sure that pixels subject to boundary effect are excluded from accuracy assessments. The same set of test pixels were used for all pixel window sizes tested in this study. The Kappa coefficient and conditional kappa were used to assess the overall classification accuracy and per-class accuracy, respectively.

Results and Discussion The FBC was applied to the original image, gray-level reduced images created with different reduction methods and with different numbers of gray-level values, texture-unit image created with the original TS method, and texture-unit images created with the modified TS algorithms. To examine the overall classification results, we summarize the kappa coefficients obtained from each type of images with 11 pixel window sizes (Table 2). For the original image, even though the radiometric quantization of the image is 11 bits, the actual effective gray-level range is only 371. The four gray-level reduction algorithms were applied to

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

3_01-141

4/5/03

2:05 AM

Page 533

Figure 4. Histogram of each training block of nine land-use classes generated by (a) TS and GLR and schemes of (b) LC, (c) HE, and (d) PC.

reduce the original image gray-level range from 371 to 256, 81, and 16, respectively. Shown in Table 2 are only the accuracies obtained from those GLR images with 16 gray levels. With 256 and 81 gray levels, the classification results are very similar. It is interesting to note that all the GLR methods produced similar results when compared among themselves and with those of the original image. The results seem to agree with the findings in Narayanan et al. (2000) that a 4-bit image does not lose much of the information for image classification. The highest accuracy level for the original image is 0.74 while the best accuracies for the GLR

TABLE 2. Window Size 5 11 17 23 29 35 41 47 53 59 65

SOME

OF THE

images range between 0.71 and 0.73. For a total of nine land-use classes, we consider a kappa coefficient of over 0.7 to be quite satisfactory. This may represent the optimistic situation because no boundary effects have been considered in this study. When examining kappa coefficient against the window size, we see a similar pattern of better classification accuracy with increasing window size and then a leveling off at a rather early stage among all five gray-level images in Table 2. We also compared the conditional kappa coefficients for each individual class produced from different gray-level images and found similar

CLASSIFICATION RESULTS MEASURED

BY

KAPPA COEFFICIENTS*

Original Image

LC16

BN16

HE16

PC16

Original TS

TU256

TU81

TU16

0.33 0.55 0.63 0.68 0.68 0.68 0.69 0.70 0.71 0.72 0.74

0.41 0.55 0.61 0.65 0.66 0.68 0.69 0.70 0.71 0.72 0.72

0.45 0.58 0.62 0.65 0.67 0.69 0.70 0.70 0.71 0.72 0.73

0.44 0.59 0.66 0.68 0.67 0.67 0.69 0.70 0.71 0.71 0.71

0.46 0.60 0.65 0.69 0.68 0.69 0.70 0.71 0.71 0.70 0.72

0.21 0.41 0.58 0.70 0.77 0.80 0.82 0.84 0.87 0.90 0.92

0.21 0.47 0.62 0.70 0.73 0.77 0.78 0.79 0.79 0.80 0.81

0.13 0.33 0.49 0.59 0.65 0.69 0.72 0.76 0.77 0.78 0.79

0.13 0.31 0.47 0.58 0.65 0.70 0.73 0.77 0.79 0.81 0.82

*Original image means all 371 gray-level values were used as entries in the frequency table; LC16, BN16, HE16, and PC16 represent that the 371 gray-level values were reduced to 16 levels with linear compression, binning, histogram equalization, and piece-wise compression, respectively. Original TS means the use of all 6561 texture units as entries in the frequency table. TU256 represents the simplification of the original logical comparison from three to two for all eight directions in texture unit calculation. TU81 represents the use of only four directions for logical comparisons. TU16 represents the use of only four directions and reduction of the original three logical comparisons to two.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

May 2003

533

patterns for each class. Because there is no consistent difference among these results, we attribute a 0.03 difference in kappa coefficient among the five images to noise. Therefore, we conclude that the different GLR methods do not cause much difference in the final classification. In addition, GLR does not seem to affect the classification accuracy by much even if the gray levels of the original image are reduced down to only 16. In fact, from Figure 4b, we can see that the effective range of gray levels used in the linear compression was only nine. The original TS method produced a texture-unit image with 6561 spatial arrangements. While the modified TS algorithms produced images with 256, 81, and 16 texture units, respectively. From Table 2, it can be seen that the highest kappa coefficient is 0.92 for the original TS method, a 0.18 improvement over the best accuracy obtained with the gray-level image. It seems clear that the inclusion of spatial arrangement information of gray-level values in a pixel neighborhood can considerably improve the performance of the FBC, as expected by Gong and Howarth (1992). We had expected that the original TS method would result in classification accuracies similar to those from the revised TS methods proposed here. The best accuracies achieved by the modified TS methods were only between 0.79 and 0.82. While they perform considerably better than the use of graylevel images, they are considerably poorer than the results of the original TS method. From Table 2, we can also see that the gray-level images work better than the TS based methods at relatively smaller window sizes, but gradually reach their limits perhaps due to their inherent inability of the considering spatial arrangement of pixels in a larger neighborhood. Figure 5 shows the best classification results for each type of classification with the FBC. It can be seen from all classification results that pixels at the edges of different land-use image are mostly misclassified. At the center of each land-use type, most classes are correctly classified. For the gray-level images, the confusion is primarily located among the last three classes. There is a tendency that large tracts of terrace and upland croplands are misclassified into shrubs. For the modified TS methods, the confusion seems to be caused by a misclassification of terraces into upland crop and upland crop into lowland crop. Therefore, the error patterns between the modified TS and the gray-level images are different. All these problems seem to be reduced in the results from the original TS method. Plotting the conditional kappa with pixel window size for the results obtained from the original gray-level image, the original TS method, and the modified TS method with 16 texture units, we can clearly see that most of the classes can reach 1 or close to 1 with a small or large window size except 3 classes (Figure 6). The first is the poor performance of the original image in classifying the fishpond with less than 0.4 conditional kappa for all window sizes. The second is the relatively poor performance of all three methods in classifying terrace and upland crop. Nevertheless, the original TS method seems to do a much better job in classifying the terrace and the upland crop than the other methods, which seems to be the primary reason for its high overall accuracy. It is interesting to note that the best accuracies are all associated with the largest window size. Because the subimage for each land-use type is 128 by 128 and the training block size was chosen to be 64 by 64, it is reasonable to expect that the best accuracies are obtained with the final window size of 65 by 65. Optimal window size is related to the actual size of various land-use patches and is often chosen as based on a compromise between land-use patch and boundary effects. As stated earlier, the issue of optimal window size is beyond the scope of this study.

534

May 2003

Figure 5. The best classification results obtained with the first column—the original image 371, and the graylevel reductions (PC256, PC81, and PC16); the second column—the original TS6561, and the modified TS methods (TU256, TU81, and TU16); all at a pixel window size of 65.

To further test our conjectures on different functionality and behavior of the TS approach and the gray level images, we plotted the variogram for each class from the original image (Figure 7). Among the nine land-use classes, clear lake water, forest, and residential have relatively short ranges compared to the other classes. It did not take PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

3_01-141

4/5/03

2:05 AM

Page 535

Figure 6. Conditional kappa coefficients obtained from the original image, the original TS, and the modified TS with 16 texture units.

a long distance to get a stablized variance, and it did not seem to have a first-order trend for these three classes. A repetitive pattern of small-scale waves can be seen from the variogram of the residential class. With these classes, the gray-level images show better performance in terms of reaching a high conditional kappa at smaller window sizes. It seems to us that the gray-level images are more sensitive to local gray level variances, thus better capturing second-order effects under the assumption of firstorder stationarity (Isaaks, 1989). The TS algorithms, on the other hand, first considers the spatial relationship in a small neighborhood to generate a texture unit image and then further enlarges the spatial context through histogram extraction from a larger kernel. From the variograms we can see that fishpond, terrace, upland crop, lowland crop, and shrub do not reach a stationarity of variance or have relatively large ranges. It seems to us that the TS algorithm has a better capability of capturing first-order trends or irregular patterns at a relatively larger scale, because the classification results from the TS approach appear better for these classes.

Summary and Conclusions To make a better use of high spatial-resolution images, such as Ikonos images, one should not only consider spectral reflectance, but also explore and analyze the spatial relationship and context of neighboring pixels during the process of image classification. We tested the use of a frequency-based contextual classification (FBC) with samples from the Ikonos image, their gray-level reduction versions, and texture-unit images extracted from the original image. The texture-unit images were generated with a texture spectrum (TS) algorithm and its modified versions to re-

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

duce data complexity. Based on the experimental results and exploratory analysis, we conclude that ● The FBC has a considerable amount of potential in classifying panchromatic high spatial resolution satellite images such as the Ikonos 1-m data. With nine different types of land-use patterns in a mountainous rural area of China, the FBC can reach a kappa coefficient of over 0.7. ● A reduction in gray-level values to 16 from the original Ikonos image does not affect the accuracy of the FBC. Not much difference in classification accuracy is observed between four different gray-level reduction schemes. ● The direct use of the texture spectrum (TS) in the FBC can largely improve the land-use classification accuracy over the use of gray-level images. An increment of 0.2 kappa coefficient can be achieved. However, the original texture unit calculation that considers all 8-pixel neighbors produces a large spectrum with 6561 possible values for each pixel window. ● The TS algorithms modified in this study can considerably cut down the size of the texture spectrum to 256, 81, and 16 possible values and therefore improve classification efficiency. While the results obtained from the modified TS are similar with their best accuracy around a kappa coefficient of 0.82, they are lower than the kappa coefficient of 0.92 achieved with the original TS algorithm with the range of pixel window sizes experimented. ● Overall, the TS algorithms performed much better in the FBC than the use of gray-level images. The TS methods reach their maximum level of accuracy at a larger window size than did the gray-level images.

Because we did not fully evaluate the algorithms in consideration of both the boundary effect and the optimal window size issues, it is premature to conclude that the modified TS methods are inferior to the original TS algorithm. Because the modified TS methods could cut down

May 2003

535

Figure 7. Variograms of each land-use class having 128 by 128 pixels from the original image.

the computations by over 410 times, it is worthwhile to further test the modified TS methods in classification of high-resolution images. Because the classification errors made by the FBC with the gray-level reduction images are different from those with the TS methods, it may be beneficial to make a combined use of the images obtained from the gray-level reduction and the modified TS.

Acknowledgments This research was partially supported by a National Basic Science Project (2001CB309404) of China and by NIAID (RO1-AI4396-02) in the United States.

References Cushnie, J.L., and P. Atkinson, 1985. Effect of spatial filtering on scene noise and boundary detail in TM imagery, Photogrammetric Engineering & Remote Sensing, 51(9):1183–1193. Gong, P., 1990. Improving Accuracies in Land-Use Classification with High Spatial Resolution Satellite Data: A Contextual Classification Approach, PhD dissertation, University of Waterloo, Ontario, Canada, 181 p. ———, 1994. Reducing boundary effects in a kernel-based classifier, International Journal of Remote Sensing, 15(5):1131–1139. Gong, P., and P.J. Howarth, 1990. The use of structural information for improving land-cover classification accuracies at the ruralurban fringe, Photogrammetric Engineering & Remote Sensing, 56(1):67–73. ———, 1992. Frequency-based contextual classification and graylevel vector reduction for land-use identification, Photogrammetric Engineering & Remote Sensing, 58(4):423–437.

536

May 2003

Gong, P., D.J. Marceau, and P.J. Howarth, 1992. A comparison of spatial feature extraction algorithms for land-use classification with SPOT HRV data, Remote Sensing of Environment, 40:137–151. Gonzalez, R.C., and P. Wintz, 1987. Digital Image Processing, Second Edition, Addison-Wesley Publising Company, Reading, Massachusetts, 503 p. Haralick, R.M, 1979. Statistical and structural approaches to texture, Proceedings of IEEE, 67(5):786–803. Isaaks, E.H., and R.M. Srivastava, 1989, An Introduction to Applied Geostatistics, Oxford University Press, New York, N.Y, 561 p. Jensen, J.R., 1996. Introductory Digital Image Processing: A Remote Sensing Perspective, Prentice Hall, Englewood Cliffs, New Jersey, 318 p. Narayanan R.M., T.S. Sankaravadivelu, and S.E. Reichenbach, 2000. Dependence of image information content on gray-scale resolution, Proceedings of IGARSS, 24–28 July, Honolulu, Hawaii, 1:153–155. Richards, J.A., and X. Jia, 1999. Remote Sensing Digital Image Analysis: An Introduction, Springer-Verlag, Berlin, Germany, 363 p. Ryherd, S., and C. Woodcock, 1996. Combining spectral and texture data in the segmentation of remotely sensed images, Photogrammetric Engineering & Remote Sensing, 62(2):181–194. Space Imaging, 1999. Space Imaging Catalogue of Products and Services, p. 14, www.spaceimaging.com (last accessed 25 October 2002). Wang, L., and D.C. He, 1990. A new statistical approach for texture analysis, Photogrammetric Engineering & Remote Sensing, 56(1):61–66. (Received 29 November 2001; accepted 08 August 2002; revised 16 September 2002)

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

Suggest Documents