and Harold Stone. 3. 1. Rutgers University. Center for Information Management, Integration and Connectivity (CIMIC). 180 University Avenue, Newark, NJ 07102.
An Experimental Study on Content-based Image Classification for Satellite Image Databases Francisco Artigas,1 Richard Holowczak,2 Soon Ae Chun,1 JuneSuh Cho,1 and Harold Stone3 1
Rutgers University Center for Information Management, Integration and Connectivity (CIMIC) 180 University Avenue, Newark, NJ 07102 2
Baruch College, City University of New York 3
NEC Research Institute, Princeton, NJ
Abstract Current art uses metadata associated with satellite images to facilitate their retrieval from image repositories. Typical metadata are geographic location, time, and data type. Because the metadata do not indicate which regions within an image are obscured by clouds, retrieval with such metadata may produce an image within which the region of interest (ROI) for the user is not visible. We report a system that can automatically determine whether an ROI is visible in the image, and can incorporate this into the metadata for individual images to enhance searching capability. The goal is to annotate each image with metadata regarding a number of ROIs. An experiment with the system annotated 236 AVHRR images of the North Atlantic from a 5-month viewing period with descriptors that expressed the visibility of an ROI centered on Long Island. For ground truth, we used the classifications of three human subjects to determine visibility of the same region of interest, and labeled the ROI with the majority decision of the three subjects. Partial cloud cover made the human determination subjective, and resulted in disagreements among the subjects. Using randomly selected
1
training subsets of the images, we found the two images whose regions were most like those in images for which the Long Island region was visible. For training subsets, the descriptors derived from the two best images produced average Recall and Precision retrieval results jointly in the 75 percent to 80 percent region. Descriptors derived from those same two images for the test subsets, also produced average Recall and Precision results that jointly fell in the 75 to 80 percent region.
1
Introduction
Traditional search and retrieval systems for remotely sensed data provide facilities to query by sensor characteristics (sensor type, channel or band wavelength), temporal characteristics (when the data were captured) and location characteristics (region of the Earth represented in the data). For example, the NOAA Satellite Active Archive web interface allows queries using criteria such as temporal, spatial (lat/lon), data type (LAC, GAC or HRPT), satellite name, receiving station and distribution site 1 . For this repository, after retrieving a collection of records in response to a query, each record can be expanded one at a time to display the thumbnail from bands 2 and 4 of the AVHRR image. Similar examples include the NASA Earth Resources Observation Systems (EROS) Data Center’s searchable collection of North America AVHRR 10-day composite data Data Gateway
3
2
and the NASA Earth Observing System (EOS)
that provides a search interface for data sets from a variety of spaceborne and
airborne instruments. 1
cf. http://www.saa.noaa.gov/new-bin/WWWdisplay]
2
cf. http://edcdaac.usgs.gov/1KM/namericacomp10d.html
3
cf. http://edcimswww.cr.usgs.gov/pub/imswelcome/
2
Additional efforts in image classification have provided methods for classifying individual image pixels or regions of image pixels according to land type, vegetative cover and atmospheric characteristics such as cloud cover [1, 2, 3]. Given these metadata, users can pose queries to retrieve candidate images of interest based on histograms of the classes of pixels contained within each image. However, manual inspection of the images themselves (often via a thumbnail representation) is required in many cases to determine if an image is suitable for the user’s future processing tasks. Retrieval by pixel class is most useful when studying properties of the particular class, but this approach is less useful for studying changes at a particular geographic area if the pixels at that area change in time because of seasonal or climatic variations. Candidate-selection methods that rely on classified pixel counts are prone to misclassify images from the perspective of a given researcher. For example, a search of nearly cloudfree images may not retrieve images that are cloudy in regions other than the user’s region of interest (ROI), while virtually cloud free images in which the user’s ROI is obscured are incorrectly included. In this paper, we consider a situation in which a user (not necessarily a domain or remotesensing expert) requires cloud free images of a particular ROI. Such “good” images must be selected from a collection of images in which the ROI is partly or fully obscured in some images. Our goal is to develop a robust system for classifying images according to the user-defined ROI and the characteristic of “obscured” (i.e., cloudy, hazy) or “visible”. We will then incorporate this information as metadata into the database, and use it to enhance retrieval. Prior art in the general area of content-based image search has mostly focused on techniques for recovering images from descriptors based on features such as color, texture, and shape [4, 5, 6]. In remote sensing, references [1, 2, 3] use content descriptors based on classified pixels 3
and cover type as mentioned above. Our search method uses correlation of regions of interest with given templates of the regions using fast correlation techniques and wavelet representations as reported in [7]. Although the search techniques can be used with cloud masks to reduce the effects of cloud cover and thereby enhance the ability to use partly cloudy data [8], cloud masks were not available for the images in this collection. The main result is an experiment on 236 AVHRR images of the North Atlantic captured in the latter half of 1999. The experiment demonstrated that two images were sufficient to classify all images as to whether a specific ROI was visible or not. Using this metadata for retrieval produces average Recall and Precision that jointly exceed 75 percent for the experimental data. This suggests that the system can produce metadata about multiple ROIs at the point of image capture, and that the metadata provides a powerful tool for subsequent data retrieval purposes. The metadata for a large image repository is quite small compared to the size of the repository. It can be searched quickly, and it can also be distributed to user sites where searches can be done locally. When done locally, the metadata searches can produce URL pointers to the physical images at a remote data archive. The remainder of this paper is organized as follows: In Section 2 we discuss the current operating procedure for ingesting and processing NOAA AVHRR data at the Rutgers-CIMIC Regional Applications Center. In Section 3 we discuss the design of an experiment using our system. In Section 4 we present the results of the experiment, discussion on the nature of our data, and the steps used to process it. Finally, in Section 6 we discuss our future work.
4
2
Image Capture and Ingest
This section gives the background on the facilities used to capture and process the imagery that forms the database at the Rutgers Center for Information Management Integration and Connectivity.
2.1
Satellite Image Database
The NASA Regional Applications Center (RAC) at the Center for Information Management, Integration and Connectivity (CIMIC) on the Rutgers University Newark campus is a joint project between Rutgers – CIMIC, NASA Goddard Space Flight Center (GSFC) and the Hackensack Meadowlands Development Commission (HMDC) of New Jersey. The RAC Program was initiated by GSFC’s Applied Information Sciences Branch, Code 935, in 1997. Its goal is to promote timely access to satellite data products that can be easily combined with other resource management applications already in use by the user community. The CIMIC-RAC down-links NOAA satellite data on a daily basis, and manages satellite imagery metadata that is accessible widely over the Internet. As part of its role as an image archive, CIMIC also creates data products useful to the public, particularly for government, industury, research, and education. In addition to its image archiving activities, CIMIC has been collaborating with Hackensack Meadowlands Development Commission (HMDC) to initiate a comprehensive environmental monitoring program. The space and land-based deployment of an environmental monitoring system addresses the HMDC’s primary areas of interest including flood monitoring, water resource management, vegetation patterns and land use for urban planning. The data from monitors is fused with satellite imagery as part of the ongoing
5
environmental studies. The CIMIC-RAC currently stores and manages satellite imagery from various sources: • Direct downloads of AVHRR data from polar orbiting satellites, such as NOAA 12, NOAA 14 and NOAA 15, over the Northeast region of the US including New York and New Jersey; • LANDSAT and RADAR data obtained from NASA archives; • Hyperspectral images from the Airborne Imaging Spectrometer for Applications (AISA) sensor; • Value-added products, such as AVHRR NDVI biweekly composites from the NASA EROS data center; • Aerial ortho-photographs provided by various private companies; and • Value-added products generated by various experts. The study in this paper is based on the AVHRR data from NOAA 12 and NOAA 14, which are widely used for remote sensing measurements in the civilian sector. Teillet et al. [9] describes the original specifications of these data and the evaluation of the product as deployed since 1992. NASA centers have collected over 100,000 images from April 1992 to date, with coverage almost continuous except for a break of 17 months starting in late 1993. One of the data products contains images of composited temporal data formed by selecting the image pixels with the highest value of the normalized difference vegetation index (NDVI) for each pixel location within a ten-day compositing time period. A composite image contains cloudy pixels at a given pixel if the corresponding pixel is under cloud cover throughout the entire ten-day compositing period. For certain areas the result is a low quality composite due to continuous 6
overcast. As of July 1999 a total of 87 global 10-day maximum NDVI composites had been produced. There is currently no automatic way to search for a particular composite of the global 1-km AVHRR land dataset that have clear skies for a specific area. Our content-based image search tool has the potential to fulfill this need, and thereby enhance and simplify access to images within the database.
2.2
Satellite Image Processing
The flow of data from initial capture to image archive is a complex process. It starts with image capture of data from the NOAA TIROS series polar orbiting satellites as performed by the Quorum Communications HRPT Data Capture Engine running on a standard PC. Each of the NOAA12, NOAA14 and NOAA15 satellites passes the Northeast region including New Jersey at least twice daily. During each pass, the satellite down-link generates approximately 50 Megabytes of raw image data containing 5 bands ranging from visible to thermal infrared, each of which is capable of representing different earth surface characteristics. After capture, the raw image data move to a UNIX workstation where the NASA RAC software converts the raw file into 5 raw band files and into a level1B format with georeference information, and creates thumbnail images for the 5 channels. At this point, traditional image metadata such as sensor characteristics (sensor type, channel or band wavelength), and temporal characteristics are extracted and recorded in a relational database. Since many images show cloud cover, at this point we can process the level1B data of each image further and classify whether specific areas of interest are covered by clouds. For this paper, we chose a region of interest centered on Long Island. 7
The process of identifying “visible” ROI from “obscured” ROIs used in this paper involves (1) geo-registration, and geo-correction of level1b images, (2) remapping (“cropping”) to ROIs, (3) creating GIF remapped images of each band, and (4) classifying “visible” ROIs from “obscured” ROIs within an image. Steps from (1) to (3) are done with NASA’s Rapid Application Tool (RAT) software. The RAT viewer is a UNIX application that allows one to load, display and analyze images. It provides algorithms to register an image with a World Database Map, to remap segments of the image into different projections (geo-registration) and to create NDVI composite images. The resulting geo-registered and remapped GIF images from RAT are the input data for NEC’s Compressed Domain Search (CDS) system [8] that does the processing in Step 4. The goodness of match between a pattern area and the area in an image is measured by a normalized correlation coefficient of pixel intensity values between these two images as a function of their relative positions. The maximum correlation value tends to occur at a position that corresponds to the relative offset at which the ROI of one image is most similar to an area of equal size in the other image. The system, therefore, does not require precise co-registration of images. For computational efficiency, the CDS tool uses both wavelet representation and Fourier transforms [8]. (See the Appendix for further details.) An image whose correlation value falls below a specific threshold is deemed to contain clouds or haze that obscure the region of interest. Once the visibility classification is complete, the image metadata are inserted into the image metadata database. Image metadata include a unique ID, capture and ingest dates, satellite information, image information (row and column numbers, resolution level), GIF thumbnails, and the region-of-interest descriptor data (x and y coordinates and extent of the ROI), and correlation value. 8
3
Experimental Design
In this section, we describe the experimental design. The goal of this experiment was to determine a set of pattern images that provide the best match for good images. In our region of interest, we defined a good image as one that is cloud free over Long Island [Figure 1]. A bad image is one that is partially or completely occluded [Figures 2 and 3].
3.1
Experimental Dataset
The data set was drawn from daily NOAA 12 and NOAA 14 satellite passes over the Northeastern US between the dates of August and December, 1999. Over this time frame, major seasonal changes took place that led to rather large variations in the imagery. This added to the challenge of recognizing a particular region by comparing it to a few example templates. In some cases, a satellite pass resulted in unusable data due to an oblique trajectory of the satellite, lack of nighttime visibility, or because of data transmission interference. As a general rule, a full pass results in a file between 20 and 60 megabytes in size. An oblique pass may only capture a small region and produce a much smaller data file. We discarded files smaller than 20 MB. Manual inspection was then used to eliminate data sets that did not contain our region of interest. The data set that survived these first two steps has 236 images. A manual classification step was then performed to flag each image as good or bad according to the visibility of the Long Island ROI. Three undergraduate students were instructed to inspect each image and describe our ROI as either visible or obscured. Each spent about 30 minutes to look through and classify the images. Of the 236 images, the students agreed on the classification in 145
9
cases. For the remaining 91 cases, we used the “majority rules” to classify. The result of this human classification step revealed 99 images in which the ROI is visible (42%) and 137 in which it is not (58%). We used the human classification to drive an experiment that measured the average Precision and Recall of our automatic metadata generation process. For the experiment, we created bitmapped images rendered as 256 levels of gray, using the band 3 (visible) AVHRR brightness values as recorded by the satellite. The NEC tool used the low-pass Haar wavelet in horizontal and vertical directions for its correlations. Resolution in this representation is 1/4 of the original resolution.
3.2
N × N experiment (N = 236)
In the NEC tool, a set of images are compared with a specific pattern, typically extracted from a similar image and much smaller than a typical full image in the data base. The tool allows the user to create a pattern from an image in the database. Candidate images that match the pattern are those that exhibit a high correlation with the pattern. In our experiment, we extracted a pattern from each of the 236 images, and correlated each such pattern with all 236 images in the data base. One expects images in which the ROI is obscured by clouds to have low correlations with all images except for the image from which it was extracted. This is because clouds tend to have unique visual characteristics. Patterns that correlate highly with many images are likely to have highly visible ROIs, and the images with which they correlate well are also likely to have visible ROIs. The result of this phase was an N × N matrix whose (i, j)th entry is the correlation of pattern i with image j. The correlation coefficients along the diagonal of the matrix are near 10
unity, but not exactly equal to unity because the wavelet transform of the pattern for image i may have a 0 origin that is offset from the 0 origin for the wavelet of the full image. Nor is the correlation matrix symmetric. The asymmetry is due to the fact that a pattern from image i is smaller than the full image. So when it is compared to an image j in many different positions, the comparisons involve different pairs of pixels than does the comparison of the pattern from image j to different positions within image i. We next partitioned the images into a training set and a testing set. The training set of images was then used to choose the set of pattern images that characterized the region of interest. We applied those patterns to the images in the testing set to evaluate the search strategy. Patterns were chosen by means of a greedy strategy. The first pattern chosen matched the most images in which the ROI was deemed to be visible. The next pattern chosen was the pattern that matched the greatest number of images with visible ROI that were not recognizable by the previously selected pattern(s). In pseudo-language the selection process is: Algorithm [Select Patterns] Input: SetOfImages Output: SetOfPatterns begin SetOfPatterns = ∅ Loop until no additional images with visible ROI can be matched Choose Pattern image that matches the most images with visible ROI not already matched by the patterns in SetOfPatterns Add this Pattern image to the SetOfPatterns End Loop end
In practice, the loop can be terminated when the number of additional matches produced by the newest selection drops to unity. The sole match is likely to be due to the uniqueness of the image, which is often due to cloud cover over the ROI. We calculated the Recall and 11
Table 1: Summary of Training and Testing sets Split Training Testing Number In Common Number In Common 50/50 118 9 118 5 60/40 142 26 94 4 70/30 165 32 71 0 80/20 189 84 47 0 90/10 212 141 24 0 Precision metrics for each pattern over a series of correlation coefficient thresholds, and we did this for pattern sets ranging from a single pattern up to three patterns. Experimental data in the next section show that two patterns were sufficient to cover our ROI. We also examined the effects of different splits of the data into training and testing sets. Because random sampling occasionally produces unrepresentative answers, we validated our results by repeating the experiment five times for each split and averaged the results. Table 1 summarizes the training and testing sets for different splits, and also gives the number of images in common among the five experiments for each split. Within a given split and within a given experiment, the training and testing sets were disjoint.
4
Experimental Results
Figures 1 to 3 show, respectively, examples with Long Island cloud free, partially obscured, and fully obscured. Figures 4 and 5 show two patterns discovered by the algorithm above during a training cycle. Note that both patterns show a cloud free Long Island, but Long Island looks quite different in the two images. Apparently, this diversity is wide enough to capture a great majority of the images in which Long Island is not obscured by clouds. Figures 6 through 10 show the receiver operating characteristic (ROC) curves for one to
12
three patterns similar to the patterns in Figures 4 and 5. In each pair, the (a) figure plots the ROC curve for the training sets of the images, and the (b) figure plots the ROC curve for trials conducted on the testing set of images with the patterns obtained from the training sets. Each of the points on the curves is for some specific value of the detection threshold that determines if an image does or does not match a pattern. The 80/20 split is used most frequently in the database literature and appears in Figure 9. Note that the training and trial data shown here both have an operating point on the ROC curve for which average Recall and Precision are 75 to 80 percent or above. The figure also shows that the third pattern produces negligible change to the ROC curve, and that two patterns are adequate. Because the training and test sets are disjoint, the results for these test sets tend to be a lower bound on Recall and Precision over the collection of test sets of equal size that overlap with the training set. All plots in Figures 6 through 10 indicate that two patterns suffice for these data. Recall and Precision are remarkably high in all cases, given that only two patterns cover five months of image data. Over a period of a full calendar year the variations are likely to be larger than for the five-month test period, so that it may be necessary to use more patterns in order to obtain high Precision and Recall. Nevertheless, the data suggest that relatively few patterns will suffice because the normalized correlation coefficient can absorb substantial variation within it. Note that the ROC curves are not sensitive relative sizes of training sets and test sets the data presented. The ROI in this study apparently tends to appear in one of two ways when it is visible, thus producing two clusters of images within the database. The training process produces two patterns, one from each cluster, when the training set is large enough to contain multiple examples of each cluster. For this ROI, the cluster sizes were about 80 and 20, respectively, or about 34% and 8%, respectively, of the full size of the data base. To observe 13
several samples of the smaller set requires only about 1/3 of the data. Hence, a small split, such as 33/67, has a good chance of yielding one pattern from each cluster. All of the splits we studied were large enough to contain many representatives of each cluster. Other ROIs may behave differently depending on the range of variation in the ROI over a period of time. Perhaps a single pattern will suffice in some cases, and several patterns will be necessary in other cases. Nevertheless, the fact that two patterns suffice for the Long Island ROI in our data set indicates that the retrieval scheme is efficient and useful.
5
Visualization tools
Several tools were developed to aid in the visualization and interpretation of the experimental results. A particularly useful tool displays the correlation matrix of the 236 test images as an array of clickable cells. When the user clicks on a cell, the corresponding image pair pops up into the foreground. This is shown in Figure 11. Similar tools were developed to select the candidate patterns and to compute the ROC curves based on local summary data, and to show the results by producing clickable URLs in the output summary. This demonstrates the feasibility of producing a downloadable index for distributing access to the repository. We believe that an interested user could download the index file and index-search application code (in this case a collection of HTML links) relatively quickly because the total size may be on the order of a few megabytes. The user could then perform a search locally and compute URL results. Clicking on the URL fetches the corresponding image from the central repository to the local machine for display. The advantage of this search structure is that it avoids creating a computational bottleneck at a
14
centralized image server.
6
Conclusions and Future Work
We have combined off-the-shelf software and custom software co-developed with NASA and with NEC Research Institute to create a robust system that can accurately and automatically classify images according to specific regions of interest. The visibility of the regions of interest can be incorporated into the retrieval metadata so that users can easily obtain images in which specific regions are visible with a high degree of certainty. This enables users to do retrospective studies over long periods of time by selecting out visible images of the same ROI at chosen points in the time span. It eliminates the tedium of looking at many images in order to determine a few images of interest. The Recall and Precision depend on the manual classification into two sets of images — those in which the ROI is visible and those in which it is not. Unfortunately, when images are partly cloudy, the decision is not a sharp one. Our three human classifiers did not concur unanimously on roughly 40 percent of the images, which is a large fraction of the data base. As a part of the future effort, we plan to analyze images for clouds, and mark pixels according to whether or not they are obscured by clouds. The tool from NEC Research Institute has the ability to treat cloud pixels as invalid, and to perform computations in a way that is analogous to removing invalid pixels from their participation in the calculation of the correlation coefficient. With cloud detection in place, we can in theory produce higher Recall and Precision, because many of the partly obscured images of an ROI will then become usable. We expect to extend the work reported here by developing a web interface within which
15
users can draw their own region of interest and have the system automatically process the images to determine an optimal set of pattern images. The system would then use that set of pattern images to identify a set of unobscured images from a database.
7
Acknowledgements
This work was supported in part by a grant for the HMDC/CIMIC-Rutgers NASA Regional Application Center. The authors are deeply indebted to Robert Wolpov and Stephen Martucci of NEC for their assistance in enhancing the image database software during this research project and to Gene Shaffer of NASA Code 935 for assistance with the Rapid Applications Tool. The authors also thank the anonymous referees for their helpful suggestions.
Appendix This appendix contains a brief review of the Fourier techniques used within the search algorithm. The criterion used to compare images is normalized correlation. Fourier techniques greatly reduce the cost of its computation for different relative shifts. Wavelet representations or other techniques that create faithful low resolution representations can be used in conjunction with the Fourier techniques to achieve greater speedups. (See [7, 8] for more details.) The derivations assume one-dimensional images for simplicity, but extend naturally to 2 dimensions. Given an N -vector image x = (x0 , x1 , . . . , xN −1 ) and an M -vector y = (y0 , y1 , . . . , yM −1 ), M < N , we wish to find which M -component subvector of x is most like y. The normalized
16
correlation coefficient C(x, y) is a vector of length N − M + 1 whose ith component is given by M −1
C(x, y)i =
k=0
M −1 2 x k=0
k+i
−
xk+i yk − 1 M
1 M
M −1 k=0 xk+i
M −1 k=0 xk+i
M −1 k=0 yk
2
M −1 2 k=0 yk
−
1 M
M −1 k=0 yk
2
.
(1)
The index of i for which this is maximum is deemed to be the point of the best match. Normalized correlation is independent of uniform illumination changes for which each intensity value v in image x is mapped to an intensity αv + β in image y for some constants α and β. Fourier-domain operations can speed up the calculations in Eq. (1) significantly in the following way. Define the vector correlation x
x
y
i
=
y to be the vector whose ith component is
M −1
xk+i yk
(2)
k=0
for 0 ≤ i ≤ N − M + 1. Extend y to length N by appending N − M 0s to y. Let F(x) denote the discrete Fourier transform (DFT) of x. Then it follows from the Convolution Theorem [10] that
x
y
i
ˆ = F −1 F(x). ∗ F(y)
i
(3)
where Fˆ is the complex conjugate of F and “.*” denotes component-by-component multiplication of equal-length vectors. The two forward and one inverse Fourier transform in Eq. (2) require O(N log N ) operations, and the pointwise multiplication of transforms requires only O(N ) operations. Pixel-domain evaluation of x
y requires O(N M ) operations, which is one
or two order magnitudes higher for typical image-search applications. To cast Eq. (1) into a form for Fourier domain processing, we introduce a mask vector m of length N whose first M components are 1 and whose remaining components are 0. This mask indicates which components of the N -vector extension of y are valid, and which are not. Let x(2) denote the vector whose ith component is x2i . Rewriting Eq. (1) in terms of vector 17
correlations yields C(x, y)i =
(x
x(2)
m i−
y)i −
1 M
(x
1 M
(x
m)2i
m)i
M −1 k=0 yk
M −1 2 k=0 yk
−
1 M
2 M −1 y k k=0
.
(4)
The sums that involve components of y are independent of i and can be treated as constants determined by the pattern. This equation requires direct transforms of x, x(2) , y, and m. It also requires inverse transforms of the vectors that produce x
y, x
m, and x(2)
m. It is well known that one
can do two real Fourier transforms for the same cost as one complex transform because the discrete Fourier transform of a real vector is conjugate symmetric[10]. That is, for N a power ˆ of 2, F(ω) = −F(N − ω), for 1 ≤ ω < N/2, and F(0) and F(N/2) are real. To transform x and y, take the transform of x + jy. Then ˆ + jy)N −ω )/2 F(x)ω = (F(x + jy)ω + F(x ˆ + jy)N −ω )/2j. F(y)ω = (F(x + jy)ω − F(x
(5)
In the inverse direction, to invert F(x) and F(y), where both x and y are real, take the Fourier inverse of F(x) + jF(y). The real part of this x and the complex part is y. This reduces the dominant cost of the computation for Eq. (4) to two direct and two inverse Fourier transforms. When cloud masks are available for both the pattern y and test image x, Eq. (1) can be modified easily to eliminate the terms that depend on occluded pixels. Set the occluded pixels of y to 0, and set the corresponding components of its cloud mask m to 0. Let h be the cloud mask for x, and set all of the occluded pixels of x and its mask h to zero. Then the
18
vector-correlation form of Eq. (1) is (x C(x, y)i =
x(2)
m i−
h
y)i −
1
m
i
(x
h
1
m
m)2i
i
(x
m)i (h
h y(2) i −
y)i h
1
m
i
(h
y)2i
. (6)
By doing pairs of real transforms together, the equation requires three forward and three inverse complex Fourier transforms.
References [1] K. Seidel, M. Schr¨ oder, H. Rehrauer, G. Schwarz, and M. Datcu, “Query by image content from remote sensing archives,” in IGARSS98, July 1998, pp. 393–396. [2] M. Datcu, K. Seidel, and M. Walessa, “Spatial information retrieval from remote-sensing images—Part I: Information theoretical perspective,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1431–1445, September 1998. [3] M. Schr¨ oder, H. Rehrauer, K. Seidel, and M. Datcu, “Spatial information retrieval from remote-sensing images—Part II: Gibbs-Markov random fields,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1446–1455, September 1998. [4] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner Dom, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, vol. 28, no. 9, pp. 23–32, September 1995. [5] J. R. Smith and S.-F. Chang, “Visualseek: A fully automated content-based image query system,” in Proceedings of ACM Multimedia 1997, November 1996, p. 87.
19
[6] R. Jain, “Content-based multimedia information management,” in Proceedings 14th International Conf. on Data Engineering, February 1998, pp. 252–253. [7] H. S. Stone, “Progressive wavelet correlation using Fourier methods,” IEEE Transactions on Signal Processing, vol. 47, no. 1, pp. 97–107, Jan. 1999. [8] H. S. Stone and T. Shamoon, “The use of image content to control image retrieval and image processing,” International Journal of Digital Libraries, vol. 1, no. 4, pp. 329–343, December 1997. [9] P. M. Teillet, N. El Saleous, M. C. Hansen, J. C. Eidenshink, C. O. Justice, and J. R. G. Townshend., “An evaluation of the global 1-km AVHRR land dataset,” International Journal on Remote Sensing, vol. 21, no. 10, pp. 1987–2021, 2000. [10] R. E. Blahut, Fast Algorithms for Digital Signal Processing, Addison-Wesley, Reading, MA, 1985.
20
Figure 1: A typical AVHRR image of the North Atlantic in which Long Island is completely visible.
21
Figure 2: An AVHRR image of the North Atlantic in which Long Island is partly obscured by clouds.
22
Figure 3: An AVHRR image of the North Atlantic in which Long Island is completely obscured by clouds.
23
Figure 4: One of two patterns used for retrieval.
24
Figure 5: The second of two patterns used for retrieval.
25
Precision vs Recall, Trial Data, 50/50 split 95
90
90
85
Precision (percent)
Precision (percent)
Precision vs Recall, Training, 50/50 split 95
80 75 70 65 60 55 50 45 65
one pattern two patterns three patterns 70
75
80
85
90
95
100
85 80 75 70 65 60 55
one pattern two patterns three patterns
50 55
60
65
Recall (percent)
70
75
80
85
90
95
Recall (percent)
(a)
(b)
Figure 6: The receiver-operating characteristic for a 50/50 split of the database, average of five trials. (a) Training data performance (118 images) (b) Trial set performance (118 images )
Precision vs Recall, Training, 60/40 split
Precision vs Recall, Trial Data, 60/40 split
90
95 90
Precision (percent)
Precision (percent)
85 80 75 70 65 60 55 60
one pattern two patterns three patterns 65
70
75
80
85
90
95
100
Recall (percent)
85 80 75 70 65 60 55
one pattern two patterns three patterns
50 55
60
65
70
75
80
85
90
Recall (percent)
(a)
(b)
Figure 7: The receiver-operating characteristic for a 60/40 split of the database, average of five trials. (a) Training data performance (142 images) (b) Trial set performance (94 images )
26
95
Precision vs Recall, Trial Data, 70/30 split 95
85
90
Precision (percent)
Precision (percent)
Precision vs Recall, Training, 70/30 split 90
80 75 70 65 60 55 50 60
one pattern two patterns three patterns 65
70
85 80 75 70 65 one pattern two patterns three patterns
60
75
80
85
90
95
55 60
100
65
70
Recall (percent)
75
80
85
90
95
100
Recall (percent)
(a)
(b)
Figure 8: The receiver-operating characteristic for an 70/30 split of the database, average of five trials. (a) Training data performance (165 images) (b) Trial set performance (71 images)
Precision vs Recall, Training, 80/20 split
Precision vs Recall, Trial Data, 80/20 split
90
100 95
Precision (percent)
Precision (percent)
85 80 75 70 65 60 55 60
one pattern two patterns three patterns 65
70
90 85 80 75 70 65 60 55
75
80
85
90
95
100
Recall (percent)
50 50
one pattern two patterns three patterns 60
70
80
90
Recall (percent)
(a)
(b)
Figure 9: The receiver-operating characteristic for an 80/20 split of the database, average of five trials. (a) Training data performance (189 images) (b) Trial set performance (47 images)
27
100
Precision vs Recall, Training, 90/10 split
Precision vs Recall, Trial Data, 90/10 split
90
95 90
Precision (percent)
Precision (percent)
85 80 75 70 65 60 55 60
one pattern two patterns three patterns 65
70
75
80
85
90
95
100
Recall (percent)
85 80 75 70 65 60 55
one pattern two patterns three patterns
50 60
65
70
75
80
85
90
95
Recall (percent)
(a)
(b)
Figure 10: The receiver-operating characteristic for a 90/10 split of the database, average of five trials. (a) Training data performance (212 images) (b) Trial set performance (24 images)
Figure 11: Visualization tool showing pattern and example image
28
100