classification and indexing: data compression, having its major advantage in its ... complexities, which they call Normalized Compression Distance (NCD).
IMAGE CLASSIFICATION USING DATA COMPRESSION BASED TECHNIQUES Daniele Cerra 1 and Mihai Datcu 1
German Aerospace Center DLR, Remote Sensing Technology Institute IMF, 82234 Wessling, Germany 2
Télécom Paris, 46 rue Barrault, 75634 Paris, France
1. INTRODUCTION Earth Observation applications are seldom usable on different kinds of data types, being strongly dependant on the characteristics of the sensor used (i.e. spatial, spectral and radiometric resolutions of the data), models adopted and a priori assumptions. We propose a parameter-free, model independent methodology based on data compression to perform image classification and indexing: data compression, having its major advantage in its universal applicability, can be a powerful and reliable instrument to discover similarities between heterogeneous kinds of data. Image classification can be performed in the compression step, by retrieving patterns within the data and matching them with the characteristics of the classes of interest. While previous work by other authors concentrated on string coding with representative dictionaries, we propose to inject in the workflow a well-known data compression based similarity metric: this is computed between alternative string pairs representing current data and existing classes, from which we select the best combinations. This approach, without increasing the required run time, yields a more precise estimate of the characteristics of each image segment, due to the fact that the similarity indices take into account the single complexities of each class and dataset, and are normalized accordingly. We tested our approach with a number of optical satellite images: in all cases, the classification results were reliable for typical remote sensing data. 2. PROPOSED METHODOLOGY 2.1. Pattern Recognition using Data Compression Watanabe et al. introduce in  a methodology for classification of general data, which they name Pattern Recognition using Data Compression (PRDC). In the case of satellite imagery, image classification is performed by encoding the local gradients of an image and arranging them as edges into an undirected graph, where each pixel of the image represents a node. By removing steep gradients one preserves homogeneous areas that can be segmented  while keeping part of the spatial features. For classification, one can then encode the segments into text strings that can be compressed with dictionaries extracted from the various classes of interest. 2.2. Normalized Compression Distance Vitanyi et al.  propose a general similarity metric based on data compression, regarded as a way to estimate approximated complexities, which they call Normalized Compression Distance (NCD). The indices are stored in a distance matrix that can be used for classification by applying on it hierarchical clustering. 2.3. The Proposed Approach In  it is shown how the PRDC methodology and the NCD metric rely on the same principles to estimate the similarity indices, with NCD being more robust and accurate. We propose to perform the image classification by injection of the NCD similarity metric inside the workflow of PRDC (The resulting processing chain is sketched in Fig. 1). This results in a new, more robust methodology: while PRDC performs image classification using simple compression ratios, NCD takes into account the single complexities of each dataset and normalizes accordingly the similarity index.
Region 1 Region …2 Region n
Class 1 Min NCD for each Region/Clas s
Class 2 … Class n
Fig. 1. Workflow for the proposed methodology. After image segmentation, each region of the image is matched with representative datasets for each class using the similarity metric NCD. Each region is considered to belong to the class which minimizes the NCD.
This methodology can be used to perform image classification for all kinds of optical satellite imagery, captured with any sensor at any spatial, spectral and radiometric resolution. 3. SELECTED REFERENCES  T. Watanabe, K. Sugawara, and H. Sugihata, “A New Pattern Representation Scheme Using Data Compression”, IEEE Trans Pattern Analysis Machine Intelligence, 24:579-590, 2002.  J. R. Lersch, A. E. Iverson, B. N. Webb, and K. F. West, “Segmentation of Multiband Imagery using Minimum Spanning Trees”, Proc. SPIE, vol. 2758, pp. 10-18, 1996.  M. Li, X. Chen, X. Li, B. Ma and P.M.B. Vitányi, “The Similarity Metric”, IEEE Trans Inf Theory, 50|12:3250-3264, 2004.  D. Cerra and M. Datcu, “Model Conditioned Data Compression Based Similarity Measure”, to be published in Proc. DCC, 2008.