A Self-organizing Principle for Segmenting and Super-resolving ISAR Images Frank M. Candocia
[email protected]
Jose C. Principe
[email protected]
Computational NeuroEngineering Laboratory University of Florida, Gainesville, FL. 32611 http://www.cnel.ufl.edu
Abstract We present and illustrate the use of a bottleneck system for the segmentation and super-resolution of ISAR targets. The system is shown to be comprised of three basic subsystems: a compressing transformation, a bottleneck processor, and a decompressing transformation. We describe each subsystem and discuss the processing responsible for segmentation and super-resolution within this framework. Results using this network are assessed and issues regarding performance are introduced.
1. Introduction Feature extraction is critical in many signal and image processing applications but our ability to automatically extract features from data is very limited. In preselecting features, we rely too much and too often on our apriori knowledge of the problem. This methodology can be problematic when such apriori knowledge is scarce and also hinders our ability to quantify the quality of the features chosen. In our opinion, feature extraction should be based on selforganizing methods because the signal’s samples are the only available information source. Such a methodology is encountered in prediction where the input signal is cleverly utilized as a desired response. Prediction is a way of self-organizing a system for time signals but it is much more difficult to apply for images. The other known principle of selforganization with an implicit desired response is auto-association with a bottleneck layer [Bourland and Kamp, 1988]. This can be thought of as the equivalent to prediction for images and also gives us a model for the intrinsic structure of the data.
In essence, we seek to model image data for a given class of imagery that is “independent” of its scale. Such an approach has been proposed in [Candocia and Principe, 1997]. The idea of bottleneck processing is not new. This type of processing has had much success in the areas of image compression [Jain, 1989] and subspace pattern recognition [Oja, 1983]. In image compression, the saving of a few transform coefficients to represent data in a compressed form can be formulated as a bottleneck process. In subspace pattern recognition, the reduction in dimensionality of a signal is an important practical step to obtaining discriminant functions. This type of processing, though, is not restricted to auto-association. It has seen use in heteroassociation via non-symmetric PCA [Kung, 1993]. The work presented here makes use of such processing in a more general, non-traditional fashion.
2. The Bottleneck System The bottleneck system (BNS) as an autoassociator is composed of three basic components: (1) a compressing transformation which is responsible for producing a reduced representation of the input signal, (2) a bottleneck processor which further processes the compressed signal space and (3) a decompressing transformation which is responsible for reconstructing the input signal. This is illustrated in fig. 1. The input to the BNS is given by the vector x and the reconstructed output is denoted x$ . The first and last blocks of this processing are the projections that constitute the forward and inverse transforms of a signal, respectively. These transforms need not be linear. The only constraint
is that the dimensionality of the input space be reduced. The compressed input space is denoted y and this feeds into the bottleneck processor (BNP). The BNP generates information z regarding y which could aid in the reconstruction of x; it could also receive additional information q as input to aid in the generation of z. The information z and compressed input space y then feed into the decompressor. x
Compressing Transformation
q y
Bottle-neck Processor
z
DeCompressing Transformation
x$
Figure 1. Block diagram of the BNS. Commonly used compression techniques make use of linear transforms in the first and last blocks of the BNS and the BNP is simply an identity transformer f ( y ) = y with no q or z. In this case a vector x∈ℜN is transformed to a vector y∈ℜM via y=Wx where M < N. The approximate reconstruction of vector x is given by x$ = WHy where H denotes the Hermitian of W. In PCA, it is known that W is a matrix whose rows are the M largest eigen-valued eigenvectors of x’s covariance matrix and y is a vector of the principal components of x. In transforms such as the DCT or DFT, the rows of W constitute sinusoidal basis functions (real and complex, respectively) for which to project x onto and y are the corresponding transform coefficients. The basis functions used for projecting are the ones that yield the largest transform coefficients.
information about the brightness of a pixel and eventually texture. Here we will work simply with brightness. This brightness is directly proportional to the amount of backscatter received by the radar - which is usually large for metallic objects relative to non-metallic ones. As such, the segmentation problem is local in nature and should make use of a local brightness measure. This is done by transforming local neighborhoods of our ISAR images into spherical coordinates. More specifically, each H × H neighborhood of our ISAR images is regarded as a vector (in Cartesian coordinates) in an H2 dimensional vector space. Our set of vectors, or ISAR image neighborhoods, are then transformed to their multi-dimensional spherical coordinates. One of the coordinates in this representation is known to be the length or norm of the vector. This quantity is descriptive of the brightness of an ISAR image neighborhood. This is the preprocessing performed with regards to our segmentation problem.
3. Pre-processing the Input This paper addresses the ability of a BNS to (1) segment target vs. background and (2) superresolve a 1m × 1m resolution ISAR data set to 1ft × 1ft resolution. These two problems are very different. The first is a clustering (classification) problem and the second is a regression problem. To aid in tackling these problems with the BNS, it is important to consider what pre-processing of the input, if any, should be performed. The preprocessing step should reflect any characteristic of the data inherent to alleviating the complexity associated with the problem. Let us now note that any reference to an ISAR image is referring to the PWF transformed ISAR target. An ISAR image is thus real valued. The 1ft × 1ft resolution training and test images are illustrated in fig. 2. It is evident that segmenting a target versus background in an ISAR image requires
Figure 2. High resolution ISAR images. (left 8) training (right 8) testing. The preprocessing for super-resolving an ISAR image is different from that just described. The backscatter at various points on targets can be quite similar - even across a set of different targets situated at varying aspect angles relative to the radar. This also suggests a local approach to the super-resolution problem. What is not clear at present is which set of descriptors (and for that matter, pre-processor) retains the most information about a class of images across resolutions. In our pre-processing, we have decided to normalize each vector (ISAR image neighborhood) to unit length. The effect of this operation will be discussed in the next section.
4. Defining the Blocks of the BNS The framework for the processing of ISAR images is given by the BNS. Here we motivate and define the processes contained within each of the blocks pictured in fig. 1.
images. This compressing transformation also works well for the segmentation problem. It further reduces speckle in our coarser ISAR images due to the decimation (albeit at the expense of image detail). However, the detail is not significant to this segmentation problem.
4.1 The Compressing Transformation
4.2 The Bottleneck Processor
The process of super-resolving ISAR/SAR information involves the increase of its resolution. This process is akin to that of interpolation in images. Here we synthesize the lower resolution 1m × 1m ISAR set to be super-resolved by decimating our original 1ft × 1ft ISAR images by a factor of 3 × 3. These images are illustrated in fig. 3. Note that now we have two versions of the same imagery with different resolutions to train a model. Later on, the model can be used on new low resolution data to enhance it.
It is important to be able to extract features (in a self organizing manner) from the compressed and pre-processed input space y. These features serve to establish the M most relevant descriptors of this space and are subsequently used to partition it. Here, the feature extraction is accomplished via vector quantizion (VQ) of the neighborhoods yp of the compressed and pre-processed training images in set y. A number of VQ algorithms exist including k-means, Kohonen’s self organizing feature map [Kohonen, 1990] and the neural gas algorithm [Martinez et al., 1993]. The codebook vectors or quantization nodes qz, z = 1,L , M that result from VQ are the intrinsic descriptors of y. We denote the set of quantization nodes by q, i.e. q={q1,...,qM} and each qz∈ℜK where K=H2. Our study makes use of the BNP illustrated in fig. 4. There are two separate inputs to this block: q and y as previously discussed. Clustering neighborhoods yp based on closest distance to each qz results in a hard partitioning of y into regions that are most correlated. Specifically, the cluster Cz contains those neighborhoods yp of y that are closest to qz in Euclidean distance. This is given in eqn. (1).
Figure 3. Synthesized low resolution ISAR images. (left 8) training (right 8) testing. Decimation is the process of appropriate lowpass filtering followed by subsampling [Crochiere and Rabiner, 1981]. Notice that this is a noninvertible, non-linear transformation which yields a coarse representation of our input images. Neighborhoods are then extracted from the decimated images. The set of these neighborhoods will be denoted by y = {yp} where each yp is a distinct H × H neighborhood from the compressed training images that has been converted to a vector by stacking the columns of the square neighborhood (or matrix) one on top of the other. These neighborhoods are samples of the compressed input space y alluded to in fig. 1. Our compressing transformation is thus a decimation by a factor of 3 × 3 followed by an extraction of neighborhoods from the resulting
{
Cz = y p : q z − y p
2
< qa − y p
2
}
(1)
where z = 1,L , M and z ≠ a . The single integer output z ∈{1,L , M } of the BNP represents the cluster Cz that a neighborhood yp belongs to. q y
Clustering Procedure
z
Figure 4. The BNP implemented for this paper. The segmentation problem mentioned needs only M=2 quantization nodes. One node theoretically clusters neighborhoods corresponding to targets and the other corresponds to non-target neighborhoods (no shadows are considered).
The super-resolution approach makes use of M=30 quantization nodes, i.e. the neighborhoods yp are partitioned into 30 clusters which will each be super-resolved. Note that (the vector for) each yp clustered is of unit length due to the preprocessing that was performed. This form of preprocessing yields scale invariant neighborhoods, i.e. for two neighborhoods a and b, if a ≈ kb1 (k a scalar), these two neighborhoods have the same underlying reflectance properties, regardless of the illumination in the scene. This is an assumption made in homomorphic image processing [Dony and Haykin, 1995] which may not be valid for our ISAR images (as alluded to earlier). The question as to what pre-processor to use for super-resolution needs further addressing.
that were PWF transformed from the TABILS 24 data set. The resolution of this data set is 1ft × 1ft. We chose 8 targets for each of our training and test sets spanning 180° of aspect angles. The difference between target aspect angles in each set was 22.5°. The corresponding 1m × 1m low resolution training/test data was simulated through decimation of the high resolution training/test data as discussed earlier. Neighborhoods of 5 × 5 (H=5) were utilized in the extraction of features both for the segmentation and super-resolution examples. The features automatically found through clustering the low resolution neighborhoods for the purpose of target segmentation are illustrated in fig. 5.
4.3 The DeCompressing Transformation The decompressing transformation is not needed for the segmentation problem; the output z of the BNP describes the cluster a neighborhood corresponds to. Each neighborhood in the ISAR image is thus assigned to the target or non-target cluster. The decompressing transformation is obviously needed for the super-resolving of ISAR images. The neighborhoods yp have been clustered into M=30 groups. Our decompressing transformation is composed of M=30 individual affine transformations {Wz, Bz} - each tailored to the specific information contained in cluster Cz, z = 1,L , M . z now is essentially a “pointer” or indicator as to which individual transformation to use. The reconstruction of a neighborhood x$ p is accomplished by:
(
)
x$ p = y p ⋅ uvec Wz y p + Bz ; y p ∈ Cz 2
(26)
(10)
Figure 5. Features extracted for target segmentation. (left) target, (right) non-target. These features have been scaled to visually enhance the structure associated with each feature. The number in parenthesis indicates the 8-bit gray level difference between the brightest and darkest value in each feature. Notice that the extracted feature corresponding to targets has a peaky center. This is consistent with the notion that ISAR targets are characterized by bright point scatters. The non-target feature is, interestingly enough, an “anti-target” feature. It characterizes local information that is “opposite” that of target information. Fig. 6 illustrates the target vs. nontarget segmentation results.
(2)
where Wz and Bz are the weight matrix and bias vector associated with an affine transformation, uvec(⋅) undoes the vectorizing operation that was performed to the neighborhoods yp in set y and the 2-norm of yp is used to restore the length of the vector which was removed during pre-processing. Details concerning the individual transformations are discussed in [Candocia and Principe, 1997].
5. Results The results presented here utilized ISAR targets a ≈ kb is our short hand notation for a − kb 2 < ε where ε > 0 and small and a,b are vectors
1
Figure 6. Segmented low resolution images. (left 8) training (right 8) test.
It is important to note that pre-processing is critical to the types of features extracted with the self-organized clustering scheme. The capacity of the network to interpolate or super-resolve the low resolution representations to the resolution of the original images is illustrated in fig. 7.
Figure 7. Super-resolved images. (left 8) training (right 8) test. The pre-processing of the low resolution images consisted of a simple normalization of the image neighborhoods. Here, M=30 features were extracted from the pre-processed low resolution images. Note that we are attempting to establish a system that recovers 9 times the information presented to it. The cropping effect is due to not super-resolving image portions with low “confidence”. This confidence is directly attributed to the amount of available data about the location to super-resolve.
targets could be digitally interpolated on the fly to 1ft × 1ft using our method. We are still investigating many issues concerning the ISAR/SAR super-resolution problem. Our research on optical images has shown the existence of highly correlated information across scales and that this information can be exploited for interpolation. There is an analogous relation to this in the electro-magnetic domain of SAR which we also wish to exploit. Very probably, the super-resolution should be performed both at the complex and PWF transformed image levels. Also, questions as to hard vs. soft partitioning of the low resolution space are being examined as well as what pre-processing is “most appropriate” for the super-resolution problem.
Acknowledgements This work was partially supported by DARPA F33615-97-1019.
References Bourland H. and Kamp Y. (1988). “Auto-association by the multilayer perceptron and singular value decomposition”, Biological Cybernetics, Vol. 59, pp. 291-294. Candocia F.M. and Principe J.C. (1997). “A Neural Implementation of Interpolation with a Family of Kernels”, To appear in Proc. Int. Conf. Neur. Net. (ICNN 97), Houston, Tx.
The test portion of fig. 3 with fig. 7 shows that the BNS approach is capable of capturing the salient characteristics of a class of images across resolutions.
Crochiere R.E. and Rabiner L.R. (1981). "Interpolation and Decimation of Digital Signals - A Tutorial Review", Proc. of IEEE, Vol. 69, No. 3, pp. 300-331.
6. Discussion and Conclusions
Dony R.D. and Haykin S. (1995). “Optimally Adaptive Transform Coding”, IEEE Trans. Image Proc., Vol. 4, No. 10, pp. 1358-1370.
The self-organization approach solved the segmentation problem here with little effort. In order to segment, it was important to have a local measure of brightness available to the clustering. In fact, the coordinate largely responsible for the segmenting was the length coordinate in the spherical coordinate transformation utilized. By clustering on this sole coordinate, comparable results to those of fig. 6 were obtained. The bottleneck approach extracts and models the image structure across resolutions in the image set. The derived model can then be applied to new low resolution images to super-resolve them. For instance, a 1m × 1m PWF radar image of
Jain A.K. (1989), Fundamentals of Digital Image Processing, Englewood Cliffs, NJ: Prentice Hall. Kohonen T. (1990). “The Self-organizing Map”, Proc. IEEE, Vol. 78, pp. 1464-1480. Kung S.Y. (1993). Digital Neural Networks, Englewood Cliffs, Prentice Hall, NJ: Chap. 8. Martinez T.M., Berkovich S.G. and Schulten K.J. (1993), “'Neural-gas' network for vector quantization and its application to time-series prediction”, IEEE Trans. Neur. Net., Vol. 4, No. 4, pp. 558-569. Oja E. (1983), Subspace Methods of Pattern Recognition, Letchworth, UK: Research Studies Press.