Proc. of European Conference on Circuit Theory and Design (ECCTD’97) Budapest, Hungary, Aug. 31-Sep. 3, 1997, pp. 634-638
CELLULAR NEURAL NETWORKS FOR IMAGE COMPRESSION BY ADAPTIVE MORPHOLOGICAL SUBBAND CODING Marco Balsi and Stefano David Dep. of Electronic Engineering, ”La Sapienza” University, via Eudossiana 18, Rome, Italy I-00184 e-mail:
[email protected] - fax: +39-6-4742647 Abstract - An “analogic” algorithm is developed for the Cellular Neural Network Universal Machine, that realizes Adaptive Subband Decomposition, a recently proposed algorithm for image compression. In this way, it is possible to obtain very high compression rates with optimized perceptual performance, using real-time mixed analog/digital programmable processors.
while morphological filters generally produce a visually pleasant effect, they do not perform as well as linear filters on textured areas. Therefore, an estimation is performed at every pixel location in order to distinguish between textured and nontextured (i.e. quasi-uniform or edge) areas. Result of the estimation does not need to be transmitted, because it can be done on the synthesis side by I. INTRODUCTION examining the received low-pass image. The authors show that the new algorithm (in the following High compression image coding is needed in denoted ASD, for Adaptive Subband Decomposition) many existing and projected transmission and storage performs significantly better that other techniques, systems. In order to obtain high compression rates, such as JPEG or linear subband coding, especially at distortions allowed on the image should be very high compression rates. perceptually acceptable. Images compressed by usual The drawback of such algorithms is algorithms suffer from two main causes of visually complexity of computation involved, that requires disturbing distortion: the ringing effect and the quite powerful digital machines. It is therefore very blocking effect. Blocking can only be avoided by interesting to map the compression algorithm on a coding the image without dividing it into blocks, massively parallel analog machine like the CNN while ringing is inherent in linear filtering, but can be Universal Machine (CNNUM) [2]. In fact, a similar avoided by using nonlinear (e.g. morphological) approach was taken by Venetianer and Roska [3] for filters. JPEG, and by Moreira-Tamayo and Pineda de Gyvez Recently, Egger et al. [1] have proposed an [4] for wavelet decomposition. Use of the CNNUM adaptive algorithm based on a pyramidal guarantees greatly enhanced speed and simple decomposition, that selectively employs circuitry, which is also realizable jointly with image morphological or linear filtering on different areas of sensors and eventually with display devices on a the image, depending on local characteristics. In fact, single substrate [5][6]. + d 0 d Q 2 2 A = 1 D = d 0 d M(x) Q d 0 d + G(x)
+ Q
M(x)
2
Q
d = 0.3sign(∆vux )
2
-
G(x)
Figure 1: Adaptive Subband Decomposition
(a)
(b)
Figure 2: Median filter (for horizontal subsampling). Gray pixels are support, black pixel is to be reconstructed.
Detailed presentation of the CNNUM model can be found in [3]. Here we may look at it as a programmable nonlinear filter. The core of the machine is composed of an array of cells that are first-order dynamical systems having one state, one input and one output variable, connected with their neighbors and neighbors’ inputs within a short distance. Connection strengths are assigned homogeneously over the whole net, and described by linear or nonlinear functions; therefore each operational step is defined by a template that is representable as a set of matrices, of numbers or functions, containing the weights (or weighting functions) associated with connection of a generic cell with neighbor output (feedback matrix A or C) or input (feedforward matrix B or D). Dynamical equation of a generic cell in position ij can be written as follows: τ
dxij dt
= − xij + +
∑
kl ∈N (ij )
(
∑
kl ∈N (ij )
(
) ∑
) ∑
Cijkl x kl , xij +
1 xij + 1 − xij − 1 2 −1 ≤ uij ≤ 1 yij =
(
Aijkl y kl , yij +
kl ∈N (ij )
kl ∈N (ij )
(
(
)
Bijkl ukl , uij +
)
Dijkl ukl , xij + Iij
)
N (ij) denotes the set of indices of cells belonging to the neighborhood of cell ij. Most often connection functions are linear: Aijkl ykl , yij = Ai − k , j − l ykl or nonlinear functions of
(
)
the difference between local and non-local variables: Aijkl ykl , yij = Ai − k , j − l ∆ yy ; ∆ yy = ykl − yij . Each pixel of an image is associated with one cell, therefore we can identify the array of cell states X, inputs U and outputs Y with images. Input images may be fed to initial state X (0) , or static input U, while output is generally taken as steady state output Y (∞) . It is possible to process selected parts of an image by defining a “fixed-state” binary image F denoting which pixels should actually be processed. Each cell contains analog and logical memory and can perform logical operations locally; a global controller feeds templates to the cells and controls local storage, and input/output. Both linear and nonlinear filters can be mapped to the CNNUM architecture, and algorithms composed of several elemental analog and logic operations (“analogic” algorithms), with loops and
(
)
( )
branch points can be realized. In this paper, we will show that all the operations needed to implement ASD can be performed by the CNNUM, and in particular we will develop the texture estimation technique necessary for the choice of the filter to be employed. In section II the basic operations are identified, and linear and morphological CNNUM filters discussed. In section III the texture detection algorithm is developed. Section IV concludes the paper. II. ADAPTIVE SUBBAND DECOMPOSITION Figure 1 describes one step of ASD (i.e. decomposition in two bands), and identifies basic operations. M ( x ) denotes a morphological filter, while G( x ) is a linear filter. The switch symbol on the left of the filters denotes the decision (texture detection) operation; Q indicates a quantization operation, which actually realizes the compression and is not discussed in this paper, being performed using well-known techniques that are beyond the scope of this paper. The boxes with a wire across indicate identity operation, and sub- and up-sampling by a factor of 2 are denoted by the 2 and 2 symbols. The authors of [1] show that the optimal choice for M is a median half-band filter with a support of six pixels, as symbolized in Figure 2a. This corresponds to a CNNUM template of radius one, and following [7] the median filter can be defined by the template of Fig. 2b. For the linear filters, Egger et al. choose a very simple three-coefficient one-dimensional lowpass filter. Based on our simulations, and consistently with the choice of the authors of [3], we preferred a two dimensional filter (which is not more expensive than one-dimensional when implemented on the CNNUM). This filter is described by the
È
Ç
0.25 0 0.25 A = 0 B = 0.5 0 0.5 I = 0 0.25 0 0.25 Figure 3: Linear filter template (for horizontal subsampling) template of Fig. 3.
III. TEXTURE DETECTION The key operation of the algorithm, and the most computationally expensive, is texture detection. In [1], textures are extracted based on the criterion that local variance is high and almost equal in all directions (while edges have high variance across, but low in the parallel direction). We did not exploit this technique directly, because it involves computations that are not particularly suitable for CNNUM processing, and memorization of several quantities at each pixel location. Instead, we developed an original algorithm that is described hereafter. Fig. 4 contains the flow chart of the analogic algorithm. Boxes are indicative of single operations (templates), while ellipses denote images produced by the processing step. The input image is processed by the ENH_EDGE template [7] with θ = 0.2 , to increase edge contrast while bringing quasihomogeneous areas to a uniform intermediate gray. Subsequent double thresholding ( θ 2 = 0.4 ) [7] selects just those candidate homogeneous areas by creating a black-and-white mask identifying those areas previously labeled by the gray (zero) value. This mask must be integrated with edge information to yield a first estimate of the mask of pixels to be filtered with the morphological filter. To this purpose the ANISOD (anisotropic diffusion) template from [7] is applied with K = 01 . , with the effect of generating uniform regions with sharp contours, that are subsequently extracted with use of SR_GRAD (single-response gradient) template [7] with I = −0.3 . The result of this processing is added (OR) with the mask of homogeneous areas and inverted, to yield a mask of candidate texture areas. In fact, this image (labeled “texture 1” in Fig. 4) mostly contains isolated pixels: the last part of the algorithm is needed to generate reasonably connected regions. Therefore, a loop is iterated that employs the PATCHMAK (patch maker) template
INPUT ENH_EDGE
ANISOD
ENHANCED EDGES
SMOOTHED IMAGE
THRESHOLD (θ2)
THRESHOLD (-θ2)
SR_GRAD
NOT AND HOMOGENEOUS REGIONS
EDGES NOR
TEXTURE 1
AND
PATCHMAK
THRESHOLD (θ=0)
END?
FALSE TRUE TEXTURE 2 PATCHEXTR
TEXTURE MAP
Figure 4: Texture extraction algorithm 0.2 0.2 0.2 A = 2 B = 0.2 0.3 0.2 I = 0 0.2 0.2 0.2 Figure 5: PATCHEXTR template
NOT
from [8], whose operation is similar to morphological dilation, with τ = 6 , followed each time by subtraction of the edge mask, with the purpose of stopping dilation at edges. In fact, it is supposed that positive valued pixel in “texture 1” are internal to textured regions, and that those regions are delimited by edges. The loop is iterated 7 times (this value was selected after simulation on several images both highly (e.g. “baboon”), and scarcely (e.g. “peppers”) textured). At the end, image “texture 2” contains connected texture regions and several isolated pixels, that are removed by PATCHEXTR (patch extraction) template, designed on purpose (Fig. 5). In this way, the texture map is obtained. Basic steps of the algorithm are outlined in Fig. 6, by showing intermediate results of processing of the “baboon” image. In this and in the other cases examined, texture maps obtained are very similar to those shown in [1]. Texture detection, as described above, yields a black-and-white image that can be used as fixedstate mask for the linear filtering. Median filtering is also applied in fixed-state mode, by using the same mask after inversion. In this way, all the basic steps of ASD are successfully mapped to the CNNUM architecture. Two issues remain two addressed: (1) sub- and upsampling, which involve a re-organization of data on the CNN array, might be done by dedicated templates; however, we did not discuss this problem because it is probably simpler and faster to employ dedicated data-flow lines; also summation and difference can be realized very simply be resorting to radius-0 templates. (2) It should be noticed that pyramidal decomposition cannot be done straightforwardly as usual by decomposing each band separately. Egger et al. discuss how this should be done, and it is apparent that this only complicates the analogic algorithms a little bit, but no additional operations are needed.
input
enhanced edges
homogeneous regions
smoothed image
single response gradient
texture 1
IV. CONCLUSION Fig. 7 shows decomposition of the “baboon” image in three subbands (only two decomposition steps were applied for readability of the image). Results on this, and other sample images, are quite consistent with results of [1]. Quantitative estimation of the performance of CNNUM-based ASD is under way.
texture map
Figure 6: Texture extraction on “baboon”
In this paper, we have shown how the CNNUM can be successfully applied to image compression. When CNNUM chips are produced, the proposed technique may lead to very cheap and fast coding devices, making image transmission and storage conveniently available to a lot of commercial applications.
References [1] O. Egger, W. Li, M. Kunt, “High Compression Image Coding using an Adaptive Morphological Subband Decomposition”, Proc. of the IEEE, 83(2), pp. 272-287 (1995). [2] T. Roska, L.O. Chua, “The CNN Universal Machine: an Analogic Array Computer”, IEEE Trans. Circ. Syst. II 40(3), pp. 163-173 (1993). [3] P.L. Venetianer, T. Roska, “Image Compression by CNN”, Hungarian Academy of Sciences Computer and Automation Institute (MTASzTAKI) rep. DNS-13-1995, Budapest, Aug. 1995. [4] O. Moreira-Tamayo, J. Pineda de Gyvez, “Wavelet Transform Coding Using Cellular Neural Networks”, Proc. of 1995 Int. Symp. on Nonlinear Theory and its Applications (NOLTA’95), Las Vegas, NV, USA, Dec. 10-14, 1995, pp.541-544. [5] S. Espejo, A. Rodríguez-Vázquez, R. Domínguez-Castro, J.L. Huertas, E. SánchezSinencio, “Smart-Pixel Cellular Neural Networks in Analog Current-Mode CMOS Technology”, IEEE J. Solid-State Circ., 29(8), pp. 895-905 (1994). [6] M. Balsi, V. Cimagalli, F. Galluzzi, "A Proposal to Implement Optoelectronic CNN Systems by Amorphous Silicon Thin-Film Technology", Int. j. circ. th. appl., 24(1), 121-125 (1996).
[7] Cs. Rekeczky, T. Roska, A. Ushida, “CNN Based Self-Adjusting Nonlinear Filters”, Hungarian Academy of Sciences Computer and Automation Institute (MTA-SzTAKI) rep. DNS4-1996, Budapest, 1996. [8] Á. Zarándy, F. Werblin, T. Roska, L.O. Chua, “Novel Types of Analogic CNN Algorithms for Recognizing Bank-Notes”, Third IEEE Int. Workshop on Cellular Neural Networks and their Applications (CNNA-94), Rome, Italy, Dec. 18-21, 1994, pp. 273-278.
Figure 7: “Baboon” image, 512×512 pixels, decomposed into three subbands.