Image Coding using a Coloured Pattern Appearance Model

7 downloads 0 Views 1MB Size Report
We have developed a coloured pattern appearance model (CPAM) to characterise the visual appearance of small image patterns. By coding the model we ...
Image Coding using a Coloured Pattern Appearance Model♠ G. Qiu School of Computer Science & IT The University of Nottingham Jubilee Campus, Nottingham NG8 1BB, United Kingdom [email protected]

Abstract With desktop imaging devices becoming ubiquitous, effectively managing the images in large collections has become a challenge. The requirements for a modern imaging system now demand not only efficient storage (low bit rate coding), but also easy manipulation, indexing and retrieval of images. In this paper, we introduce a new method for colour image coding based on a visual appearance model of local colour image patterns. The visual appearance of small image patterns is characterised by their spatial pattern, colour direction and local energy strength. To encode the local visual appearance, an approach based on vector quantisation (VQ) is introduced. A separate VQ is designed for the spatial pattern and colour direction respectively. It is shown that the method not only achieves good image coding results in terms of rate distortion criterion, it also enables content-based retrieval to be performed in the compressed domain easily and conveniently. Keywords: colour image coding, vector quantization, colour vision, colour appearance model, image database

1. Introduction Image coding is a well-studied subject. Up to now, the goal of image coding has been optimum rate distortion trade-off. Coding techniques are assessed as good if the bit rates are low and the distortions are also low. Normally, for a given coding technique, the lower the bit rate, the higher the distortion, and vice versus. In terms of rate distortion criterion, image coding is one of the most successful fields compared to other relative fields such as computer vision, in the sense that many mature techniques have been developed and widely used in many real life applications. However, with the rapid advancement in processor speed, storage device technology, and faster network connection, low bit rate coding is no longer a critical factor in many practical systems applications. Based on a certain trade-off, higher bit rate can be acceptable. On the other hand, with desktop imaging devices becoming ubiquitous, effectively managing the images in large collection has become a challenge. The requirement for an imaging system now demand not only ♠

Part of this work was performed when the author was with the School of Computing, University of Leeds, United Kingdom. To view the images appear in this paper in colour, a pdf version is available online at http://www.cs.nott.ac.uk/~qiu/Online/Publications.html

efficient storage (low bit rate coding), but also easy manipulation, indexing and retrieval. This has prompted the suggestions of a “4th criterion” in image coding [13]. However, to the best knowledge of the author, [13] only suggested what were required for modern image coding techniques to meet the new demand, no serious attempts have been made by the author of [13] and others in the community to find solutions to meet all four criterions. In this paper, we attempt to address the new requirements for image coding techniques by developing a method that not only achieves good results in terms of rate distortion criterion, but also enables content-based retrieval to be performed in the compressed domain easily and conveniently. We have developed a coloured pattern appearance model (CPAM) to characterise the visual appearance of small image patterns. By coding the model we achieve data compression, and simultaneously, the codes can be directly used to index the image thus enabling content-based image retrieval. The emphasis of this paper is on image coding using the model and its application to image indexing and retrieval is presented in a companion paper [12].

2. CPAM: A Coloured Pattern Appearance Model There is evidence to suggest that different visual pathways process colour and pattern in the human visual system. In [1], experiments were carried out using square wave patterns with a range of different spatial frequencies, colours and stimulus strengths to measure how colour appearance depends on spatial pattern. Results suggest that the value of one neural image is the product of three terms. One term defines the pathway's sensitivity to the square wave’s colour direction. A second term defines the pathway’s sensitivity to the spatial pattern and the third term defines the pathway’s sensitivity to the square wave’s stimulus strength. This is the patterncolour-separable (PCS) model of human colour vision. There is also physiological evidence to suggest the existence of opponent colour signals in the visual pathway [2]. The opponent colour theory suggests that there are three visual pathways in the human colour vision system. One pathway is sensitive mainly to light-dark variations; this pathway has the best spatial resolution. The other two pathways are sensitive to red-green and blue-yellow variation. The blue-yellow pathway has the worst spatial resolution. In opponent-colour representations, the spatial sharpness of a colour image depends mainly on the sharpness of the light dark component of the images and very little on the structure of the opponent-colour image components However, the pattern-colour-separable model and the opponent colour theory are consistent with one another. In [1], the spatial and spectral tuning characteristics of the pattern, colour and strength pathways were estimated; the result was that one broadband and two opponent colour pathways were inferred. The property of the HVS that different visual pathways have different spatial acuity is well known and exploited in colour image processing in the form of colour models (spaces). The earliest exploitation of this perhaps was the use of YIQ signal in terrestrial TV broadcasting [3]; where the Y component captures the light-dark variation of the TV signal and is transmitted in full bandwidth, whilst the I and Q channels capture the chromatic component of the signal and are transmitted using half the bandwidth. Similar colour models, such as YCbCr [5], Lab [4] and many more [6] were also developed in different contexts and applications.

Put the colour models such as YIQ and YCbCr in the context of pattern colour separable framework, it could be roughly interpreted as that the spatial patterns are mostly contained in the Y channel and colours in the I and Q or Cb and Cr channels, whilst the strength is the overall energy of all three channels, although Y will contain the vast majority of it. Because colours and patterns are separable, coding the Y, independently from I and Q, or Cb and Cr, plus the strength, should completely capture the visual appearance of an image. We would like to translate the pattern-colour-separable model [1], into a computational system. Colour signals captured using a camera or other input devices normally appear in the form of RGB signals. It is first necessary to convert RGB to an opponent colour space. We decided to use the YCbCr colour model1 in this paper. The relation between YCbCr and the betterknown RGB space is as follows: 0.587 0114 .  R  Y   0.299 C  = −0169 . −0.331 0.500  G     b  Cr   0.500 −0.419 −0.081  B 

The Y contains the luminance information, Cb and Cr contain (mostly) chromatic information as well as some luminance information. Because pattern and colour are separable, and Y component has the highest bandwidth, the spatial patterns will be most contained in Y, Cb and Cr together can be roughly interpreted as colour. The stimulus strength of a small area of the image can be approximated by the mean values of the area in Y channel only. The three visual pathways, pattern, colour and strength for a small block of image are now modelled in the coloured pattern appearance model (CPAM) as shown in Fig.1.

Y

S

mean

÷

P

Cb

↓2 Cr

÷

C

↓2 Fig.1 Coloured Pattern Appearance Model (CPAM). The visual appearance of a small image block is modelled by three components: the stimulus strength (S), the spatial pattern (P) and the colour (C). For a small image area, the stimulus strength S is approximated by the local mean of the Y component. The pixels in Y normalised by S form the spatial pattern. Because Cb and Cr have lower bandwidth, they are subsampled by a factor of 2 in both dimensions. The sub-sampled pixels of Cb and Cr are normalised by S, to form the colour (C) component of the appearance model. Normalising the pattern and colour channels by the strength has two purposes. First, from a coding’s point of view, removing the DC component makes the code more efficient [7]. Second, from image indexing’s point of view, it removes to a certain extent the effects of lighting conditions, making the visual appearance model somewhat “colour constant” [10] which should

1

Other similar colour space can also be used.

improve the indexing and retrieval performance, especially in the case of retrieving similar surfaces imaged under different conditions.

3. Image Coding based on CPAM In order to use the model for the purpose of image coding and indexing, the S, P and C signals of the model have to be coded properly. Because we have in mind the code should capture the visual appearance of the image and simultaneously could be conveniently used as features for indexing in image database, we design our encoder based on vector quantization [7]. Vector quantization (VQ) is a mature method of lossy signal compression/coding in which statistical techniques are used to optimise distortion/rate tradeoffs. A vector quantizer is described by an encoder Q, which maps the k-dimensional input vector X to an index i ∈ I specifying which one of a small collection of reproduction vectors (codewords) in a codebook C = {Ci; i ∈ I} is used for reconstruction, and there is also a decoder, Q-1, which maps the indices into the reproduction vectors, i.e., X’ = Q-1(Q(X)). There are many methods developed for designing VQ codebook. The K-means types of algorithms, such as the LGB algorithm [7], and neural network based algorithms, such as the Kohonen feature map [8] are popular tools. In this work, we used a specific neural network training algorithm, the frequency sensitive competitive learning (FSCL) algorithm [9] to design our codebook. We find FSCL is insensitive to the initial choice of codewords, and the codewords designed by FSCL are more efficiently utilised than those designed by methods such as the LGB algorithm. The FSCL method can be briefly described as follows: 1. Initialise the codewords, Ci(0), i = 1, 2, …, I, to random number and set the counters associated with each codeword to 1, ni(0) =1 2. Present the training sample, X(t), where t is the sequence index, and calculate the distance between X(t) and the codewords, Di (t) = D(X(t), Ci(t)), and modify the distance according to D*i(t)=ni(t)Di(t) 3. Find j, such that D*j

Suggest Documents