Rotation-invariant texture retrieval with Gaussianized steerable pyramids George Tzagkarakis∗ 1 , Baltasar Beferull-Lozano2 , Associate Member, IEEE, and Panagiotis Tsakalides1 , Member, IEEE
1
Department of Computer Science, University of Crete & Institute of Computer Science - FO.R.T.H.
ICS-FORTH, P.O. Box 1385, 711 10 Heraklion, Crete, Greece Phone: +30 2810 391670, Fax: +30.81 - 391601 E-mail:
[email protected],
[email protected]
2
School of Computer and Comm. Sciences Audiovisual Communications Laboratory
Swiss Federal Institute of Technology (EPFL) CH1015, Lausanne, Switzerland E-mail:
[email protected]
Submitted to: IEEE Trans. on Image Processing (April 2005) Revised: September 2005 EDICS: 5-OTHE
1
Abstract This paper presents a novel rotation-invariant image retrieval scheme based on a transformation of the texture information via a steerable pyramid. First, we fit the distribution of the subband coefficients using a joint alpha-stable sub-Gaussian model to capture their non-Gaussian behavior. Then, we apply a normalization process in order to Gaussianize the coefficients. As a result, the feature extraction step consists of estimating the covariances between the normalized pyramid coefficients. The similarity between two distinct texture images is measured by minimizing a rotation-invariant version of the Kullback-Leibler Divergence between their corresponding multivariate Gaussian distributions, where the minimization is performed over a set of rotation angles.
Index Terms Statistical image retrieval, rotation-invariant Kullback-Leibler Divergence, steerable model, Fractional Lower-Order Moments, sub-Gaussian distribution.
I. I NTRODUCTION During the last decades, information is being gathered and stored at an impressive rate on large digital databases. Examples include multimedia databases containing audio, images and video. The search of large digital multimedia libraries, unlike the search of conventional text-based digital databases, cannot be realized by simply searching text annotations. Because of the amount of details in multimedia data, it is difficult to provide automatic annotation without human support. The design of completely automatic mechanisms that extract meaning from this data and characterize the information content in a compact and meaningful way is a challenging task. Content-based Image Retrieval (CBIR) is a set of techniques for retrieving relevant images from a database on the basis of automatically-derived features, which accurately specify the information content of each image. We can distinguish two major tasks, namely Feature Extraction (FE) and Similarity Measurement (SM). In the FE step, a set of features constituting the so-called image signature is generated after a pre-processing step (image transformation), to accurately represent the content of a given image. This set has to be much smaller in size than the original image, while capturing as much as possible of the image information. During the SM step, a distance function is employed which measures how close each image in the database is to a query image, by comparing their signatures. Typical low-level image features, such as color [1], shape [2] and texture [3], are commonly used in CBIR applications. In this work, we focus on the use of texture information for image retrieval. Loosely
2
speaking, the class of images that we commonly call texture images includes images that are spatially homogeneous and consist of repeated elements, often subject to some randomization in their location, size, color and orientation. Previously developed texture extraction methods include multi-orientation filter-banks and spatial Gabor filters [4]. The basic assumption for these approaches is that the energy distribution in the frequency domain identifies a texture. These retrieval systems use simple norm-based distances (e.g. Euclidean distance) on the extracted image signatures, as a similarity measure. In this work, we consider the tasks of FE and SM in a joint statistical framework. Thus, in our approach, the FE step becomes a Maximum Likelihood (ML) estimator of the model parameters fitting the given image data, while the SM step employs a statistical measure of similarity, such as the Kullback-Leibler Divergence (KLD) [5], between probability density functions having different model parameters. In this setting, optimal retrieval is asymptotically achieved. Using this statistical approach, a simple extension of the energy-based methods for texture retrieval is to model each texture by the marginal densities of the transform coefficients. This is motivated by the results of recent physiological research on human texture perception, which suggest that two homogeneous textures are often difficult to discriminate if they produce similar marginal distributions of responses from a filter-bank [6]. The development of retrieval models in a transform-domain is based on the observation that often a linear, invertible transform restructures the image, resulting in a set of transform coefficients whose structure is simpler to model. Real-world images are characterized by a set of “features”, such as textures, edges, ridges and lines. For such images, the 2-dimensional (2-D) wavelet transform has been shown to be a powerful modeling tool, providing a natural arrangement of the wavelet coefficients into multiscale and oriented subbands representing the horizontal, vertical and diagonal edges [7]. Texture information is modeled using the first or second order statistics of the coefficients obtained via a Gabor wavelet transform [8], or an overcomplete wavelet decomposition constituting a tight frame [9]. On the other hand, considering a statistical framework, texture is modeled by joint probability densities of wavelet subband coefficients. Until recently, wavelet coefficients have been modeled either as independent Gaussian variables or as jointly Gaussian vectors [10]. However, it has been pointed out that the wavelet transforms of real-world images tend to be sparse, resulting in a large number of small coefficients and a small number of large coefficients [11]. This property is in conflict with the Gaussian assumption, giving rise to peaky and heavy-tailed non-Gaussian marginal distributions of the wavelet subband coefficients [11], [12]. Experimental results have proven that the generalized Gaussian density (GGD) is a suitable member of the class of non-Gaussian distributions for modeling the marginal behavior of the wavelet coefficients [11],
3
[13]. Computationally tractable image retrieval mechanisms based on a combination of overcomplete wavelet-based texture and color features are described in [14], where the similarity measure between the GGD models is based on the Bhattacharrya distance. Recently, the GGD models have been also introduced in a statistical framework for texture retrieval in CBIR applications, by jointly considering the two problems of FE and SM [15]. In recent work, we showed that successful image processing algorithms can achieve both superior noise reduction and feature preservation if they take into consideration the actual heavy-tailed behavior of the signal and noise densities [16], [17]. We demonstrated that successful modeling of subband decompositions of many texture images is achieved by means of symmetric alpha-stable (SαS ) distributions [18], [19], which very often provide a better fit of the non-Gaussian heavy-tailed distributions, than the generalized Gaussian distribution (GGD), thus motivating their use in our CBIR model. After extracting the SαS model parameters, we analytically derived the KLD between two SαS distributions. Our formulation improved the retrieval performance, resulting in a decreased probability error rate for images with distinct non-Gaussian statistics [20], compared with the GGD model. However, the majority of current approaches does not take into account the important interdependencies between different subbands of a given image, which can be employed in order to provide a more accurate representation of the texture image profile. Huang studied the correlation properties of wavelet transform coefficients at different subbands and resolution levels, applying these properties on an image coding scheme based on neural networks [21]. Portilla and Simoncelli developed an algorithm for synthesizing texture images by setting different constraints on the correlation between the transform coefficients and their magnitudes [22]. The theory of Markov random fields has enabled a new generation of statistical texture models, in which the full model is characterized by statistical interactions within local neighborhoods [23]. Recently, a new framework for statistical signal processing based on wavelet-domain hidden Markov models has been proposed [24], [25]. It provides an attractive modeling of both the non-Gaussian statistics and the property of persistence across scales in a wavelet decomposition. In this paper, we proceed by grouping the wavelet subband coefficients and considering them as samples of a multivariate sub-Gaussian random process, which is characterized by the associated fractional lowerorder statistics. Within the framework of sub-Gaussian processes, we use the notion of covariation instead of the second-order covariance, in order to extract possible interdependencies between wavelet coefficients at different image orientations and scales. The joint sub-Gaussian modeling preserves the heavy-tailed behavior of the marginal distributions, as well as the strong statistical dependence across orientations and
4
scales. A desirable property in a CBIR system is rotation invariance. This is a topic that has been previously pursued by various researchers. Greenspan et al. [26] and Haley and Manjunath [27], [28] employed rotation-invariant structural features, using autocorrelation and DFT magnitudes, obtained via multiresolution Gabor filtering. Recently, a rotation-invariant image retrieval system based on steerable pyramids was proposed by Beferull-Lozano et al. [29]. In this system, the correlation matrices between several basic orientation subbands at each level of a wavelet pyramid are chosen as the energy-based texture features. Mao and Jain [30] presented a multiresolution simultaneous autoregressive (MR-SAR) model where a multivariate rotation-invariant SAR (RISAR) model is introduced, which is based on the circular autoregressive (CAR) model. A second category of methods achieving rotation invariance includes the implementation of a Hidden Markov Model (HMM) on the subband coefficients of the transformed image. Do and Vetterli [25] derived a steerable rotation-invariant statistical model by enhancing a recently introduced technique on waveletdomain HMM [24]. Liu and Picard [31] exploited the effectiveness of the 2-D Wold decomposition of homogeneous random fields, in order to extract features that represent perceptual properties described as “periodicity”, “directionality” and “randomness”. The above mentioned rotation-invariant CBIR techniques can be classified in two classes. The first class includes techniques where the FE step consists of computing rotation-invariant texture features, while the SM step consists of applying a common similarity function, such as the Euclidean distance and the KLD. The second class includes techniques where the FE step consists of estimating the parameters of a so-called steerable model and then applying a rotation-invariant version of a common similarity function (e.g. KLD), during the SM step. In this paper, we describe a novel technique belonging to the second class. First, we design a new steerable model, which is based on the joint sub-Gaussian modeling of the coefficients of a steerable pyramid incorporating dependence across orientations and scales. Then, we apply a Gaussianization procedure on the steerable pyramid coefficients, by jointly considering them as samples of a multivariate sub-Gaussian distribution, viewed as a special case of a Gaussian Scale Mixture (GSM). After the Gaussianization step, we derive an analytical expression for a rotation-invariant version of the KLD between multivariate Gaussian densities (including the rotation angle between textures), avoiding the use of a computationally heavy Monte-Carlo method, usually employed to approximate the KLD in the non-Gaussian case [25]. Our system has several advantages with respect to the HMM-based methods. First, HMMs require
5
the use of an Expectation-Maximization (EM) algorithm, which in some cases may not converge, for the estimation of the model parameters (hidden state variables and statistics of a Gaussian mixture). On the other hand, our proposed method incorporates dependence across space, orientations and scales, combined in an efficient way of estimating the multipliers of the multivariate sub-Gaussian model, which are necessary to perform the Gaussianization. Besides, by exploiting the statistical dependencies between subbands at adjacent scales, we insert the same first-order Markovian dependence as in HMMs, but in a simpler way. Also, for the heavy-tailed modeling, we use SαS distributions, which are often better than GGDs. The rest of the paper is organized as follows: in section II, we briefly review the probabilistic setting for a CBIR problem. In section III, we justify the choice of the multivariate sub-Gaussian model for the joint modeling of the wavelet coefficients. In section IV, we develop a rotation-invariant CBIR system by applying a Gaussianization procedure on the coefficients of a steerable pyramid. In section V, we apply our scheme to a set of textures and evaluate the retrieval performance. Finally, in section VI, we provide conclusions and directions for future research. II. S TATISTICAL CBIR ~ = {~x1 , . . . , ~xN | ~xi ∈ F , i = 1, . . . , N } be a set of N Let F denote the feature space and X
independent feature vectors associated to a query. Also, let S = {1, . . . , K} be the set of class indicators associated with the image classes in the database. Denote the probability density function (PDF) of the query feature vector space by pq (~x) and the PDF of class i ∈ S by pi (~x). The design of a retrieval system in a probabilistic framework, consists of finding an appropriate map g : F 7→ S . These maps constitute the set of similarity functions. The goal of a probabilistic CBIR system is the minimization of the probability of retrieval error, that ~ 6= s). Hence, if we provide the system with a set of feature vectors X ~ drawn is, the probability P (g(X) ~ from class s, we want to minimize the probability that the system will return images from a class g(X)
different from s. It can be shown [32] that the optimal similarity function, that is, the one minimizing ~ 6= s), is the Bayes or maximum a-posteriory (MAP) classifier P (g(X) ~ = arg max P (s = i|X) ~ g ∗ (X) i
~ = i)P (s = i), = arg max p(X|s i
(1)
~ = i) is the likelihood for the i-th class and P (s = i) its prior probability. Under the where p(X|s
6
assumption that all classes are a-priori equally likely, the MAP classifier reduces to the ML classifier: ~ = arg max p(X|s ~ = i) g ∗ (X) i
N 1 X = arg max log p(~xj |s = i) . i N
i.i.d.
(2)
j=1
When the number N of feature vectors is large, application of the Weak Law of Large Numbers [33] to (2) results in the following equation: Z ~ = arg min g (X) ∗
i
pq (~x) pq (~x) log d~x, pi (~x) {z } |
(3)
D(pq kpi )
where D(pq kpi ) denotes the Kullback-Leibler divergence or relative entropy between the two densities, pq (·) and pi (·).
The problem of retrieving the top M images similar to a given query image, can be formulated as a ~ = {x1 , . . . , xN }, multiple hypothesis problem. The query image Iq is represented by a feature data set, X
obtained after a transformation step, and each image in the database, Ii (i = 1, . . . , C), is assigned with a hypothesis Hi . Therefore, the problem of retrieving the top M images consists of selecting the M images ~ of the given query image. in the database that are closer in terms of best hypotheses to the data X
Under the assumption that all hypotheses are a-priori equally likely, the optimum rule resulting in the minimum probability of retrieval error, is to select the hypotheses with the highest likelihoods among the C . Thus, the top M matches correspond to the M hypotheses, Hi1 , Hi2 , . . . , HiM for which ~ i1 ) ≥ · · · ≥ p(X|H ~ iM ) ≥ p(X|H ~ i ), i ∈ p(X|H / {i1 , . . . , iM }.
A computationally efficient implementation of this setting is to adopt a parametric approach. Then, ~ i ) is modeled by a member of a family of PDFs, denoted by p(X; ~ θ i ), each conditional PDF, p(X|H
where θ i is a set of model parameters to be specified. In this framework, the extracted signature for the image Ii is the estimated model parameter θˆi , computed in the FE step. Then, implementation of (3) gives the optimal rule for retrieving the top M similar images to the given query image Iq : ~ θ q ) and the density p(X; ~ θ i ) associated with 1. Compute the KLDs between the query density p(X;
image Ii in the database, ∀ i = 1, . . . , C : Z ~ θ q )kp(X; ~ θ i )) = D(p(X;
p(x; θ q ) log
p(x; θ q ) dx . p(x; θ i )
2. Retrieve the M images corresponding to the M smallest values of the KLD.
(4)
7
The KLD in (4) can be computed using consistent estimators θˆq and θˆi , for the model parameters. The ML estimator is a consistent estimator [5] and for the query image it gives: ~ θ) . θˆq = arg max log p(X; θ
(5)
We can also apply a chain rule [33], in order to combine the KLDs from multiple data sets. This rule ~ Y ~ ) and q(X, ~ Y ~ ), where X, ~ Y ~ are assumed to be states that the KLD between two joint PDFs, p(X,
independent data sets, is given by ~ Y ~ )kq(X, ~ Y ~ )) = D(p(X)kq( ~ ~ + D(p(Y ~ )kq(Y ~ )). D(p(X, X))
(6)
III. S TATISTICAL M ODELING OF WAVELET S UBBAND C OEFFICIENTS VIA J OINT SUB -G AUSSIAN D ISTRIBUTIONS In this section, we introduce the family of multivariate sub-Gaussian distributions justifying this choice in terms of an accurate approximation of the marginal and joint densities of the transform coefficients. A. The family of multivariate sub-Gaussian distributions We first give the definition for the family of univariate symmetric alpha-stable (SαS ) distributions, before introducing the family of multivariate sub-Gaussian distributions. The SαS distribution is best defined by its characteristic function [34]: φ(t) = exp(ıδt − γ α |t|α ),
(7)
where α is the characteristic exponent, taking values 0 < α ≤ 2, δ (−∞ < δ < ∞) is the location parameter, and γ (γ > 0) is the dispersion of the distribution. The characteristic exponent is a shape parameter, which controls the “thickness” of the tails of the density function. The smaller the α, the heavier the tails of the SαS density function. The dispersion parameter determines the spread of the distribution around its location parameter, similar to the variance of the Gaussian. A SαS distribution is called standard if δ = 0 and γ = 1. The notation X ∼ fα (γ, δ) means that the random variable X follows a SαS distribution with parameters α, γ, δ . In general, no closed-form expressions exist for most SαS density and distribution functions. Two important special cases of SαS densities with closed-form expressions are the Gaussian (α = 2) and the Cauchy (α = 1). Unlike the Gaussian density which has exponential tails, stable densities have tails following an algebraic rate of decay (P (X > x) ∼ Cx−α , as x → ∞, where C is a constant depending
8
on the model parameters), hence random variables following SαS distributions with small α values are highly impulsive. An important characteristic of non-Gaussian SαS distributions is the non-existence of second-order moments. Instead, all moments of order p less than α do exist and are called the Fractional Lower Order Moments (FLOM’s). In particular, the FLOM’s of a SαS random variable X ∼ fα (γ, δ = 0), are given by [18]:
where
¡ ¢p E{|X|p } = C(p, α) · γ ,
0 < p < α,
(8)
³ ´ ³ ´ ¡ ¢ p+1 p p+1 ¡ ¢p 2 Γ 2 Γ − α Γ 1 − αp ¡ ¢ = C(p, α) = . √ ³ ´ cos π2 p Γ(1 − p) α π Γ − p2
(9)
The SαS model parameters (α, γ) can be estimated using the consistent Maximum Likelihood (ML) method described by Nolan [35], which gives reliable estimates and provides the tightest possible confidence intervals. Extending the SαS model to heavy-tailed random vectors leads to the multivariate sub-Gaussian SαS distribution1 [18]. ~ distributed as X ~ = A1/2 G ~ , where A is a positive Definition 1 Any vector X
α 2 -stable
random variable
~ = (G1 , G2 , . . . , Gn ) is a zero-mean Gaussian random vector, independent of A, with covariance and G ~. matrix R, is called a sub-Gaussian SαS random vector (in Rn ) with underlying Gaussian vector G
A multivariate sub-Gaussian distribution, with underlying covariance matrix R, is often denoted by α-SG(R). In this work, the transform coefficients at different subbands are tied up in vectors and are
assumed to be samples of an α-SG(R) distribution, which can be viewed as a variance mixture of Gaussian processes [36]. It is important to note that covariances do not exist for the family of SαS random variables, due to the lack of finite variance. Instead, we measure correlation between transform coefficients using a quantity called covariation [18], which plays an analogous role for SαS random variables to the one played by covariance for Gaussian random variables. Let X and Y be jointly SαS random variables with 1 < α ≤ 2, zero location parameters and dispersions γX and γY respectively. Then, for all 1 < p < α, 1
In the following, instead of saying sub-Gaussian SαS variable / vector / distribution, we simply use the term sub-Gaussian
variable / vector / distribution.
9
the covariation of X with Y is given by [X, Y ]α =
E{XY } α γY , E{|Y |p }
(10)
where for any real number z and a ≥ 0 we use the notation za, z>0 z
= 0, z=0 −(−z)a , z < 0. The covariation coefficient of X with Y , is defined by λXY =
E{XY } [X, Y ]α = . [Y, Y ]α E{|Y |p }
(11)
Note the asymmetric nature of the covariation and the covariation coefficient, as opposed to the usual second-order moments. ~ = A1/2 G ~ , where G ~ = (G1 , G2 , . . . , Gn ) the underlying Consider the sub-Gaussian random vector X ~ , [Xi , Xj ]α Gaussian vector with covariance matrix R. Then, the covariations between the components of X i, j = 1, . . . , n, are given by [18]: α
(α−2)
cij = [Xi , Xj ]α = 2− 2 [R]ij [R]jj 2 .
(12)
Note that cij = cji only if [R]ii = [R]jj . During the FE step, it is necessary to estimate the covariations from the transform coefficients of the images. In the next section, we describe how this estimation is performed. B. Estimation of covariations By applying (8) on Y we have
¡ ¢1/p E{|Y |p } γY = . C(p, α)
(13)
~ 1, X ~ 2, . . . , X ~ N } constitute a set of N independent realizations of an α-SG(R) Let the vectors {X ~ k = (X k , X k , . . . , X k ), k = 1, . . . , N . Now, observe that we can find an estimation distribution, where X n 1 2
of [X, Y ]α by multiplying an estimated value of λXY and the ML estimation of γY . The value of λXY is estimated via the Fractional Lower Order Moment (FLOM) Estimator [37], which is very simple and computationally efficient, in addition to being unbiased and consistent. For two jointly SαS random variables X, Y with α > 1, and a set of independent observations (X1 , Y1 ), . . . , (Xn , Yn ), the FLOM estimator is defined as follows: ˆ F LOM = λ
Pn
p−1 sign(Y ) i |Yi | i i=1 X P . n p i=1 |Yi |
(14)
10
~ is given by: Thus, the covariation estimator between the components of a sub-Gaussian vector X PN ~ k ~ k p−1 sign(X~jk ) α k=1 Xi |Xj | F LOM ˆcij = γXj , (15) PN |X~ k |p k=1
j
where the dispersion γXj can be estimated using the ML estimator, described in [35]. We define the covariation matrix C, the matrix having as elements the covariations. Then, the estimated ˆ is the matrix with elements [C ˆ ]ij = ˆcF LOM . Once these covariations are estimated covariation matrix C, ij
from the data, we can estimate the elements [R]ij of the underlying covariance matrix, R, using (12): ¡ ¢2 ˆ ]jj α , ˆ ]jj = 2 α2 [C [R
α
ˆ ]ij = 2 2 [R
ˆ ]ij [C (α−2)
ˆ] 2 [R jj
,
(16)
which are consistent and asymptotically normal, that is, the distribution of the above estimators tends to a normal distribution, as the number of observations N tends to infinity. Notice that the estimation of covariations and consequently the estimation of the covariation matrices, requires the specification of the parameter p. We compute the optimal p as a function of the characteristic exponent α, by finding the value of p that minimizes the standard deviation of the estimator, for different values of α > 1. For this purpose, we studied the influence of the parameter p on the performance of the covariation estimator given by (15) via Monte-Carlo simulations. We generated two real SαS (1 < α ≤ 2) random variables, X = a1 U + b1 V , Y = a2 U + b2 V , where U and V are independent, standard SαS random variables and {ai , bi , i = 1, 2} are real coefficients. The
true covariation of X with Y is [X, Y ]α = a1 a + b1 b . We generated N = 5000 independent 2 2 samples of U and V and calculated the covariation estimator by means of (15) for different values of p in the range (0, 2]. We ran K = 1000 Monte-Carlo simulations for different values of α ∈ (1, 2].
We randomly selected, without loss of generality, the coefficient values to be equal to a1 = 0.32, a2 = −2.45, b1 = −1.7, b2 = 0.44. Fig. 1 displays the standard deviation of the estimator ˆcF LOM (p) as a
function of the parameter p and for different values of α. Table I shows results on the performance of the estimator. We include the mean of the estimator, the standard deviation in parentheses and the value of p for which the smallest standard deviation is achieved by the estimator. We also note that we obtained similar experimental results for different values of the coefficients {ai , bi , i = 1, 2}. In our proposed CBIR system, we need to estimate the covariations between the components of the subGaussian vectors, which are special cases of SαS random variables. We repeated the above Monte-Carlo simulations using two sub-Gaussian random variables, X = A1/2 GX , Y = A1/2 GY . By definition, X and Y can be viewed as SαS random variables with dispersion γX and γY , respectively. We generate a sample
11
2/α , 1, 0) of a sub-Gaussian random variable by first generating a sample A drawn from a Sα/2 ((cos πα 4 )
distribution and then by generating a sample G drawn from a zero-mean Gaussian distribution with variance 2γ 2 , which is viewed as a S2 (γ, 0, 0) variable (with γ = γX or γ = γY depending on whether the Gaussian part G corresponds to the variable X or Y , respectively). Fig. 2 displays the curves representing the standard deviation of the F LOM covariation estimator as a function of p, for two values of α and 25 pairs of dispersions (γX , γY ), with the dispersions ranging in the interval (0, 3.5), which corresponds to the dispersions estimated from the wavelet subbands of some selected images used in our experiments (obtained from the USC, SIPI database2 , cf. Fig. 7). For each α, we observe that all the curves are minimized in a common interval on the p-axis, and actually the optimal values for p are close to each other. We repeated the procedure for α = 1 : 0.05 : 2 and for a given αi , we defined the optimal pi as the mean of the optimal p values of its corresponding 25 curves, corresponding to the 25 pairs of dispersions. In section V, we use the values, shown in Table II, for the optimal p as a function of α. This table is used as a lookup table in order to find the optimal p for every 1 < α ≤ 2 by linearly interpolating these values. C. Joint sub-Gaussian modeling of wavelet coefficients In this section, we justify the selection of the family of sub-Gaussian distributions as a statistical modeling tool for the wavelet coefficients of texture images and, as an example, we show results on modeling data obtained by applying standard 2-D, orthogonal, discrete wavelet transform (DWT) on real texture images. Similar results are obtained when using other types of wavelet transforms, such as a steerable wavelet transform, which is more convenient to achieve rotation invariance. The 2-D orthogonal DWT expands an image using a certain basis, whose elements are scaled and translated versions of a single prototype filter. In particular, the DWT decomposes images in dyadic scales, providing at each resolution level one low-pass subband approximation and three spatially oriented wavelet subbands. There are interesting properties of the wavelet transform [7] that justify its use in CBIR systems: Locality (image content is localized in both space and frequency), multiresolution (image is decomposed at a nested set of dyadic scales), and edge detection (wavelet filters operate as local edge detectors). Because of these properties, the wavelet transforms of real-world images tend to be sparse, resulting in a large number of small magnitude coefficients and a small number of large magnitude coefficients. In our modeling, we employ all the subbands except the low-pass residual, since it does 2
http://sipi.usc.edu/services/database
12
not present this sparsity behavior, but an average of the original image. Importantly, this property is in conflict with the Gaussian assumption, giving rise to peaky and heavy-tailed non-Gaussian marginal distributions of the wavelet subband coefficients, which leads us to use joint sub-Gaussian distributions. In our proposed retrieval scheme, we proceed by using a statistical model that captures both wavelet subband marginal distributions and inter-subband correlations. Various experimental results have shown the importance of the cross-correlation of each subband with other orientations at the same decomposition level in characterizing the texture information [38]. Our joint modeling is performed by tying up the wavelet coefficients at the same or adjacent spatial locations, levels and subbands, to form a sub-Gaussian vector. This modeling of the subband coefficients preserves the heavy-tailed behavior of their marginal distributions. Notice that the components of a subGaussian vector are highly dependent, as illustrated in [18], and this makes the joint sub-Gaussian model appropriate for capturing the cross-dependencies between different subbands, since around features, such as edges and lines, the wavelet coefficients at all subbands are dependent in the sense that they have high probability of being significant. Next, we assess the effectiveness of a SαS density function for the approximation of the empirical density of the subband coefficients, near the mode and on the tails. In our data modeling, the statistical fitting proceeds in two steps: first, we assess whether the data deviate from the normal distribution and we determine if they have heavy tails by employing normal probability plots [39]. Then, we check if the data is in the stable domain of attraction by estimating the characteristic exponent α directly from the data and by providing the related confidence intervals. As a further stability diagnostics, we employ the amplitude probability density (APD) curves (P (|X| > x)) that give a good indication of whether the SαS fit matches the data near the mode and on the tails of the distribution.
Fig. 3 compares the SαS and GGD fits for a selected subband of a certain image. Clearly, the SαS density is superior to the GGD, following more closely both the mode and the tail of the empirical APD, than the exponentially decaying GGD. Table III shows the ML estimates of the characteristic exponent α together with the corresponding 95% confidence intervals, for a set of 10 textures (real-world 512 × 512 natural scene images) obtained from the MIT Vision Texture (VisTex) database, decomposed in 3-levels using Daubechies’ 4 (’db4’) filters [40]. It can be observed that the confidence intervals depend on the decomposition level. In particular, they become wider as the level increases since the number of samples used for estimating the SαS parameters gets smaller because of the subsampling that takes place between scales. This table also demonstrates that the coefficients of different subbands and decomposition levels exhibit various degrees of non-Gaussianity, with values of α varying between 0.9 (close to Cauchy) and
13
2 (close to Gaussian).
IV. ROTATION -I NVARIANT CBIR WITH G AUSSIANIZED S TEERABLE P YRAMIDS The property of rotation invariance is very desirable in a texture retrieval system. An important problem with the standard wavelet transform is that it lacks the translation and rotation invariant properties. This results in a mismatch of the retrieval process when the image orientation varies. In fact, the wavelet coefficients of the rotated image will be completely different, in the sense that they will not be simply rotated versions of the wavelet coefficients of its original version. A way to overcome this problem is to replace the standard wavelet transform with a steerable pyramid [41], [42], which is a linear multi-scale, multi-orientation image decomposition produced by a set of orientation filters, generated by a set of basis functions (directional derivative operators). Steerable pyramids are overcomplete and possess the desired properties of rotation invariance and (approximate) translation invariance. In this section, we design a rotation-invariant CBIR technique, which is based on the joint sub-Gaussian modeling of a steerable pyramid coefficients, incorporating dependence across space, angles and scales. In particular, we construct a steerable model, relating the fractional lower-order statistics of a rotated image with that of its original version, and then apply a Gaussianization process on the steerable model by employing the local statistical behavior of the coefficients, which are grouped into appropriate spatial neighborhoods. The similarity measurement between two images is performed by deriving a rotationinvariant similarity function, which effectively performs angular alignment between the images. A. Steerability of the pyramid subband coefficients In the case of a database containing images along with rotated versions of them, we are interested in finding features which are as “steerable” as possible, that is, given the features of an image oriented at an angle φ, we should be able to obtain the features corresponding to the same image rotated at an angle θ, without having to re-extract the features from the rotated image3 .
Let cl (xk , φ) represent the value of a transform coefficient at a spatial location xk (k = 1, . . . , N ), orientation φ and level l (l = 1, . . . , L). In a steerable pyramid with J basic orientations (subbands), at each level l, given the J basic coefficients Xkl = [cl (xk , φ1 ), cl (xk , φ2 ), ..., cl (xk , φJ )]T , the transform 3
Through the next sections, we consider counter-clockwise rotation.
14
coefficient cl (xk , φ) for any angle φ is given by [29]: l
c (xk , φ) =
J X
fi (φ)cl (xk , φi )
∀ φ,
l = 1, ..., L
(17)
i=1
where {f1 (φ), f2 (φ), ..., fJ (φ)} is the set of J steering functions. Let Σl and Σlθ denote the sampled correlation matrices, with elements given by the correlations between pairs of subbands (at a given decomposition level l) of the original image I and its rotated version Iθ , respectively. The following proposition [29] establishes the relation between Σl and Σlθ . Proposition 1 ([29]) The matrices Σlθ and Σl are related as follows: Σlθ = F(θ)Σl FT (θ),
where
(18)
f1 (φ1 − θ)
f2 (φ1 − θ) · · ·
f1 (φ2 − θ) f2 (φ2 − θ) · · · F(θ) = .. .. .. . . . f1 (φJ − θ) f2 (φJ − θ) · · ·
fJ (φ1 − θ) fJ (φ2 − θ) .. .
.
(19)
fJ (φJ − θ)
Proof: The proof of Proposition 1 follows easily by direct computation and making use of the properties of the steering functions {fi (θ)}. It can be easily shown that the above proposition holds if Σl and Σlθ are the covariance matrices, with elements the covariances between pairs of subbands at the l-th decomposition level of the images I and Iθ , respectively.
In our work, the J basic angles are taken to be equispaced, which makes F(θ) an orthogonal matrix for any θ, i.e., FT (θ) = F−1 (θ) (= F(−θ)), and thus, in this case, Σl and Σlθ become orthogonally equivalent. Under a joint sub-Gaussian assumption, the coefficients of the J basic orientations (subbands) at a given level l are modeled as joint sub-Gaussian vectors α-SG(Rl ), with Rl denoting the underlying covariance matrix corresponding to the subbands at the lth-level. In particular, the elements of Rl are the ~ . Let Rl denote the underlying covariance covariances between the components of the Gaussian part, G θ
matrix corresponding to the l-th level subbands of the rotated image Iθ . Then, it is straightforward to verify that Proposition 1 holds by replacing Σlθ and Σl with Rlθ and Rl , respectively. The pyramid coefficients at a given subband are assumed to follow a sub-Gaussian marginal distribution. So, the coefficients corresponding to the basic orientation φi at level l can be expressed as: √ cl (xk , φi ) = A clG (xk , φi ), i = 1, ..., J,
(20)
15
where clG (xk , φi ) is the Gaussian part of the α-SG(Rl ) vector. From (17), the transform coefficient at level l, at any angle φ is: cl (xk , φ) =
J X
¡√ ¢ fi (φ) A clG (xk , φi )
i=1 J √ √ X = A fi (φ)clG (xk , φi ) = A clG (xk , φ).
(21)
i=1
Notice that (21) shows that the pyramid subband coefficients of a rotated image at an angle φ, are also sub-Gaussian random variables with the same characteristic exponent as that one of the corresponding subbands of the original (non-rotated) image, and with a Gaussian part which is the rotated version of the original Gaussian part at the same angle φ. Therefore, it can be seen that if one is able to estimate √ accurately the multiplier A, it would be possible to normalize the coefficients cl (xk , φ) dividing them √ by A, and work with the Gaussianized coefficients clG (xk , φ). This is convenient because, as it will become clearer later, it is easier to use appropriate and simple (analytical) similarity functions with the Gaussianized coefficients. In order to accurately estimate the multiplier A, we consider dependence across orientations, scales and space, which results in an improved statistical model for natural images. We achieve this by defining an appropriate neighborhood for each coefficient, which is then modeled as a sub-Gaussian random vector. This joint sub-Gaussian modeling is followed by a Gaussianization procedure, which results in a steerable pyramid whose coefficients are jointly Gaussian (Gaussianized steerable pyramid). There are several reasons that justify the Gaussianization step: a) the normalized transform domain can be well modeled statistically, using only second-order covariances between pairs of subbands, b) the similarity measurement can be performed using an analytical expression for the KLD between two multivariate Gaussian distributions, avoiding computationally complex methods, such as the Monte-Carlo method, c) the normalized pyramid allows to perform easily steerability in the feature space. B. Variance-adaptive local modeling using multivariate sub-Gaussian distributions The dependencies between the coefficients forming a certain neighborhood, including in general coefficients located at a small spatial region and at different orientations and scales, can be modeled using a homogeneous random field with a spatially changing variance. This requirement can be realized by
16
modulating the vector of coefficients constituting the neighborhood (node of the field) with a hidden scaling random variable (multiplier), as follows: d ~ = X
√ ~ AG
(22)
d ~ is a zero-mean Gaussian random vector and A a positive scalar variable independent of G ~ (= where G
~ that can be written like this, is said to follow a Gaussian Scale denotes equality in distribution). A vector X
Mixture (GSM) distribution [43]. Notice that when the multiplier A is drawn from a SαS distribution, this is exactly the case of a multivariate α-sub-Gaussian model. Two basic assumptions are made in order to reduce the dimensionality of these models: (i) the probability structure is defined locally. In particular, the probability density of a coefficient when conditioned on the rest of neighbors, is independent of the coefficients outside the neighborhood, (ii) all such neighborhoods obey the same distribution (spatial homogeneity). The construction of a global probabilistic model for images, based on these local descriptions, needs the specification of a neighborhood structure for each subband coefficient, and the distribution of the multipliers, which we have already specified that it is a member of the family of SαS distributions. We extract the interdependencies between coefficients at different orientations, levels and spatial positions, ~ l, i denote a generic P -dimensional neighborhood by utilizing their joint α-sub-Gaussian statistics: Let X k
of the coefficient cl (xk , φi ) at the spatial position xk (k = 1, . . . , K ), orientation φi (i = 1, . . . , J ) and level l (l = 1, . . . , L). This neighborhood is supposed to be drawn of an α-SG(Rl ) random vector. C. Gaussianization of the multivariate sub-Gaussian model An important property of a GSM model is that the probability density of a P -dimensional GSM vector ~ is Gaussian when conditioned on A. Combining this property with (22), it is clear that the normalized X √ ~ ~ conditioned on A is vector X/ A follows a joint Gaussian distribution. The probability density of X
given by: ~ p(X|A) =
~ T (AR)−1 X/2) ~ exp(−X . (2π)P/2 |AR|1/2
(23)
From (23), it can be seen that the ML estimator for the multiplier A is ~ T −1 ~ ˆ X) ~ = X R X, A( P
(24)
~ to emphasize the assumption of locality. where the estimator is explicitly written as a function of X
This simplifies the computational procedure for the Gaussianization of the steerable pyramid subband
17
coefficients, as we assume that the multipliers associated with different neighborhoods are estimated independently, even though the neighborhoods are overlapping. In our implementation, we estimate, as explained in section III-B, the underlying covariance matrix Rl, i , corresponding to the basic orientation φi at the lth-level, by employing the neighborhoods of all
coefficients (or a subset of them, which is computationally efficient, at the cost of a reduced estimation ~ l, i , k = 1, . . . , N ). This procedure has the advantage of resulting in a accuracy) at the given orientation (X k
computationally efficient way to estimate the hidden multiplier A and normalize the subband coefficients. Also, our technique avoids the use of a Gaussian Mixture Model (GMM), as in other approaches [25], which requires complicated Expectation-Maximization algorithms to estimate the multipliers, nested in a Markovian manner. We must also note that the multipliers in [25] are discrete, whereas in our model they vary in a continuous fashion. Summarizing, the steps of our Gaussianization method are: 1. Decompose the given image into L levels and J orientations per level, via a steerable pyramid. 2. For each decomposition level l, l = 1, . . . , L: For each orientation φi , i = 1, . . . , J , at the l-th level: i) Estimate the covariance matrix Rl, i using (16). ii) For each coefficient cl (xk , φi ), k = 1, . . . , K : •
~ l, i . Construct the corresponding neighborhood X k
•
~ l, i ) using (24). Estimate the multiplier Aˆl,k i (X k
•
q Compute the normalized coefficient c˜l (xk , φi ) = cl (xk , φi )/ Aˆl,k i .
From (24), it is obvious that the estimation accuracy for the multiplier depends on the accurate estimation of the underlying covariance matrix Rl, i and the neighborhood structure. D. Computation of inter-level covariations The multiplier estimation, as well as the construction of an image signature which we describe later on, may require the involvement of coefficients or the computation of covariations between subbands at different levels. Using the standard pyramid decomposition, we move from level l to the next coarser level (l +1) by subsampling the output of a low-pass filter. As a result, the subbands at the (l +1)-th level are 1/4 in size than those of the l-th level (since we are dealing with images), which is undesired since the covariation estimation includes summations between vectors of equal length. Subsampling is good for compression and for saving memory and complexity when performing the decomposition. However,
18
subsampling introduces some aliasing [41], which is not good for extracting features. There are two ways to avoid this: a) Instead of subsampling the output of the filters, we can upsample the filters and perform the filtering without subsampling their outputs. In addition to avoiding aliasing, it allows us also to keep the same number of coefficients across scales, which favors the computation of covariation matrices across scales. b) Using Fourier domain filtering, simply by multiplying the DFT of the image with the DFT of the filter and then taking the inverse DFT to obtain the subband coefficients. We follow the frequencydomain approach since, although both implementations should give approximately the same values, the frequency-domain implementation is more exact as it does not suffer from the finite-length constraint that is imposed on the filters for the convolution. Besides, the estimation of covariation assumes two jointly sub-Gaussian random variables, i.e., with equal characteristic exponent values. The values of α, estimated from the different orientation subbands using a steerable pyramid without subsampling, are close to each other but not equal. This problem is overcome by making use of the asymmetry in the definition of the covariation: •
from (15), we observe that the free p parameter affects the second variable (as an exponent),
•
the intuitive idea is to use the estimated characteristic exponent, α, corresponding to the second variable (subband) in order to estimate the covariation: for instance, in order to estimate [X, Y ]α we first estimate α from Y and then assume that X follows a distribution with the same α, while in order to estimate [Y, X]α we estimate α from X and then assume that Y also follows a distribution with that α. This procedure exploits the differences between the subbands, regarding their distribution.
We also use this approach for the estimation of the covariations between subbands at the same decomposition level. E. Neighborhood construction There is a trade-off between the computational complexity and the neighborhood size. This can be seen from (24), where the estimated stable multiplier depends on the inverse of the underlying covariance matrix R. The computational complexity increases as the neighborhood size P increases, since the complexity to estimate R and to calculate its inverse, R−1 , depends on its dimension (P × P ). It is clear that it is not computationally feasible to construct all possible neighborhoods for each subband coefficient in order to select the optimal neighborhood, because of the large amount of combinations. In this section, we examine the performance of our Gaussianization procedure with respect to different neighborhoods, taking also into consideration the computational limitations. For this purpose, we
19
implement the Gaussianization process using the neighborhoods shown in Table IV. For a given subband coefficient cl (xk , φi ), the formation of some of the above neighborhoods requires the inclusion of coefficients at the same spatial location of the subbands but at the next coarser scale. In order to associate coefficients between adjacent levels, we use the frequency-domain implementation of the steerable pyramid, without subsampling the output of the filters, as described in section IV-D. In this case, it results in subbands with equal size at all decomposition levels. Then, the coefficient at the corresponding spatial location of the next coarser subband is simply c(l+1) (xk , φi ). We tested the performance of our Gaussianization procedure with respect to the neighborhood structure, by implementing it on a set of 5, randomly selected textures of size 512 × 512 from the Brodatz database with the following code numbers: 1. 1.1.01, 2. 1.1.04, 3. 1.1.08, 4. 1.2.08, 5. 1.5.04, along with their rotations at 30, 60, 90 and 120 degrees. We applied a 3-level pyramid decomposition with 4 orientations per level, thus obtaining a total of 300 subbands. After the Gaussianization, at each subband, we calculated the relative entropy (∆H ) between the histogram (with 256 bins) and the Gaussian PDF fitting the normalized subband coefficients, as a fraction of the histogram entropy (H ): ¡ h(xk ) ¢ P256 ∆H k=1 h(xk ) log g(xk ) = P256 , H − k=1 h(xk ) log(h(xk ))
(25)
where h(xk ) is the probability density of the center xk of the k -th bin, as estimated from the histogram, and g(xk ) is the corresponding value of the fitting Gaussian PDF with parameters estimated from the normalized coefficients. The best choice for the neighborhood structure corresponds to the smallest fraction ∆H/H . Fig. 4 displays the histogram of the neighborhood indices (cf. Table IV), for the 300 subbands. The vertical axis is the relative frequency of each neighborhood shape (horizontal axis), whose selection resulted in the smallest fraction (25) for the above subbands. For the given set of textures, the choice of the 10-th neighborhood shape results in the best Gaussianization performance for most of the pyramid subbands. Notice that for a given coefficient cl (xk , φi ), this neighborhood contains coefficients from the same ~ used in the estimation of the multiplier Aˆ (cf. (24)), exploits only subband only. Thus, the vector X
intra-subband dependencies. Gaussianization of pyramid subband coefficients at a given orientation and ~k , containing their 5 × 5 neighborhood at level is performed by forming the corresponding vectors X
the same orientation and level. Then, by inserting these vectors in (15) and combining with (16) the underlying covariance matrix, R, is estimated for this set of neighborhoods. Finally, for each coefficient ~k and at that subband the corresponding multiplier Aˆk is estimated by substituting its associated vector X
20
matrix R in (24). F. Feature Extraction After the normalization procedure, the marginal and joint statistics of the coefficients at adjacent spatial positions, orientations and levels are close to the Gaussian distribution. Then, to extract the features, we simply compute the J × J covariance matrix at each decomposition level. Thus, for a given image I , decomposed in L levels, a possible signature S G is given by the set of the L covariance matrices: I 7→ S G = {Σ1I , Σ2I , . . . , ΣL I },
(26)
where ΣlI is the covariance matrix of the l-th decomposition level. Due to the symmetric property of the covariance matrix, the total size of the above signature equals: size(S G ) =
J(J+1) 2
· L. The signature S G
contains only the across orientation, second-order dependence at a given decomposition level. Because of the strong dependence across scales and orientations, we may obtain a better (more complete) signature by considering in addition the across levels dependence as expressed by the covariance matrices between consecutive levels. In this case, the signature of an image I is the following: (L−1)→L
1→2 I 7→ SEG = {Σ1I , . . . , ΣL , . . . , ΣI I , ΣI l→(l+1)
where ΣI
},
(27)
denotes the (in general asymmetric) covariance matrix corresponding to the subbands at l→(l+1)
levels l and (l + 1). In particular, the element [ΣI
]ij is equal to the covariance of the i-th subband
at level l with the j -th subband at the next lower level (l + 1). The enhanced signature SEG contains more texture-specific information than the signature S G at the cost of an increased computational complexity, since its size equals size(SEG ) =
J(J + 1) J(J + 1) ·L+ · (L − 1) . 2 2
Notice that, although neighborhood 10, which does not include dependencies across levels, was best for Gaussianization, regarding the design of a retrieval scheme, the dependence across orientations and levels is very useful in extracting a more accurate profile of the texture information. G. Similarity Measurement In recent work [29], a Gaussian assumption for the marginal and joint distributions of the steerable pyramid coefficients results in a deterministic rotation-invariant similarity measure, between two images
21
I and Q. If we use the signature SEG , this measure takes the following form: " L X D(I, Q) = min kΣlI − F(−θ)ΣlQ FT (−θ)k + θ
+
L−1 X
l=1
#
kΣl→l+1 − F(−θ)Σl→l+1 FT (−θ)k , I Q
(28)
l=1
where k · k denotes any of the common matrix norms (however, a good choice is the Frobenius norm, which gives an indication of the “matrix amplitude”). If the texture information is represented by the signature S G , the similarity function is modified by omitting the second sum, which corresponds to inter-level dependencies. In our method, we construct and employ a novel statistical rotation-invariant similarity function. After the Gaussianization procedure has been applied, we model the distribution of each decomposition level in the case of using S G , as well as the joint distribution between consecutive levels in the case of using SEG , using a multivariate Gaussian density (MvGD). The similarity between two images is measured by
employing the KLD between MvGDs. Consider the case in which the texture information of each image is expressed using the signature SEG , that is, each decomposition level, as well as each pair of adjacent decomposition levels, is associated with a covariance matrix. Given two images I and Q, let I l , Ql be the set of orientation subbands at the l-th decomposition level and I l, l+1 , Ql, l+1 be the set of orientation subbands at two adjacent levels l and l + 1, following zeromean MvGDs with covariance matrices ΣlI , ΣlQ , and Σl→l+1 , Σl→l+1 , respectively. The KLD between I Q two corresponding levels is given by [5]: ´ 1³ D(I l kQl ) = tr(ΣlI (ΣlQ )−1 − I) − ln |ΣlI (ΣlQ )−1 | . 2
(29)
In the same way, the KLD between two corresponding pairs of adjacent levels is given by: 1³ D(I l, l+1 kQl, l+1 ) = tr(Σl→l+1 (Σl→l+1 )−1 − I) − I Q 2 ´ − ln |Σl→l+1 (Σl→l+1 )−1 | . I Q
(30)
We define the overall KLD between images I , Q to be equal to the following sum: D(IkQ) =
L X l=1
D(I l kQl ) +
L−1 X
D(I l→l+1 kQl→l+1 ).
(31)
l=1
In our problem, we deal with image databases which may contain rotated versions of a given image. Notice that from (18), it follows that: FT (θ), ΣlQθ = F(θ)ΣlQ FT (θ) , Σl→l+1 = F(θ)Σl→l+1 Qθ Q
(32)
22
where ΣlQθ is the l-th level covariance matrix and Σl→l+1 is the covariance matrix between levels l and Qθ l + 1, of a rotated version of image Q at an angle θ, Qθ . ˜ = Qφ to be a counter-clockwise rotation, by an angle φ, of Consider I to be the query image and Q
the original image Q in the database. In a real application, of course, the value of φ is unknown. Thus, ˜ (I l and Q ˜ l , respectively) is defined as the minimum KLD the distance between the l-th levels of I and Q ˜ l , where the minimization is over a set of possible rotations Θ, and thus, it is necessary between I l and Q −θ ˜ −θ means a clockwise rotation to perform an angular alignment by finding the optimum θ∗ . The notation Q ˜ . By noticing that Σl of image Q ˜ Q
−θ
= F(−θ)ΣlQ˜ FT (−θ) and substituting (32) into (29), we obtain that
˜ by an angle θ (I l the KLD between the l-th level of an image I and a clockwise rotation of image Q ˜ l , respectively), is given by: and Q −θ ˜l ) = D(I l kQ −θ
¢ 1h ¡ l T tr ΣI F (θ)(ΣlQ˜ )−1 F(θ) − I − 2 i
− ln(|ΣlI ||(ΣlQ˜ )−1 |) .
(33)
˜ l→l+1 ), Similarly, we obtain the KLD between two corresponding pairs of adjacent levels, D(I l→l+1 kQ −θ
by replacing the covariance matrices ΣlI , ΣlQ˜ with the covariance matrices Σl→l+1 , Σl→l+1 , respectively. ˜ I Q ˜ is defined as: Finally, the overall KLD between I and Q ! Ã L L−1 X X ˜ l→l+1 ) , ˜l ) + ˜ = min D(I l→l+1 kQ D(I l kQ D(IkQ) θ∈Θ
−θ
−θ
(34)
l=1
l=1
which results in the following proposition: ˜ be the signatures corresponding to the normalized coefficients of Proposition 2 Let SEG (I) and SEG (Q) ˜ , respectively. The rotation-invariant the steerable pyramids for two given homogeneous textures I and Q
KLD between the two textures takes the following form: ˜ D = D(IkQ) " L ¢ 1 X ¡ l T = min tr ΣI F (θ)(ΣlQ˜ )−1 F(θ) + θ∈Θ 2 l=1
# ¢ ¡ l→l+1 T 1 )−1 F(θ) − J(L − ) − + tr ΣI F (θ)(Σl→l+1 ˜ Q 2 l=1 # " L L−1 X 1 X )−1 |) . (Σl→l+1 − ln(|ΣlI (ΣlQ˜ )−1 |) + ln(|Σl→l+1 ˜ I Q 2 L−1 X
l=1
l=1
(35)
23
Proof: The proof of Proposition 2 follows by direct computation, after the substitution of (33) in (34). If we employ the signature S G instead of SEG , the derivation of the KLD between two distinct textures ˜ is straightforward by implementing the above proposition, omitting the terms which contain the I and Q
covariance matrices Σl→l+1 , Σl→l+1 and replacing (L − 12 ) with ˜ I Q
L 2.
˜ are two rotated versions of the same image, the angle θ∗ for which the Notice that when I and Q ˜ , that is, the minimum is achieved in (35) should be close to the exact relative angle between I and Q ˜ . Thus, a way to evaluate the performance of angle one needs to rotate clockwise I in order to get Q
the above rotation-invariant KLD, is to verify whether the estimated angle θ∗ is actually close to the real relative angle between two physically rotated versions of the same image. Besides, it may also be useful on its own for many practical applications to find out approximately this relative angle. Fig. 5(b) illustrates this by showing the function D(IkQ)(θ) given by (35) for the Bark texture sample obtained from the Brodatz database. It is also important to note that, in our implementation, the steering functions have only odd harmonics, which oscillate at some finite speed. Thus, the number of local minima of (35), as a function of θ, can be at most equal to twice the number of independent harmonics (which happens to be equal to the number of basic harmonics). In addition, the distance between any two consecutive local minima is lower bounded making it possible to search for them in a few non-overlapping angular intervals [44], which is useful in order to speed up the search for the optimal angle θ∗ . V. E XPERIMENTAL R ESULTS In this section, we evaluate the efficiency of our overall CBIR system and compare it with the performance of the method presented in [25], which can be considered as a representative for the rotationinvariant texture retrieval schemes based on wavelet-domain HMMs. In order to evaluate the retrieval effectiveness of our CBIR system and perform the comparison with the HMM-based method, we apply the same experimental setup as in [25]. In particular, the image database consists of 13, 512 × 512 Brodatz texture images obtained from USC SIPI database (cf. Fig. 7). Each of them was physically rotated at 30, 60 and 120 degrees, before being digitized. Then, the texture image dataset is formed by taking 4 non-overlapping 128 × 128 subimages each from the original images at 0, 30, 60 and 120 degrees. Thus, the dataset used in the retrieval experiments contains 208 images that
come from 13 texture classes. We implemented a 3-level steerable pyramid decomposition with J = 2 basic orientations, φ1 = 0, φ2 = π/2, which means that the steering functions are [42]:
24
f1 (θ) = cos(θ) , f2 (θ) = f1
³π 2
´ − θ = sin(θ)
The histogram of the estimated characteristic exponent values for the 208 textures is shown in Fig. 6. We observe that only 28% of the textures exhibit Gaussian statistics. In the following illustration, the query is anyone of the non-overlapping 128 × 128 subimages in the dataset. The relevant images for each query are defined as the other 15 subimages from the same original 512 × 512 image. The retrieval performance is evaluated in terms of the percentage of relevant images
among the top 15 retrieved images. We evaluate the performance of the retrieval scheme, which employs the signatures S G and SEG as the set of extracted features, containing intra- and inter-scale dependencies and the rotation-invariant KLD as the similarity measure. We compare the performance of this retrieval scheme with that obtained by minimizing the Frobenius norm of the differences between the corresponding covariance matrices ([29]) after the Gaussianization procedure, as well as with the performance obtained by minimizing the corresponding Frobenius norm between sample correlation matrices (Gaussian assumption) without applying the Gaussianization step. Finally, we also perform a comparison between the retrieval efficiency of our proposed method and that of the HMM-based retrieval scheme ([25]). Table V shows the performance, in average percentages of retrieving relevant images in the top 15 matches, of our CBIR system and of the two methods that employ the Frobenius norm during the SM step and make a Gaussian and non-Gaussian assumption for the marginal and joint statistics of the pyramid subband coefficients, respectively. Comparing the average retrieval rates corresponding to the first two methods of the table, we conclude that the fractional lower-order statistics provide better approximations of the joint statistics between coefficients at adjacent orientations and scales, than the second order moments. Of course, both methods employ the covariance matrices between pairs of subbands, but in the first scheme (Non-Gaussianized & Frobenius) the sample covariances are estimated using the raw subband coefficients without Gaussianization, while in the second scheme (S G (or SEG ) & Frobenius) the covariances are estimated after the implementation of the Gaussianization procedure, which exploits lower-order moments, since the estimation is based on covariations. The comparison between the retrieval rates of the second and third methods, verifies the fact that a statistical similarity function (KLD) is preferable than a deterministic one (Frobenius norm), for the same set of extracted features. Notice that, as we have already mentioned, the efficiency and the computational cost of the Gaussian-
25
ization process strongly depend on the choice of the neighborhood shape. In particular, a neighborhood with a sufficient number of coefficients from different orientations and scales is required to achieve an increased Gaussianization performance, resulting in an increased computational complexity. However, the observation of the retrieval rates in Table V reveals that the rotation-invariant KLD still preserves a high retrieval performance even when the neighborhood size is small and thus the Gaussianization is not so accurate, in contrast with the performance of the rotation-invariant Frobenius distance, which decreases in the case of a small neighborhood such as the one corresponding to the neighborhood index 1. Thus, in the case of a CBIR system with limited computational power, we can use a small neighborhood at the cost of a reduced Gaussianization performance, while at the same time maintaining an increased retrieval efficiency by employing the rotation-invariant KLD instead of the Frobenius distance. Besides, as we note in the following section, the computational complexity of the two similarity measures is approximately the same, supporting the choice of the rotation-invariant KLD as the most appropriate function for the implementation of the SM step. As shown in Table V, the use of the 10-th neighborhood type combined with the enhanced signature SEG ,
results in an improved retrieval performance with respect to the other combinations. This is consistent
with the above analysis, since the 10-th neighborhood type results in the best Gaussianization performance for most of the 300 constructed subbands and since the enhanced signature contains more texture-specific information than the S G signature. We can also observe that by choosing the 10-th neighborhood type, the average retrieval rate using the rotation-invariant Frobenius norm (28) is very close to the rate corresponding to the rotation-invariant KLD. This is due to the improved Gaussianization performance, compared with that corresponding to the first neighborhood type, which results in an increased performance of the Frobenius norm that is best suited for Gaussian distributions. However, the statistical similarity function (KLD) remains superior than the deterministic Frobenius norm. Fig. 8 shows the average percentages of correct retrieval rates for each one of the 13 texture classes, for our CBIR scheme using the enhanced signature SEG and the HMM-based method. It is clear that the proposed method results in a superior retrieval performance for the majority of the texture classes, compared with the performance of the HMM-based method. Notice that there are three classes (4, 5 and 9) for which the implementation of the HMM-based method gives a higher retrieval rate, than the rate corresponding to our method. The reason for this behavior is that these three classes are exactly those with the greatest portion of characteristic exponent values near or equal to 2. This means that the Gaussian assumption, made by the HMM-based method, is more appropriate in describing the statistical dependencies between the subband coefficients.
26
A. Computational complexity In the following, we compare the computational complexity of our CBIR method with the one based on HMMs [25], as well as the method that makes a Gaussian assumption for the distribution of the subband coefficients and uses the Frobenius norm as the similarity function [29]. For this purpose, let M × M denote the dimension of the original image, which is decomposed via an L-level steerable pyramid with J basic orientations (without subsampling when moving from level l to the next coarser level l + 1) and P to be equal to the size of the neighborhood used in the Gaussianization process. In addition, let S
denote the number of matrices contained in the selected signature, which varies depending on whether we exploit only intra-level dependencies or both intra- and inter-level dependencies. In particular, our method consists of three main steps, namely i) the Gaussianization process, ii) the Feature Extraction (FE) and iii) the Similarity Measurement (SM), while the other two methods consist only of the FE and SM steps. Regarding our CBIR scheme, the computational cost of the Gaussianization step is approximately equal to S J [(M 2 /4)(8P 2 + P + 1) + O(P 3 ) + 2P 2 + P ], where the constant in the O(·) notation is in the order of 3 or 4 depending on the size of the signature. The cost of the FE step,
which consists of the direct computation of covariance matrices, is roughly equal to S J M 2 . Finally, the similarity measurement is performed by minimizing the rotation-invariant version of the KLD presented in Proposition 2. As it was mentioned in section IV-G, in the case of J = 2 orientations, the number of local minima in (35) is at most equal to 4. In our implementation, we divided the [0, π] interval into 4 non-overlapping sub-intervals of equal length. Due to the high smoothness of the function inside the min operator, with respect to θ, we applied the Newton’s method for the minimization of that function
in each interval, using the middle of the interval as an initial choice. The algorithm converged in at most 5 steps in each interval, while the cost of each step is in the order of L O(J 2 + J) where the constant
factor is less than 4 and its value depends on whether we employ the standard (S G ) or the enhanced (SEG ) signature. Thus, the computational complexity of the SM step is quite low, while the main cost of our method is due to the Gaussianization step. Regarding the third method (Gaussian assumption & Frobenius norm), the computational complexity of the FE step is equal to that of our method, since it also consists of the computation of covariances. Besides, we applied the Newton’s method for the minimization of (28) during the SM step, resulting in approximately the same number of operations as the KLD. However, as it was mentioned above, the choice of the KLD favours an increased retrieval performance even when the selected neighborhood does not result in a good Gaussianization of the subband coefficients.
27
Although the computational complexity of the HMM-based method is difficult to be estimated due to the iterative algorithms it employs, we try to give a rough approximation of it in order to compare with our method. The FE step of the HMM-based method consists of estimating (2L − 1) hidden states using the Expectation Maximization (EM) algorithm and (2 · L · J) eigenvalues corresponding to the covariance matrices of the Gaussian densities used in the model. Here J is the number of wavelet tree levels. The Expectation step of the EM algorithm is more difficult due to the increased interplay between the states. Besides, for an HMM, the complexity of each iteration of EM is linear in the number of observations, that is, the subband coefficients, and this linearity may involve a large multiplicative constant depending on the number of hidden states and the number of iterations required to converge. Thus, it is quite clear that the complexity of the EM algorithm is at least as much as that of the Gaussianization step in our method, while the complexity for estimating the covariance matrices and calculating the corresponding eigenvalues is almost equal to the cost for estimating only the covariance matrices in our method. Finally, since there is no closed form expression for the KLD between HMMs, the method presented in [25] employed Monte-Carlo simulations for computing the integral in the KLD, which, for the same dataset of textures, consists of about 64 iterations. Instead, the smoothness of the rotation-invariant KLD of our CBIR system guarantees a much lower computational cost during the SM step, as we mentioned above. VI. C ONCLUSIONS AND F UTURE WORK In this paper, we studied the design of a rotation-invariant CBIR system based on a multivariate sub-Gaussian (SαS ) modeling of the coefficients of a steerable wavelet decomposition. We exploited the variance adaptation of the coefficients in small regions at different orientation subbands and levels, by applying a Gaussianization procedure. Then, the FE step consists of simply estimating second-order moments between orientation subbands at the same and at adjacent levels. This process takes also into account the actual heavy-tailed behavior of the coefficients, represented by the fractional lower-order statistics (covariations) between pairs of subbands. We achieve rotation invariance by constructing an appropriate rotation-invariant version of the KLD between zero-mean multivariate Gaussian densities. The experimental results showed an increased average retrieval performance in comparison with the performance of previous methods based on second-order statistics estimated directly from the original subband coefficients, without implementing the Gaussianization. We also conclude that a statistical similarity function, such as KLD, is preferable than the deterministic Frobenius norm. Future research directions, which could further result in an improved retrieval system with decreased probability of retrieval error, are the following: first of all, the main assumption throughout the present
28
work was the stationary behavior of texture content. That is, we assumed that the distribution of the subband coefficients, which is closely related with the texture-specific information, is invariable within each subband. Instead, we could consider a non-stationary approach by permitting a locally adapted distribution, that is, by spatially adapting the characteristic exponent and the dispersion parameters. This can also be used in segmentation applications, since the different “objects” contained in a picture can be viewed as local intensity variations. Regarding the task of similarity measurement between two distinct images, we weighted the contribution of inter-level dependencies in the same way as the intra-level ones, resulting in an overall similarity function that is written as a sum of partial distances. We could further improve the power of the similarity measure by considering some kind of chain rule for the KLD between two images [5]. In section IV, we evaluated the performance of a Gaussianization procedure applied on the subband coefficients of a steerable pyramid. For this purpose, we preserved the same neighborhood type across subbands, for every image in the database. Obviously, it is computationally unfeasible to check the Gaussianization performance for all possible neighborhood formations. However, we expect an improved Gaussianization performance and consequently better retrieval performance, by performing some reduced complexity adaptation of the optimal neighborhood type across subbands and images. R EFERENCES [1] J. Ashley, M. Flickner, J. Hafner, D. Lee, W. Niblack, and D. Petkovic, “The query by image content (qbic) system,” in Proceedings of the 1995 ACM SIGMOD Interl. Conf. on Management of data, (New York, U.S.), pp. 475–, ACM Press, 1995. [2] S. Belongie, J. Malik, and J. Puzicha, “Matching shapes,” In International Conference on Computer Vision, Vancouver, Canada, 2001. [3] B. Manjunath, J. R. Ohm, V. Vasudevan, and A. Yamada, “Color and texture descriptors,” IEEE Trans. Circuits and Syst. Video Tech., vol. 11, pp. 703–715, June 2001. [4] J. Malik and P. Perona, “Preattentive texture discrimination with early vision mechanisms,” Journal of Optical Soc. Am., vol. 7, no. 5. [5] S. Kullback, Information Theory and Statistics. Dover, 1997. [6] J. R. Bergen and E. H. Adelson, “Theories of visual texture perception,” in Spatial Vision (D. R. Ed., ed.), CRC press, 1991. [7] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1998. [8] P. Wu, B. S. Manjunath, S. Newsam, and H. D. Shin, “A texture descriptor for browsing and similarity retrieval,” Signal Processing: Image Communication, vol. 16, pp. 33–43, 2000. [9] M. Unser, “Texture classification and segmentation using wavelet frames,” IEEE Trans. on Image Processing, vol. 4, pp. 1549–1560, November 1995.
29
[10] M. K. Mihcak, I. Kozintsev, K. Ramchandran, and P. Moulin, “Low-complexity image denoising based on statistical modeling of wavelet coefficients,” IEEE Signal Processing Letters, vol. 6, no. 12, pp. 300–303, 1999. [11] S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674–692, July 1989. [12] E. P. Simoncelli, “Statistical models for images: Compression, restoration and synthesis,” in 31st Asilomar Conf on Signals, Systems and Computers, (Pacific Grove, CA), pp. 673–678, IEEE Computer Society, 1997. [13] S. G. Chang, B. Yu, and M. Vetterli, “Lossy compression and wavelet thresholding for image denoising,” submitted to IEEE Trans. Image Processing. [14] S. Liapis and G. Tziritas, “Color and texture image retrieval using chromaticity histograms and wavelet frames,” IEEE Trans. on Multimedia, vol. 6, pp. 676–686, Oct. 2004. [15] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance,” IEEE Trans. Image Processing, vol. 11, pp. 146–158, Feb. 2002. [16] A. Achim, A. Bezerianos, and P. Tsakalides, “Novel Bayesian multiscale method for speckle removal in medical ultrasound images,” IEEE Trans. Med. Imag., vol. 20, pp. 772–783, Aug. 2001. [17] A. Achim, P. Tsakalides, and A. Bezerianos, “SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed modeling,” IEEE Trans. Geosc. and Rem. Sens., vol. 41, pp. 1773–1784, Aug. 2003. [18] G. Samorodnitsky and M. S. Taqqu, Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. New York: Chapman and Hall, 1994. [19] C. L. Nikias and M. Shao, Signal Processing with Alpha-Stable Distributions and Applications. New York: John Wiley and Sons, 1995. [20] G. Tzagkarakis and P. Tsakalides, “A statistical approach to texture image retrieval via alpha-stable modeling of wavelet decompositions,” 5th International Workshop on Image Analysis for Multimedia Interactive Services, Lisboa, Portugal, April 21-23 2004. [21] J. Huang, “Study on the correlation properties of wavelet transform coefficients and the applications in a neural networkbased hybrid image coding system,” CISST’03, Las Vegas, USA, June 22-26 2003. [22] J. Portilla and E. P. Simoncelli, “A parametric texture model based on joint statistics of complex wavelet coefficients,” International Journal of Computer Vision, vol. 40, pp. 49–71, Dec. 2000. [23] G. Cross and A. Jain, “Markov random field texture models,” IEEE Trans Pattern Analysis and Machine Intelligence, vol. 5, pp. 25–39, 1983. [24] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based statistical signal processing using hidden markov models,” IEEE Trans. on Signal Processing (Special issue on Wavelets and Filterbanks), pp. 886–902, April 1998. [25] M. N. Do and M. Vetterli, “Rotation invariant texture characterization and retrieval using steerable wavelet-domain hidden markov models,” Submitted to IEEE Trans. on Multimedia, April 2001. [26] H. Greenspan, S. Belongie, R. Goodman, and P. Perona, “Rotation invariant texture recognition using a steerable pyramid,” in Proc. Int. Conf. on Pattern Recognition, pp. 162–167, 1994. [27] G. M. Haley and B. S. Manjunath, “Rotation invariant texture classification using modified Gabor filters,” in Proc. IEEE Int. Conf. Image Processing, Washington, DC, Oct 1995. [28] G. M. Haley and B. S. Manjunath, “Rotation-invariant texture classification using a complete space-frequency model,” IEEE Trans. Image Processing, vol. 8, pp. 255–269, Feb 1999.
30
[29] B. Beferull-Lozano, H. Xie, and A. Ortega, “Rotation-invariant features based on steerable transforms with an application to distributed image classification,” in IEEE International Conference on Image Processing, Barcelona, Spain, 2003. [30] J. Mao and A. Jain, “Texture classification and segmentation using multiresolution simultaneous autoregressive models,” Pattern Recognition, vol. 25, pp. 173–188, Feb. 1992. [31] F. Liu and R. W. Picard, “Periodicity, directionality, and randomness: Wold features for image modeling and retrieval,” IEEE Trans. Patt. Anal. Machine Intell., vol. 18, pp. 722–733, July 1996. [32] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, 1990. [33] T. Cover and J. Thomas, Elements of Information Theory. John Wiley, 1991. [34] J. P. Nolan, “Parameterizations and modes of stable distributions,” Statistics & Probability Letters, no. 38, pp. 187–195, 1998. [35] J. P. Nolan, “Numerical calculation of stable densities and distribution functions,” Commun. Statist.-Stochastic Models, vol. 13, pp. 759–774, 1997. [36] S. Cambanis and G. Miller, “Linear problems in pth order and stable processes,” SIAM J. Appl. Math., vol. 41, pp. 43–69, 1981. [37] M. Shao and C. L. Nikias, “Signal processing with fractional lower order moments: Stable processes and their applications,” Proc. IEEE, vol. 81, pp. 986–1010, 1993. [38] E. P. Simoncelli and J. Portilla, “Texture characterization via joint statistics of wavelet coefficient magnitudes,” in Proc. of IEEE Int. Conf. on Image Processing, 1998. [39] J. Chambers, W. Cleveland, B. Kleiner, and P. Tukey, Graphical Methods for Data Analysis. Wadsworth, 1983. [40] C. K. Chui, An Introduction to Wavelets. Academic Press, 1992. [41] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, “Shiftable multi-scale transforms,” IEEE Trans. Inform. Theory, vol. 38, no. 2, 1992. [42] W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 13, no. 9, pp. 891–906, 1991. [43] D. Andrews and C. Mallows, “Scale mixtures of normal distributions,” J. Royal Stat. Soc., vol. 36, pp. 99–, 1974. [44] B. Beferull-Lozano, “Quantization design for structured overcomplete expansions,” Ph. D. thesis, University of Southern California, 2002.
31
L IST OF TABLES I
Performance of the covariation estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
II
Optimal p parameter as a function of the characteristic exponent α. . . . . . . . . . . . . . . 32
III
SαS modeling of wavelet subband coefficients of texture images from the VisTex database,
using Daubechies’ 4 filter and 3 decomposition levels. ML parameter estimates and 95% confidence intervals for the characteristic exponent α. . . . . . . . . . . . . . . . . . . . . . 33 IV
Neighborhood shapes used in the measurement of the Gaussianization performance. . . . . . 34
V
Average retrieval rate (%) in the top 15 matches. . . . . . . . . . . . . . . . . . . . . . . . . 34 L IST OF F IGURES
1
Curves representing the standard deviation of the covariation estimation as a function of the parameter p for the ˆcF LOM estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2
Curves representing the standard deviation of the covariation estimation as a function of the parameter p for α = 1.2, 1.5 and 25 dispersion pairs (γX , γY ), using the ˆcF LOM estimator.
3
35
Modeling of the horizontal subband at the first level of decomposition of the Flowers.6 image with the SαS and the GGD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4
Histogram of the neighborhood indices in terms of resulting in the best Gaussianization performance for a set of 300 subbands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5
(a) Bark physically rotated at 30 and 120 degrees, (b) D(IkQ)(θ) for J = 4. Notice that the minimum is achieved for θ∗ = 90 degrees, which is the exact relative angle between the two texture samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6
Histogram of the estimated values for the characteristic exponent, α, for the set of 208 texture images of size 128 × 128. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7
Texture images from the VisTex database, from left to right and top to bottom: 1) Bark, 2) Brick, 3) Bubbles, 4) Grass, 5) Leather, 6) Pigskin, 7) Raffia, 8) Sand, 9) Straw, 10) Water, 11) Weave, 12) Wood, 13) Wool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8
Average percentages (%) of correct retrieval rate for individual texture class. . . . . . . . . . 38
32
TABLE I P ERFORMANCE OF THE COVARIATION ESTIMATOR . α
ˆcF LOM (p)
1.1
-1.9158
True [X, Y ]α (p=0.56)
-1.916
(p=0.57)
-1.8254
(p=0.59)
-1.7476
(p=0.66)
-1.6821
(p=0.68)
-1.6285
(p=0.76)
-1.5561
(p=0.91)
-1.5288
(p=2)
-1.532
(0.1397) 1.2
-1.8203 (0.1390)
1.3
-1.755 (0.1481)
1.4
-1.6814 (0.1234)
1.5
-1.6201 (0.1218)
1.7
-1.5533 (0.1062)
1.9
-1.5109 (0.0886)
2
-1.5301 (0.0668)
TABLE II O PTIMAL p PARAMETER AS A FUNCTION OF THE CHARACTERISTIC EXPONENT α.
α
Optimal p
α
Optimal p
1
0.52
1.5
0.69
1.05
0.54
1.55
0.71
1.1
0.56
1.6
0.72
1.15
0.57
1.65
0.74
1.2
0.58
1.7
0.76
1.25
0.59
1.75
0.79
1.3
0.61
1.8
0.81
1.35
0.62
1.85
0.84
1.4
0.64
1.9
0.88
1.45
0.66
1.95
0.93
2
0.8
33
TABLE III SαS MODELING OF WAVELET SUBBAND COEFFICIENTS OF TEXTURE IMAGES FROM THE V IS T EX DATABASE , USING DAUBECHIES ’ 4 FILTER AND 3 DECOMPOSITION LEVELS . ML PARAMETER ESTIMATES AND 95% CONFIDENCE INTERVALS FOR THE CHARACTERISTIC EXPONENT
α.
Image Subbands IMAGE
Horizontal
Vertical
Diagonal
Level 1 Bark.10
1.601 ± 0.061
1.684 ± 0.058
1.681 ± 0.057
Brick.1
1.614 ± 0.056
1.577 ± 0.056
1.896 ± 0.031
Buildings.4
1.681 ± 0.039
1.684 ± 0.047
1.601 ± 0.046
Fabric.0
2.000 ± 0.007
1.270 ± 0.053
1.322 ± 0.055
Fabric.10
1.229 ± 0.057
1.367 ± 0.058
1.175 ± 0.054
Flowers.6
1.76 ± 0.041
1.701 ± 0.044
1.986 ± 0.025
Food.9
1.626 ± 0.061
1.339 ± 0.055
1.386 ± 0.056
Grass.1
1.879 ± 0.047
1.853 ± 0.053
1.791 ± 0.055
Metal.4
1.320 ± 0.057
1.228 ± 0.054
1.402 ± 0.059
Stone.3
1.591 ± 0.054
1.688 ± 0.049
1.746 ± 0.050
Level 2 Bark.10
1.855 ± 0.097
1.858 ± 0.110
1.850 ± 0.107
Brick.1
1.311 ± 0.105
1.539 ± 0.103
1.850 ± 0.101
Buildings.4
0.921 ± 0.089
1.186 ± 0.119
1.151 ± 0.098
Fabric.0
2.000 ± 0.006
1.171 ± 0.099
1.349 ± 0.114
Fabric.10
1.677 ± 0.120
1.611 ± 0.115
1.557 ± 0.114
Flowers.6
1.334 ± 0.097
1.424 ± 0.100
1.900 ± 0.057
Food.9
1.990 ± 7.7e-8
1.465 ± 0.107
1.750 ± 0.101
Grass.1
1.921 ± 0.073
1.858 ± 0.099
1.990 ± 0.073
Metal.4
1.680 ± 0.121
1.505 ± 0.119
1.690 ± 0.119
Stone.3
1.869 ± 0.083
1.509 ± 0.103
1.658 ± 0.117
Level 3 Bark.10
1.723 ± 0.225
2.000 ± 0.162
1.792 ± 0.248
Brick.1
1.227 ± 0.242
1.368 ± 0.204
1.990 ± 0.219
Buildings.4
1.220 ± 0.201
2.000 ± 0.109
1.014 ± 0.181
Fabric.0
2.000 ± 0.097
1.498 ± 0.246
1.874 ± 0.167
Fabric.10
1.865 ± 0.205
2.000 ± 0.103
1.904 ± 0.184
Flowers.6
1.643 ± 0.222
1.573 ± 0.210
1.851 ± 0.211
Food.9
2.000 ± 0.005
1.677 ± 0.222
1.990 ± 0.245
Grass.1
2.000 ± 0.375
1.862 ± 0.177
2.000 ± 0.241
Metal.4
1.787 ± 0.217
1.888 ± 0.145
1.863 ± 0.182
Stone.3
2.000 ± 0.096
1.278 ± 0.223
1.500 ± 0.226
34
TABLE IV N EIGHBORHOOD SHAPES USED IN THE MEASUREMENT OF THE G AUSSIANIZATION PERFORMANCE .
Index
Neighborhood Structure
Size (P )
l
for a given c (xk , φi ) 1
(l, φi ): (3x3) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 1 ∪ (l + 1, φi ): 1
J +9
2
(l, φi ): (3x3) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 1
J +8
3
(l, φi ): (3x3)
9
4
(l, φi ): 4 (cross shape (c.s.))
5
5
(l, φi ): 4 (c.s.) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 1 ∪ (l + 1, φi ): 1
J +5
6
(l, φi ): 4 (c.s.) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 1
J +4
7
(l, φi ): 4 (c.s.) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 4 (c.s.)
5(J + 1)
∪ (l + 1, φi ): 4 (c.s.) 8
(l, φi ): (5x5) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 1 ∪ (l + 1, φi ): 1
J + 25
9
(l, φi ): (5x5) ∪ {(l, φj )|j = 1, . . . , J, j 6= i}: 1
J + 24
10
(l, φi ): (5x5)
25
1) (l, φi ): X, means that X is the neighborhood at the level l and orientation φi , centered at the coefficient cl (xk , φi ). 2) {(l, φj )|j = 1, . . . , J, j 6= i}: Y, means that for each one of the rest (J − 1) orientations at level l, Y is the neighborhood centered at the coefficient cl (xk , φj ), = 1, . . . , J, j 6= i. 3) (l+1, φi ): Z, means that Z is the neighborhood at the next lower level l+1 and orientation φi centered at the corresponding spatial location.
TABLE V AVERAGE RETRIEVAL RATE (%) IN THE TOP 15 MATCHES .
Methods Neighborhood
Non-Gaussianized
SG
SG
Index
&
&
&
Frobenius (28)
Frobenius (28)
KLD (35)
85.34
87.11
92.72
1
Methods Neighborhood
Non-Gaussianized
SEG
SEG
Index
&
&
&
Frobenius (28)
Frobenius (28)
KLD (35)
85.34
93.86
94.23
10
35
4
1.1 1.2 1.3 1.4 1.5 1.7 1.9 2
Standard Deviation [dB]
3
2
1
0
−1
−2 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
p
Fig. 1. Curves representing the standard deviation of the covariation estimation as a function of the parameter p for the ˆcF LOM estimator.
α = 1.5 3
2
2
Standard Deviation [dB]
Standard Deviation [dB]
α = 1.2 3
1
0
−1
−2
−3
−4
1
0
−1
−2
−3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
−4
0
0.2
0.4
0.6
0.8
p
(a)
1
1.2
1.4
1.6
1.8
2
p
(b)
Fig. 2. Curves representing the standard deviation of the covariation estimation as a function of the parameter p for α = 1.2, 1.5 and 25 dispersion pairs (γX , γY ), using the ˆcF LOM estimator.
36
Amplitude Probability Curves
0
10
Empirical APD SαS GGD −1
P(|X|>x)
10
−2
10
−3
10
−4
10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Data amplitude , x
Modeling of the horizontal subband at the first level of decomposition of the Flowers.6 image with the SαS and
Fig. 3.
the GGD depicted in solid and dashed lines, respectively. The estimated parameters for the SαS distribution have the values α = 1.76, γ = 0.08 while the GGD has parameters α = 0.11 and β = 1.02. The dotted line denotes the empirical APD.
0.35
0.3
Relative Frequency
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
Neighborhood Index
Fig. 4.
Histogram of the neighborhood indices in terms of resulting in the best Gaussianization performance for a set of 300
subbands.
37
14.5
Rotation−invariant KLD
14
Bark.030
13.5
13
12.5
12
11.5
0
20
40
60
Bark.120
(a) Fig. 5.
80
100
120
140
160
180
Angle [in degrees]
(b)
(a) Bark physically rotated at 30 and 120 degrees, (b) D(IkQ)(θ) for J = 4. Notice that the minimum is achieved
∗
for θ = 90 degrees, which is the exact relative angle between the two texture samples.
Relative Frequency
0.25
0.2
0.15
0.1
0.05
0 1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
Characteristic Exponent
Fig. 6. Histogram of the estimated values for the characteristic exponent, α, for the set of 208 texture images of size 128 × 128.
38
Fig. 7. Texture images from the VisTex database, from left to right and top to bottom: 1) Bark, 2) Brick, 3) Bubbles, 4) Grass, 5) Leather, 6) Pigskin, 7) Raffia, 8) Sand, 9) Straw, 10) Water, 11) Weave, 12) Wood, 13) Wool.
G SE & KLD
HMM−based
Average Retrieval Rate (%)
100 80 60 40 20 0
1
2
3
4
5
6
7
8
9
10
11
Texture Class
Fig. 8.
Average percentages (%) of correct retrieval rate for individual texture class.
12
13