Aug 1, 2011 - ROCC = 1â. 6. N. â i=1. (DMOS(i)âDMOSp(i))2. N(N2 â1). (10) where the index i denotes the image sample and N denotes the number of ...
Author manuscript, published in "DICTAP 2011, The International Conference on Digital Information and Communication Technology and Its Applications, Dijon : France (2011)"
Image quality assessment based on Intrinsic Mode Function coefficients modeling Abdelkaher Ait Abdelouahad1 , Mohammed El Hassouni2 , Hocine Cherifi3 , and Driss Aboutajdine1 1 LRIT
URAC- University of Mohammed V-Agdal-Morocco FLSHR- University of Mohammed V-Agdal-Morocco 3 Le2i-UMR CNRS 5158 -University of Burgundy, Dijon-France
hal-00611416, version 1 - 1 Aug 2011
2 DESTEC,
Abstract. Reduced reference image quality assessment (RRIQA) methods aim to assess the quality of a perceived image with only a reduced cue from its original version, called ”reference image”. The powerful advantage of RR methods is their ”General-purpose”. However, most introduced RR methods are built upon a non-adaptive transform models. This can limit the scope of RR methods to a small number of distortion types. In this work, we propose a bi-dimensional empirical mode decomposition-based RRIQA method. First, we decompose both, reference and distorted images, into Intrinsic Mode Functions (IMF), then we use the Generalized Gaussian Density (GGD) to model IMF coefficients. Finally, the distortion measure is computed from the ”fitting errors”, between the empirical and the theoretical IMF histograms, using the Kullback Leibler Divergence (KLD). In order to evaluate the performance of the proposed method, two approaches have been investigated : the logistic function-based regression and the well known Support vector machine-based classification. Experimental results show a high correlation between objective and subjective scores. Key words: RRIQA, IMF, GGD, KLD
1
Introduction
Last years have witnessed a surge of interest to objective image quality measures, due to the enormous growth of digital image processing techniques: lossy compression, watermarking, quantization. These techniques generally transform the original image to an image of lower visual quality. To assess the performance of different techniques one has to measure the impact of the degradation induced by the processing in terms of perceived visual quality. To do so, subjective measures based essentially on human observer opinions have been introduced. These visual psychophysical judgments (detection, discrimination and preference) are made under controlled viewing conditions (fixed lighting, viewing distance, etc.), generate highly reliable and repeatable data, and are used to optimize the design of imaging processing techniques. The test plan for subjective video quality assessment is well guided by Video Quality Experts Group (VQEG) including the test procedure and subjective data analysis. A popular method for assessing image quality involves asking people to quantify their subjective impressions by selecting one of the five classes: Excellent, Good, Fair, Poor, Bad, from the quality
hal-00611416, version 1 - 1 Aug 2011
scale (UIT-R [1]), then these opinions are converted into scores. Finally, the average of the scores is computed to get the Mean Opinion Score (MOS). Obviously, subjective tests are expensive and not applicable in tremendous number of situations. Objective measures aim to assess the visual quality of a perceived image automatically based on mathematics and computation methods are needed. Until now there is no one single image quality metric that can predict our subjective judgments of image quality because image quality judgments are influenced by a multitude of different types of visible signals, each weighted differently depending on the context under which a judgment is made. In other words a human observer can easily detect anomalies of a distorted image and judge its visual quality with no need to refer to the real scene, whereas a computer cannot. Research on objective visual quality can be classified in three folds depending on the information available. When the reference image is available the metrics belongs to the Full Reference (FR) methods. The simple and widely used Peak Signal -to -noise -Ratio (PSNR) and the Mean Structure Similarity Index (MSSIM) are both widely used FR metrics [2]. However, it is not always possible to get the reference images to assess image quality. When reference images are unavailable No Reference (NR) metrics are involved. No reference (NR) methods, which aim to quantify the quality of distorted image without any cue from its original version are generally conceived for specific distortion type and cannot be generalized for other distortions [3]. Reduced Reference (RR) is typically used when one can send side information with the processed image relating to the reference. Here, we focus on RR methods which provide a better tradeoff between the quality rate accuracy and information required, as only small size of feature are extracted from the reference image. Recently, a number of authors have successfully introduced RR methods based on : image distortion modeling [4][5], human visual system (HVS) modeling [6][7], or finally natural image statistics modeling [8]. In [8], Z.wang et al introduced a RRIQA measure based on Steerable pyramids (a redundant transform of wavelets family). Although this method has known some success when tested on five types of distortion, it suffers from some weaknesses. First of all, steerable pyramids is a non-adaptive transform, and depends on a basis function. This later cannot fit all signals when this happens a wrong time-frequency representation of the signal is obtained. Consequently it is not sure that steerable pyramids will achieve the same success for other type of distortions. Furthermore, the wavelet transform provides a linear representation which cannot reflect the nonlinear masking phenomenon in human visual perception [9]. A novel decomposition method was introduced by Huang et al [10], named Empirical Mode decomposition (EMD). It aims to decompose non stationary and non linear signals to finite number of components : Intrinsic Mode Functions (IMF), and a residue. It was first used in signal analysis, then it attracted more researcher’s attention. Few years later, Nunes et al [11] proposed an extension of this decomposition in the 2D case Bi-dimensional Empirical Mode Decomposition(BEMD). A number of authors have benefited from the BEMD in several image processing algorithms : image watermarking [12], texture image retrieval [13], and feature extraction [14]. In contrast to wavelet, EMD is nonlinear and adaptive method, it depends only on data since no basis function is needed. Motivated by the advantages of the BEMD, and to remedy the wavelet drawbacks discussed above, here we propose the use of BEMD as a representation domain. As distortions affects IMF coefficients and also
their distribution. The investigation of IMF coefficients marginal distribution seems to be a reasonable choice. In the literature, most RR methods use a logistic function-based regression method to predict mean opinion scores from the values given by an objective measure. These scores are then compared in term of correlation with the existing subjective scores. The higher is the correlation, the more accurate is the objective measure. In addition to the objective measure introduced in this paper, an alternative approach to logistic function-based regression is investigated. It is an SVM-based classification, where the classification was conducted on each distortion set independently, according to the visual degradation level. The better is the classification accuracy the higher is the correlation of the objective measure with the HVS judgment. This paper is organized as follows. Section 2 presents the proposed IQA scheme. The BEMD and its algorithm are presented in Section 3. In Section 4, we describe the distortion measure. Section 5 explains how we conduct the experiments and presents some results of a comparison with existing methods. Finally, we give some concluding remarks.
hal-00611416, version 1 - 1 Aug 2011
2
IQA proposed scheme
In this paper, we propose a new IQA scheme based on the BEMD decomposition. This scheme provides a distance between a reference image and its distorted version as an output. This distance represents the error between both images and should have a good consistency with human judgment.
Fig. 1. The deployment scheme of the proposed RRIQA approach.
The scheme consists in two stages as mentioned in Fig.1. First, a BEMD decomposition is employed to decompose the reference image at the sender side and the distorted
hal-00611416, version 1 - 1 Aug 2011
image at the receiver side. Second, the features are extracted from the resulting IMFs based on modeling natural image statistics. The idea is that distortions make a degraded image appearing unnatural and affect image statistics. Measuring this unnaturalness can lead us to quantify the visual quality degradation. One way to do so is to consider the evolution of marginal distribution of IMF coefficients. This implies the availability of IMF coefficient histogram of the reference image at the receiver side. Using the histogram as a reduced reference raises the question of the amount of side information to be transmitted. If the bin size is coarse, we obtain a bad approximation accuracy but a small data rate while when the bin size is fine, we get a good accuracy but a heavier RR data rate. To avoid this problem it is more convenient to assume a theoretical distribution for the IMF marginal distribution and to estimate the parameters of the distribution. In this case the only side information to be transmitted consist of the estimated parameters and eventually an error between the the empirical distribution and the estimated one. The GGD model provides a good approximation of IMF coefficients histogram and this only with the use of two parameters (as explained in section 4). Moreover, we consider the fitting error between empirical and estimated IMF distribution. Finally, at the receiver side we use the extracted features to compute the global distance over all IMFs.
3
The Bi-dimensional Empirical Mode Decomposition
The Empirical Mode Decomposition (EMD) has been introduced [10] as a driven-data algorithm, since it is based purely on the properties observed in the data without predetermined basis functions. The main goal of EMD is to extract the oscillatory modes that represent the highest local frequency in a signal, while the remainders are considered as a residual. These modes are called Intrinsic Mode Functions (IMF). An IMF is a function that satisfies two conditions: 1- The function should be symmetric in time, and the number of extrema and zero crossings must be equal, or at most differ by one. 2- At any point, the mean value of the upper envelope, and the lower envelope must be zero. The so called ”sifting process” works iteratively on the signal to extract each IMF. Let x(t) be the input signal, the algorithm of the EMD is summarized as follows : The sifting process consists in iterating from step 1 to 4 upon the detail signal d(t)
Empirical Mode Decomposition Algorithm 1. Identify all extrema of x(t). 2. Interpolate between minima (resp. maxima), ending up with some envelope emin (t)(resp. emax (t)). 3. Compute the mean m(t) = (emin (t) + emax (t))/2. 4. Extract the detail d(t) = x(t) − m(t). 5. Iterate on the residual m(t).
until this later can be considered as zero mean. The resultant signal is designated as an IMF, then the residual will be considered as the input signal for the next IMF. The
algorithm terminates when a stopping criterion or a desired number of IMFs is reached. After IMFs are extracted through the sifting process, the original signal x(t) can be represented like this : n
x(t) =
∑ Im f j (t) + m(t)
(1)
j=1
hal-00611416, version 1 - 1 Aug 2011
where Im f j is the jth extracted IMF and n is the total number of IMFs. In two dimensions (Bi-dimensional Empirical Mode Decomposition : BEMD), the algorithm remains the same as for a single dimension with a few changes : the curve fitting for extrema interpolation will be replaced with a surface fitting, this increases the computational complexity for identifying extrema and specially for extrema interpolation. Several two dimensions EMD versions have been developed [15][16], each of them uses its own interpolation method. Bhuiyan et al [17] proposed an interpolation based on statistical order filters. From a computational cost standpoint, this is a fast implementation, as only one iteration is required for each IMF. Fig.2 illustrates an application of the BEMD on the ”Buildings” image :
Original
IMF1
IMF2
IMF3
Fig. 2. The ”Buildings” image decomposition using the BEMD.
4
Distortion measure
The resulting IMFs from an BEMD show the highest frequencies at each decomposition level, this frequencies decrease as the order of the IMF increases. For example, the first IMF contains a higher frequencies than the second one. Furthermore, in a particular
IMF the coefficients histogram exhibits a non Gaussian behavior, with a sharp peak at zero and heavier tails than the Gaussian distribution as can be seen in Fig.3 (a). Such a distribution can be well fitted with a two parameters Generalized Gaussian Density (GGD) model given by :
p(x) =
|x| β exp(−( )β ) 1 α 2αΓ ( β )
(2)
where Γ (z) = 0∞ e−t t z−1 dt, z > 0 represents the Gamma function, α is the scale parameter that describes the standard deviation of the density, and β is the shape parameter.
hal-00611416, version 1 - 1 Aug 2011
R
In the conception of an RR method, we should consider a transmission context, where an image in the sender side with a perfect quality have to be transmitted to a receiver side. The RR method consists in extracting relevant features from the reference image and use them as a reduced description. However, the selection of features is a critical step. On one hand, extracted features should be sensitive to a large type of distortions to guarantee the genericity, and also be sensitive to different distortion levels. On the other hand, extracted features should have a minimal size as possible. Here, we propose a marginal distribution-based RR method since the marginal distribution of IMF coefficients changes from a distortion type to another as illustrated in Fig.3 (b), (c) and (d). Let us consider IMFO as an IMF from the original image and IMFD its corresponding from the distorted image. To quantify the quality degradation, we use the Kullback Leibler Divergence (KLD) which is recognized as a convenient way to compute divergence between two Probability Density Functions (PDFs). Assuming that p(x) and q(x) are the PDFs of IMFO and IMFD respectively, the KLD between them is defined as : Z
d(pkq) =
p(x) log
p(x) dx q(x)
(3)
For this aim, the histograms of the original image must be available at the receiver side. Even if we can send the histogram to the receiver side it will increase the size of the feature significantly and causes some inconvenients. The GGD model provides an efficient way to get back coefficients histogram, so that only two parameters are needed to be transmitted to the receiver side. In the following, we note pm (x) the approximation of p(x) using a 2- parameters GGD model. Furthermore, our feature will contains a third characteristic which is the prediction error defined as the KLD between p(x) and pm (x): d(pm kp) =
Z
pm (x) log
pm (x) dx p(x)
(4)
In practice, this quantity can be computed as it follows : L
d(pm kp) = ∑ Pm (i) log i=1
Pm (i) dx P(i)
(5)
Where P(i) and Pm (i) are the normalized heights of the ith histogram bins, and L is the number of bins in the histograms. Unlike the sender side, at the receiver side we first
hal-00611416, version 1 - 1 Aug 2011
(a)
(b)
(c)
(d)
Fig. 3. Histograms of IMF coefficients under various distortion types. (a) original ”Buildings” image, (b) white noise contaminated image, (c) blurred image, (d) transmission errors distorted image. (Solid curves) : histogram of IMF coefficients. (Dashed curves) : GGD model fitted to the histogram of IMF coefficients in the original image. The horizontal axis represents the IMF coefficients, while the vertical axis represents the frequency of these coefficients.
compute the KLD between q(x) and pm (x) (equation (6)). We do not fit q(x) with a GGD model cause we are not sure that the distorted image is still a natural one and consequently if the GGD model is still adequate. Indeed the distortion introduced by the processing can greatly modify the marginal distribution of the IMF coefficients. Therefore it is more accurate to use the empirical distribution of the processed image. d(pm kq) =
Z
pm (x) log
pm (x) dx q(x)
(6)
Then the KLD between p(x) and q(x) are estimated as : b d(pkq) = d(pm kq) − d(pm kp)
(7)
Finally the overall distortion between an original and distorted image is as it follows:
hal-00611416, version 1 - 1 Aug 2011
D = log2 (1 +
1 K bk k k ∑ |d (p kq )|) Do k=1
(8)
where K is the number of IMFs, pk and qk are the probability density functions of the kth IMF in the reference and distorted images, respectively. dbk is the estimation of the KLD between pk and qk , and Do is a constant used to control the scale of the distortion measure. The proposed method is a real RR one thanks to the reduced number of features used : the image is decomposed into four IMFs and from each IMF we extract only three parameters {α, β , d(pm kp)} so that 12 parameters in the total. Increasing the number of IMF will increase the computational complexity of the algorithm and thus the size of the feature set. To estimate the parameters (α, β ) we used the moment matching method [18], and for extracting IMFs we used a fast and adaptive BEMD [17] based on statistical order filters, to replace the sifting process which is time consuming. To evaluate the performances of the proposed measure, we use the logistic functionbased regression which takes the distances and provides the objective scores. Another alternative to the logistic function-based regression is proposed and it is based on SVM classifier. More details about the performance evaluation are given in the next section.
5
Experimental results
Our experimental test was carried out using the LIVE database [19]. It is constructed from 29 high resolution images and contains seven sets of distorted and scored images, obtained by the use of five types of distortion at different levels. Set1 and 2 are JPEG2000 compressed images, set 3 and 4 are JPEG compressed images, set 5, 6 and 7 are respectively : Gaussian blur, white noise and transmission errors distorted images. The 29 reference images shown in Fig.4 have very different textural characteristics, various percentages of homogeneous regions, edges and details. To score the images one can use either the MOS or the Differential Mean Option Score (DMOS) which is the difference between ”reference” and ”processed” Mean
hal-00611416, version 1 - 1 Aug 2011
Fig. 4. The 29 reference images of the LIVE database.
Opinion Score. For LIVE database, the MOS of the reference images is equal to zero, and then the difference mean opinion score (DMOS) and the MOS are the same. To illustrate the visual impact of the different distortions, Fig.5 presents the reference image and the distorted images. In order to examine how well the proposed metric correlates with the human judgement, the given images have the same subjective visual quality according to the DMOS. As we can see, the distance between the distorted images and their reference image is of the same order of magnitude for all distortions. In Fig.6, we show an application of the measure in equation (8) to five white noise contaminated images, as we can see the distance increases as the distortion level increases, this demonstrates a good consistency with human judgement. The tests consist in choosing a reference image and one of its distorted versions. Both images are considered as entries of the scheme given in Fig.1. After feature extraction step in the BEMD domain a global distance is computed between the reference and distorted image as mentioned in equation (8). This distance represents an objective measure for image quality assessment. It produces a number and that number needs to be correlated with the subjective MOS. This can be done using two different protocols: Logistic function based-regression The subjective scores must be compared in term of correlation with the objective scores. These objective scores are computed from the values generated by the objective measure ( the global distance in our case), using a nonlinear function according to the Video Quality Expert Group (VQEG) Phase I FRTV [20]. Here, we use a four parameter logistic function given by :
hal-00611416, version 1 - 1 Aug 2011
Original
(a)
(b)
(c)
Fig. 5. An application of the proposed measure to different distorted images. ((a): white noise, D = 9.36, DMOS =56.68), ((b): Gaussian blur, D= 9.19, DMOS =56.17), ((c): Transmission errors, D= 8.07, DMOS =56.51).
Original
D = 4.4214(σ = 0.03)
D = 6.4752(σ = 0.05)
D = 9.1075(σ = 0.28)
D = 9.3629(σ = 0.40)
D = 9.7898(σ = 1.99)
Fig. 6. An application of the proposed measure to different levels of Gaussian white noise contaminated images.
logistic(γ, D) =
γ1 −γ2 +γ2 , where γ D−γ 1+e− ( γ 3 )
= (γ1 , γ2 , γ3 , γ4 ). Then, DMOS p = logistic(γ, D)
4
hal-00611416, version 1 - 1 Aug 2011
Fig.7 shows the scatter plot of DMOS versus the model prediction for the JPEG2000, Transmission errors, White noise and Gaussian blurred distorted images. We can easily remark how well is the fitting specially for the Transmission errors and the white noise distortions.
Fig. 7. Scatter plots of (DMOS) versus the model prediction for the JPEG2000, Transmission errors, White noise and Gaussian blurred distorted images.
Once the nonlinear mapping is achieved, we obtain the predicted objective quality scores (DMOSp). To compare the subjective and objective quality scores, several metrics were introduced by the VQEG. In our study, we compute the correlation coefficient to evaluate the accuracy prediction and the Rank order coefficient to evaluate the monotonicity prediction. These metrics are defined as follows :
N
∑ (DMOS(i) − DMOS)(DMOSp(i) − DMOSp)
CC = s
i=1 N
s
N
∑ (DMOS(i) − DMOS)2 ∑ (DMOSp(i) − DMOSp)2
i=1
i=1
(9)
N
6 ∑ (DMOS(i) − DMOSp(i))2 ROCC = 1 −
i=1
N(N 2 − 1)
(10)
where the index i denotes the image sample and N denotes the number of samples. Table 1. Performance evaluation for the quality measure using LIVE database.
hal-00611416, version 1 - 1 Aug 2011
Dataset
Noise Blur Error Correlation Coefficient (CC) BEMD 0.9332 0.8405 0.9176 Pyramids 0.8902 0.8874 0.9221 PSNR 0.9866 0.7742 0.8811 MSSIM 0.9706 0.9361 0.9439 Rank-Order Correlation Coefficient (ROCC) BEMD 0.9068 0.8349 0.9065 Pyramids 0.8699 0.9147 0.9210 PSNR 0.9855 0.7729 0.8785 MSSIM 0.9718 0.9421 0.9497
Table 1 shows the final results for three types : white noise, Gaussian blur and transmission errors. We report the results obtained for two RR metrics (BEMD, Pyramids) and two FR metrics (PSNR, MSSIM). As the FR metrics use more information we can expect than they should be more performing than RR metrics. This is true for MSSIM but not for the PSNR that perform poorly as compared to the RR metrics for all the types of degradation except for the noise perturbation. As we can see, our method ensures better prediction accuracy (higher correlation coefficients), better prediction monotonicity (higher Spearman rank-order correlation coefficients) than the steerable pyramids based method, and this for the white noise. Also comparing to PSNR which is a FR method, we can observe a significant improvements for the blur and transmission errors distortions. We notice that we carried out other experiments for using the KLD between probability density functions (PDFs) by estimating the GGD parameters at the sender and the receiver side, but the results were not satisfying comparing to the proposed measure. This can be explained by the strength of the distortion that makes reference image lose its naturalness and then an estimation of the GGD parameters at the receiver side is not suitable. To go further, we thought to examine how an IMF behaves with a distortion type. For this aim, we conducted the same experiments as above but on each IMF separately. Table 2 shows the results. As observed, the sensitivity of an IMF to the quality degradation changes depending on the distortion type and the order of the IMF. For instance, the performance decreases for the ”Transmission errors” distortion as the order of the IMF increases. Also, some
Table 2. Performance evaluation using IMFs separately.
hal-00611416, version 1 - 1 Aug 2011
IMF1 IMF2 IMF3 IMF4
White Noise CC = 0.91 ROCC = 0.90 CC = 0.75 ROCC = 0.73 CC = 0.85 ROCC = 0.87 CC = 0.86 ROCC = 0.89
Gaussian Blur CC = 0.74 ROCC = 0.75 CC = 0.82 ROCC = 0.81 CC = 0.77 ROCC = 0.73 CC = 0.41 ROCC = 0.66
Transmission errors CC = 0.87 ROCC = 0.87 CC = 0.86 ROCC = 0.85 CC = 0.75 ROCC = 0.75 CC = 0.75 ROCC = 0.74
IMFs are more sensitive for one set, while for the other sets it is not. A weighting factor according to the sensitivity of the IMF seems to be a good way to improve the accuracy of the proposed method. The weights are chosen in a way to give more importance for the IMFs which give better correlation values. To do so, the weights have been tuned experimentally, since no emerging combination can be applied in our case. Let us take the ”Transmission errors” set for example, if w1 , w2 , w3 , w4 are the weights for the IMF1 , IMF2 , IMF3 , IMF4 respectively, then we should have w1 > w2 > w3 > w4 . We change the value of wi , i = 1, ..., 4 until reaching a better results. Some improvements have been obtained, but only for the Gaussian blur set as CC=0.88 and ROCC=0.87. This improvement around 5% is promising as the weighing procedure is very rough. One can expect further improvement by using a more refined combination of the IMF. Detailed experiments on the weighting factors remain for future work. SVM-based classification Traditionally, RRIQA methods use the logistic functionbased regression to obtain objective scores. In this approach one extracts features from images and trains a learning algorithm to classify the images based on the feature extracted. The effectiveness of this approach is linked to the choice of discriminative features and the choice of the multiclass classification strategy [21]. M.saad et al [22] proposed a NRIQA which trained a statistical model using the SVM classifier, in the test step objective scores are obtained. Distorted images : we use three sets of distorted images. Set 1 :white noise, set 2 :Gaussian blur, set 3 : fast fading. Each set contains 145 images. The determination of the training and the testing sets has been realized thanks to the cross validation (leave one out). Let us consider a specific set (e.g white noise). Since the DMOS values are in the interval [0,100], this later was divided into five equal intervals ]0,20], ]20,40], ]40,60], ]60,80], ]80,100] corresponding to the quality classes : Bad, Poor, Fair, Good Excellent, respectively. Thus the set of distorted images is divided into five subsets according to the DMOS associated to each image in the set. Then at each iteration we trained a multiclass SVM (five classes) using the leave one out cross validation. In other words each iteration involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data.The Radial Basis Function RBF kernel was utilized and a feature selection step was carried out to select its parameters that give a better classification accuracy. The entries of the SVM are formed by the distances computed in equation (7). For the ith distorted image, Xi = [d1 , d2 , d3 , d4 ] represents the vector of features (only four IMFs are used). Table 3 shows the classification accuracy per set of distortion. In the worst case (Gaussian blur) only one out of ten images is misclassified.
Table 3. Classification accuracy for each distortion type set.
hal-00611416, version 1 - 1 Aug 2011
Distortion type Classification accuracy White Noise 96.55% Gaussian Blur 89.55% Fast Fading 93.10%
In the case of logistic function-based regression, the top value of the correlation coefficient that we can obtain is equal to 1 as a full correlation between objective and subjective scores while for the classification case, the classification accuracy can be interpreted as the probability by which we are sure that the objective measure correlates well with the human judgment, thus a classification accuracy that equal to 100% is equivalent to a CC that equal to 1. This leads to a new alternative of the logistic function-based regression with no need to predicted DMOS. Thus, one can ask which one is more preferable? the logistic function-based regression or the SVM-based classification. From the first view, the SVM-based classification seems to be more powerful. Nevertheless this gain on performances is obtained at the price of an increasing complexity. On the one hand a complex training is required before one can use this strategy. On the other hand when this training step has been done the classification is straightforward.
6
Conclusion
A reduced reference method for image quality assessment is introduced, it’s a new one since it is based on the BEMD, also the classification framework is proposed as an alternative of the logistic function-based regression. This later produces objective scores in order to verify the correlation with subjective scores, while the classification approach provides an accuracy rates which explain how the proposed measure is consistent with the human judgement. Promising results are given demonstrating the effectiveness of the method especially for the white noise distortion. As a future work, we expect to increase the sensitiveness of the proposed method to other types of degradations to the level obtained for the white noise contamination. We plan to use an alternative model for the marginal distribution of BEMD coefficients. The Gaussian Scale Mixture seems to be a convenient solution for this purpose. We also plan to extend this work to other types of distortion using a new image database.
References 1. UIT-R Recommendation BT. 500-10, ”M´ethodologie d’´evaluation subjective de la qualit´e des images de t´el´evision,”tech. rep., UIT, Geneva, Switzerland, 2000. 2. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, ”Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 1624–1639 Apr. 2004. 3. Z. Wang, H. R. Sheikh and A. C. Bovik, ”No-reference perceptual quality assessment of JPEG compressed images,” IEEE International Conference on Image Processing, pp. 477–, 480Sept. 2002
hal-00611416, version 1 - 1 Aug 2011
4. I. P. Gunawan and M. Ghanbari, ”Reduced reference picture quality estimation by using local harmonic amplitude information,” in Proc. London Commun. Symp., Sep. 2003, pp. 137-140. 5. T. M. Kusuma and H.-J. Zepernick, ”A reduced-reference perceptual quality metric for inservice image quality assessment,” in Proc. Joint 1st Workshop Mobile Future and Symp. Trends Commun., Oct. 2003, pp. 71-74 6. M. Carnec, P. Le Callet, and D. Barba, ”An image quality assessment method based on perception of structural information,” in Proc. IEEE Int. Conf. Image Process., Sep. 2003, vol. 3, pp. 185–188. 7. M. Carnec, P. Le Callet, and D. Barba, ”Visual features for image quality assessment with reduced reference,” in Proc. IEEE Int. Conf. Image Process., Sep. 2005, vol. 1, pp. 421-424. 8. Z. Wang and E. Simoncelli. ”Reduced-reference image quality assessment using a waveletdomain natural image statistic model”. Proc. of SPIE Human Vision and Electronic Imaging, pp. 149–159, 2005. 9. J. Foley, ”Human luminence pattern mechanisms : Masking experiments require a new model,” J. of Opt. Soc of Amer. A 11(6), pp .1710–1719, 1994. 10. N.E. Huang, Z. Shen, S.R. Long, et al., ”The empirical mode decomposition and the hilbert spectrum for non-linear and non-stationary time series analysis”, Proc. Roy. Soc. Lond. A 454 (1998) 903-995 11. J. Nunes, Y. Bouaoune, E. Delechelle, O. Niang, and P. Bunel. ”Image analysis by bidimensional empirical mode decomposition”. Image and Vision Computing, 21(12):1019–1026, 2003. 12. J. Taghia, M. Doostari, J. Taghia ”An Image Watermarking Method Based on Bidimensional Empirical Mode Decomposition”. Congress on Image and Signal Processing (CISP08), pp. 674–678, 2008. 13. J. Andaloussi, M. Lamard, G. Cazuguel, H. Tairi, M. Meknassi, B. Cochener, and C. Roux. ”Content based Medical Image Retrieval: use of Generalized Gaussian Density to model BEMD IMF”. World Congress on Medical Physics and Biomedical Engineering, Munich : Germany (2009), Volume 25/4, pp. 1249–1252, 2009. 14. J. Wan, L. Ren, and C. Zhao. ”Image Feature Extraction Based on the Two-Dimensional Empirical Mode Decomposition”. Congress on Image and Signal Processing, CISP ’08. , 1, pages 627–631. 15. A. Linderhed, ”Variable sampling of the empirical mode decomposition of twodimensional signals”, Int. J. Wavelets Multresolution Inform. Process. 3 (2005) 435-452. 16. C. Damerval, S. Meignen and V. Perrier, ”A fast algorithm for bidimensional EMD”, IEEE Sig. Process. Lett.12 (2005) 701-704. 17. S. Bhuiyan, R. Adhami, and J. Khan.” A novel approach of fast and adaptive bidimensional empirical mode decomposition”. IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008), pages 1313–1316, 2008. 18. G. Van de Wouwer, P. Scheunders, and D. Van Dyck. ”Statistical texture characterization from discrete wavelet representations”. IEEE transactions on image processing, 8(4):592– 598, 1999. 19. H. Sheikh, Z. Wang, L. Cormack, and A. Bovik. LIVE image quality assessment database. 2005-10). http://live. ece. utexas. edu/research/quality. 20. A. Rohaly, J. Libert, P. Corriveau, A. Webster, et al. Final report from the video quality experts group on the validation of objective models of video quality assessment. ITU-T Standards Contribution COM, pages 9–80. 21. C.Demirkesen, H.Cherifi. ” A comparison of multiclass SVM methods for real world natural Scenes”, In proceedings of Advanced Concepts for Intelligent vision Systems, LCNS 5259 pp. 752–763, 2008. 22. M. Saad, A. C. Bovik, and C. Charrier, ”A DCT statistics-based blind image quality index,” IEEE Signal Processing Letters, pp. 583–586, 2010.