SALIENCY-PRESERVING VIDEO COMPRESSION ...

Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.

SALIENCY-PRESERVING VIDEO COMPRESSION Hadi Hadizadeh and Ivan V. Baji´c School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada ABSTRACT In region-of-interest (ROI) video coding, the part of the frame designated as ROI is encoded with higher quality relative to the rest of the frame. At low bit rates, coding artifacts in nonROI parts of the frame may become salient and draw user’s attention away from ROI, thereby degrading visual quality. In this paper we propose a saliency-preserving framework for ROI video coding. This approach aims at reducing attentiongrabbing visual artifacts in non-ROI parts of the frame in order to keep user’s attention on ROI. Experimental results indicate that the proposed method is able to improve the visual quality of ROI video at low bit rates. Index Terms— ROI video coding, visual attention model, visual coding artifacts, saliency 1. INTRODUCTION Video compression standards such as MPEG-4 and H.26x have been developed to achieve high compression efficiency simultaneously with high perceived visual quality [1], [2]. However, lossy compression techniques may produce various coding artifacts such as blockiness, ringing, blur, etc., especially at low bit rates [3]. Several methods have been proposed to detect and reduce coding artifacts [4], [5], [6]. Recently, region-of-interest (ROI) coding of video using computational models of visual attention has been recognized as a promising approach to achieve high-performance video compression [7],[8]. The idea behind most of these methods is to encode a small area around the predicted attentiongrabbing (salient) regions with higher quality compared to other less visually important regions. Such a spatial prioritization is supported by the fact that only a small region of 2 − 5◦ of visual angle around the center of gaze is perceived with high spatial resolution due to the highly non-uniform distribution of photoreceptors on the human retina [7]. Granting a higher priority to the salient regions, however, may produce severe coding artifacts in areas outside the salient regions where the image quality is lower. Such artifacts may draw viewer’s attention away from the naturally salient regions, thereby degrading the perceived visual quality. To mitigate this problem, in this paper, we introduce the concept of saliency-preserving video compression as a new paradigm for video compression which attempts to suppress

such attention-grabbing artifacts, and keep user’s attention on the same regions that were salient before compression. At the same time, some artifacts may be tolerated, so long as they do not draw attention. Using this concept, we propose a novel algorithm for saliency-preserving video compression within a ROI coding framework. In the proposed algorithm, the visibility of potential coding artifacts is predicted by computing the difference between the saliency map of the original raw video frames and the saliency map of the encoded frames. The quantization parameters (QPs) of individual macroblocks (MBs) are then adjusted according to the obtained saliency errors, so that the total saliency error is reduced while the bit rate constraint is satisfied. To achieve this goal, the problem is formulated as a Multiple-Choice Knapsack problem (MCKP) [9]. Experimental results indicate that the proposed method is able to improve the visual quality of encoded video compared to the conventional rate-distortion optimized (RDO) video, as well as the conventional ROI-coded video. Note that a visible artifact is not necessarily salient. A particular artifact may be visible if the user is looking directly at it or at its neighborhood, but may go unnoticed if it is nonsalient and the user is looking elsewhere in the frame. As the severity of the artifact increases, it may become salient and draw user’s attention to it. Although several methods have been developed for detecting visible (but not necessarily salient) artifacts [6], in our work, the concept of visual saliency is used to minimize salient coding artifacts, i.e., those coding artifacts that may grab user’s attention. The paper is organized as follows. In Section 2, a recent ROI-coding algorithm is described, followed by a brief summary of the MCKP problem. The proposed method is presented in Section 3. Experimental results are given in Section 4, and the conclusions are drawn in Section 5. 2. PRELIMINARIES 2.1. ROI Video Coding In [10], a ROI bit allocation scheme was proposed for H.264/AVC. In this scheme, after detecting the ROI, several coding parameters including QP, macroblock (MB) coding modes, the number of reference frames, accuracy of motion vectors, and the search range for motion estimation, are adap-


tively adjusted at the MB level according to the relative importance of each MB and a given target bit rate. Subsequently, the encoder allocates more resources, such as bits and computational power, to the ROI. In [10], the optimized QP value for each MB is obtained as

K X X

Maximize

Subject to q X1 [i] Qp [i] = M ADpred,adapt [i] T [i] − (N − i)X2 [i] s N X w[k] M ADpred,adapt [k], ∗ w[i]

pkm xkm

k=1 m∈Ck K X X

wkm xkm ≤ c,

(2)

k=1 m∈Ck

X

xkm = 1, k = 1, 2, ..., K,

m∈Ck

(1)

xkm ∈ {0, 1}, k = 1, 2, ..., K, m ∈ Ck .

k=i

where T [i] is the remaining bits before encoding the i-th MB, N is the total number of MBs in a frame, X1 [i] and X2 [i] are the first-order and zero-order parameters of the RQ model [10],[11], M ADpred,adapt [i] is the adaptive mean absolute difference (MAD) prediction value, and w[i] is the importance level associated with the i-th MB. In order to combine this ROI coding approach with the concept of visual attention, we encode the input video based on the saliency maps produced by the Itti-Koch-Niebur (IKN) saliency model from [12]. The saliency map of each frame is first remapped to the range [−0.5, 0.5]. Then, the saliency value of the i-th MB, computed as the average saliency value of its pixels, is used as the importance level w[i] of that MB. Given the target bit rate, the QP value of each MB can be obtained using (1). As in [10], the obtained QP value is further bounded to within ±4 of the QP value of the previously encoded MB in order to maintain visual smoothness and suppress blocking artifacts. Other ROI bit allocation schemes (e.g., [13]) can also be employed to find the initial set of QP values. The proposed method starts with a set of QP values obtained by ROI bit allocation, and then modifies them in a way that minimizes saliency error.

2.2. Multiple Choice Knapsack Problem The Multiple Choice Knapsack Problem (MCKP) is a generalization of the ordinary knapsack problem, where the set of items is partitioned into classes. The binary choice of taking an item is replaced by the selection of exactly one item out of each class [9]. Consider K mutually disjoint classes C1 , C2 , ..., CK of items to be packed into a knapsack of capacity c. Each item m ∈ Ck is associated with a profit pkm and a weight wkm . The goal is to choose exactly one item from each class such that the profit sum is maximized while the total weight is kept below the capacity c. Let xkm be the binary indicator of whether item m is chosen in class Ck . The MCKP is formulated as follows [9]:

MCKP is an NP-hard problem [9]. However, an exact solution can be obtained in a reasonable time for the problem sizes encountered in the proposed method using the algorithm proposed in [14]. 3. THE PROPOSED METHOD We now present the proposed algorithm for saliencypreserving video coding. In the sequel, capital bold letters (e.g., X) denote matrices, lowercase bold letters (e.g., x) denote vectors, and italics letters (e.g., x) represent scalars. Consider an uncompressed video sequence consisting of N frames {F1 , F2 , ..., FN }, where each frame is W MBs wide and H MBs high. Let BT be the target number of bits we wish to spend on encoding these frames. For each frame Fi , a visual saliency map Si of the same size as Fi is computed by a chosen visual attention model in which the saliency of each 16×16 block (obtained by the average saliency of pixels within the block) determines the visual importance of the corresponding MB in Fi . The proposed method consists of the following steps. Step 1) The current frame Fi is first encoded by a ROI encoder (e.g., the one described in Section 2.1) using its original saliency map Si . The encoded frame is then decoded, and ˜ i , is computed. Let the saliency map of the decoded frame, S Qi be the QP matrix of Fi obtained by the ROI encoder, and Bi be the matrix containing the actual number of bits spent on each MB of the encoded Fi . Both Qi and Bi are of size W × H (in MBs). Qi (x, y) is the QP value of the MB at position (x, y), and Bi (x, y) is the number of bits of the MB at position (x, y). The total number of bits Bi spent on enPH PW coding frame Fi is Bi = y=1 x=1 Bi (x, y). Note that PN BT = i=1 Bi . Due to quantization, Si is, in general, dif˜ i . Let Ei = Si − S ˜ i be the saliency error matrix. ferent from S We store Ei and Bi for subsequent optimization. Our goal is to modify some elements of Qi such that if Fi is re-encoded with the modified Qi , the L1 -norm of its saliency error matrix Ei is decreased. The QP values can change by the offsets from the set O = {o1 , o2 , ..., oM } of size M , whose elements are either positive or negative integer values. We always set o1 = 0, so that one of the options corresponds to not changing the QP value.


Due to spatial intra-prediction, the bit rates of neighboring MBs are dependent. Moreover, since the saliency is computed over a neighborhood, the saliency value of an MB is affected by the QPs of its neighbors. Modeling such a dependence is not an easy task. To overcome this difficulty, we use the following approach. Let P be the set of all binary matrices (i.e., whose elements are either 0 or 1) of size W × H that have the following property: there are exactly two zeros in between every two non-zero elements in both horizontal and vertical directions. In total, there are 9 such matrices, i.e., P = {P1 , P2 , ..., P9 }. Let Kn be the number of 1’s in Pn . Each element of Pn is identified with a MB in a frame. At any one time, we will only change the QPs of MBs identified with 1’s. Since any two non-zero elements of each Pn are at least two positions apart, the dependence between MBs corresponding to those non-zero elements is reduced, so we can change their QPs without affecting other MBs selected by the same Pn significantly. As an illustration, the binary masks corresponding to Pn , n = 1, 2, ..., 9, are shown in Fig. 1 for a QCIF resolution frame, which contains 9 × 11 MBs. Step 2) The current frame Fi is then re-encoded in the following manner: first a binary matrix Pn ∈ P and a QP is comoffset om ∈ O are chosen. A new QP matrix Qmn i puted as Qmn = Q + o P where the superscript in Qmn i m n i i indicates that the new QP matrix has been obtained by offset are passed om and binary mask Pn . All elements of Qmn i through a hard-limiter to ensure that all QP values will be in the range [0, 51] as required by H.264/AVC. Fi is then re˜ mn encoded by Qmn i , and its saliency map Si , saliency error mn mn matrix Ei , and the bit matrix Bi are stored. In this step, the rate control is disabled to prevent further modification of QP values by the encoder. Note that since o1 = 0, Q1n i = Qi . 1n = B , where E and Bi are = E and B Therefore, E1n i i i i i computed in Step 1. Therefore, this procedure does not need to be performed with offset o1 , but is applied to all other offsets oj , j ≥ 2 using the selected Pn . At the end of this step, we obtain M different QP values, saliency errors, and actual bits for each MB in Fi for which the corresponding element in Pn is non-zero. Let G be the set of locations (x, y) of such MBs in Fi . Since there are Kn non-zero elements of Pn , G also contains Kn elements. We now want to find the best QP offset for each MB in G among the obtained M options, such that if the chosen QP offsets are applied to Qi , the L1 -norm of the saliency error is minimized, while the resultant number of bits of the encoded frame remains at or below Bi . To achieve this goal, we model this problem as a Multiple-Choice-Knapsack problem (MCKP) [9]. Here, each class is one MB in G (so we have a total of Kn classes), and each item in a class is a QP offset (hence, M items in each class). We then consider a 2D window of size 3 × 3 (in MBs) around the k-th MB in G (k = 1, 2, ..., Kn ), and we compute the total saliency error tot etot ikmn , and total number of bits bikmn within this window as

Fig. 1. An illustration of the nine binary masks (three in each row) corresponding to Pn , n = 1, 2, ..., 9 for QCIF resolution. Black squares indicate the positions of 1’s in Pn , while white squares indicate the positions of 0’s. For QCIF resolution, the number of black squares is Kn = 12 in six cases, and Kn = 9 in three cases. follows: etot ikmn =

X

Emn i (x, y),

(x,y)∈N (k)

btot ikmn

=

X

Bmn i (x, y),

(3)

(x,y)∈N (k)

where N (k) denotes the neighborhood around the k-th MB in G, and (x, y) denotes the MB position within Fi . Note that, as mentioned earlier, when the QP of an MB is changed, not only the saliency error and bits of that MB are changed, but also the saliency error and bits of its neighbors may change. For this reason, we compute the total saliency error and the actual number of bits of all MBs within a window around kth MB, and consider them as a generalized saliency error and total bits of the k-th MB. The idea here is to cover the whole frame by nonoverlapping windows around all MBs in G. Since there are at least two MBs in between each pair of MBs in G, there should not be any gap between any pair of 3 × 3 windows surrounding the MBs in G. For some MBs in G, parts of the window may fall outside the frame. For such cases, we consider only the parts that are inside the frame. In other cases, the 3 × 3 window around the first or the last MB in a row or column might not touch the frame boundary (e.g., the black MB in the bottom-right corner of the first mask in Fig. 1). In such cases, the window is expanded up to the frame boundary. Covering the entire frame by such windows allows PK us to use Bi as the capacity of the knapsack. Note that k=1 btot ikmn |m=1 = Bi because for m = 1, o1 = 0. tot tot Having computed etot ikmn and bikmn , the negative eikmn is

Presented at IEEE ICME 2011, Barcelona, Spain, July 2011. tot considered as the profit (pkm = −etot ikmn in (2)) , and bikmn is regarded as the weight (wkm = btot in (2)) of the m-th ikmn offset in the k-th MB within the i-th frame when using Pn . Bi is set as the capacity of the knapsack (c = Bi in (2)), and the MCKP problem is solved. MCKP chooses exactly one item (in our case one offset) per class (in our case, per MB) such that the total profit is maximized (in our case, saliency error minimized) while the total weight (in our case, bits) remains at or below Bi . The obtained QP offsets are then applied to the original QP matrix Qi , and the current frame is re-encoded with the updated QP matrix Q∗n i . Finally, the new saliency error matrix of the encoded frame is computed, and its L1 -norm Ln1 as well as the obtained QP matrix Q∗n i are stored. This procedure is repeated for each matrix Pn , n = 1, 2, ..., 9. Step 3) At the end of Step 2, we obtain nine saliency error L1 norms L11 , L21 , ..., L91 and the corresponding QP matrices. The QP matrix whose saliency error L1 norm is the smallest is chosen as the final QP matrix Q∗i for frame Fi . Finally, Fi is encoded using Q∗i , and the encoder moves on to the next frame. Note that in the proposed algorithm, whenever the QP of one MB is changed, the rate-distortion optimized (RDO) mode decision [2] is employed to obtain the optimal prediction mode and MB type. Therefore, the total number of bits of each MB is computed after the RDO mode decision. Any potential underflow (overflow) in the number of bits is added to (subtracted from) the knapsack capacity of the subsequent frame, thereby preserving the total rate assigned during initial ROI bit allocation. Algorithm 1 summarizes the proposed method. In our current implementation, each video frame is encoded K = (M − 1) × 9 + 1 times, the saliency map of the frame is computed K times, and MCKP is employed nine times. Hence, in its current implementation, the proposed method is only suitable for offline applications. However, multiple frame encodings and saliency computations can be avoided by using a suitable model for the relationship between QP values and saliency values. Our current work is focused on the development of such a model. As for solving the MKCP, in our simulations the algorithm from [14] takes an average of about 200 msec (on an Intel Core 2 Duo processor at 3.33 GHz with 8 GB RAM) per frame.

4. RESULTS AND DISCUSSION To evaluate the proposed method, we used three standard CIF sequences (Soccer, Crew, and Bus). All sequences were 100 frames long, at 30 frames per second (fps), and were encoded using JM 9.8 [15]. Soccer and Bus were encoded at 50 kbps, and Crew was encoded at 100 kbps. We used these relatively low bit rates to make the coding artifacts more visible for display purposes. The GOP structure was set to IPPPP. The IKN model [12] was utilized to generate saliency maps. In all experiments reported here, only three QP offsets

Input: Raw Frame Fi Output: Encoded Frame Fi Encode Fi using the ROI encoder Compute Ei , Bi , and Qi Lmax = MAXINT foreach Pn ∈ P do 1n E1n i = Ei and Bi = Bi foreach om ∈ O \ o1 do Encode Fi using Qmn = Qi + om Pn i Compute Emn and Bmn i i , and store them end Run MCKP using Emn and Bmn i i , m = 1, 2, ..., M ∗n Encode Fi using Qi obtained by MCKP Compute the new Ei if L1 (Ei ) ≤ Lmax then Q∗i = Q∗n i Lmax = L1 (Ei ) end end Encode Fi using Q∗i . Algorithm 1: The proposed algorithm for saliency preserving video coding O = {0, −1, 1} were employed. We compare the proposed saliency-preserving ROI (SPROI) coding to the conventional ROI coding using three metrics. The first metric ∆L1 (E) is computed as ∆L1 (E) = (ESP −ROI − EROI )/EROI ,

(4)

where ESP −ROI = EROI

N 1 X −ROI L1 (ESP ), i N i=1

N 1 X = L1 (EROI ), i N i=1

(5)

−ROI and EROI are N is the total number of frames, and ESP i i the saliency error maps of the i-th frame encoded by the SPROI and ROI coding methods, respectively. This metric indicates how much the total saliency error of the video encoded by the SP-ROI method is different from the total saliency error of the video encoded by the ROI coding method. To measure the propensity of coding artifacts outside of ROI to draw user’s attention, we first binarize the saliency map of the original raw frames using a specific threshold in order to obtain an estimate of the location of ROI. This threshold is set to the 75-th percentile of the saliency map. All MBs whose saliency is larger than this threshold are considered as a part of ROI. We then define two new metrics ∆L1 (E∗ ) and ∆J, where ∆L1 (E∗ ) is computed as ∆L1 (E) in (4), except that saliency errors in (5) are taken only for the MBs outside of ROI. Meanwhile, ∆J is computed as ∆J =


Table 1. The performance of SP-ROI relative to ROI video coding. Sequence ∆L1 (E) ∆L1 (E∗ ) ∆J ROI-PSNR Soccer −6.27% −11.00% −2.65% −0.17 dB Crew −1.01% −2.11% −5.23% −0.18 dB Bus −11.57% −5.96% −2.83% −0.09 dB

Table 2. Average PSNR-Y of SP-ROI and ROI coding relative to RDO coding. Method Soccer Crew Bus ROI −0.08 dB −0.22 dB −0.10 dB SP-ROI −0.09 dB −0.10 dB −0.08 dB

SP −ROI

ROI

ROI

SP −ROI

ROI

(J −J )/J , where J and J are the fractions of pixels outside ROI that have absolute quantization error greater than the just-noticeable-difference (JND) threshold [16] of their corresponding pixels in frames encoded by the SP-ROI and ROI-coding methods, respectively. To compute the JND thresholds, we employed the spatial JND model proposed in [17] in the luminance (Y) channel. Note that the JND threshold determines the visibility threshold of a quantization error. Therefore, a quantization error is visible if its magnitude is greater that the JND threshold. Table 1 compares the SP-ROI method with the ROI coding method using the three aforementioned metrics, as well as the average Peak Signal-to-Noise Ratio (PSNR) of the Y component within ROI. The values of ∆L1 (E) indicate that the saliency error of SP-ROI over the entire frame is lower than that of ROI coding, which was the design goal. ∆L1 (E∗ ) shows that outside of ROI, the saliency error with SP-ROI coding is lower, indicating that it is less likely for non-ROI regions to become salient after encoding. Finally, the values of ∆J show that the percentage of pixels outside of ROI whose quantization error is above JND is lower with SP-ROI coding than with conventional ROI coding. Overall, the proposed SP-ROI method reduces both the saliency and visibility of the coding artifacts compared to conventional ROI coding. This comes with the cost of slightly reduced PSNR within ROI, as indicated in the last column of the table. In Table 2, the average PSNR performance of the SP-ROI and ROI coding methods are compared against RDO coding. These PSNR values were obtained by averaging the PSNR over all frames of the corresponding sequence. As seen from this table, the average PSNR value of both SP-ROI and ROI coding is lower than that of RDO coding, as expected. However, as illustrated in the next example, both SP-ROI and ROI coding provide better visual quality than RDO. All of the above results were obtained after matching the bit rate of the ROI-coded and RDO-coded videos with the actual bit rate of the video encoded by the SP-ROI method within ±0.1% difference. Tables 3 and 4 show, respectively, the average

Table 3. Average SSIM index. Method Soccer Crew Bus RDO 0.6200 0.7871 0.4864 ROI 0.6231 0.7897 0.4846 SP-ROI 0.6378 0.7926 0.5100 Table 4. Average VQM values. Method Soccer Crew Bus RDO 3.45169 2.55774 7.60255 ROI 3.47870 2.51918 7.60608 SP-ROI 3.79243 2.57792 7.71076

structural similarity (SSIM) index [18] and the average Video Quality Metric (VQM) value [19] computed over all frames of each sequence. As seen from these results, the proposed method provides higher visual quality, as measuerd by both of these metrics, compared to conventional RDO and ROI methods. Fig. 2 compares the visual quality of the three methods on a sample frame of Soccer. As seen from these figures, the proposed SP-ROI coding provides an improved visual quality of the encoded frames by reducing the visibility of the coding artifacts. In particular, note that the coding artifacts around the ball and the player’s feet have been reduced, compared to both RDO and ROI-coded frame. At the same time, the visual quality of conventional ROI-coded frame is slightly better than the RDO-coded frame. 5. CONCLUSION In this paper, we introduced the concept of saliencypreserving video coding, and proposed a novel ROI coding method that attempts to preserve the saliency of the original video frames. Experimental results were presented using the saliency model from [12], although the proposed method is generic and can utilize any other visual attention model. The results indicate that the proposed method is able to improve the visual quality of encoded video compared to conventional ROI and RDO video coding at low bit rates.

References [1] M. Ghanbari, Video Coding: An Introduction to Standard Codecs, London, U.K. : Institution of Electrical Engineers, 1999. [2] I. E. G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia, NJ:Wiley, 2003. [3] M. Yuen and H. R. Wu, “A survey of hybrid MC/DPCM/DCT video coding distortions,” Signal Process., vol. 70, no. 3, pp. 247–278, 1998.

Presented at IEEE ICME 2011, Barcelona, Spain, July 2011. [4] H. Liu, N. Klomp, and I. Heynderickx, “A perceptually relevant approach to ringing region detection,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1304–1318, Jun. 2010. [5] M. Shen and C. J. Kuo, “Review of postprocessing techniques for compression artifact removal,” J. Vis. Commun. Image Rep., vol. 9, no. 1, pp. 2–14, 1998. [6] S. Daly, “The visible difference predictor: an algorithm for the assessment of image fidelity,” in Digital Images and Human Vision, A. B. Watson, Ed. 1993, pp. 179–206, MIT Press. [7] L. Itti, “Automatic foveation for video compression using a neurobiological model of visual attention,” IEEE Trans. Image Process., vol. 13, no. 10, pp. 1304–1318, 2004. [8] Z. Chen, N. K. Ngan, and W. Lin, “Perceptual video coding: Challenges and approaches,” in Proc. IEEE International Conference on Multimedia and Expo (ICME’10), Jul. 2010, pp. 784–789. [9] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems, Springer, 2004. [10] Y. Liu, Z. G. Li, and Y. C. Soh, “Region-of-interest based resource allocation for conversational video communication of H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 1, pp. 134–139, Jan. 2008. [11] Y. Liu, Z. G. Li, and Y. C. Soh, “A novel rate control scheme for low delay video communication of H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 1, pp. 67–78, Jan. 2007. [12] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 1254–1259, Nov. 1998. [13] J.-C. Chiang, C.-S. Hsieh, G. Chang, F.-D. Jou, and W.-N. Lie, “Region-of-interest based rate control scheme with flexible quality on demand,” in Proc. IEEE International Conference on Multimedia and Expo (ICME’10), Jul. 2010, pp. 238– 242. [14] D. Pisinger, “A minimal algorithm for the multiple-choice knapsack problem,” European Journal of Operational Research, vol. 83, pp. 394–410, 1994. [15] “The H.264/AVC JM reference software,” [Online] Available: http://iphome.hhi.de/suehring/tml/. [16] C.-H. Chou and Y.-C. Li, “A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile,” IEEE Trans. Image Process., vol. 5, no. 6, pp. 467– 476, Dec. 1995. [17] X. Yang, W. Lin, Z. Lu, E. Ong, and S. Yao, “Motioncompensated residue preprocessing in video coding based on just-noticeable-distortion profile,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 6, pp. 745–752, Jun. 2005. [18] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. [19] M. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcasting., vol. 50, no. 3, pp. 312–322, Sep. 2004.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 2. An example of the visual quality of different methods. (a) original frame, (b) saliency map of the original frame, (c) RDO-coded frame, (d) ROI-coded frame, (e) SP-ROI-coded frame, (f) saliency error map of the RDO-coded frame, (g) saliency error map of the ROI-coded frame, (h) saliency error map of the SP-ROI-coded frame.