Image Quality Assessment using an Image Activity

Image Quality Assessment using an Image Activity Weighting and the HVS Response David B. Lowe, Athula Ginige School of Electrical Engineering, University of Technology, Sydney Ph: +61 2 330 2526, Fax: +61 2 330 2435 E-mail: fdbl, [email protected]

Abstract An important facet of image coding and analysis is the assessment of image quality. However image quality is very subjective (due predominantly to the very complex interactions within the human visual system). Thus the primary techniques for measuring the quality are either subjective or tend to give poor performance. The subjective techniques (such as CCIR Recommendation 500) are very time-consuming and non-deterministic. The alternatives (such as the mean square error) typically give a poor correlation with the subjective quality. A quantitative deterministic quality assessment method, with a high correlation to the subjective quality, would allow the development of a more systematic approach to image coding. A quantitative scheme is developed based on initially weighting errors in the image according to the human visual system spatial frequency response and then adjusting this according to an image activity measure. The method gives a signi cantly higher correlation with the subjectively assessed quality than traditional methods. The algorithm is outlined and typical results comparing the performance to existing methods are presented.

1. Introduction An important facet of image coding is the assessment of image quality. Since image quality is very subjective (due predominantly to the very complex interactions within the Human Visual System (HVS)) the primary techniques for measuring image quality are subjective assessment methods (such as CCIR Rec. 500 [1]). These techniques however are very time consuming and non-deterministic. A quantitative assessment method would have signi cant bene ts. It would allow computer simulation of objective scales of images, and subsequent production of the optimum quality images. Several such techniques exist in the form of Mean Square Error (MSE) and Signal to Noise Ratio (SNR) calculations. These however do not make use of the characteristics of the HVS and its perception of dierent error types, and subsequently can give results largely at variance with the subjectively perceived image quality. Errors around sharp edges are signi cantly more noticeable than uniform errors within a uniform region. Similarly errors are much more noticeable in images that have less activity. A reliable and consistent quantitative quality assessment scheme therefore needs to take into account the HVS in weighting the various error types. Sakrison [2] discusses the importance of a distortion measure in image transmission, as well as the role of the observer. The HVS is discussed with respect to image perturbations. A model of the relevant aspects of the HVS is developed and the method by which this might be incorporated into

Five-Point Scale Quality Impairment 1 Excellent 5 Imperceptible 2 Good 4 Perceptible, not annoying 3 Fair 3 Slightly annoying 4 Poor 2 Annoying 5 Bad 1 Very annoying

Comparison Scale +3 Much Better +2 Better +1 Slightly Better 0 Same -1 Slightly Worse -2 Worse -3 Much Worse

Table 1: CCIR Recommendation 500 - Subjective assessment of images: Lists the ratings used by observers when assessing the quality and impairment of images.

a distortion measure is discussed. Lukas and Budrikas [3] also develop a picture quality prediction scheme based on the HVS. Miyahara [4] developed the PQS (Picture Quality Scale) which was based on extracting various error types from an image and combining these using a weighted summation to obtain an overall quality rating. This quality rating considered several important factors; the major one being that errors in the vicinity of contours are signi cantly more noticeable than elsewhere. An alternative, and much simpler approach is to weight errors according to the spatial frequency response of the HVS. A technique based on this scheme is outlined and the results analysed by considering the correlation with subjective assessment of various images.

2. Qualitative Assessment The HVS is an exceedingly complex system of which we only have a peripheral understanding. It is therefore to be expected that eorts at duplicating the subjective evaluation of images using a deterministic algorithm have met with limited success. To date the primary technique for evaluating the quality of images has been based on subjective assessment. A typical example of this is CCIR Recommendation 500.3. This details techniques for using observers to assess a series of images, The mages are presented to each observer under a set of speci ed conditions, using one of several scales (refer to table 1). The primary diculty that arises from the use of subjective schemes such as this fall into two categories. Firstly they are inherently inconsistent due to problems with ensuring complete uniformity of observers and viewing conditions. Secondly they are non-deterministic. Thus the quality cannot be calculated. The ability to determine the quality numerically is of signi cant importance in image processing, as it would allow image coding or image analysis schemes to optimise their performance based on the perceived image quality

3. Quantitative Assessment The primary aim of this research was to develop a scheme which allowed the quantitative deterministic assessment of image quality. This assessment should agree as closely as possible with that obtained using subjective analysis. It should be noted that exact correlation will never be achieved due to natural variations in the subjective assessment. Of the techniques that have been previously developed, the simplest and most widely used are those based on MSE and SNR calculations. The primary limitations of these error

1:4 1:2 1:0 0:8 Response 0:6 0:4 0:2 0:0

0

5

10

15

20

25

Spatial Freq (cyc/deg)

30

Figure 1: Human Visual System spatial frequency response: The HVS response peaks for inter-

mediate frequencies (typically between 3 and 10 cycles/degree), exhibiting a bandpass response.

criteria arise when they are used as a global measure of image delity rather than a local error measure. As mentioned previously these techniques do not make allowance for the HVS responding dierently to dierent types of errors. The principal justi cation for using these techniques is their simplicity.

3.1 The Human Visual System The Human Visual System (HVS) exhibits a measurable and reasonably consistent spatial frequency response. This response can be approximated by the following equation: "

b Hf (f ) = A(a + ff ) exp ? ff o o

#

(1 )

q

(2 ) f = 12 + 22 where the coecients A, a, fo , and b are constants which are dierent for each individual. Typical values are A = 2:6, a = 0:0192, fo = 8:772, and b = 1:1. The spatial frequencies 1 and 2 are usually measured in cycles/degree and the response Hf is usually normalised to decibels. This characteristic gives a peak response at approximately 8.772 cycles/degree. Figure 1 illustrates this relationship graphically. From this curve we can see that the HVS response diminishes for both low frequencies and high frequencies. It should be noted that this approximation is by no means ideal; for example it does not take into account that the HVS has a greater response to vertical and horizontal lines than diagonal. It is however a good basis for further investigation. One further aspect of the HVS is worth commenting on. Subjective tests have shown that the eye tends to concentrate upon those areas of a scene or image where there is a high concentration of contours. The general trend appears to be that the HVS will quickly scan an image and then concentrate on particular features [5].

3.2 The HVS Weighted Error The HVS can be used to weight the errors within an image. The spatial frequencies for the image can be determined by calculating the fourier transform of the error image. This is

then used in conjunction with the HVS spatial frequency response to calculate a weighted error. The error image is given by:

e(x; y) = jI (x; y) ? I (x; y)j E (1; 2) = F [e(x; y)] 0

(3) (4)

where I (x; y) is the original image and I (x; y) is the processed image. E (1; 2) is the Fourier transform of the error image. The HVS weighted error can then be given by: P E ( ; )H ( ; ) HV SWE = 1;P2 1H (2 ; )1 2 (5 ) 1 2 1 ;2 0

4. Comparison of Assessment To illustrate the inadequacy of existing quantitative schemes and the performance of the HVS weighted scheme a number of images were assessed using a subjective assessment, a mean square error (MSE) and the HVSWE. The correlation between the various methods was considered.

4.1 Subjective Assessment The subjective quality was determined in accordance with CCIR recommendation 500 [1]. A series of images of varying complexity, impairment and error type were generated. These images are detailed in Table 2. The images were presented to 25 non-expert observers1. These observers rated the image quality on a scale of 1 to 5 (1 corresponds to \very annoying" impairment, while 5 corresponds to \imperceptible" impairment). Several expert observers were also used and it was found that in general the results were similar to those obtained from the non-expert observers, except that the expert observers results were consistently slightly more critical, as would be expected. The overall correlation remained the same however. This subjective assessment is referred to as the Mean Opinion Score (MOS). The images used are detailed in table 2.

4.2 Comparison with MSE and HVSWE The MOS for each image obtained from the subjective tests was compared to the error obtained utilising both the HVS spatial frequency response weighted error (HVSWE) and the Mean Square Error (MSE). A correlation between the qualitative MOS and the quantitative HVSWE and MSE measurements allows the determination of the areas where these techniques are unsatisfactory. This will subsequently allow these techniques to be modi ed in such a way as to improve their correlation with subjective analyses. Figures 2 and 3 show the correlation between the MOS and the HVSWE and MSE measurements for the 15 images. Considering the results that were obtained allowed two signi cant conclusions to be drawn:

The MSE is completely inadequate as a measure of the error within images and

should be totally abandoned when considering an error measurement for use in image quality assessment. The HVSWE technique provides a good correlation with the subjective assessment for single images, but problems arise when trying to correlate various images. It was

1

For the de nition of non-expert and other associated terms refer the Rec. 500

No. Source Description 1 Head Inverse gradient ltered (60 iterations, scaling 0.05) 2 Head Estimation ( rst-order, 8 levels of quad-tree, error threshold = 5) 3 Head Estimation ( rst-order, 9 levels of quad-tree, error threshold = 4) 4 Head Estimation ( rst-order, 10 levels of quad-tree, error threshold = 4) 5 Head Local eight-connected averaging 6 Flag Estimation ( rst-order, 7 levels of quad-tree, error threshold = 10) 7 Flag Unmodi ed (control) 8 Flag Estimation ( rst-order, 9 levels of quad-tree, error threshold = 3) 9 Flag Subsampled (4:1) and zero-order interpolated to original size 10 Flag DCT coded and quantised to 0.5 bpp 11 Orange Inverse gradient ltered (15 iterations, scaling factor 0.05) 12 Orange Estimation ( rst-order, 9 levels of quad-tree, error threshold = 4) 13 CntrlTwr Estimation ( rst-order, 8 levels of quad-tree, error threshold = 5) 14 CntrlTwr DCT coded and quantised to 1.0 bpp 15 CntrlTwr Subsampled (4:1) and spline interpolated back to original size Table 2: Image quality assessment test images: This lists the images (and the form of the degradation) which were used for image quality assessment. Note: the estimation technique referred to is part of a coding scheme developed by the authors, based around various order ood lls that are limited by an edge map of the image [6, 7].

found that if a single image is degraded by varying amounts then the correlation between the MOS and the HVSWE is reasonably high. However if several dierent images are compared then for equal MOS's the HVSWE varies considerably. It is noticeable that this problem is most pronounced when the images have signi cantly diering complexity. A proposed explanation of the second point relies on the fact the HVS is less capable of discerning errors in an image with a high level of detail (this is partly due to the fact that high detail tends to draw attention away from other areas of the image, including errors). Thus for two images, one with very low detail (i.e. a signi cant low-frequency component) and the other with a high level of detail (i.e. a signi cant high-frequency component) the image with low detail will be likely to give a worse subjective MOS. This can only be partly counteracted by including the HVS spatial frequency response into the error calculation, as the HVS's response to errors does not follow the spatial frequency response exactly (this is actually a psychological eect rather than a physiological eect).

5. Image Activity Weighted Error Based upon the above observations, a quality assessment scheme was developed by modifying the HVSWE based on the image complexity. Thus an image complexity measure needed to be determined. Additionally it needed to be speci ed whether this scaling of the HVSWE needed to be performed on a global basis or a local basis. There are arguments for both cases. In areas of low activity errors will de nitely be more noticeable, however it has been found that it is the high activity areas within images which tend to attract the eyes focus, drawing attention away from the low activity areas and hence possible errors

180 160 140 120 100 MSE 80 60 40 20 0

3 3 3 3 0

1

2

3 3 3 33 3 3 33 3 3 3

Mean Opinion Score

4

5

Figure 2: Correlation between MOS and MSE for image quality assessment. The MSE gives a

poor assessment of the image quality as perceived by a human observer. Note: For the MOS, 1 = very annoying impairment, 5 = imperceptible impairment

9 8 7 6 5 HVSWE 4 3 2 1 0

3 33 3 3 33 3 3 333 3 0

1

2

3

Mean Opinion Score

4

3

5

Figure 3: Correlation between MOS and quality assessment based on HVS spatial frequency

response: A HVS based assessment is a signi cant improvement over the MSE, but is still quite inaccurate.

in these low activity areas. It was thus decided to initially implement the HVSWE with global activity scaling. The global activity measure was developed based on the average deviation from the local mean. This was calculated as follows: h

GA = E (1(i; j ) ? p(i; j ))2

i

(6 )

E is the mathematical expectation operator, GA is the global activity, p is the pixel intensity and 1 is the local mean de ned by: X 1(i; j ) = N1 p(i + k; j + l) (7 ) k;l where k and l de ne the local neighbourhood to be used (typically k; l 2 ?2; ?1; 0; 1; 2) and N is the number of pixels within this neighbourhood. In order to incorporate the GA measure into the image quality we needed to determine the eect of image activity on the perceived image quality. To achieve this 8 images were produced which gave the same HVSWE (error = 2.5). These were then assessed subjectively to obtain the MOS, and the GA of each image was calculated. A relationship for the Image Acitivity Weighted HVSWE (IAWE) was then determined empirically, to maximise the correlation between MOS and IAWE:

IAWE = 3 + 4= arctan[a log10(GA) + bHV SWE + c]

(8 )

Using the available data to determine values for the tuning parameters gives a = 1:28, b = ?0:63, and c = 0:09. This formula should give results which are interpreted in the same fashion as for the MOS (i.e. a low result indicates poor quality and vice versa). The results obtained using this measure are summarised in Figure 4. From this we can see that the correlation between the subjectively assessed quality and the calculated quality is signi cantly better than either the MSE or the HVSWE. A full investigation of this technique should consider whether a local activity measure would be more appropriate than the global measure used, as well as using a signi cantly broader range of test images. This is suggested as an area or further research.

6. Conclusions A quantitative deterministic quality assessment technique based on the response of the Human Visual System and an image activity measure has been developed. Test results indicate that this techniques gives excellent correlation with subjective assessments carried out in accordance with CCIR Recommendation 500. Additionally the quality assessment is simple to calculate. Further research will focus on techniques of applying the activity measure for modifying the HVS weighted error, and whether a global or local activity measure is more appropriate.

7. Acknowledgements The authors wish to express their gratitude to both OTC Ltd., for the Telecommunications Student Awards, and the Australian Research Council, for the Post-Graduate Research Awards, both of which assisted in funding this research.

5 4

333

IAWE 3 2

3

1 0

0

1

33 3 3 333

3

33 3 2

3

Mean Opinion Score

4

5

Figure 4: Correlation between MOS and quality assessment based on image activity weighting:

This scheme gives an excellent correlation between the calculated error and the rating as perceived by an observer.

References 1 2 3 4 5

6 7

C.C.I.R. Recommendation 500-3: Method for the Subjective Assessment of the Quality of Television Pictures, 1986. D. J. Sakrison. On the role of the observer and a distortion measure in image transmission. IEEE Transactions on Communications, COM-25(11):1251{1267, November 1977. F. X. J. Lukas and Z. L. Budrikis. Picture quality prediction based on a visual model. IEEE Transactions on Communications, COM-30(7):1679{1692, July 1982. M. Miyahara. Quality assessments for visual service. IEEE Communications Magazine, pages 51{60, October 1988. R. L. Kashyap. Analysis and synthesis of image patterns by spatial interaction models. In L. N. Kanal and A. Rosenfeld, editors, Progress in Pattern Recognition, Machine Intelligence and Pattern Recognition, pages 149{186. Elsevier Science Publishers B. V. (North Holland), 1981. D. B. Lowe. Image Representation via Information Decomposition. PhD thesis, School of Electrical Engineering, University of Technology, Sydney, December 1992. D. B. Lowe and A. Ginige. A hierarchical structure for spatial domain coding of video images. In The Australian Video Communications Workshop: Melbourne, pages 195{203, July 9-11 1990.