Image quality assessment in multimedia applications

7 downloads 0 Views 2MB Size Report
In the framework of multimedia applications image quality may have different ... excellence, as perceived by an observer neither associated with the act of ...
Image quality assessment in multimedia applications G. Ciocca, F. Marini, R. Schettini DISCo, Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano-Bicocca 20126, Viale Sarca 336, Milano, Italy ABSTRACT In the framework of multimedia applications image quality may have different meanings and interpretations. In this paper, considering the quality of an image as the degree of adequacy to its function/goal within a specific application field, we provide an organized overview of image quality assessment methods by putting in evidence their applicability and limitations in different application domains. Three scenarios have been chosen representing three typical applications with different degree of constraints in their image workflow chains and requiring different image quality assessment methodologies. Keywords: Image Quality Assessment, Full Reference metrics, Reduced Reference metrics, No Reference metrics, multimedia applications.

1. INTRODUCTION Image quality is often understood as the subjective impression of how well image content is rendered or reproduced [1]; the integrated set of perceptions of the overall degree of excellence of an image [2]; or an impression of its merits or excellence, as perceived by an observer neither associated with the act of photography, nor closely involved with the subject matter depicted [3]. In these definitions image quality actually refers to the quality of the imaging systems used to acquire or render the images. Although suitable targets and studio scenes are often used for testing, we do not know in advance what the objects/subjects will be actually acquired and processed. Depending on the applications, both scene contents and imaging conditions may range from being completely free (the common use of a consumer digital camera) to be strictly controlled. Quality, in general, has been defined as the “totality of characteristics of a product that bear on its ability to satisfy stated or implied needs” [4]; “fitness for (intended) use” [5]; “conformance to requirement” [6]; “user satisfaction” [7]. These definitions and their numerous variants could fit digital image quality as suggested by the Technical Advisory Service for Images: “The quality of an image can only be considered in terms of the proposed use. An image that is perfect for one use may well be inappropriate for another.” [8]. According to the International Imaging Industry Association [9], image quality is the perceptually weighted combination of all visually significant attributes of an image when considered in its marketplace or application. We must, in fact, consider the application domain and expected use of the image data. An image, for example, could be used just as a visual reference to an item in the digital archive and although image quality has not been precisely defined, we can reasonably assume that in this case image quality requirements are low. On the opposite if the image should “replace” the original, image quality requirements will be high. Another quality definition that considers an image as input of the vision stage of an interaction process is formulated by Janssen [10] in terms of the degree to which imposed requirements are satisfied to guarantee the successfulness of this process. Taking into account that images should not be necessarily processed by a human observer, we can consider the quality of an image as the degree of adequacy to its function/goal within a specific application field. Given a specific domain and task, there are several factors that may influence the results and therefore the perceived image quality. These are: scene geometry and lighting conditions, imaging device (HW and SW), image processing and transmission, rendering device (HW and SW), observer’s adaptation state and viewing conditions, observers’ previous experiences, preferences and expectations. In this paper we provide an organized overview of image quality assessment methods by putting in evidence their applicability and limitations in different application domains.

Multimedia Content Access: Algorithms and Systems III, edited by Raimondo Schettini, Ramesh C. Jain, Simone Santini, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7255, 72550A © 2009 SPIE-IS&T · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.810607 SPIE-IS&T/ Vol. 7255 72550A-1

2. ON THE QUALITY OF DIGITAL IMAGES 2.1. Holistic Image quality modeling Some attempts have been made in the last decade to develop a general, broadly applicable, image quality model that regard images not only as signals but also as carriers of visual information which encodes information about the geometry of the scene and the properties of the objects located within this scene [10][11][12]. The FUN image quality model assumes the existence of three major dimensions determining image quality: •

• •

Fidelity is the degree of apparent match of the acquired/reproduced images with the original. Ideally, an image having the maximum degree of Fidelity should give the same impression to the viewer as the original. As an example a painting catalogue requires a high fidelity with respect to the originals. Genuineness and faithfulness are sometimes used as synonymous of Fidelity [9]. Dozens of books and thousands of papers have been written about image fidelity and image reproduction e.g. [13]. Usefulness is the degree of apparent suitability of the acquired/reproduced image with respect to a specific task. In many application domains, such as medical or astronomical imaging, image processing procedure can be applied to increase the image usefulness e.g. [14]. These processing steps have an obvious impact on Fidelity. Naturalness is the degree of apparent match of the acquired/reproduced images with the viewer’s internal references. This attribute plays a fundamental role when we have to evaluate the quality of an image without having access to the corresponding original. Examples of images requiring a high degree of naturalness are those downloaded from the web, or seen on journals. Naturalness plays a fundamental role also when the image to be evaluated does not exist in the reality such as in virtual reality domains.

It should be noted that, in general, the quality dimensions are not independent. The overall image quality can be evaluated as a single number weighting the individual components. These weights depend on the specific image data type, and on its function/goal within a specific application field. 2.2. Image quality assessment approaches Image quality can be assessed either for an image seen in isolation or for an image seen together with a reference one [15]. Image quality assessment is usually done by one or more of the following direct approaches: •

Psychological experiments involving human observers (subjective). Standard psychophysical scaling tools for measure subjective image quality are now available and described in some Standards, such as ITU-R BT.500-11 [16][17][18]. The involvement of real people that view the images for assessing their quality requires that all the factors that influence perception are taken into account and strict protocols are adopted. Notwithstanding effectiveness of subjective approaches, their efficiency is very low compared to objective ones. All subjective quality issues could be discarded if the image usage does not require the user involvement, or if the observer could be substituted by a computational model. This has led the research towards the study of objective image quality measures not requiring human interaction.



Computing suitable metrics directly from the digital image (objective). These image quality metrics can be broadly classified in [19]: o

o

Full-reference (FR) metrics perform a direct comparison between the image under test and a reference or “original” in a properly defined image space. Having access to an original is a requirement on the usability of such metrics. Among the quality dimension previously introduced, only image fidelity can be assessed. Different FR metrics are described and compared in [19]. No-reference (NR) metrics assume that image quality can be determined without a direct comparison between the original and the processed images. In theory, it possible to measure the quality of any visual contents. In practice, some information about the application domain, requirements and users’ preferences are required to contextualize the quality measures. Examples of NR metrics are those designed to identify the presence of specific processing distortions such as noise, compression artifacts, etc….

SPIE-IS&T/ Vol. 7255 72550A-2

o

Reduced-reference (RR) metrics lies between FR and NR metrics. They extract a number of features from both the reference and the image under test. These features are used as surrogate of the whole images information and image comparison is based only on the correspondence of these features. Thus only image fidelity can be assessed.

Depending on the assessment procedure, i.e. if the image is seen in isolation or not, the result of these metrics can be a number, a set of numbers or an image error map. This map can then be used to precisely locate where the imaging processing procedures degrade the image. The aforementioned image quality approaches assess the quality by taking into account the properties of the images themselves in the form of their pixel or feature values. Image quality can be also indirectly assessed by: •



Quantifying the performance of an image-based task. This can be done manually by domain experts and/or automatically by a computational system. For example, in a biometrics system an image of a face is of good quality if the person can be reliable recognized. This can be done by manually inspecting each image acquired and evaluating if the pose satisfies the application constraints (e.g. not occluded face) or enforced by law requirements (e.g. open eyes). Image distortions that are irrelevant for the task can therefore go unnoticed or simply ignored by the observer. We can consider these as irrelevant. The quality evaluation could be done by a face recognition algorithm that automatically processes each images and assesses the fulfillment of the constraints and requirements [20]. Assessing the performance of the imaging/rendering devices. Using suitable sets of images and one or more direct methods (both objective and subjective) it is possible to assess the quality of the imaging and rendering procedures. In this case image quality is related to some measurable features of imaging/rendering devices, such as spatial resolution, color depth, etc…. These features can be quantitatively assessed using standard targets and ad-hoc designed software tools (e.g. Imatest [21]), but these measures alone are not sufficient to fully assess image quality. The Camera Phone Image Quality (CPIQ) Initiative of the International Imaging Industry Association (I3A) suggests both objective and subjective characterizations procedures [9].

Subjective image quality assessment makes it necessarily to take into account both the Human Vision System (HVS) characteristics, the image rendering procedure, the subjects’ characteristics and the perceptual task [2]. The HVS is specialized and tuned to recognize the features that are most important for human evolution and survival, there are other image features that human can not distinguish or that are easily overlooked. The intrinsic limitations of the HVS relevant to image quality assessment are: luminance sensitivity, contrast sensitivity, and texture masking [22]. These limitations make quality assessment highly dependent by the image contents. Subjective experiences and preferences may influence the human assessment of image quality, for example, it has been shown that the perceived distortions are dependent on how familiar the test person is with the observed image [21]. Image quality assessment is also affected by the user’s task, e.g [23][20]: passive observation can be reasonably assumed when the observer views a vacation image, but not radiography for medical diagnosis. The cognitive understanding and interactive visual processing, like eye movements, influence the perceived quality of images in a top-down way [19]. If the observer is provided with different instructions when evaluating a given image, he will give different scores to the same image depending on those instructions. Prior information regarding the image contents, or fixation, may therefore affect the evaluation of the image quality.

3. QUALITY ASESSESSMENT AND IMAGE WORKFLOW In Fig. 1 is shown a generic image workflow chain. It starts with a real scene, which is a scene that must be captured by a digital image. The scene is acquired by a proper device (e.g. digital camera, scanner, …) that performs all the processing steps aimed to produce a digital representation of the scene. Examples of these processing steps are geometric transformation, gamma correction, color adjustments, etc… Imaging metadata can be automatically embedded in the image header (e.g. EXIF) by the imaging device and may includes some information such as maker and model of the camera, device settings and pre-processing, date and time of original, Time Zone offset, and GPS Information. Other metadata are usually added both for catalogue and retrieval purposes. These metadata can include both textual annotations inserted by cataloguers in the context of the application, or automatically computed image representations (numerical or alphanumerical) of some attributes of the digital images. These representations can be used to derive

SPIE-IS&T/ Vol. 7255 72550A-3

information about the image contents [24]. These representations are usually related to visual characteristics, but they may also be related to symbolic, semantic, or emotional image interpretation. The metadata schema is usually set at the beginning of the digitalization stage and it is based on application needs and the workflow requirements. Once the image is acquired a validation procedure can be applied. This procedure is aimed to have an initial assessment of the suitability and/or quality of the image with respect to the application needs. For example, a manual inspection can be performed in order to check if the whole scene has been correctly acquired or satisfies some constraints. In some cases, the validation step can be automatically performed using suitable algorithms borrowed from the pattern recognition field. Images passing through the validation step may have extra ancillary information added to them (e.g. identity of a subject). If required, the image can be further processed in order to increase its usefulness for the task at hand (e.g. contrast increasement or binarization) or in order to allow more efficient transmission and storage. Again, extra information can be added. The image thus obtained can be finally rendered taking into account both the user’s device characteristics and the viewing conditions. These characteristics will not be considered if the images will be automatically processed by a computational system. Every element in the workflow chain affects the quality of the resulting images. Image quality can be assessed in the different processing stages using one of the approaches discussed in the previous section.

Real Real World World

Imaging Imaging

Digital Digital Image Image

Validation Validation

Broad Domain Human

Metadata Metadata Metadata Metadata Metadata Metadata

Transmission Transmission Media Media

Rendering System

Increase Usefulness Compression

System

To storage DATA ACQUISITION

Processed Processed Image Image

Processing Processing

Metadata Metadata Metadata Metadata

Metadata Metadata Narrow Domain

Validated Validated Image Image

To storage

To storage

DATA PROCESSING & STORAGE

FRUITION

Figure 1. A generic image workflow chain.

In the image quality literature little attention is given to the scene contents. The scene is composed by the contents itself (a face, for example), and by the viewing/acquisition environment: geometry, lighting and surrounding. We may call scene gap the lack of coincidence between the acquired and the desired scene. The scene gap should be quantified either at the end of the acquisition stage or during the validation stage (if any). Scene gap can be considered recoverable if subsequent processing steps can correct or limit the information loss or corruption in the acquired scene. It is unrecoverable if no suitable procedure exists to recover or restore it. The recoverability of the scene gap is affected by the image domain. When narrow image domains are considered (e.g. medical X-Rays images), having limited and predictable variability of the relevant aspects of image appearance, it is easier to devise procedures aimed to automatically detect or reduce the scene gap. When broad image domains are considered it is very difficult, and in many cases impossible, to automatically detect, quantify and recover the scene gap. The characteristics of the imaging devices have an obvious impact on the quality of the acquired images. The hardware (sensors and optics) and software components (processing algorithms) of the device may be very articulated and complex. Their roles can be to keep image fidelity as much as possible, improve image usefulness, naturalness, or suitable combinations of these quality dimensions. We call device gap the lack of coincidence between the acquired image and the image as acquired by an ideal device properly defined, or chosen and used. The characteristics of the devices to be used must be carefully evaluated in order to make the best cost-performance choice in accordance to what it is needed for the application at hand and to how the image must be accessed, processed and used [16]. For the rendering devices it is important to evaluate the artifacts that their processing pipelines eventually introduce. We call rendering gap the lack of coincidence between the actually rendered image and the image as rendered by an ideal (perfect) device properly defined, or chosen and used, for the application at hand. The viewing conditions have a significant influence on the appearance of a rendered image because they can amplify or diminish the visibility of artifacts. This is why all the standards for subjective image quality assessment pay particular attentions to this issue. Finally, observers’ previous experiences, preferences and expectations clearly vary and are nearly impossible to standardize. We call observer gap the lack of coincidence between the actual observer and the observer the image creator had in mind for a given scope and application. Properly screening and selection of the panel of the observers to be used

SPIE-IS&T/ Vol. 7255 72550A-4

in the image quality assessment is thus required. In Fig. 2 is shown the generic image workflow chain with the indication of where the different image quality assessment approaches applied. The FR quality assessment metrics can be applied only when two digital images are available.

Reduced Reference

Real Real World World

Imaging Imaging

Full Reference Reduced Reference

No Reference

Digital Digital Image Image

Validation Validation

Broad Domain Human To storage

Processing Processing

Transmission Transmission Media Media

To storage

To storage

DATA PROCESSING & STORAGE

DATA ACQUISITION

Rendering System

Increase Usefulness Compression

System

Task Constraints

Processed Processed Image Image Metadata Metadata Metadata Metadata Metadata Metadata

Metadata Metadata Metadata Metadata

Metadata Metadata Narrow Domain

Validated Validated Image Image

Psychophysical Tests Task Performances

No Reference

Task Performances FRUITION

Figure 2. Relationship between the image workflow chain and the image quality assessment approaches.

4. SOME APPLICATION SCENARIOS We illustrate three typical applications with different degree of constraints in their image workflows and requiring different image quality assessment methodologies. 4.1 Biometric system A general workflow chain for a biometric system for digital passports is shown in Fig. 3. Since the application needs and constraints are clearly defined and focused, very low variability in the workflow chain is admitted. The source images are usually taken in a controlled environment. Lights and subject pose are managed in order to reduce at the minimum the variability in the acquisition and to focus on the face of the subject. The acquisition devices are usually digital cameras. They must have just enough spatial resolution to make it possible to record all the relevant facial information. Images acquired pass through a validation phase that is aimed to discard those that do not possess the requirements needed by the application [25]: • be in sharp focus and clear; • show skin tones naturally; • have an appropriate brightness and contrast; • be color neutral; • show eyes open and clearly visible (no hair across eyes); • be taken with a plain light colored background; • be taken with uniform lighting and not show shadows or flash re-flections on face and no red eye effect. This validation phase can be carried out by a domain expert or by an automatic procedure. Such procedure should include face detection, facial features detection and ad-hoc photo compositional rules that can be assessed by pattern recognition and computer vision algorithms. The validated images can then be stored in a database with auxiliary information such as the identity of the captured subjects and other related information. The images can be further processed in order to crop the relevant regions (e.g. only the face) or to increase their usefulness. Additional information such as biometrical features (eye location, eye to mouth distance) can also be extracted and stored during this stage. In the workflow, the image quality is affected by the imaging and processing procedures. For example, the enhancement used to reduce the non-uniform lightning or the compression used may degrade the biometric feature extraction. Image quality assessment can be carried out exploiting NR methodologies in the acquisition and validation phases and FR methodologies in the processing phase, especially if compression is involved. Finally, by evaluating the performance of a biometric system in face recognition, the quality of the acquired images (i.e. of the whole workflow chain) can be indirectly assessed.

SPIE-IS&T/ Vol. 7255 72550A-5

Face recognizable? Law compliant? Processing Processing

Validation Validation Identity Identity

Metadata Metadata

Controlled environment

To storage

To storage

DATA ACQUISITION

Features Features Image enhancement Image Cropping Biometric Features Extraction

Biometric IR System

To storage

DATA PROCESSING & STORAGE

FRUITION

Figure 3. Image workflow chain of a generic biometric system exploiting facial information.

4.2 High-quality image archives Fig. 4 illustrates an image workflow chain aimed to the population of a generic image archive for professional users such as institutional museums, photographic agencies and, in general, any entity responsible to manage and distribute high quality image archives. In this case, the main scope of the workflow chain is to collect images with the maximum fidelity in order to preserve as much as possible the characteristics of the originals. In the case of an art gallery, the environment cannot be excessively tampered with to properly light the artifacts or move them in a better place to facilitate the acquisition procedures. Thus the acquisition environment can be considered as only partially controlled (for example, paintings can be illuminated with lights that must not harm the colors or the camera cannot be freely placed in front of the artifacts). Often the acquisition is performed by high-end acquisition devices. Special devices can also be used to acquire large surfaces at high resolution. Since the fidelity of the acquired image is of paramount importance, color charts are used to calibrate and characterize the acquisition devices [13]. They may be also acquired along with the artifacts constituting reliable references for subsequent processing steps. In the validation phase, it should be assessed that the whole artifact has been completely and correctly acquired as in the case of the multi view acquisitions of 3D objects or the surface tessellation of very large paintings. For verified images the color charts may be removed and only the objects of interest retained. Artifact images are annotated with auxiliary information such as title of the opera, author, creation date, etc… Since the images collected may be distributed and used in different ways, the processing phase may include resizing, thumbnail creation, digital image format changes and compression. The images collected can be accessed by browsing by users or indexed and exploited within the framework of IR Systems. Image quality is a function of the acquisition device and the processing procedures. During fruition the perceived image quality is greatly affected by the rendering device and the viewing conditions. For a faithful reproduction of digital images, the rendering devices must be carefully calibrated and characterized [13]. Image quality assessment can be carried out using NR or RR methodologies exploiting, for example, the colour charts in the acquisition and validation phases, while FR methodologies could be used in the processing phase. Subjective methodologies can be used to evaluate the perceived quality in the fruition environments. Fidelity Completeness Semantic rules

Validation Validation

Full / partial controlled environment

Device DeviceProfiles Profiles

Processing Processing

Annotations Annotations

To storage

To storage

DATA ACQUISITION

'I

'I

Transmission Transmission Media Media

Rendering Renderingrules rules Compression Format change Enhancements

To storage

DATA PROCESSING & STORAGE

Figure 4. Image workflow chain of a generic multimedia digital archive.

SPIE-IS&T/ Vol. 7255 72550A-6

FRUITION

4.3 Consumer collections of personal photos Personal photo collections cannot be accurately characterized and belong, by definition, to the broad image domain. No constraints can be imposed on the image contents and in the image acquisition procedures. Images can be acquired using a variety of devices (mobile phones, digital cameras, webcams, etc… ) with very different technical and functional characteristics. Personal photos are usually taken to record special events and thus users usually pay little attention to the acquisition conditions. For example, images taken with very low light are usually very noisy but they can be considered acceptable if they depict the object of interest. Given the broad image domain, a validation procedure is not clearly definable. The EXIF metadata are often incomplete and heterogeneous since no agreement exists among camera manufacturers. Users want the images to look good (or even funny or conspicuous) disregarding the fidelity with the original. In this scenario is more important the perceptual/subjective impression than the objective quality. Consequently the processing is mainly aimed to enhancing the appeal and naturalness of the images even if these processing steps may introduce artifacts or distortions such as halos. The processing procedures may also include the application of digital effects in order to make the image more attractive for potential viewers. The problem of the storage (compression and image format) is far less important than the enhancement. In this scenario image quality can be partially assessed using NR metrics and indirectly estimated from the quality assessment of the imaging devices on proper datasets.

Transmission Transmission Media Media

Processing Processing

??? ???

To storage

DATA ACQUISITION

??? ???

Image enhancement Increase the appeal Digital effects

To storage

DATA PROCESSING & STORAGE

FRUITION

Figure 5. Image workflow chain of a personal photo collection.

ACKNOWLEDGEMENTS The authors would like to thank Carlo Batini who encouraged and inspired this work and Gabriella Pasi and Federico Cabitza for many helpful discussions concerning data quality definitions.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Yendrikhovskij S., de Ridder H., “Image Quality is FUN: Reflections on Fidelity, Usefulness and Naturalness”, SID Symposium Digest of Technical Papers, 33(1), 986-989 (2002). Engeldrum P.G., [Psychometric Scaling: A Toolkit for Imaging Systems Development], Imcotek Press, (2000). Keelan B.W., [Handbook of Image Quality], Marcell Dekker, Inc., Monticello, (2002). --, “Quality management and quality assurance -- Vocabulary”, ISO 84021994, (2000). Juran, J.M., [Juran on planning for quality], New York: The Free Press, (1988). Crosby, P.B., [Quality is free], McGraw-Hill, (1979). Wayne, S.R., “Quality control circle and company wide quality control”, Quality Progress, 14-17 (1983). --, Technical Advisory Service for Images, http://www.tasi.ac.uk/advice/creating/quality.html I3A, “Fundamentals and review of considered test methods”, CPIQ Initiative Phase 1 White Paper, (2007). Janssen R., [Computational Image Quality], SPIE Press, (2001). Torgerson W.S., [Theory and Methods of Scaling], Wiley New York, (1958). Yendrikhovskij S., “Image Quality: Between Science and Fiction,” Proc. IS&T PICS, 173-178 (1999). Sharma G., [Digital color imaging handbook], CRC Press, (2002). Gonzales R.C., Woods R.E., [Digital image processing], Prentice Hall, (2008).

SPIE-IS&T/ Vol. 7255 72550A-7

[15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

MacDonald L., Jacobson R.E. “Assessing image quality”, In Digital heritage: applying digital imaging to cultural heritage, Elsevier Butterworth-Heinemann, 351-373 (2006). --, “Methodology for the Subjective Assessment of the Quality for Television Pictures”, ITU-R Rec. BT. 500-11, (2002). Engeldrum P.G., “Psychometric Scaling:Avoiding the Pitfalls and Hazards”, IS&T's 2001 PICS Conference Proceedings, 101-107 (2001). Thurstone L.L., “A law of comparative judgement”, Psychological Review, 34, 273-286 (1927). Wang, Z., Bovik A.C., Sheikh H.R., Simoncelli E.P., “Image quality assessment: From error visibility to structural similarity”, IEEE Transactions on Image Processing, 13(4), 600-612 (2004). Lundström C., “Technical report: Measuring digital image quality”, Linköping University, Department of Science and Technology, (2006). --, Imatest - Digital Image Quality Testing, www.imatest.com/. Watson A.B., Borthwick R., Taylor M., “Image quality and entropy masking”, In Proceedings SPIE Human Vision and Electronic Imaging Conference, 3016, 2-12 (1997). Eckbert M.P., Bradley A.P. “Perceptual quality metrics, applied to still image compression”, Signal Processing, 70, 177–200 (1998). Ciocca G., Gagliardi I., Schettini R., “Quicklook2: An Integrated Multimedia System”, International Journal of Visual Languages and Computing, Special issue on Querying Multiple Data Sources, 12, 81-103 (2001). --, “Biometrics Deployment of Machine Readable Travel Documents”, ISO/IEC 19794-5, (2005).

SPIE-IS&T/ Vol. 7255 72550A-8