This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
A Model for the Assessment of Watermark Quality with regard to Fidelity
Michalis Xenos*, Hellenic Open University, School of Sciences and Technology, 23 Saxtouri Str, Patras, GR 26223, Greece & Computer Technology Institute, 61 Riga Feraiou Str, Patras GR 26221, Greece
℡ +30 2610 367405
+30 2610 367427
Email:
[email protected]
Katerina Hantzara, Evanthia Mitsou, Ιοannis Kostopoulos Patras University, Computer Engineering and Informatics Dept. Rion, GR 26500, Greece
Proposed running head: Watermark Quality Assessment
*
Corresponding author.
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
A Model for the Assessment of Watermark Quality with regard to Fidelity
Abstract This paper presents a model for assessing watermark quality with regard to fidelity. The model can be applied for assessing the quality of watermarks as well as investigating the effect of attacks on watermarked images. The proposed model is based on image properties as perceived by the human eye and attempts to provide algorithmic measurements that correspond to humanperceived watermark fidelity. Emphasis has been placed on the analysis of the model and the determination of the weights used to derive the final result. Experiments are presented to illustrate the applicability of the model on still images, using examples of watermarked images as well as examples of attacks on watermarked images.
Keywords: Information Hiding, Watermarks, Watermark Quality
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
Biographies Dr Michalis Xenos received the Diploma degree in Computer Engineering & Informatics in 1991, and the PhD degree in 1996 from the University of Patras, Greece. Since 1991 he has been holding teaching assistance, teaching and research positions at the Departments of Computer Engineering & Informatics and Mathematics of the University of Patras, Greece. He is a Faculty Member of the Computer Science Department of the School of Sciences and Technology of the Hellenic Open University and a Research Associate in the Research Academic Computer Technology Institute of Patras. He has published 6 books, over 50 papers in international books, journals and conferences and he serves as a reviewer for numerous journals and conferences. He is member of the Technical Chamber of Greece since 1991 and member of IEEE, since 1998. His research area includes inter alia Digital Watermarking and Image Quality Assessment. Katerina Hantzara obtained her Diploma from the Computer Engineering and Informatics Department of the University of Patras in 2001. She is currently a M.S. candidate at the same University, attending postgraduate studies on “Computer Science and Technology”, while working as a research and development computer engineer for the R.A. Computer Technology Institute of Patras. Her research interests include Information Hiding and Web-based applications.
Mitsou Evanthia graduated on 2001 from the Computer Engineering and Informatics department of University of Patras in Greece. She is currently attending the postgraduate study program “Computer Technology and Science” on the same department, whereas at the same time she is working as a researcher. Her research interests include image processing, digital watermarking and steganography.
Ioannis Kostopoulos was born in Athens, Greece, in 1972. He received the bachelor degree in Mathematics, from the University of Patras, in 1995. He is now attending the Ph. D. degree in Computer Engineering and Informatics department of University of Patras (expectable graduation: Spring 2003). He is member of the Database Laboratory of Computer Engineering and Informatics Department and member of the Multimedia Coding and Watermarking Group. He is also an IEEE Student member (Circuits and Systems Society and Signal Processing Society) and his main research interests include digital watermarking of still images and Video, data hiding and cryptography. He is speaking fluently the English language. He has 13 publications in International Journals and Conferences.
3
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
A Model for the Assessment of Watermark Quality with regard to Fidelity Michalis Xenos, Hellenic Open University, School of Sciences and Technology, 16 Saxtouri Str, Patras, GR 26223, Greece,
[email protected] & Computer Technology Institute, 61 Riga Feraiou Str, Patras GR 26221, Greece
Katerina Hantzara, Evanthia Mitsou, Ιοannis Kostopoulos Patras University, Computer Engineering and Informatics Dept. Rion, GR 26500, Greece, {hantzara,emitsou}@ceid.upatras.gr,
[email protected]
This paper presents a model for assessing watermark quality with regard to fidelity. The model can be applied for assessing the quality of watermarks as well as investigating the effect of attacks on watermarked images. The proposed model is based on image properties as perceived by the human eye and attempts to provide algorithmic measurements that correspond to humanperceived watermark fidelity. Emphasis has been placed on the analysis of the model and the determination of the weights used to derive the final result. Experiments are presented to illustrate the applicability of the model on still images, using examples of watermarked images as well as examples of attacks on watermarked images.
1. INTRODUCTION Digital still images are an inseparable part of the user’s everyday life. The constantly increasing number of multimedia applications and the expansion of use of digital data have certainly aided towards this. As the number of commercial vendors offering still images through the network increases, more and more problems arise relating to copyright protection, copy control, authentication of such images, etc. Digital watermarking offered a solution to such problems, but also brought up a number of other issues that need to be addressed, such as watermark robustness, watermark fidelity and, in short, watermark quality. A watermark is extra information embedded in the data of an image which, depending on the application, it can be meaningful or not. Its nature can vary depending on whether it represents bit sequences, single bits, binary logos or even part of the image content. As far as still images are concerned, there are three basic methods of embedding a digital watermark: methods applied on spatial domain [1,2], methods applied on frequency domain 1
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
[3] and hybrid methods [4]. A general overview of such methods can be found in [5]. The deployment of different kinds of watermarking algorithms is necessary given the existence of a large number of applications and media objects in today’s digital world requiring specific treatment in each case. The variety of existing watermarking methods raises the question of appropriateness, which in turn leads to the issue of watermark quality. Watermark quality does not deal with the problem of the ‘best’; it deals with determining the appropriate method for a particular case. Watermark quality is mainly affected by three factors that are closely inter-related: watermark capacity, watermark robustness and watermark fidelity [5]. Capacity has to do with the amount of data embedded into the image. Robustness has to do with the ability of a watermark to survive after image modifications. Such modifications may be malicious (aiming at destroying the watermark), or innocuous (made for specific causes, i.e. an image compression, but their result also affects the watermark). All image modifications, regardless of the modifier’s aim, will be referred to as attacks hereinafter. Finally, fidelity has to do with watermark imperceptibility and refers to the perceptual similarity between the original image and the watermarked one. These three factors are closely inter-related in any watermarking system in the sense that any increase in the amount of data to be embedded in an image leads to decrease in both the quality of the watermarked image and the robustness of the method [6]. The first two factors affecting watermark quality, namely the capacity and robustness of a method, can be easily addressed and assessed. For example capacity, as already mentioned, deals with the ability of watermarking models to embed large volumes of data into a cover image. In such cases, specific constraints that can be defined mathematically are set in order to measure the capacity of an image to carry extra information. An information theoretic analysis of capacity with Mean Square Error Fidelity constraints is described in [7]. In a similar manner, watermark robustness can be measured with automated methods and tools, such as Stirmark, CheckMark or Optimark that are widely used today by the research community and watermarking companies as benchmarking platforms. This paper deals with the third element affecting watermark quality: fidelity. A model is presented with goal to describe in computable terms the way in which fidelity is perceived by the end-user of a still image and how this human perception can be algorithmically computed. It should be mentioned that the presented model can also be applied in the cases of attacks to watermark images, since an attack is a modification of the original image, just as the watermark is. An attack that will radically modify the original image will probably destroy any watermark, but it will also significantly alter the image content. Thus, a malicious attack aiming at destroying a watermark also aims at being imperceptible so as not to destroy the image content. The proposed model can be used for assessing the effects of attacks to watermarked images as well. This has a lot to do with the issue of watermark quality, since one expects a watermark to be robust against attacks, but at the same time one should also be able to measure the extent to which an image was altered by the attack, i.e. noone expects any watermark to survive an attack that will completely destroy or alter the image content. In the following section a literature review of the methods used to evaluate watermark fidelity is presented and the need for a model such as the one presented is documented. Section 3 analyzes the proposed model, namely it presents the structure of the model and the image properties on which it is based, while section 4 presents examples of the application of the model. Finally, section 5 discusses the conclusions of the paper.
2
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
2. THE IMPORTANCE OF THE MODEL A number of studies on watermarks have used a rather limited set of typical images so as to prove the imperceptibility of each method and conclude regarding watermark fidelity. In the articles reporting the results of these cases, typical images such as the ‘Lena’ [8] appear printed (original versus watermarked image) so as to ‘prove’ the watermark’s fidelity. Naturally, such printed examples cannot be used as proof of watermark fidelity, since the visual degradation caused by the printing procedure has already removed any perceivable evidence of watermark into the watermarked image. Studies assessing watermark fidelity have been based on either the use of human observers or algorithmic methods. For example, works on digital watermarking methods applied on color or gray scale images present both the original and the watermarked image to a number of viewers to be examined. Such methods, called subjective testing, can be used to obtain an accurate estimate of image fidelity. However, these methods suffer from the problem of subjectivity. In addition, the practicality of such tests is limited since they are time consuming, expensive, difficult to repeat, and require specialized viewing conditions [9]; therefore they are ineffective in cases that automation is required. Algorithmic methods on the other hand are fully automated and lead to immediate results [5]. Simple objective metrics such as Mean Square Error (MSE), Peak Signal-tonoise Ratio (PSNR), or Weighted PSNR are commonly used thanks to their simplicity and absence of any standardized or widely accepted alternative. However it is well known that such simple metrics do not correlate well with human perception of fidelity or viewer opinion [10]. Work on comparison and refinement of perceptual image quality metrics based on optimally synthesized distorted test images has been presented by Wang and Simoncelli [11]. The problem with such methods is that they are not based on image properties as these are perceived by the human eye. Although this paper is against using printed images to present a case regarding fidelity, an exception will be made in this case since this method has not been presented yet and examples have to be presented to the reader in some way. Three versions of ‘Lena’ are presented in figures 1 and 2. Image A is the original one, while images B and C have been modified using a commercial imageprocessing tool (Adobe Photoshop 6.0).
Image A (original)
FIG. 1.
Image B (modified)
Original and imperceptibly modified image
Surprisingly enough PSNR measurements for the two modified images illustrated are PSNR(A,B) = 26,55 dB and PSNR(A,C) = 40,55 dB. These measurements imply that image C (the one that Lena has a moustache) is ‘better’ than image B! Furthermore the measured value of 40 dB would have been considered as a strong indication of an ‘imperceptible’ 3
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
modification in many cases, since measurements higher than 35 dB have been reported by many methods as proof of successful practices. On the other hand, measurements close to 25 dB are typically considered as indications of easily perceived modifications. Actually, there is no trick in this example, image B is the same as image A altered by shifting all pixels of the original image one pixel left (the line left blank on the left side of the modified image was filled using the corresponding line of image A).
Image A (original)
FIG. 2.
Image C (modified)
Original Lena and Lena with moustache
Having stressed the importance of algorithmic methods, one may wonder why subjective opinions are still part of image quality assessment methods aiming at obtaining better results. The main reason behind using human evaluators as a means of assessing image quality is the underlying inadequacy of the relevant methods to simulate human perceptibility accurately enough. It serves no purpose to model image quality without taking into account the operation of the Human Visual System (HVS), its features and properties. As stressed by Osberger [12], many attempts towards HVS-based objective quality metrics, e.g. [13, 14, 15], are promising but much work still remains to be done before these metrics can be widely adopted by the image processing community. Previous methods have only modeled low-level properties of the HVS and have been based on data obtained from experiments using simple, artificial stimuli. However the underlying concept of the proposed model is the achievement of hierarchical organization and the incorporation of both low-level and higher level perceptual factors. Concluding, this section’s discussion has pointed out a need. The need for an automated model –therefore algorithmic – that will still be based on properties of the human eye to assess watermark fidelity. Recent similar work on JPEG and JPEG2000 compressed images based on the degradation of structural information, presented by Wang et. al. [16], has produced excellent results in comparison to subjective ratings. This is what the proposed model aims at achieving. It should be stressed that the model does not address all three factors of image quality; it aims to address watermark quality with regard to fidelity and to serve as a contribution towards the standardization of a method assessing watermark fidelity. Moreover, the model is based on watermark methods and properties and therefore any ‘expansion’ of its use may prove to be invalid. The tool used for the automation of the model can be made available on request, provided that proper reference to this journal is made.
4
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
3. THE MODEL The proposed work is mainly focusing on quantifying human vision properties through methods for modeling as many of the visual processes as feasible. The complex structure of natural images is one of the facts that were taken into consideration when determining appropriate parameters for the different stages of the presented human vision based model used for image quality evaluation. This is necessary, since the sensitivity of the human eye can vary considerably depending on the nature of the visual stimulus being used. Firstly, the HVS and how its operation affects the watermarked images fidelity is briefly discussed, followed by a presentation of the model, discussion of its structure and presentation of the model’s higher-level factors. Finally, the presentation of the proposed model is completed with a discussion about the low level criteria and metrics used for its implementation.
3.1. The Human Visual System Extensive psychological and physiological experimentation on the operation of the primate visual system has resulted in good understanding of the HVS. Although much work still remains to be done, the development of vision models which successfully mimic a number of human vision tasks e.g. [15,17] underlines the relative maturity of this field. Osberger [18] classifies vision processes in two levels: early vision processes and higher vision processes. Two of the most important features of the early vision processes that need to be considered by any HVS quality model are a) sensitivity to contrast changes rather than luminance changes (known as the Mach phenomenon) and b) masking, which refers to HVS reduced ability to detect a stimulus on a spatially or temporally complex background. Given these features of the early vision processes, changes in watermarked images are less visible along strong edges, on textured areas, or immediately after a scene change. The amount of masking caused by a background depends not only on the background’s contrast, but also on the level of uncertainty created by the background and stimulus [19]. Areas of high uncertainty (e.g. complex areas of a scene) induce higher masking than areas of the same contrast with lower uncertainty, i.e. the human eye is more sensitive to noise in flat colored regions or around edges rather than textured regions, and less sensitive to noise in very dark or very light regions. As regards higher level processes, a strong relation exists between eye movements and attention. Studies of viewer eye movement patterns –for general viewing conditions and when viewing an image in the same context– have proved to be very similar for different observers [20,21]. Fixations are generally not distributed evenly over a scene; instead there tend to be a few regions in a scene which receive a disproportionately large number of fixations (even if unlimited time is given), while other areas are not viewed at all [22]. This suggests that our perception of overall picture quality is heavily influenced by the quality in these areas of interest, while peripheral regions can undergo significant degradation without strongly affecting overall quality [23,24]. Therefore, in order to determine the importance of the different regions in an image automatically, one needs to determine the factors that influence human visual attention. Research indicates that human attention is controlled by both high and low level factors [25]. Low-level factors that have been found to influence visual attention include contrast, size, shape, color and motion. Several high level factors have also been determined, such as location, foreground and background, people and context.
5
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
3.2. The Structure of the Proposed Model The concept of the proposed model is based on a traditional quality model used in software quality, namely FCM [26]. The structure of the proposed model is organized in a hierarchical way with abstract factors in the higher levels, which are decomposed in more specific characteristics in the middle levels and lead into measurable quantities in the lower levels.
Height
Dimensions
Width
Properties Type # Bytes used
Depth
# Distinct colors
Color
Main color Secondary color
Basic colors
Third color Luminace
Brightness
Features
Lightness
Saturation Hue Size
Properties
Shape Center Context
Location
Foregroung Background Brightness
Regions
Contrast Features
Luminace Lightness Contrast between region & surrounds
Saturation Hue Main color Color
FIG. 3.
Basic colors
Secondary color Third color
The structure of the proposed model
6
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
As shown in figure 3, the highest level of the model’s hierarchy consists of the basic image characteristics that are directly perceived by the human visual system. These are called quality factors and are grouped into four major categories: properties, color, features and regions. Since quality factors cannot be measured directly, they are broken down into lower level features called quality criteria. These criteria correspond to the “inner” characteristics of an image and directly affect its “outer” characteristics, namely the ones perceived during image observation and comparison. In most cases, an extra level of refinement is necessary where each quality criterion is mapped to even more fine-grained criteria, which in turn can receive specific values (metrics). The four main factors of the model are briefly presented hereinafter. The properties factor consists of two criteria: image dimensions and image type. Image dimensions can be further broken down to width and height and are among the first elements captured by the human eye when observing two or more images in order to estimate image distortion. Changes in dimensions can either occur as part of a malicious attack or when the real owner of the watermarked image wants to resize it in order to use it in electronic applications, for example its publication on the Internet. Image type refers to image content. Based on this content, images are classified under the following categories (values): Portrait, Graphics, Object, Text, and Other (when the image does not fall in any of the abovementioned categories). Each of these categories has its own properties and one can exploit this image-dependent information in evaluating the visual degradation of a watermarked image in order to obtain more efficient results. Unfortunately, designing an algorithmic method for determining the image type is a very difficult task to do. An alternative approach is to ask the human viewers of the image, through a user-friendly interface, to classify the original image under one of the above categories. In this case, despite the fact that human interference is involved in the assessment procedure, subjectivity is minimized since the characterization of an image’s type based on a detailed listing of all possible image types is in most cases an easy, clear and specific task to accomplish for any human evaluator. Color, on the other hand, is one of the most important factors affecting image quality. Any visual degradation of an image that is not related to changes in the image’s dimensions is perceived as color modification. The color specification system for image coding that was adopted for the application of the presented model was the linear RGB, because it is widely known and used by almost all computer applications. According to this model each pixel contains three numerical values (Red-Green-Blue) that define a color. The more bits are used for the representation of each color component the more colors will compose the image. The overall number of bits used is referred to as color depth and it is easy to understand that decreasing this number automatically leads to degradation of the image quality. Any change in the basic colors, namely those colors that appear more often in the pixels’ values, can affect negatively the overall image quality. Of course, not all basic colors affect the final result in the same way. A change in the most dominant color will affect image quality more than the same proportional change in the second most dominant color. Since the purpose of examining an image’s basic colors is to capture only huge distortions, it is adequate for the implementation of the proposed model to take into account the first three image basic colors. For even more satisfactory results it is suggested to expand the number of basic colors from 10 to 20 maximum. In our implementation the number that offered optimum results –and was finally used– is 15. In this way, misleading cases can be avoided; for instance, cases where color X is more perceivable than color Y even though it corresponds to fewer pixels, simply because color X is contained in neighboring pixels giving the impression of a flat color, whereas color Y is contained in scattered pixels. 7
This is a pre-print working version of the paper published in: Journal of Visual Communication and Image Representation, Vol. 16 (6), pp. 621-642, 2005.
Features refer to perceivable image attributes such as brightness, saturation and hue. Changes in these attributes are reflected on image quality. Brightness is defined as the attribute relating to the amount of light that a given area appears to emit. Because brightness perception is very complex and subjective, luminance is used in this model instead. The luminance of each pixel is measured by using equation (1). Since the human vision is reacting in a non-linear way to the perceivable luminance, an alternative parameter that measures the perceptual response is used in the proposed model. This parameter is called lightness and is measured using equation (2), where Yn is white color luminance. Lightness, as an image attribute, is the feeling of a color luminance compared with the white color and, like luminance, it takes values from 0% up to 100%.
RGB Lum(Y ) = 0.229 ⋅ R + 0.5876 ⋅ G + 0.114 ⋅ B
(1)
1 3
⎛Y ⎞ ⎜ Y ⎟⎟ ⎝ n⎠
L* = 116 ⋅ ⎜
Y > 0.008856 Yn
− 16,
(2)
Hue is expressed as an angle into the color wheel, taking values from 0 to 360 degrees. It starts and finishes to the Red color (Red=0o, or 360o), passing the yellow color (Yellow=60o), the Green color (Green=120o), the Blue color (Blue=240o) and all the colors between them. The Hue of each pixel is measured as shown in process 1. Process 1: Hue of each pixel Let Min[R,G,B] be the chromatic component with the minimum value and Max[R,G,B] the component with the maximum value, then: If the color is white or black or gray scale value then: Ηue – Not defined In any other case: If R = Max[R,G,B] then:
Hue = 60.0 ⋅
(G − B ) max[R, G, B ] − min[R, G, B ]
If G = Max[R,G,B] then:
Hue = 120 + 60.0 ⋅ If B = Max[R,G,B] then:
Hue = 240 + 60.0 ⋅
(3)
(B − R ) max[R, G, B ] − min[R, G, B ]
(4)
(R − G ) max[R, G, B ] − min[R, G, B ]
(5)
If after the Hue calculation the result is Hue