indicates the center-surround difference operation between a center image c and a surround ... The color distribution of what we call Near Neutral Objects (NNO).
Content Aware Image Enhancement Gianluigi Ciocca1 , Claudio Cusano1 , Francesca Gasparini1 , and Raimondo Schettini1 1
DISCo (Dipartimento di Informatica, Sistemistica e Comunicazione) Universit` a degli Studi di Milano-Bicocca Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy [ciocca, cusano, gasparini, schettini]@disco.unimib.it
Abstract. We present our approach, integrating imaging and vision, for content-aware enhancement and processing of digital photographs. The overall quality of images is improved by a modular procedure automatically driven by the image class and content.
1
Introduction
The quality of real-world photographs can often be considerably improved by digital image processing. Since interactive processes may prove difficult and tedious, especially for amateur users, an automatic image tool would be most desirable. There are a number of techniques for image enhancement and processing, including global and local correction for color balancing, [1–3], contrast enhancement [4, 5] and edge sharpening [6, 7]. Other techniques merge color and contrast corrections, such as all the Retinex like algorithms [8–11]. Rarely, traditional enhancement algorithms available in the literature are driven by the content of images [12]. Our interest is related to the design of content-aware image enhancement for amateur digital photographs. The underlying idea is that global and/or local image classification makes it possible to set the most appropriate image enhancement strategy according to the content of the photograph. To this end, we have pragmatically designed a modular enhancing procedure integrating imaging and vision techniques. Each module can be considered as an autonomous element, related to color, contrast, sharpness and defect removal. These modules can be combined in a complete unsupervised manner to improve the overall quality, according to image and defect categories. The proposed method is modular so that each step can be replaced with a more efficient one in future work, without changing the main structure. Also, the method can be improved by simply inserting new modules. The initial global image classification makes it possible to further refine the localization of the color regions requiring different types of color and sharpness corrections. The following color, contrast, and edge enhancement modules may exploit image annotation, together with further image analysis statistics (in same cases locally adaptive as well). As a final step we also propose a self-adaptive image cropping module exploiting both visual and semantic information. This module can be useful for users looking at photographs
on small displays that require a quick preview of the relevant image area. Figure 1 shows the overall schema of our content aware image enhancement and processing strategy. Processing blocks responsible for the image content annotation are at the left of the dashed line while those driven by it are at the right of the line. The single modules are described in the following sections.
Fig. 1. The processing blocks on the left of the dashed line identify the content of the input image and produce a semantically annotated image. The blocks on the right perform several processing steps exploiting the given annotation.
2
Semantic Image Classification
Automatic image classification allows for the application of the most appropriate enhancement strategy according to the content of the photograph. To this end we designed an image classification strategy [13] based on the analysis of low-level features that can be automatically computed without any prior knowledge of the content of the image. We considered the classes outdoor, indoor, and close-up which correspond to categories of images that may require different enhancement approaches. The indoor class includes photographs of rooms, groups of people, and details in which the context indicates that the photograph was taken inside. The outdoor class includes natural landscapes, buildings, city shots and details in which the context indicates that the photograph was taken outside. The closeup class includes portraits and photos of people and objects in which the context provides little or no information in regards to where the photo was taken. The features we used are related to color (moments of inertia of the color channels in the HSV color space, and skin color distribution), texture and edge (statistics on wavelets decomposition and on edge and texture distributions), and composition of the image (in terms of fragmentation and symmetry). We used the CART methodology [14] to train tree classifiers. Briefly, tree classifiers are produced by recursively partitioning the space of the features which
describe the images. Each split is formed by conditions related to the values of the features. Once a tree has been constructed, a class is assigned to each of the terminal nodes, and when a new case is processed by the tree, its predicted class is the class associated with the terminal node into which the case finally moves on the basis of the values of the features. The construction process is based on a training set of cases of known class. A function of impurity of the nodes, i(t), is introduced, and the decrease in its value produced by a split is taken as a measure of the goodness of the split itself. For each node, all the possible splits on all the features are considered and the split which minimizes the average impurity of the two subnodes is selected. The function of node impurity we have used is the Gini diversity index: i(t) = 1 −
C X
p(c|t)2 ,
(1)
c=1
where p(c|t) is the resubstitution estimate of the conditional probability of class c (c = 1, · · · , C) in node t, that is, the probability that a case found in node t is a case of class c. When the difference in impurity between a node and best subnodes is below a threshold, the node is considered as terminal. The class assigned to a terminal node t is the class c∗ for which: p(c∗ |t) = max p(c|t). c=1,···,C
(2)
In CART methodology the size of a tree is treated as a tuning parameter, and the optimal size is adaptively chosen from the data. A very large tree is grown and then pruned, using a cost-complexity criterion which governs the tradeoff between size and accuracy. Decision forests can be used to overcome the instability of the decision trees improving, at the same time, generalization accuracy. The trees of a decision forest are generated by running the training process on boostrap replicates of the training set. The classification results produced by the single trees are combined applying a majority vote. Each decision tree has been trained on bootstrap replicates of a training set composed of about 4500 photographs manually annotated with the correct class. Evaluating the decision forest on a test set of 4500 images, we obtained a classification accuracy of about 89%. To further improve the accuracy of the classifier and to avoid doubtful decisions, we introduced an ambiguity rejection option in the classification process: an image is “rejected” if the confidence on the classification result is below a tunable threshold. For instance, setting the threshold in such a way that 30% of the test set was rejected, we obtained a classification accuracy of about 96% on the remaining images.
3 3.1
Image Content Annotation using Intermediate Features Outdoor Image Annotation
For the detection of semantically meaningful regions in outdoor photographs, we designed a strategy for the automatic segmentation of images. Segmented
regions are assigned to seven classes: sky, skin, vegetation, snow, water, ground, and buildings [15]. The process works as follows: for each pixel, a fixed number of partially overlapping image subdivisions (tiles) that contain it is considered. Each tile is described by a joint histogram which combines color distribution with gradient statistics. Histograms are classified by seven Support Vector Machine (SVM) [16]. The i−th SVM defines a discrimination function fi (x) trained to distinguish between the tiles of class i and those of the other six classes:
fi (x) = bi +
N X j=1
k(x, xj )
N X N X
j=1 k=1
− 21
αij αik k(xj , xk )
,
(3)
where xj , (j = 1, · · · , N ) are the training patterns. The parameters bi , αij are estimated by solving a suitable optimization problem. The discrimination function which assumes the maximum value determines the class assigned to a tile. In our method we used a Gaussian kernel function k: k(x1 , x2 ) = exp(−pkx1 − x2 k22 ).
(4)
The final label assigned to a pixel is determined by a majority vote on the classification results obtained on the tiles which contain the pixel. When the majority vote is below a tunable threshold, the class of the pixel is considered as “unknown”. Our strategy achieved a classification accuracy of about 90% on a test set of 10500 tiles (1500 per class) randomly extracted from 200 images. 3.2
Indoor and Close-ups Image Annotation by Skin Detection
Many different methods for discriminating between skin pixels and non-skin pixels are available. The simplest and most often applied method is to build an “explicit skin cluster” which expressly defines the boundaries of the skin cluster in certain color spaces. The underlying hypothesis of methods based on explicit skin clustering is that skin pixels exhibit similar color coordinates in a properly chosen color space. This type of binary method is very popular since it is easy to implement and does not require a training phase. The main difficulty in achieving high skin recognition rates, and producing the smallest possible number of false positive pixels, is that of defining accurate cluster boundaries through simple, often heuristically chosen, decision rules. We approached the problem of determining the boundaries of the skin clusters in multiple color spaces by applying a genetic algorithm. A good classifier should have high recall and high precision, but typically, as recall increases, precision decreases. Consequently, we adopted a weighed sum of precision and recall as the fitness of the genetic algorithm. Keeping in mind that different applications can have sharply different requirements, the weighing coefficients can be chosen to offer high recall or high precision or to satisfy a reasonable trade-off between these two scores according to application demands
[17]. In the following applications addressing image enhancement, we adopted the boundaries evaluated for recall oriented strategies, leading to a recall of about 89% and a precision of about 40%. 3.3
Indoor and Close-ups Image Annotation by Face Detection
Face detection in a single image is a challenging task because the overall appearance of faces ranges widely in scale, location, orientation and pose, as well as in facial expressions and lighting conditions [18, 19]. Our objective therefore was not to determine whether or not there are any faces, but instead to evaluate the possibility of having a facial region. To do so, we have trained an auto-associative neural network [20] to output a score map reflecting the confidence of the presence of faces in the input image. It is a three layer linear network, where each pattern of the training set is presented to both the input and the output layers, and the whole network has been trained by a back-propagation sum square error criterion, on a training set of more than 150 images, considering not only face images (frontal faces), but non-face images as well. The network processes only the intensity image, so that the results are color independent. To locate faces of different sizes, the input image is repeatedly scaled down by a factor of 15%, generating a pyramid of sub-sampled images. The performance of the face detector is 95% of true positive and 27% of false positive.
4 4.1
Image Content Annotation using Low Level Features Saliency Regions Detection
Visual attention models simulate the human vision system to process and analyze images in order to identify conspicuous regions within the images themselves. These regions (ROIs) can be used to guide the analysis of the images. In the follows we describe the Itti and Koch model [21] that is inspired by the behavior and the neuronal architecture of the early primate visual system. Input is provided in the form of a color image. Nine images with descending spatial scale are created from the original image using dyadic Gaussian pyramids which progressively low-pass filter and sub-sample the input images. Each feature to be extracted is computed with a set of linear “center-surround” operations akin to visual receptive fields, where visual neurons are most sensitive in a small region of the visual space (center), while stimuli presented in a broader region concentric to the center (surround) inhibit the neuronal response. The difference between images at fine (center) and coarse (surround) scales is used to mimic this behavior. A set of features maps are obtained by varying the scale. Three visual features are exploited: color, intensity and orientation. In the follows Θ indicates the center-surround difference operation between a center image c and a surround image s. A feature map for the intensity feature is computed as: I(c, s) = |I(c)ΘI(s)|.
(5)
The second set of maps are computed for the two channel pairs Red/Green and Blue/Yellow as follow: RG(c, s) = |(R(c) − G(c))Θ(G(s) − R(s))|,
(6)
BY (c, s) = |(B(c) − Y (c))Θ(B(s) − Y (s))|.
(7)
Gabor pyramids O(σ, θ), with σ representing the scale and θ representing the orientation, are used to compute the orientation maps. O(c, s, θ) = |O(c, θ)ΘO(s, θ)|.
(8)
In total 42 feature maps are computed: 6 for the intensity, 12 for color and 24 for orientation. Before computing the final saliency map, the feature maps are normalized in order to globally promote maps in which a small number of strong peaks of activity (conspicuous locations) are present, while globally suppress maps which contain numerous comparable peak responses. After the normalization process, all the sets of the normalized maps are combined and averaged into the final saliency map S. 4.2
Image Dominant Color Detection and Classification
The basic idea of our solution is that by analyzing the color distribution of the image in a suitable color space with simple statistical tools, it is possible not only to evaluate whether or not a dominant color is present, but also to classify it. The color gamut of the image is first mapped in the CIELab color space, where the chromatic components and lightness are separate. Intuitively, the 2-dimensional color histogram F (a, b) depends on color properties of the image: for a multicolor image without cast this histogram will show several peaks distributed over the whole ab-plane, while for a single color image, it will present a single peak, or few peaks in a limited region. The more concentrated the histogram and the farther from the neutral axis, the more intense the cast. To study the histogram distribution we use the, respectively the mean values (µa , µb ) and the variances (σa2 , σb2 ) of the histogram projections along the two chromatic axes a, b. We can now associate with each histogram an p Equivalent Circle (EC) with its center in C = (µa , µb ) and a radius ofpσ = σa2 + σb2 , defining in this way a distance D : D = µ − σ, where µ = µ2a + µ2b , a measure of how far the whole histogram (identified by its equivalent circle) lies from the neutral axis (a = 0, b = 0). To characterize the EC quantitatively, we introduce the ratio Dσ = D/σ, as a useful parameter for measuring the intensity of the cast: the greater the Dσ , the stronger the cast. The cast detection step first analyzes the color histogram distribution in the ab chromatic plane, examining its EC and evaluating the statistical values σ,µ, and Dσ . If the histogram is concentrated (small value of σ) and far from the neutral axis (high value of µ and Dσ ), the colors of the image are confined to a small region. This means that there is either a strong cast (to be removed), or what we have called an intrinsic cast (to be
preserved). The latter is usually found in the presence of widespread areas of vegetation, skin, sky, or sea. To distinguish between true cast and predominant color, a statistical classifier exploiting both color and spatial information is used to identify regions that probably correspond to skin, sky, sea, or vegetation [22]. If the classifier considers one of these regions significant, an intrinsic cast is likely, and the image is not subjected to cast removal. Otherwise, the image is identified as a cast image, and processed for color correction. All images without a clearly concentrated color histogram are analyzed with a following procedure based on the criterion that a cast has a greater influence on a neutral region than on objects with colors of high chroma. The color distribution of what we call Near Neutral Objects (NNO) [23, 24] is studied with the same statistical tools described above. A pixel of the image belongs to the NNO region if its chroma is less than an initial fixed value and if it is not isolated, but has some neighbors that present similar chromatic characteristics. If the percentage of pixels that satisfies these requisites is less than a predefined percentage of the whole image, the minimum radius of the neutral region is extended, and the region recursively evaluated. When there is a cast, we may expect the NNO color histogram to show a single small spot, or at most a few spots in a limited region. NNO colors spread around the neutral axis, or distributed in several distinct spots indicate instead that there is no cast. The statistical analysis that we have performed on the whole histogram is now applied to the NNO histogram, allowing us to distinguish among three cases: cast images, no cast images, and ambiguous ones. If we define, with obvious meaning, σN N O ,µN N O , and DσN N O , NNO histograms that are well defined and concentrated far from the neutral axis will have a positive DσN N O : the image has a cast. With negative values of DσN N O the histogram presents an almost uniform spread distribution around the neutral axis, indicating no cast. Exploiting the experience of a large number of images we set as thresholds DσN N O = 0.5 for cast images and DσN N O = −0.5 for no cast images. Cases falling in the interval between −0.5 and 0.5 are considered ambiguous.
5 5.1
Image Processing and Enhancement Color Balancing
To recover the original appearance of the scene, under different lighting and display conditions, the measured RGB values must be transformed. Many algorithms have been developed for color balancing based on the Von Kries assumption that each sensor channel is transformed independently. The gray world, and the white balance, which correspond respectively to average-to-gray and normalize-to-white, are perhaps the most commonly used [3, 25]. The gray world algorithm assumes that, given an image with a sufficient amount of color variations, the mean value of the R, G, B components of the image will average out to a common gray value. Once we have chosen the common gray value, each color component is scaled by the averages of the three RGB
channels. There are several versions of the white balance algorithm: the basic concept is to set at white a point or a region that appears reasonably white in the real scene. Once it has been set, each color component is scaled by the averages of the three RGB channels in this region. However, when one of these color balance adjustments is performed automatically, with no knowledge of the image source, the mood of the original scene is often altered in the output image. The cast removal method we propose is applied only to those images classified as having a clear cast, or an ambiguous cast in order to avoid problems such as the mistaken removal of predominant color, or the distortion of the image’s chromatic balance. The main peculiarity of the method lies in how it determines the White Balance (WB) region, which also depends on the strength of the cast. To avoid the mistaken removal of an intrinsic color, regions previously identified by the cast detector as probably corresponding to skin, sky, sea or vegetation, are temporarily removed from the analyzed image. At this point the algorithm differentiates between images with a clear cast and images with a low, or ambiguous one. This color balance algorithm can be considered a mixture of the white balance and gray world procedures. The WB region is formed by several objects of different colors, which are set at white not singularly, but as an average. This prevents the chromatic distortion that follows on the wrong choice of the region to be whitened. When there is a clear cast, the colors of all the objects in the image suffer the same shift due to that specific cast. If we consider the average over several objects, introducing a kind of gray world assumption, only that shift survives and will be removed. Note that for images where the WB region consists of a single object, such as is usually the case of images with a strong cast, our method tends to correspond to the conventional white balance algorithm, while for images where the WB region includes more objects (as in images with a softer cast), the method tends to the gray world algorithm. We have compared our results on 746 cast images, with those obtained with conventional white patch and gray world algorithms. We collected the majority opinion of the five experts. This analysis, gives our algorithm as consistently better than the gray world with the exception of only a few cases (10), when it is judged equal. In comparison with the white patch algorithm, our method is judged better in 40% of the cases, while in the rest it is judged equal. In the great majority of the experimented cases, the processed image was held to be superior to the original; only in less than 2% of the 746 images processed was the output judged worse than the original. 5.2
Local Contrast Enhancement
The original dynamic range of a scene is generally constrained into the smaller dynamic range of the acquisition system. This makes it difficult to design a global tone correction that is able to enhance both shadow and highlight details. Several methods for adjusting image contrast, [4, 5, 9–11], have been developed in the field of image processing for image enhancement. In general, it is possible to discriminate between two classes of contrast corrections: global and local
corrections. With global contrast corrections it is difficult to accommodate both lowlight and highlight details. The advantage of local contrast corrections is that it provides a method to map one input value to many different output values, depending on the values of the neighboring pixels and this allows for simultaneous shadow and highlight adjustments. Our contrast enhancement method is based on a local and image dependent exponential correction [26]. The simplest exponential correction, better known as gamma correction, is common in the image processing field, and consists in elaborating the input image through a constant power function. This correction gives good results for totally underexposed or overexposed images. However, when both underexposed and overexposed regions are simultaneously present in an image, this correction is not satisfactory. As we are interested in a local correction, the exponent of the gamma correction used by our algorithm is not a constant. Instead, it is chosen as a function that depends on the point to be corrected, on its neighboring pixels and on the global characteristics of the image. This function is also chosen to be edge preserving to eliminate halo artifacts. Usually it happens, especially for low quality images with compression artefacts, that the noise in the darker zones is enhanced. To overcome this undesirable loss in the image quality, a further step of contrast enhancement was added. This step consists of a stretching and clipping procedure, and an algorithm to increase the saturation. 5.3
Selective Edge Enhancement
The key idea is to process the image locally according to topographic maps obtained by a neurodynamical model of visual attention, overcoming the tradeoff between smoothing and sharpening typical of the traditional approaches. In fact, only high frequencies corresponding to regions that are non-significant to our visual system are smoothed while significant details are sharpened. In our algorithm, the enhanced image fe (x, y) is obtained correcting the original f (x, y) by subtracting non-salient high frequencies fhN S (x, y), which produces a smoothing effect, and adding salient high frequencies, fhS (x, y), with a consequent edge sharpening. In formula: fe (x, y) = f (x, y) − λfhN S (x, y) + λfhS (x, y).
(9)
Salient high frequencies, fhS (x, y) are obtained weighting all the image high frequencies through a topographic map corresponding to visually salient regions, obtained by the neurodynamical model of visual attention described in Section 4.1. This permits to enhance the edges differently without enhancing the perceived noise. On the other hand, non-salient high frequencies, fhN S (x, y) are simply obtained weighing all the high frequencies through the complement of the same topographic map, causing an adaptive smoothing. In our implementation, the high frequencies fh (x, y) of Equation 9 are evaluated using the Laplacian, which is the simplest isotropic derivative operator. It should be noted that our enhancement method is actually independent from the high-pass filter used.
5.4
Adaptive Image Cropping
Some of the efforts that have been put on image adaptation are related to the ROI coding scheme introduced in JPEG 2000 [27]. Most of the approaches for adapting images only focused on compressing the whole image in order to reduce the data transmitted. Few other methods use an auto-cropping technique to reduce the size of the image transmitted [28, 29]. These methods decompose the image into a set of spatial information elements (saliency regions) which are then displayed serially to help users’ browsing or searching through the whole image. These methods are heavily based on a visual attention model technique that is used to identify the saliency regions to be cropped. We designed a self-adaptive image cropping algorithm exploiting both visual and semantic information [30]. Visual information is obtained by a visual attention model, while semantic information relates to the automatically assigned image genre and to the detection of face and skin regions. The processing steps of the algorithm are firstly driven by the classification phase and then further specialized with respect to the annotated face and skin regions. The classification of the images to be cropped allows us to build a cropping strategy specific for each image class. The efficacy of the strategy is maximized by taking into account the properties of the image and focusing our attention to some objects in the image instead of others. A landscape image, for example, due to its lack of specific focus elements, is not processed at all: the image is regarded as being wholly relevant. A close up image, generally, shows only a single object or subject in the foreground, and thus, the cropping region should take into account only this discarding any region that can be considered as background. In case of an image belonging to the other class, we are concerned in whether it contains people or not. In the first case, the cropping strategy should prioritize the selection of regions containing most of the people. To perform this, we first analyze the images in order to detect candidate face regions. These regions are then validated by checking the amount of skin present and those that have a given amount of skin are selected as true face regions. Regions detected by the visual attention model, skin regions, and face regions are then combined together to detect the final relevant regions to be cropped. Here we focus our interest in checking the presence of people, but the analysis can be further specialized for the identification of other objects. Images without people are processed with a pipeline similar to the one for the close up images. The main difference lays in the fact that more than one cropping region can be returned by the algorithm. Although the visual attention model is used in all the different cropping strategies, it requires a tuning of several parameters that influence the results depending on the image content. The image classification phase allows us to heuristically tune these parameters at the best. The last phase in the adaptive image cropping aims at adjusting the cropping region such that the overall information is maximized and all the space in the output display is used. The region is rotated if the orientation of the region differs from the normal orientation of the display. Further, the region is enlarged and/or trimmed in order to retain
the display’s aspect ratio maintaining the cropped region centered. Finally, the cropped image is resized to the size of the display. To evaluate the goodness of the proposed image cropping strategy, we collected the judgments of a panel of five nonprofessional photographers which can be summarized as follows: about 7% of the output images were judged worse than the original image scaled to fit the small screen display; about 40% were judged equivalent (the great majority of landscape images are correctly classified and since they are not cropped according to our procedure they are equivalent to the resized original), while the remaining 53% of the images were judged better. Without considering landscapes, the percentage of images judged better increases. The worst results are mostly due to the errors introduced by the face detector. Most of the misclassified images by the CART algorithm are still cropped acceptably.
6
Conclusions
The main objective of image enhancement is to compensate image degradations by altering local and global image features. We strongly believe that the most appropriate enhancement strategy can be safely and effectively applied only if the image to be processed is semantically annotated. To this end we have designed a modular, fully automatic, content-aware enhancement procedure. The modular structure adopted will make it possible to easily replace any of the implemented algorithms with a more efficient/effective one in the future. Also, the overall procedure can be improved by simply inserting new modules for other image artifacts.
References 1. Buchsbaum, G., A spatial processor model for object color perception, Journal of Franklin Institute 310, (1980) 1–26. 2. Cardei, V., Funt, B., Barnard, K., White Point Estimation for Uncalibrated Images, Proc. of the IS&T/SID Seventh Color Imaging Conference, (1999) 97–100. 3. Barnard, K., Cardei, V., Funt, B., Comparison of Computational Color Constancy Algorithms-Part I: Methodology and Experiments with Synthesized Data, IEEE Transactions on Image Processing Vol. 11(9), (2002) 972–983. 4. Tomasi, C., Manduchi, R., Bilateral filtering for gray and color images, Proc. IEEE Int. Conf. on Computer Vision, (1998) 836–846. 5. Moroney, N., Local colour correction using non-linear masking, Proc. IS&T/SID Eighth Color Imaging Conference, (2000) 108–111. 6. Kashyap, R.L. A robust variable length nonlinear filter for edge enhancement and noise smoothing, Proc. of the Int. Conf. on Signal Processing, (1994) 143–145. 7. Polesel, A., Ramponi, G., Mathews, V.J., Image Enhancement Via Adaptive Unsharp Masking, IEEE Transactions on Image Processing, Vol. 9(3), (2000) 505–510. 8. Land, E., The Retinex Theory of Color Vision, Scientific American 237, (1997) 108–129. 9. Rahman, Z., Jobson, D., Woodell, G., Retinex processing for automatic image enhancement, Journal of Electronic Imaging, Vol. 13(1), (2004) 100–110.
10. Rizzi, A., Gatta, C., Marini, D., A new algorithm for unsupervised global and local color correction, Pattern Recognition Letters, 24, (2003) 1663–1677. 11. Meylan L., Susstrunk, S., Bio-inspired image enhancement for natural color images, Proc. IS&T/SPIE Electronic Imaging, Vol. 5292, (2004) 46–56. 12. Naccari, F., Battiato, S., Bruna, A., Capra, A. Castorina, A., Natural Scene Classification for Color Enhancement, IEEE Trans. on Consumer Electronics, Vol. 5(1), (2005) 234–239. 13. Schettini, R., Brambilla, C., Cusano, C., Ciocca. G., Automatic classification of digital photographs based on decision forests, Int. Journal of Pattern Recognition and Artificial Intelligence, Vol. 18(5), (2004) 819–845. 14. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., Classification and Regression Trees, Wadsworth and Brooks/Cole (1984). 15. Cusano, C., Gasparini, F., Schettini, R., Image annotation for adaptive enhancement of uncalibrated color images, LNCS, V. 3736, (2005) 216–225. 16. Cortes, C., Vapnik, V., Support-Vector Networks, Machine Learning, Vol. 20(3), 273–297. 17. Gasparini, F., Schettini, R., Skin segmentation using multiple thresholding, Proc. Internet Imaging VII, Vol. 6061 (S. Santini, R. Schettini, T. Gevers eds.), (2006) 1–8. 18. Rowley, H., Baluja, S., Kanade, T., Neural Network-Based Face Detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20,(1), (1998) 23–28. 19. Yang, M. H, Kriegman, D.J., Ahuja, N., Detecting Faces in Images: A Survey, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24(1), (2002) 34–58. 20. Gasparini, F., Schettini, R., Automatic redeye removal for smart enhancement of photos of unknown origin, Lecture Notes in Computer Science, Vol. 3736, (2005) 226–233. 21. Itti, L., Koch, C., A model of saliency based visual attention of rapid scene analysis, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20, (1998) 1254– 1259. 22. Ciocca, G., Cusano, C., Schettini, R., Image annotation using SVM, Proc Internet imaging V, Vol. SPIE 5304 (S. Santini, R. Schettini eds.), (2004) 330–338. 23. Cooper, T., A Novel Approach to Color Cast Detection and Removal in digital Images, Proc. SPIE Vol 3963, (2000) 167–175. 24. Cooper, T., Color Segmentation as an Aid to White Balancing for Digital Still Cameras, Proc. SPIE, Vol. 4300, (2001) 164–171. 25. Barnard, K., Cardei, V., Funt, B., A Comparison of Computational Color Constancy Algorithms-Part I: Experiments with Image Data, IEEE Transaction On Image Processing, Vol. 11(9), (2002). 26. Capra, A., Corchs, S., Gasparini, F., Schettini, R., Dynamic Range Optimization by Local Contrast Correction and Histogram Image Analysis, Proc. IEEE International Conference on Consumer Electronics, ICCE 2006, (2006) 309–310. 27. Christopoulos C., Skodras A., Ebrahimi T., The JPEG2000 still image coding system: an overview, IEEE Trans. Consumer Electronic, Vol. 46(4), (2000) 1103– 1127. 28. Chen, L., Xie, X., Fan, X., Ma, W., Zhang, H., Zhou, H., A visual attention model for adapting images on small displays, Multimedia Systems, 9, (2003) 353–364. 29. Suh, B., Ling, H., Bederson, B.B., Jacobs, D.W., Automatic Thumbnail Cropping and its Effectiveness, In Proc. UIST’03, (2003) 95–104. 30. Ciocca, G., Cusano, C., Gasparini, F., Schettini, R., Self Adaptive Image Cropping for Small Displays, Proc. IEEE Int. Conf. on Consumer Electronics (ICCE), (2007) 10–14.