Automatic Image Event Segmentation and Quality Screening for Albuming Applications Alexander C. Loui
Andreas E. Savakis
Imaging Science and Technology Laboratory Eastman Kodak Company Rochester, New York 14650-1816 alexander.loui@ kodak.com
Department of Computer Engineering Rochester Institute of Technology Rochester, New York 14623
[email protected]
Abstract In this paper, a system for automatic albuming of consumer photographs is described, and its specific core components of event segmentation and screening of low quality images are discussed A novel event segmentation algorithm was created to automatically cluster pictures into events and sub-events for albuming, based on datehime metadata information as well as color content of the pictures. A new quality-screening algorithm is developed based on object quality measures, to detect problematic images due to underexposure, low contrast, and camera defocus or movement. Performance testing of these algorithms was conducted using a database of real consumer photos and showed that these functions provide a useful firstcut album layout for typical rolls of consumer pictures. A first version of the automatic albuming application software was tested through a consumer trial in the United States from August to December 1999.
I. INTRODUCTION Placing pictures in albums is an activity that many people enjoy, yet the vast majority of pictures are never placed in albums because of the time and effort required to complete the albuming process. With the proliferation of digital cameras, scanners, and internet imaging, the volume of digitized photographs steadily increases and it is desirable to automatically generate albums for people that do not have the time or inclination to do it on their own. The main goal of this work is to help people organize their pictures so that they will be able to convey their story effectively. There are many ways people organize pictures for storytelling. One popular method is to sort pictures in chronological order and organize them by events. Another way is to sort pictures by subject, i.e., by a particular person, a pet, etc. The main area of focus in this paper is how to automatically classify events in general consumer pictures. This is a difficult task since we have very limited or no contextual information about the picture content, and the final interpretation could be subjective. Our approach to this task is to combine one important piece of contextual data, the date and time information, with
0-7803-6536-4/00/$10.00 (c) 2000 IEEE
correlation between pictures content through image understanding, for event classification. In multimedia systems that automatically generate albums, the detection and screening of low quality images is an important function to ensure the overall quality of the final photo album. Additionally, such screening can eliminate undesired pictures from final printing, thus reducing costs for the consumer. Undesirable images may need to be screened because of low overall image quality, poor subject matter or rendering problems. The screening of low quality images is addressed here via an efficient method based on objective quality measures such as sharpness, contrast, and exposure. The paper is organized as follows. Section 2 describes the image event segmentation algorithm using K-means clustering and a block-based color histogram correlation technique. Section 3 describes the screening of low quality images, including a description of the objective quality metrics. Section 4 discusses the performance of these techniques using a database of real consumer pictures. Finally Section 5 gives some concluding remarks and suggestions for future work.
11.IMAGE EVENT SEGMENTATION The goal of event segmentation is to automatically sort and cluster images from sets/rolls of pictures into separate events, and within each event in separate groups of similar subject content. The event-clustering algorithm organizes pictures into events and sub-events, based on two types of information: date and time of picture capture, and content similarity between pictures. If date and time information is not available, as is the film, the algorithm relies on image case with 35" content information. The basis of using time information for clustering is the assumption that most people arrange their photos in roughly chronological order. Moreover, we know that typically time differences between pictures in an event (or a sub-event) are smaller than time differences between pictures from different events. In the event segmentation algorithm, K-means clustering [l] is used based on date and time information. In addition, image content information is
1125
extracted to determine similarity between consecutive pictures using a block-based color histogram correlation method. The following steps are carried out according to Figure 1: 1. Run the date/time clustering algorithm (Section 2.A) to determine event boundaries. 2. Check the color similarity of images at event boundaries to verify that the images indeed differ. 3. Within each event cluster, perform comparisons using the block-based histogram algorithm (Section 2.B), so that each cluster is composed of several groups of pictures. 4. Check, using the date and time information, whether the separations between the groups are “time logical”, i.e., if there are meaningful separations in time. If not, the groups are merged. 5. Check the subject arrangement within an event to group similar pictures together. 6 . Finally, refinement is carried out to check if there are too many groups with an isolated picture, and whether some of them can be merged.
A. Dateflime Clustering The date/time clustering algorithm consists of the following steps (see Figure 2): 1. Extract the date and time information from the picture metadata. Convert them into minutes that will be the base unit of our algorithm. 2. Compute the time difference histogram, and perform an appropriate scaling of the time difference axis. 3. Divide the histogram in two parts using a 2-means clustering algorithm. The cluster with higher values contains the time differences corresponding to separations between events. 4. Identify the clusters based on these separations. In step 2 above, a scaling of the time difference axis has been used to achieve better segmentation. For example, consider a roll containing three events, the first two events are separated by two weeks and the second and third events are separated by 3 months. As the difference between two weeks and zero is much smaller than the difference between three months and two weeks, the algorithm will only detect two events instead of three without appropriate scaling. B. Block-based Histogram Correlation Content similarity of images is determined using a block-based histogram correlation technique. Each image is divided into a number of blocks, and a color histogram for each block is computed. An image similarity value between two pictures is obtained by comparing the set of block histograms of a reference picture to that of a candidate picture. The histogram intersection value is used for similarity measure [2]. For each block of the reference image we keep the intersection value that provides the best match, and then
0-7803-6536-4/00/$10.00 (c) 2000 IEEE
compute the average. The average value gives an estimate of global similarity between the two pictures. If the average intersection is below a low threshold we can say that the two pictures are sufficiently different and may not be part of the same event. If the average intersection is above a high threshold we can say that the two pictures are considered similar enough to be in the same event. If the average intersection is between these two thresholds, we need to perform more analysis to determine whether the pictures belong to the same event. Sometimes two pictures are very similar, for example with almost the same background and the same subject (a person), but the subject is not exactly at the same place in both pictures. To detect such differences between images, we keep the best intersection values for the reference block and the coordinates of the corresponding candidate blocks, which gives the predominant direction for the closest match. Then we move the “common window” between the two images in that direction and do the block comparison again to check if the new results are better. As mentioned above, we may want to determine whether two pictures have similar subject matter. The idea is to create a “best intersections mapping” containing intersection values instead of blocks of pixels. This map is built as follows: for each block comparison we keep the n-best intersections and the coordinates of the corresponding blocks and put them in the map. These best intersections mapping will have the size of the “common window” between the two pictures. Then we divide that image into three parts: the left, the center, and the right. Thus, if the comparison between two images gives a medium average intersection (so we cannot make a decision) but the center average intersection of the “best intersections mapping” is very we can infer that is the same subject for the two images.
111. SCREENING OF LOW QUALITYIMAGES Methods for screening low quality images based on sharpness and contrast have been developed for sometime [3]. In this paper, we implement sharpness and contrast measures that are derived from the edge histogram of the image. First the image is cropped at 20% level along the border and converted to grayscale. The image edges are detected using the Sobel operator after running a 3x3 averaging filter to reduce noise. The edge histogram is formed and the regions that contain the strongest edges, i.e. above the 90” percentile of the edge histogram, are identified. These regions are refined through median filtering, and their edge statistics are computed. The average of the strongest edges provides an estimate of sharpness, and the standard deviation provides an estimate of the contrast.
1126
There are cases where low edge strength is not due to the fact that the image is of low quality. In a number of landscape images showing the open horizon from mountaintops or boats, the overall edge strength is low, but the image should not be excluded. To retain these types of images, the following blueness measure was employed: 1 if B > max(R,Tb) Blueness = 1 if G > max(R,Tb) (1) 0 otherwise where Tb is a threshold value to ensure that .the Blueness color is not black. Blueness is turned on for either green or blue pixels that may occur due to sky, water, or foliage in a landscape. If blueness is on for the majority of the image pixels, the image is not screened, unless its sharpness or contrast is extremely low. The Blueness exception works well for landscape images that do not have strong edges due to atmospheric conditions, shooting distance and lack of a dominant subject. An estimate of underexposure or overexposure is obtained by introducing a darkness parameter computed as follows: 1 if max(R,G,B)