Logotype Detection Concept in Video Stream Based on Features Extraction and Features Clustering from Contours and Colors
Mitja Gomboc
Iztok Kramberger
Teletech d.o.o. Raziskovalno razvojna enota Teletech Poljska ulica 6, 2000 Maribor, Slovenia
[email protected]
University of Maribor Faculty of Electrical Engineering and Computer Science Smetanova ulica 17, 2000 Maribor, Slovenia
[email protected]
Abstract—In this paper we propose an approach to identify logotypes in video stream. Proposed concept is based on feature extraction process which combines contours and color information. Extracted contours from video stream are used to calculate image invariant moments to describe the shape of the logotype. Furthermore color information is extracted from the normalized color histograms to evaluate the dominant colors. To find the regions of interest we use some additional features derived from contours position, distance and color. After we determine the regions of interest the features are extracted and clustered together to unambiguously represent logotypes. Keywords; video analysis, feature extraction, contours, color histograms, data clustering
I.
INTRODUCTION
The humans easily interpret visual or hearing information into something meaningful or discard it right away. The human brain divides visual information into many channels that streams different kinds of information into your brain. Your brain has an attention system that identifies important parts of an image to examine while suppressing examination of other areas. There are widespread associative inputs from all senses that allow the brain to draw on cross-associations made from years of living in the world.
searching. General content recognition in all digital sources like images or videos is one of the primary challenges for computer vision. Important part of the computer vision is also the source of digital media. The acquired information should contain enough details adequate to further image processing. To identify the logotypes we have to assume some limitations. The basic limitations are made from human point of view. Human visual perception of the video has to include enough visual information. There are some important characteristics for human detection like size of the logotype in current frame. Color contrast in the scene has to be high enough that logotype does not blend in with the image background. Continuous duration in video stream has to be long enough to be able to visually percept the logotype. Proposed approach combines features based on edge detection and color extraction. Edge detection algorithms are used to extract features such as contours. Contours are used to calculate image invariant moments to describe the shapes of contours and to distinguish different contours from each other. Color features are based on color extraction from color histogram from the regions where the contours are located. The regions of interest in video stream are described with these unique features to represent specific visual information like logotypes.
In a computer vision system the visual information is represented with a grid of numbers. For the most part, there is no pattern recognition, no automatic control over senses and no cross-associations from previous experiences. The grid of numbers is all the computer receives from visual perception. Based on the received data the computer vision tries to extract visual information for a specific task [1].
Extracted features from video stream could also be used for other post processing tasks:
The hardest part for computer vision is to interpret visual content on the same level as human brains do. Some simple task like logotype detection becomes a challenging task for computer vision.
This paper is organized as follows. General overview of video analysis and logotypes detection is presented in section 2. In section 3 we describe the features used in our approach. Section 4 describes logotype detection concept. Next section presents the experimental results. Conclusions and further work is given in section 5.
Automatic multimedia and video analysis is desired in wide ranges of applications for content indexing or content
•
Automatic video analysis.
•
Video indexing.
•
Content classification.
II.
VIDEO ANALYSIS AND LOGOTYPES DETECTION
A. Video analysis For efficient video analysis we require sufficient video quality. The base video quality depends from video source (air or cable) and video transmission (analog or digital). In case the video source has to low quality there can be: too much interference present, noise level to high, video resolution to low or video compression can introduce some amounts of distortion and artifacts in the video image. The goal of video analysis is to automatically segment, index and label the video sequences in video stream. For semantic features detection in video sequence low level feature extraction is performed on the key frames of the video and a feature group including color and texture features is formed. A region that contains all the high level features is constructed using a clustering method forming a model that contains all the information [2]. Semantic labeling of images in video sequences can be efficiently achieved with spatiotemporal video segmentation. The process of spatiotemporal semantic segmentation can be subdivided in two stages. Firstly, the video sequence is split into small block of frames. Spatiotemporal regions are extracted and labeled individually within each block. Then, we iteratively merge consecutive blocks by a matching procedure which considers both semantic and visual properties [3]. Visual content descriptors and extraction of them is a crucial problem for state of the art visual information analysis. Detection of visual objects in video sequences, extraction of visual descriptors can be achieved with extraction of regions using an efficient contours technique. After detection calculations are made for visual descriptions of the regions including color, motion and shape features that are invariant to affine transformations. The features describe the visual objects extracted from video [4]. B. Logotypes detection Logotype is a graphical element that represents a trademark or commercial brand. Typically they are designed for immediate recognition because they identify corporations or non-commercial organizations. We come across logotypes in different kind of media such as document papers, television broadcast, video streams and images. Automatic logo detection and recognition continues to be of great interest to the document retrieval community as it enables effective identification of the source of a document [5]. There are different approaches in this area some use segmentation free and layout independent methods other use other principal properties of logotypes like spatial compactness and colorimetric uniformity [6]. TV logos are generally in most broadcast stations to claim video content ownership or visually distinguish the broadcast from the interrupting commercial block. Detecting and tracking a TV logo is of interest to TV commercial skipping applications and logo-based broadcasting surveillance. There are different approaches based on traditional methods such as frame difference, pixel based or edge matching. Some more
advance techniques use instead of single frame based detection the temporal correlation of multiple consecutive frames to extract logo masks and match it against template masks [7]. Others use unsupervised methods to learn logo or logo movement and extract features to describe logo. The learned information is used to detect logos [8]. III.
FEATURES
Simple features like color and shape are very easy to understand for humans. However the computer vision calculates a number of different features from pixel information and connects them together to describe a visual object. There are numerous features that can be used for image processing of visual information. A. Edges Common method for finding edges is the Canny edge detector. One of the differences between the Canny algorithm and other simpler algorithm is that, in the Canny algorithm, the first derivatives are computed in x and y and then combined into four directional derivatives. The most significant new dimension to the Canny algorithm is that it tries to assemble the individual edge candidate pixels into contours [1]. B. Contours Contours represent 2-D shapes based on detected edges. From contours we can calculate additional features like moments. Contours are linking element between edges and moments. C. Moments Shape analysis from contours starts with the second-order moments. Moments provide a framework for scale and rotation invariant shape parameters. The most challenging task is to find a unique set of features from contours. This means that different shapes must not be mapped into the same set of features [9]. Hu’s invariant moments are results of the theory of algebraic invariants, where he derived seven invariants to rotation of 2-D objects [10]. D. Color histograms Histograms can be used to represent the color distribution of an object. The use of histograms to characterize color information is a common technique for image indexing and retrieval. There are different types of techniques like the edge gradient histogram or the co-occurrence edge color histogram [11]. Histograms are simply collected counts of the underlying data organized into a set of predefined bins. They can be populated by counts of features computed from the data, such as color or any other characteristic [1]. We used normalized color histograms to extract dominant color information from regions of interest. E. Regions of interest The goal of a region detector is the accurate determination of objects location, size and shape. To efficiently approximate these parameters we have to determine regions of interest within each frame. Regions of interest are determined on the
basis of contours position. Area containing contour is described with a simple rectangle. The rectangle represent basic region of interest for further analysis and feature extraction. F. Features extraction and clustering Single features are extracted from every region of interest and clustered together to describe the visual information about that region of interest. Every region has attached information about position, moments and color histogram. IV.
LOGOTYPES DETECTION CONCEPT
Our approach bases on features extraction and feature clustering to determine regions of interest which are candidates for logos or logotypes of different trademarks. First step is to find the region candidates where logotypes are positioned. Basic regions are determined from features based on edges connected into contours. Next step is to cluster similar nearby regions based on features from color, position and size. They are clustered together into bigger regions. Afterwards we extract features from every region. Features are composed from color histogram, moments from contours and information about region, size and position. Basic flowchart diagram of the logotype detection concept algorithm is presented in Fig. 1. Concept has two levels of features clustering. First is used to connect similar regions and second to extract logotype features. Our concept consists of the following steps: •
Detect regions of interest from features extracted from video frame (edge detection and contours).
•
Cluster or discard regions of interest based on color histograms, positioning of adjacent regions and area of the regions.
•
Extract all features from clustered regions of interest including invariant moments.
•
Features processing to detect logotypes. V.
EXPERIMENTAL RESULTS
Experimental results were obtained using OpenCV framework and C++ programming language. Test video stream includes sports and news broadcast taken from local digital cable television. Video streams were recorded at resolution 720×576. Number of regions detected bases on number of contours found in frame. We can reduce the number of regions if we resize the frame used for contour extraction. Table 1 shows average number of regions found based on the resize factor in different video streams. With this method we reduce the number of regions of interest. However we have to be careful when selecting resize factor. On original image size there are a lot of additional small regions that contain no specific visual information. These small regions are like noise. As we increase the resize factor the number of regions reduces and small regions disappear. When we set resize factor too high the visual information clusters into bigger regions and we lose details that may be important. We have chosen the resize factor of 4 for further testing with dominant colors. To further reduce the number of regions that do not include essential visual information we use color information. From regions of interest we extract the dominant colors base on the color histogram and we set the various threshold levels for dominant colors. Table 2 shows results with various thresholds levels for dominant colors. The maximum threshold level 100 percent means presence of single color regions. Logotypes are typically design for immediate recognition therefore we can assume they are not painted with a lot of different colors. Regions of interest with a lot of colors without specific dominant colors can therefore be discarded from further processing. We can assume the same for regions with only one color.
Detect regions of interest
TABLE I. Resize Edge detection Contours
Video stream
Frame Color grouping Position grouping
AVERAGE NUMBER OF REGIONS OF INTEREST WITH DIFFERENT CONTOUR RESIZE FACTOR
Video stream
2
4
8
16
Sports
2875
1015
298
101
34
News
1960
779
274
95
31
TABLE II. Features extraction
AVERAGE NUMBER OF REGIONS OF INTEREST WITH VARIOUS THRESHOLD LEVEL FOR DOMINANT COLORS
Video Stream Feature sets of logotype candidates
Threshold for dominant colors 100%
80%
60%
40%
20%
Sports
298
274
235
158
53
News
274
244
208
133
33
Contours Invariant moments
Contour resize factors 1
Color histogram Position
Figure 1. Logotype detection diagram
VI.
CONCLUSION AND FURTHER WORK
After detection of regions of interest with possible logotypes we can start with next step that includes features clustering and features processing to detect and recognize logotypes. For efficient clustering of extracted features we will have to implement first level of artificial intelligence. The idea is to put similar features with same properties in the same cluster. These properties include information about color, shape, size, distance and position. Furthermore for logotype recognition we need a predefined set of logotypes from previous frames or from external database of known logotypes. The concept shows how various simple features like contours and color histograms can be extracted from video stream and clustered together to represent more advance features to describe visual information. To optimize detection of logotype candidates the simple features are filtered out based on assumption made from human point of view. The most important features to separate logotype candidates from others are size of the region and dominant colors present in that region. REFERENCES [1] [2]
G. Bradsky and A. Kaehler, “Learning OpenCV,” O’Reilly Media, September 2008, First Edition. E. Spyrou, G. Koumoulos, Y. Avrithis, S. Kollias, “Using Local Region Semantics for Concept Detection in Video,” 1st International
Conference on Semantics And digital Media Technology (SAMT 2006), Athens, Greece, 2006. [3] E. Galmar, T. Athanasiadis, B. Huet, Y. Avrithis, “Spatiotemporal Semantic Video Segmentation,” Multimedia Signal Processing, 2008 IEEE 10th Workshop, pp. 574–579, Cairns, Qld, October 2008. [4] P. Tzouveli, G. Andreou, G. Tsechpenakis, Y. Avrithis and S. Kollias, “Intelligent Visual Descriptor Extraction from Video Sequences,” Lecture Notes in Computer Science – Adaptive Multimedia Retrieval, Springer-Verlag, Vol. 3094, 2004, pp. 132–146, Springer Berling / Heidelberg, January 2004. [5] Guangyu Zhu, D. Doermann, “Automatic Document Logo Detection,” Document Analysis and Recognition, 2007 (ICDAR 2007), Ninth International Conference, Volume 2, pp. 864–868, September 2007. [6] Z. Ahmed, H. Felia, “Logos extraction on picture documents using shape and color density,” Industrial Electronics, 2008. ISIE 2008. IEEE International Symposium, pp. 2492–2496, Cambridge, July 2008. [7] Jinqiao Wang, Lingyu Duan, Zhenglong Li, Jing Liu, Hanqing Lu, J.S. Jin, “A Robust Method for TV Logo Tracking in Video Streams,” Multimedia and Expo, 2006 IEEE International Conference, pp. 10411044, Toronto, Ont., July 2006. [8] Qiao Huang, Jianming Hu, Wei Hu, Tao Wang, Hongliang Bai, Yimin Zhang, “A Reliable Logo and Replay Detector for Sports Video,” Multimedia and Expo, 2007 IEEE International Conference, pp. 16951698, Beijing, July 2007. [9] B. Jähre, “Practical Handbook on Image processing for Scientific and Technical Applications,” CRC Press LLC, 2004, Second Edition. [10] J. Flusser, “Moment Invariants in Image Analysis,” Proceedings of World Academy of Science, Engineering and Technology, Vol. 11, pp. 196–201, February 2006. [11] A. Hesson, D. Androutsos, “Logo and trademark detection in images using Color Wavelet Co-occurrence Histograms,” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference, pp. 1233–1236, March 2008.