Logotype Detection Concept in Video Stream Based

Logotype Detection Concept in Video Stream Based on Features Extraction and Features Clustering from Contours and Colors

Mitja Gomboc

Iztok Kramberger

Teletech d.o.o. Raziskovalno razvojna enota Teletech Poljska ulica 6, 2000 Maribor, Slovenia [email protected]

University of Maribor Faculty of Electrical Engineering and Computer Science Smetanova ulica 17, 2000 Maribor, Slovenia [email protected]

Abstract—In this paper we propose an approach to identify logotypes in video stream. Proposed concept is based on feature extraction process which combines contours and color information. Extracted contours from video stream are used to calculate image invariant moments to describe the shape of the logotype. Furthermore color information is extracted from the normalized color histograms to evaluate the dominant colors. To find the regions of interest we use some additional features derived from contours position, distance and color. After we determine the regions of interest the features are extracted and clustered together to unambiguously represent logotypes. Keywords; video analysis, feature extraction, contours, color histograms, data clustering

I.

INTRODUCTION

The humans easily interpret visual or hearing information into something meaningful or discard it right away. The human brain divides visual information into many channels that streams different kinds of information into your brain. Your brain has an attention system that identifies important parts of an image to examine while suppressing examination of other areas. There are widespread associative inputs from all senses that allow the brain to draw on cross-associations made from years of living in the world.

searching. General content recognition in all digital sources like images or videos is one of the primary challenges for computer vision. Important part of the computer vision is also the source of digital media. The acquired information should contain enough details adequate to further image processing. To identify the logotypes we have to assume some limitations. The basic limitations are made from human point of view. Human visual perception of the video has to include enough visual information. There are some important characteristics for human detection like size of the logotype in current frame. Color contrast in the scene has to be high enough that logotype does not blend in with the image background. Continuous duration in video stream has to be long enough to be able to visually percept the logotype. Proposed approach combines features based on edge detection and color extraction. Edge detection algorithms are used to extract features such as contours. Contours are used to calculate image invariant moments to describe the shapes of contours and to distinguish different contours from each other. Color features are based on color extraction from color histogram from the regions where the contours are located. The regions of interest in video stream are described with these unique features to represent specific visual information like logotypes.

In a computer vision system the visual information is represented with a grid of numbers. For the most part, there is no pattern recognition, no automatic control over senses and no cross-associations from previous experiences. The grid of numbers is all the computer receives from visual perception. Based on the received data the computer vision tries to extract visual information for a specific task [1].

Extracted features from video stream could also be used for other post processing tasks:

The hardest part for computer vision is to interpret visual content on the same level as human brains do. Some simple task like logotype detection becomes a challenging task for computer vision.

This paper is organized as follows. General overview of video analysis and logotypes detection is presented in section 2. In section 3 we describe the features used in our approach. Section 4 describes logotype detection concept. Next section presents the experimental results. Conclusions and further work is given in section 5.

Automatic multimedia and video analysis is desired in wide ranges of applications for content indexing or content

•

Automatic video analysis.

•

Video indexing.

•

Content classification.

II.

VIDEO ANALYSIS AND LOGOTYPES DETECTION

A. Video analysis For efficient video analysis we require sufficient video quality. The base video quality depends from video source (air or cable) and video transmission (analog or digital). In case the video source has to low quality there can be: too much interference present, noise level to high, video resolution to low or video compression can introduce some amounts of distortion and artifacts in the video image. The goal of video analysis is to automatically segment, index and label the video sequences in video stream. For semantic features detection in video sequence low level feature extraction is performed on the key frames of the video and a feature group including color and texture features is formed. A region that contains all the high level features is constructed using a clustering method forming a model that contains all the information [2]. Semantic labeling of images in video sequences can be efficiently achieved with spatiotemporal video segmentation. The process of spatiotemporal semantic segmentation can be subdivided in two stages. Firstly, the video sequence is split into small block of frames. Spatiotemporal regions are extracted and labeled individually within each block. Then, we iteratively merge consecutive blocks by a matching procedure which considers both semantic and visual properties [3]. Visual content descriptors and extraction of them is a crucial problem for state of the art visual information analysis. Detection of visual objects in video sequences, extraction of visual descriptors can be achieved with extraction of regions using an efficient contours technique. After detection calculations are made for visual descriptions of the regions including color, motion and shape features that are invariant to affine transformations. The features describe the visual objects extracted from video [4]. B. Logotypes detection Logotype is a graphical element that represents a trademark or commercial brand. Typically they are designed for immediate recognition because they identify corporations or non-commercial organizations. We come across logotypes in different kind of media such as document papers, television broadcast, video streams and images. Automatic logo detection and recognition continues to be of great interest to the document retrieval community as it enables effective identification of the source of a document [5]. There are different approaches in this area some use segmentation free and layout independent methods other use other principal properties of logotypes like spatial compactness and colorimetric uniformity [6]. TV logos are generally in most broadcast stations to claim video content ownership or visually distinguish the broadcast from the interrupting commercial block. Detecting and tracking a TV logo is of interest to TV commercial skipping applications and logo-based broadcasting surveillance. There are different approaches based on traditional methods such as frame difference, pixel based or edge matching. Some more

advance techniques use instead of single frame based detection the temporal correlation of multiple consecutive frames to extract logo masks and match it against template masks [7]. Others use unsupervised methods to learn logo or logo movement and extract features to describe logo. The learned information is used to detect logos [8]. III.

FEATURES

Simple features like color and shape are very easy to understand for humans. However the computer vision calculates a number of different features from pixel information and connects them together to describe a visual object. There are numerous features that can be used for image processing of visual information. A. Edges Common method for finding edges is the Canny edge detector. One of the differences between the Canny algorithm and other simpler algorithm is that, in the Canny algorithm, the first derivatives are computed in x and y and then combined into four directional derivatives. The most significant new dimension to the Canny algorithm is that it tries to assemble the individual edge candidate pixels into contours [1]. B. Contours Contours represent 2-D shapes based on detected edges. From contours we can calculate additional features like moments. Contours are linking element between edges and moments. C. Moments Shape analysis from contours starts with the second-order moments. Moments provide a framework for scale and rotation invariant shape parameters. The most challenging task is to find a unique set of features from contours. This means that different shapes must not be mapped into the same set of features [9]. Hu’s invariant moments are results of the theory of algebraic invariants, where he derived seven invariants to rotation of 2-D objects [10]. D. Color histograms Histograms can be used to represent the color distribution of an object. The use of histograms to characterize color information is a common technique for image indexing and retrieval. There are different types of techniques like the edge gradient histogram or the co-occurrence edge color histogram [11]. Histograms are simply collected counts of the underlying data organized into a set of predefined bins. They can be populated by counts of features computed from the data, such as color or any other characteristic [1]. We used normalized color histograms to extract dominant color information from regions of interest. E. Regions of interest The goal of a region detector is the accurate determination of objects location, size and shape. To efficiently approximate these parameters we have to determine regions of interest within each frame. Regions of interest are determined on the

basis of contours position. Area containing contour is described with a simple rectangle. The rectangle represent basic region of interest for further analysis and feature extraction. F. Features extraction and clustering Single features are extracted from every region of interest and clustered together to describe the visual information about that region of interest. Every region has attached information about position, moments and color histogram. IV.

LOGOTYPES DETECTION CONCEPT

Our approach bases on features extraction and feature clustering to determine regions of interest which are candidates for logos or logotypes of different trademarks. First step is to find the region candidates where logotypes are positioned. Basic regions are determined from features based on edges connected into contours. Next step is to cluster similar nearby regions based on features from color, position and size. They are clustered together into bigger regions. Afterwards we extract features from every region. Features are composed from color histogram, moments from contours and information about region, size and position. Basic flowchart diagram of the logotype detection concept algorithm is presented in Fig. 1. Concept has two levels of features clustering. First is used to connect similar regions and second to extract logotype features. Our concept consists of the following steps: •

Detect regions of interest from features extracted from video frame (edge detection and contours).

•

Cluster or discard regions of interest based on color histograms, positioning of adjacent regions and area of the regions.

•

Extract all features from clustered regions of interest including invariant moments.

•

Features processing to detect logotypes. V.

EXPERIMENTAL RESULTS

Experimental results were obtained using OpenCV framework and C++ programming language. Test video stream includes sports and news broadcast taken from local digital cable television. Video streams were recorded at resolution 720×576. Number of regions detected bases on number of contours found in frame. We can reduce the number of regions if we resize the frame used for contour extraction. Table 1 shows average number of regions found based on the resize factor in different video streams. With this method we reduce the number of regions of interest. However we have to be careful when selecting resize factor. On original image size there are a lot of additional small regions that contain no specific visual information. These small regions are like noise. As we increase the resize factor the number of regions reduces and small regions disappear. When we set resize factor too high the visual information clusters into bigger regions and we lose details that may be important. We have chosen the resize factor of 4 for further testing with dominant colors. To further reduce the number of regions that do not include essential visual information we use color information. From regions of interest we extract the dominant colors base on the color histogram and we set the various threshold levels for dominant colors. Table 2 shows results with various thresholds levels for dominant colors. The maximum threshold level 100 percent means presence of single color regions. Logotypes are typically design for immediate recognition therefore we can assume they are not painted with a lot of different colors. Regions of interest with a lot of colors without specific dominant colors can therefore be discarded from further processing. We can assume the same for regions with only one color.

Detect regions of interest

TABLE I. Resize Edge detection Contours

Video stream

Frame Color grouping Position grouping

AVERAGE NUMBER OF REGIONS OF INTEREST WITH DIFFERENT CONTOUR RESIZE FACTOR

Video stream

2

4

8

16

Sports

2875

1015

298

101

34

News

1960

779

274

95

31

TABLE II. Features extraction

AVERAGE NUMBER OF REGIONS OF INTEREST WITH VARIOUS THRESHOLD LEVEL FOR DOMINANT COLORS

Video Stream Feature sets of logotype candidates

Threshold for dominant colors 100%

80%

60%

40%

20%

Sports

298

274

235

158

53

News

274

244

208

133

33

Contours Invariant moments

Contour resize factors 1

Color histogram Position

Figure 1. Logotype detection diagram

VI.

CONCLUSION AND FURTHER WORK

After detection of regions of interest with possible logotypes we can start with next step that includes features clustering and features processing to detect and recognize logotypes. For efficient clustering of extracted features we will have to implement first level of artificial intelligence. The idea is to put similar features with same properties in the same cluster. These properties include information about color, shape, size, distance and position. Furthermore for logotype recognition we need a predefined set of logotypes from previous frames or from external database of known logotypes. The concept shows how various simple features like contours and color histograms can be extracted from video stream and clustered together to represent more advance features to describe visual information. To optimize detection of logotype candidates the simple features are filtered out based on assumption made from human point of view. The most important features to separate logotype candidates from others are size of the region and dominant colors present in that region. REFERENCES [1] [2]

G. Bradsky and A. Kaehler, “Learning OpenCV,” O’Reilly Media, September 2008, First Edition. E. Spyrou, G. Koumoulos, Y. Avrithis, S. Kollias, “Using Local Region Semantics for Concept Detection in Video,” 1st International

Conference on Semantics And digital Media Technology (SAMT 2006), Athens, Greece, 2006. [3] E. Galmar, T. Athanasiadis, B. Huet, Y. Avrithis, “Spatiotemporal Semantic Video Segmentation,” Multimedia Signal Processing, 2008 IEEE 10th Workshop, pp. 574–579, Cairns, Qld, October 2008. [4] P. Tzouveli, G. Andreou, G. Tsechpenakis, Y. Avrithis and S. Kollias, “Intelligent Visual Descriptor Extraction from Video Sequences,” Lecture Notes in Computer Science – Adaptive Multimedia Retrieval, Springer-Verlag, Vol. 3094, 2004, pp. 132–146, Springer Berling / Heidelberg, January 2004. [5] Guangyu Zhu, D. Doermann, “Automatic Document Logo Detection,” Document Analysis and Recognition, 2007 (ICDAR 2007), Ninth International Conference, Volume 2, pp. 864–868, September 2007. [6] Z. Ahmed, H. Felia, “Logos extraction on picture documents using shape and color density,” Industrial Electronics, 2008. ISIE 2008. IEEE International Symposium, pp. 2492–2496, Cambridge, July 2008. [7] Jinqiao Wang, Lingyu Duan, Zhenglong Li, Jing Liu, Hanqing Lu, J.S. Jin, “A Robust Method for TV Logo Tracking in Video Streams,” Multimedia and Expo, 2006 IEEE International Conference, pp. 10411044, Toronto, Ont., July 2006. [8] Qiao Huang, Jianming Hu, Wei Hu, Tao Wang, Hongliang Bai, Yimin Zhang, “A Reliable Logo and Replay Detector for Sports Video,” Multimedia and Expo, 2007 IEEE International Conference, pp. 16951698, Beijing, July 2007. [9] B. Jähre, “Practical Handbook on Image processing for Scientific and Technical Applications,” CRC Press LLC, 2004, Second Edition. [10] J. Flusser, “Moment Invariants in Image Analysis,” Proceedings of World Academy of Science, Engineering and Technology, Vol. 11, pp. 196–201, February 2006. [11] A. Hesson, D. Androutsos, “Logo and trademark detection in images using Color Wavelet Co-occurrence Histograms,” Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference, pp. 1233–1236, March 2008.

Logotype Detection Concept in Video Stream Based

Logotype Detection Concept in Video Stream Based

Suggest Documents

Video Cataloging Based on Robust Logotype ... - Semantic Scholar

Logo Detection and Recognition in Video Stream - IEEE Xplore

Video Stream Analysis in Clouds: An Object Detection ... - Ashiq Anjum

QoE-Based Multi-Stream Scalable Video Adaptation ... - NYU Video Lab

CONTOUR BASED SMOKE DETECTION IN VIDEO USING

CONTENT-BASED VIDEO COPY DETECTION IN LARGE

CONTOUR BASED SMOKE DETECTION IN VIDEO ... - CiteSeerX

Video-based Parking Space Detection

Use of Hoeffding Trees in Concept Based Data Stream Mining

Large Scale Concept Detection in Video Using a Region Thesaurus

Concept-Based Video Retrieval Contents - CiteSeerX

Video Concept Detection Using Support Vector Machine with ...

Measuring the Influence of Concept Detection on Video Retrieval

Video Linkage: Group Based Copied Video Detection - PIKE

22 Video-Based Deception Detection - Springer Link

Video-based Trailer Detection and Articulation Estimation

histogram based efficient video shot detection

VIDEO SHOT BOUNDARY DETECTION BASED ON COLOR ...

Video-Based Detection of Goose Behaviours

Long Wavelength Video-based Event Detection

Long Wavelength Video-based Event Detection

Two-Stream Transformer Networks for Video-Based ... - IEEE Xplore

Two-Stream Transformer Networks for Video-Based ... - IEEE Xplore

Proxy-Based Multi-Stream Scalable Video Adaptation ...