Logo Detection and Recognition in Video Stream Syed Yasser Arafat1, Syed Afaq Husain2, Iftikhar Azim Niaz3 , Muhammad Saleem4 Mirpur University of Science & Technology (MUST), Mirpur, Azad Kashmir 1 Air University2 Riphah International University3 International Islamic University4 Islamabad, Pakistan
[email protected],
[email protected],
[email protected],
[email protected] e.g., watermark, letter mark, iconic and alpha-glyph. Figure 1 shows various types of logos. From the image processing point of view, logos can be composed of horizontal lines,
ABSTRACT Logos sometimes also known as trademark have high importance in today’s marketing world. Products, companies and different gaming leagues are often recognized by their respective logos. In this paper, an implementation system for logo detection and recognition in video stream is presented. Feature set using SIFT (Scale Invariant Feature Transform) key-points is created for each query logo and the video frame being processed. The system uses a concept of bidirectional or two-way matching of features. Model fitting is carried out by using Random Sample Consensus (RANSAC) algorithm to suppress any outliers from two-way matching process. Also novel threshold scheme is used to finally make decision regarding presence of logo. Finally boundary is detected around the matched logo. The system provides various kinds of statistics like frequency of recognized logos, their visibility time and their locations in the video. Experiments were performed on various sports videos. The results of the experiments show that the developed system has achieved the precision rate of 89% and a recall of 99%.
Fig1. Some Query logos used for experimentation
vertical lines, mixture of horizontal and vertical lines, text, graphics or mixture of both of these. In video coverage, various kinds of transformations like scale, rotation, shearing and translation further makes it difficult to detect and recognize logo. Currently this task is often done manually by human annotators, who personally view the videos and then compile the statistics using their judgment. This manual compilation is a tedious and time consuming job. In this paper, we discuss an implementation system, that has been developed to detect and recognize logos, which is basically an extension of our earlier work [1]. Figure 2 flowchart gives an overview of the developed system. The task of the person using the system is simply to choose the video to be checked and the selection of target logo image to be detected. The system then processes the video for logo detection and recognition and generates statistics like frequency of detected logos, their appearance time and their spatial locations in the video.
Keywords Computer vision, image processing, Video processing, Pattern Matching, SIFT, Model fitting
I. INTRODUCTION Logos, being a symbol of identification for companies, products, leagues and counties, is very important at various national and international events, like football, cricket matches and others. Companies spend millions of dollars to promote their brands, products, ideas and services. To justify the huge amount of money spent by sponsors, the companies need to know the visibility duration of their logos on live aired videos and their estimated position in the video.
The paper is organized as follow. Section II gives overview of the literature survey. Section III provides the details of the implemented system. Section IV discusses the experiments performed and their results. In section V we conclude and discuss the future work.
Detection of logos, therefore, in recorded or live videos, is a hot issue among business enterprises. The companies are very conscious of assessing and tapping the worth of their investments. There is nothing more attracting for a researcher than an industry demand. Logo detection and recognition in the videos has attracted the attention of the pattern recognition researchers recently.
II. LITERATURE REVIEW Various techniques for logo detection and recognition have been devised in the recent past [10, 11, 19, 21, 23, 24, 25]. Most of these techniques have been used in commercial projects. Due to the commercial nature, the implementation details of these techniques are not so clear
Logo recognition in video stream is not a straight forward issue [19]. Firstly, Logos can be of many types,
978-1-4244-7571-1/10/$26.00 ©2010 IEEE
163
and easily available for review. Various techniques for logo retrieval in static images from databases have been proposed such as geometric invariants [8], CBIR system using Fourier descriptors also has been used for logo retrieval [9], and Images generally has local and global properties. Authors [10, 11, 12, 20] have used global & local features for logo classification. More recently, the researchers have shifted their attention to logo detection and recognition in videos. Some of the techniques for video processing are gaussian receptive fields [17], Trademark matching has also been done using Scale Invariant Feature Transform (SIFT) [5, 6, 7, 16, 2] and [18] have used K means method to build the visual logo dictionary and Latent Semantic Analysis techniques for retrieval. Detection of logos in outdoor environment has been achieved by spatial pyramids & learning spatial configurations of local features [24]. Direct-Info [3] is the most comprehensive application involving audio, video, semantic analysis, scene detection, event detection, logo recognition, text analysis, etc. However, details on steps, like logo localization, have not been revealed. Commercial Monitoring System [4] is used for scene change detection and dominant color information. It is a global measure and not suitable for logo identification.
•
Invariant Scale Detection
•
Keypoint localization
•
Orientation Assignment
•
Descriptor Building
Gaussian kernel is used to create scale space that is invariant to scale. In this case image I is convolved with G, where G is a Gaussian function to obtain laplacian L.
The current research is generally focused on detection and recognition of logos from still images and video frames. The localization and recognition of logos from video is a challenge that has been undertaken in this study. We have developed a system for logo detection and recognition of predefined corporate logos in videos and it improves the concept presented in [1] by introducing the concept of bidirectional matching and model fitting by RANSAC [14, 15, 22].
III. DEVELOPED SYSTEM A system flow chart, illustrating various steps enumerated below, is shown in Figure 2.
Step 1 Video Playback and Logo Reading In this step video is played, video frames are grabbed and logo image is read from a image file or captured from within the video frame.
Figure 2 Block diagram of the implemented system
Step 2 Feature Extraction Feature extraction is done using SIFT which is basically inspired from responses of neurons in the inferior temporal cortex in the primate vision [16]. At the heart of SIFT is the Difference of Gaussian (DoG) operation which acts like a band pass filter of second degree Laplace. In SIFT framework, feature points are termed as keypoints . SIFT is a feature detector as well as a descriptor. The key steps in the SIFT are as follows:
Figure 5 Keypoint orientation on queried Logo
164
fall under the fold of practical problems, faced during the data acquisition.
Finally Logos are represented by a set of SIFT feature points Lfi. where i represents the ith logo.
Lf i = {x j , y j , sc j , α j , d j }
RANSAC Algorithm [13] • Randomly select a sample of s data points from S and instantiate the model from this subset.
4
For
j ∈ {1,2,....N j }
•
Determine the set of data points Si which are within a distance threshold t of the model. The set Si is the consensus set of samples and defines the inliers of S.
•
If the subset of Si is greater than some threshold T, re-estimate the model using all the points in Si and terminate
•
If the size of Si is less than t, select a new subset and repeat the above.
•
After N trials the largest consensus set Si is selected, and the model is re-estimated using all the points in the subset Si
Where Nj is the number of keypoints for a particular logo.
Step 3 Bi-Directional Matching In general, whenever keypoints are matched between the given logo image and the video frame, it is found that one keypoint in logo image matches with multiple keypoints in the video frame and vice versa. This situation disrupts the recognition results and makes logo localization ambiguous. Although similar concept of bi-directional match exist [21], but in our approach we focus on the spatial consistency. It means that features detected on both query image Q and frame F, being processed should point to the same locations in Q and F.
RANSAC has the complexity of Wn. Lowe [5, 16] has used Hough Transform which has computational efficiency of O(NM). We have preferred RANSAC over Hough Transform because it yields better results [22].
In Bi-Directional (forward and reverse) matching, consistent keypoints in the query logo and the video frame are sieved out. If mvk & mq are a matched keypoints of video frame and query logo respectively then consistent features Nmatch obtained through bidirectional matching is achieved through
Step 5 Final Threshold In panorama stitching [7], another implementation of SIFT, has used threshold value similar to equation 6 for object recognition.
If
Ĭ> (Į+ 0.22 * Number of features) len1
len 2
i =1
j =1
¦ ¦
6
( mvk (1, i ) = mq( 2, j ))and ( mvk ( 2, i) = mq(1, j )) In our implementation, it has been discovered empirically that equation 7 yields better results.
5
Nmatch(1,j)=mq(2,j) ,
Ĭ > (Į + ȕ * Number of features)
Nmatch(2,j)=mq(1,j)
7
Here Ĭ is the number of inliers, Į has value of 5.9 and ȕ has a value of 0.11, Here ȕ controls the contribution of matched features.
Here len1 and len2 are the number of keypoints in a video frame and query logo respectively. Nmatch finally have the coordinates of consistent keypoints.
Step 6 Boundary Box Detection
Step 4 Model Fitting
The keypoints which are obtained after model fitting are used to localize the Logo [23]. For matched key points mk.
For obtained consistent keypoints, model-fitting of query logo to the video frame is done using Random Sample Consensus (RANSAC) [14, 15, 22]. RANSAC is used to estimate parameters of a mathematical model from a set of observed data, having noise or outliers. It is assumed that the data consist of mostly of inliers, which can be expressed by a set of model parameters and outliers which do not fit into a model. In addition, data points can be subject to noise or extreme values which can be a result of erroneous measurements or even represent an incorrect hypothesis which results in Type-I error. All these cases
mk ={ (x1,y1), (x2,y2), (x3,y3),…, (xn,yn)}
8
The centroid C is calculated by solving the following given equation for mean ȝx , ȝy of x and y coordinates respectively.
¦ 165
n
ψ ( xi ; μ x ) = 0
i =1
9
¦
n
ψ ( yi ; μ y ) = 0
i =1
10
Where ȥ is a tukey biweight function [1]. The scale for the potential identified logo is estimated using the median absolute deviation from the computed median for x-axis and y-axis. MADx= mediani ( |xi - medianj (xj) |) MADy= mediani ( |yi - medianj (yj) |) Each point in mk the distance from C is calculated. Points with low influence are excluded from the final match keypoints mk. Finally the rectangle is drawn around identified logo.
Figure 7 Video recognition Statistics Report
IV. EVALUATION & EXPERIMENTAL RESULTS For experimentation purposes 13 different videos of total of 3146 frames of various resolutions were used. In the developed system first a video is selected to be processed for logo recognition and then query logo is selected. Finally recognize Logo button is pressed as shown in the Fig. 6. Finally the results are obtained by reporting tool as shown in Fig 7. Visualization of detected logos is shown in Fig 8 or alternatively cropped logos can be found in a recognition folder. With our system we have achieved the precision rate of 89% and a recall rate of 99% as shown in Fig 9. Figure 10, 11 and 12 shows the snapshots of the results for various queried logo along with corresponding keypoints match. Figure 8 Logo locations visualization
Figure 6 Logo Selection from Screen-I
Figure 9 Precision recall rate for 13 experimented videos
166
results against various kinds of image transformations and background clutter. The results show that our system achieves the precision rate of 89% and a recall rate of 99%. There is a lot of scope in this area of computer vision. Currently the system is not optimized for real time processing or detecting multiple logos in a single frame. In future we are planning to optimize system for time efficiency. Other directions can be like recognition of 3D objects, Recognition of non-planar and deformable objects.
REFERENCES
Figure 10 Keypoints visualized for MASTER CARD logo
Figure 11 Keypoints visualized for MAK Gold logo
Figure 12 Keypoints visualized for PEPSI logo.
V. Conclusion and Future Work A system for logo detection and recognition has been successfully developed. The system has shown robust
167
[1]
S.Yasser Arafat, Muhammad Saleem, S.Afaq Husain, Comparative Analysis of Invariant Schemes for Logo Classification, in proceedings of 5th International Conference on Emerging Technologies, ICET 2009, Islamabad, Pakistan, Pages 256 - 261
[2]
Andrew D.Bagdanov, Lamberto Ballan, Marco Bertini, Alberto Del Bimbo, Trademark Matching and Retrieval in Sports Video Databases, MIR’07, September 28-29,2007, Augsburg, Bavaria, Germany
[3]
G. Kienast, H. Rehatschek, A. Horti, S. Buseman, T. Declereck, V. Hahn, R. Cavet, DIRECT-INFO, A Media Monitoring System for Sponsorship Tracking, Proceedings of the ACM SIGIR Workshop on Multimedia Information Retrieval, 2005, Salvador Brazil
[4]
Sung Hwan Lee Won Young Yoo Young Suk Yoon, A visual feature based video identifying system for the TV commercial's monitoring, 8th International conference on Advanced Communication Technology, ICACT 2006, Phoneix
[5]
D. Lowe, Local Feature View Clustering for 3-D Object Recognition, Proceedings of the Conference on Computer Vision and Pattern Recognition , 2001
[6]
D. Lowe, Object Recognition from Local Scale Invariant Keypoints, Proceedings of the International Conference on Computer Vision, pages 117-1157, 1999.
[7]
Matthew Brown and David G. Lowe, Recognizing Panoramas International Conference on Computer Vision, ICCV 2003, France 2003.
[8]
David, Ehud, Issac, Logo Recognition using geometric Invariants, 08186-4960-7/93, IEEE, 1993.
[9]
Andre Folkers, Haran Samet, Content-Base Image Retrieval using Fourier Descriptors on a Logo Database, 16th International Conference on Pattern Recognition (ICPR'02) - Volume 3 p. 30521
[10]
A. Soffer, H. Samet, Using Negative Shape Features For Logo Similarity Matching, in Proceedings ICPR’98’, pages 571-573
[19]
J. Schietese, J. P. Eakins, and R. C. Veltkamp, Practice and Challenges in trademark retrieval, In Proc. of ACM VIVR, Amsterdam, NL, 2007
[11]
J. Neuman, H. Samet, A. Soffer, Integration of local and Global Shape Analysis for Logo classification, Pattern recognition letters 23:1212, 1449-1457,2002
[20]
[12]
J.R. Eakins, J.M. Broadman, M.E. Grahman, Similarity Retrieval of Trademark Images, ACM Multimedia ’98, pages 53-63
Aya Soffer, Hanan Samet, Negative Shape features for image databases consisting of geographic symbols, In Advances in Visual for processing, pages 569-581, Singapore, 1997, World Scientific
[21]
Marwan A. Mattar, Allen R. Hanson, and Erick G. Learned –Miller, Sign Classification using Local and Meta-Features, IEEE CVPR Workshop on computer vision applications for visually impaired, San Diego, CA, June 2005
[22]
Bard Grinstead, Andreas Koschan, Anderi Gribok, and Mongi A. Abidi, Improving Video-Based Robot self Localization Through outlier Removal, Proc. Of 1st joint Emer. Prep. And Response/Robotic & Remote Syst. Top. Mtg., Salt Lake City, UT, Feb 2006
[23]
Lamberto Ballan, Marco Bertim, and Arjun Jain, A System for Automatic Detection and Recognition of Advertising Trademarks in Sports Videos, in Proc. of ACM International conference on Multimedia (MM), Vancouver, BC, 2008
[24]
J. Kleban, X. Xie, and W. Y. Ma, Spatial Pyramid Mining for Logo Detection in Natural Scenes, In Proc. of IEEE ICME, Hannover, Germany, 2008
[25]
K. Gao, S. Lin, Y. Zhan, S. Tang, Logo detection based on spatial-spectral saliency and partial special context, In Proc. of IEEE ICME, New York, USA, 2009
[13]
Marc Pollefeys, http://www.inf.ethz.ch/personal/pomarc/teaching.ht ml
[14]
Peter Kovesi, http://www.csse.uwa.edu.au/~pk/Research/MatlabF ns/index.html
[15]
Fischler, Martin A. and Bolles, Robert C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, vol. 24, no. 6, pp. 381-395, 1981.
[16]
David G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
[17]
Fabien Pelisson, Daniela Hall, Olivier Riff, And James L. Crowley, Brand Identification Using Gaussian Derivative Histograms, ICVS-2003, Springer-Verlag, 2003, Berlin Heidelberg.
[18]
J. Wang, Q. Liu, J. Liu, Hanqing Lu, Logo Retrieval with Latent Semantic Analysis, Asia-Pacific Workshop on Visual Information Processing, Nov 7-9, 2006
168