Search and RetrievalâSearch process; I.2.10 [Artificial In- telligence]: Vision ... rently performed by several sports marketing firms, through manual inspection of ...
A System for Automatic Detection and Recognition of Advertising Trademarks in Sports Videos Lamberto Ballan, Marco Bertini, and Arjun Jain Media Integration and Communication Center, University of Florence, Italy
{ballan, bertini, jain}@dsi.unifi.it http://www.micc.unifi.it/vim ABSTRACT In this technical demonstration we show the current version of our trademark detection and recognition system that has been developed in collaboration with a sport marketing firm 1 with the aim of evaluating the visibility of advertising trademarks in broadcast sporting events. We propose a semi-automatic system for detecting and retrieving trademark appearances in sports videos. A human annotator supervises the results of the automatic annotation through an interface that shows the time and the position of the detected trademarks; due to this fact the aim of the system is to provide a good recall figure, so that the supervisor can safely skip the parts of the video that have been marked as not containing a trademark, thus speeding up his work.
Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Search process; I.2.10 [Artificial Intelligence]: Vision and Scene Understanding—Video Analysis
General Terms Algorithms, Experimentation
Keywords Video retrieval, trademark matching, sports videos
1.
INTRODUCTION
Large and small companies invest every year a huge amount of money in sports marketing and sponsorships. A large amount of these investments is used for the placement of the sponsors’ trademarks and logos in the form of objects such as billboards, banners and other physical advertising media. Since the cost for these sponsorships can be extremely 1
Sport System Europe srl (Italy), www.sportsystem.com
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’08, October 27–November 1, 2008, Vancouver, BC, Canada. Copyright 2008 ACM 978-1-60558-303-7/08/10 ...$5.00.
Figure 1: The interface of our system. The rows at the bottom indicate, with different colors, the timeline of detected trademarks in the video. high, sponsors require a verification of the level of visibility of their trademarks, in order to evaluate the return of their investment. This work of brand visibility evaluation is currently performed by several sports marketing firms, through manual inspection of the broadcast videos; several human annotators mark up the start and end time of each trademark appearance, possibly adding a subjective evaluation of its quality. Obviously this process is both expensive and slow, so it becomes more than necessary to have an automatic tool which perform this task automatically. Most of the research related to trademark recognition deals with the problem of content-based indexing and retrieval in logo databases [3, 5], with the goal of assisting the process of trademark registration. In this case the image acquisition and processing chain is controlled so that the images are of acceptable quality and are not distorted. The problem of trademark recognition in real world images [2] and videos [1, 4] is inherently harder, due to the relatively low quality of the images (e.g. video interlacing, color sub-sampling, motion blur, compression artifacts, etc.). In this demonstration we present the current version of our trademark detection and recognition system. It is a semi-automatic system for detecting and retrieving trademark appearances in sports videos. The results of the detection/recognition process are shown to the human annotator, through a graphical interface that shows the time, a
Figure 3: Robust trademark localization.
Figure 2: Overview of our system. measure of confidence of the appearance and the position of the detected trademarks (Fig. 1), so that he could rapidly validate the results of the automatic annotation.
2.
THE SYSTEM
The appearance of trademarks in sports videos are often characterized by perspective deformations, motion blur and occlusions. To obtain a matching technique that is robust to partial occlusions, we use local neighborhood descriptors of salient points. By combining the results of local point-based matching we are able to match entire trademarks. Because of this we use DoG points and SIFT descriptors as a compact representation of the important aspects and local texture in trademarks (for details see [1]).
2.1 Video Acquisition and Frame Selection The process of interest points detection and description is very time consuming. Therefore, original MPEG2 videos (25fps) are sub-sampled and SIFT points are detected at 5fps; these frames are selected measuring their visual quality. Visual quality is estimated measuring blurriness, number of edges and number of SIFT points (these are used as a hint to evaluate the likelihood of the detectability of trademarks). Frame selection is performed by a SVM classifier, using a RBF kernel.
2.2 Detection and Retrieval of Trademarks Trademarks and video frames are represented as a bag of SIFT feature points and each trademark is represented by one or more graphical instances. Trademark Tj is so represented by the Nj SIFT points detected in the image and each frame, Vi , of a video is represented similarly as a bag of Mi points. Detection and retrieval of trademarks is done by comparing the bag of local features representing the trademark Tj with the Mi local features detected in the i-th frame of the video V . The results of matching are denoted as Mij ; therefore, |Mij | is the number of SIFT points in Vi that have been matched with the reference trademark (Tj ). The determination of whether a frame contains a trademark is made by thresholding the normalized match score: |Mij |/|Tj | > τ ⇐⇒ trademark Tj present in frame Vi .
In other words, the Normalized Match Threshold (NMT) τ requires that a certain percentage of the feature points detected in the reference trademark must be matched to the frame Vi . Experiments have been performed to set τ according to the appropriate trade-off between precision and recall. However, different values of τ are selected depending on the type of sports (e.g. τ = 0.17 is used in volleyball videos).
2.3 Robust Trademark Localization In order to localize the trademark in the original frame Vi and to approximate its area, we compute a robust estimate of the feature point cloud (Fig. 3). Given the feature point locations, F = {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )}, the robust centroid estimate is computed by iteratively solving for the centroid coordinates (µx , µy ): n X
ψ(xi ; µx ) = 0,
i=1
n X
ψ(yi ; µy ) = 0
i=1
where the influence function ψ used is the Tukey biweight and the scale parameter c is estimated using the median absolute deviation from the median. Matched points with a low influence are so discarded from the final match set.
3. DEMONSTRATION We demonstrate trademark detection, recognition and localization in several different sports videos such as Volley, MotoGP and Soccer. We will show how trademarks can be recognized under very challenging conditions where trademarks are partially occluded or deformed by perspective. The performances obtained with our system show that in most cases a precision rate of about 85% can be achieved with a recall of around 80%.
4. REFERENCES [1] A. D. Bagdanov, L. Ballan, M. Bertini, and A. Del Bimbo. Trademark matching and retrieval in sports video databases. In Proc. of ACM Multimedia Information Retrieval, Augsburg, Germany, 2007. [2] S. Sanyal and S. H. Srinivasan. Logoseeker: A system for detecting and matching logos in natural images. In Proc. of ACM Multimedia, Augsburg, Germany, 2007. [3] J. Schietse, J. P. Eakins, and R. C. Veltkamp. Practice and challenges in trademark image retrieval. In Proc. of ACM CIVR, Amsterdam, NL, 2007. [4] A. Watve and S. Sural. Soccer video processing for the detection of advertisement billboards. Pattern Recognition Letters, 29(7):994 –1006, 2008. [5] P. Y. Yin and C. C. Yeh. Content-based retrieval from trademark databases. Pattern Recognition Letters, 23(1-3):113 – 126, 2002.