Goal Event Detection in Soccer Videos via

Goal Event Detection in Soccer Videos via Collaborative Multimodal Analysis 1

* and Mandava Rajeswari2

1

Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia 2 School of Computer Sciences, Universiti Sains Malaysia, 11800 Minden, Penang, Malaysia

ABSTRACT

Keywords: indexing, webcasting-text

INTRODUCTION Technological advances have greatly enhanced broadcast, capture, transfer and storage of digital video (Tjondronegoro et al

repositories has spurred interest in automatic indexing and retrieval techniques, especially those that cater for content-based semantics

Article history:

E-mail addresses:

*Corresponding Author

restricting the domain being addressed is to some extent, imperative in order to bridge the semantic gap between low-level features and

sports has been used to extract important semantic concepts such as tennis that serves and rallies (Huang et al et al et al

posterity logging (Assfalg et al running time and sparseness of event occurrences further complicate matters, where traditional

RELATED WORKS A great body of literature has been dedicated to events or highlights detection in soccer, as well

et al

Jinjun et al

et al

is carried out using supervised learning algorithms, which discovers the audiovisual patterns

patterns are not detected due to feature patterns being less prominent during event occurrences, et al et al

1

et al algorithms face the challenge of recognizing event patterns from the majority of non-event As for supervised learning algorithms, their robustness may be questionable since for some

CONTRIBUTIONS

directly extracted from the video, an external textual resource was utilized to initiate event

considerations were solely made within each video itself, and without relying on any pre-

as the issue of the huge and asymmetric search space, are solved by utilizing the minute-bydetailed and reliable annotations of a match’s progression, two crucial cues (namely, the 1

as by Changsheng et al

et al

this study analyzed the visual and aural information only from the particular video under All the audiovisual considerations and assumptions are uncomplicated and therefore able to

FRAMEWORK FOR GOAL EVENT DETECTION

Video Pre-processing

Shot Boundary Detection

et al

V into m-shots, represented as V {Si , Si 1 , S m }

far-views or close-up views

1. Dominant Hue Detection idx peak

this range are detected, an immediate close-up view label can be assigned since it highly

2. idx peak

idx peak

was

determined with the optimal value for saturation value region, morphological image processing and connected components analysis are applied (Halin et al 3. indicate close-up views, whereas smaller objects indicate far views et al et al et al ratio alone, which as suggested in Halin et al et al close-up view as either a close-up or far view based on the majority voting of all the frame labels within

Textual Cues Utilization

source, namely, the event name and its minute time-stamp

Goal Event Keyword Matching and Time Stamp Extraction

G {goal !, goal by , scored , scores, own goal , convert}

(1)

g, that has i time-stamp of each of the i These can be written as a set T g tig , where i g Then, for each i, the goal event search is initiated within the one-minute segment of each ti

Text-Video Synchronization and Event Search Localization The time-stamp tig mapping to the corresponding video frames can be erroneous due to the misalignment with the tig and its corresponding video

t ref f

ref

and

The values

t ref

and f

ref

can then be used to localize the event search to the one-minute

tig being the minute time-stamp of the goal event, the beginning

g g ( f i ,begin ) and ending ( f i ,end

f i ,gbegin

f ref

fr t ref

fi ,gend

fi ,gbegin

tig fr

where fr is the video frame rate and g Note that for f i ,begin , the time-stamp tig (after being converted to seconds) is subtracted by tig 1, tig g i , end

for f end, fr g one-minute after f i ,begin g i

Candidate Shortlist Generation

goal segments within

g i

fi ,gbegin , fi ,gend

broadcasters (including the footage used in this paper), three generic visual-related properties These properties were exploited to decompose the one-minute segment into a shortlist of The camera transitions from a far-view to a close-up view 2. Close-up views during goals normally last at least 6-seconds;

has already been localized to the one-minute eventful segment, detecting other events is very

the n-number of candidate segments is generated from Cig

cikg

for k 1,

g i

,n

g g where Cig i is the set containing the shortlisted candidates, and cik s the k candidate segment within Cig

Candidate Ranking At this stage, we have obtained the candidate segments cikg , where one of them is the actual needed from each cikg

pitch or the fundamental frequency of an audio signal f

is reliable to detect excited human f

is called shrp.m

on the other hand, was chosen as it managed to accurately capture the average measurement

f f f

g ik

The rule being applied here is that the candidate with the goal event will cause commentator f0 values across audio frames, leading g f to high c*, with the maximum ik f

g ik

EXPERIMENTAL RESULTS AND DISCUSSION

1

Teams 1

5 6 7 8 9 11

Note that for the following sub-sections of Shortlist Generation, the evaluation criteria used are precision and recall

and Candidate

Sub-section Candidate Shortlist Generation to cater for each of these contexts, which will be further explained in detail within each of the

subsets from different matches were used to demonstrate the robustness of the algorithm across Precision and recall true positives, false positives and false negatives are explained, supposing that the positive class being predicted is a far-view True positive

far-view, when the actual class is indeed a far-view;

False positive

far-view, when the actual class is a close-up view;

False negative

close-up view, when the actual class is a far-view

Precision

# true positives # true positives # false positives

(5)

Recall

# true positives # true positives # false negatives

(6)

The results are encouraging where very high recalls

Shot Type

# of shots

Precision

Close-up view

Average

98.27%

96.27%

Average

91.65%

96.49%

Candidate Shortlist Generation

cikg precision and recall are

Relevant refers to the number of candidate segments generated that actually Retrieved refers to the total number of candidates generated based on the

Precision

Recall

# relevant # retrieved # retrieved # relevant # retrieved # retrieved

can be observed that the Average Number of Candidates per Shortlist

(7)

(8)

Sub-section Candidate Ranking), the actual segment can still be retrieved without the need recall cases since it is mandatory that an actual goal event segment be present within each of the Candidate Ranking

Average Number of Candidates per Shortlist Precision

Candidate Ranking

n represents the g ik

number of candidate segments c i-th g for each k-th k = 1 n) is recorded in the sub-columns of column 5, where f ik g the numeric boldface values indicate the maximum f ik of that time-stamp, which is the top-

X

shown in

1

Tg

number

N

g 1

(f

t g = 56

1

301.55 264.59

t1g

3

5

1

275.99

tg

273.13

t g = 77

267.05

g 5

293.08

t = 91

4

1

t1g

305.79

g 1

284.33

t

5

t g = 95

6

7 8

1

281.00

t1g

299.02

tg

277.35

g 1

t

290.87

tg

286.89

g 1

295.47

t

t1g t

g

)

271.64

t1g

tg

g in

290.08

t g = 86

2

f

285.55

t

1

g ik

279.46, 1

283.63

9 tg

274.09

tg

278.12

t1g = 18 t

10

11

5

298.86

g

301.02

tg

279.22

t g = 51

300.38

t5g = 68

298.66

t1g = 56

281.59

tg

2

295.39

281.84

t1g g

272.70

tg

273.82

t g = 56

306.16

t5g

280.70

t1g = 57

300.00

tg

270.98

tg

293.71

t

12

5

13

X

COMPARISON

et al replay shot, and the replay must directly follow the close-up shot; et al

et al

Note that the approach proposed in this paper only considers two shot labels; the far and close-up replay

were able to obtain

and depend on viewers’ preferences (Changsheng et al

5 matches shown in

number

1 Precision

Truth CT

5 Proposed CT

5 5 5

10 Proposed

7 5

CT

5 5

8

5

12 Proposed

5

CONCLUSION

distinctly identify event occurrences and to localize the video search space to only relevant and

REFERENCES Online, simultaneous shot boundary detection and key frame extraction for th sports videos using rank tracing . Computer Vision and Image Understanding, 92, Sport news images

IEEE Transactions on Multimedia, 10, IEEE Transactions on Multimedia, 10,

Journal of Information Science and Engineering, 24, IEEE Transactions on Multimedia, 8, Unsupervised soccer video abstraction based on pitch, dominant th color and camera motion analysis ,

IEEE Transactions on Image Processing, 12, Soccer video summarization using enhanced logo detection

th

sports video.

Expert Systems with Applications, 36, Sports highlight detection from keyword sequences using HMM

IEEE Transactions on Circuits and Systems for Video Technology, 14, Hierarchical temporal association mining for video rd

event detection in video databases.

IEEE Signal Processing Magazine, 23, Audio-visual football video analysis, from structure detection to attention analysis.

IEEE Transactions on Circuits and Systems for Video Technology, 15, A decision tree-based multimodal data mining framework for soccer goal detection. The authoring metaphor to machine understanding of multimedia Time interval maximum entropy based event indexing in soccer video. Live Match

Content-based video indexing for sports applications using multi-modal approach ACM Transactions on Multimedia Computing, Communications, and Applications, 4, Uefa champions League, Match Season 2011.

Algorithms and system for segmentation and structure analysis in soccer video. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio.

IEEE Signal Processing Magazine, IEEE, 17 Goal event detection in broadcast soccer videos by th combining heuristic rules with unsupervised fuzzy c-means algorithm.

IEEE Transactions on Circuits and Systems for Video Technology, 17,