Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations 1
Alfian Abdul Halin, 2Nurfadhlina Mohd Sharef, 3Azrul Hazri Jantan Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, 43400, Serdang, Selangor Darul Ehsan, Malaysia
[email protected] 2,3 Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, 43400 Serdang, Selangor Darul Ehsan, Malaysia
[email protected] [email protected]
1, First and Corresponding Author
Abstract This paper presents a semantic shot classification algorithm for soccer videos. Generally, each shot within a match video is assigned either a far or close up-view class label. Initially, the playfield region for each frame within a shot is identified through low-level color image processing. An additional property is then considered namely the largest object size overlapping the playfield. Class labels are then accordingly assigned to each frame based on carefully constructed rule-sets. Majority voting is finally performed where the dominant frame labels within each shot is used as the ultimate class label. Experiments conducted on six soccer matches with varying camera shooting styles have been very promising, where the additional consideration of largest object size is able to significantly reduce the number of misclassifications.
Keywords: Shot Classification, Scene Classification, Soccer Video Analysis 1. Introduction Semantic analysis of soccer video has attracted a great deal of attention, partly due to its entertainment appeal and high commercial potential [1, 18, 19]. Examples of works include events/highlights detection [2, 15, 16, 17] as well as summary and abstract generation [4, 5]. Such tasks require semi or fully-automatic methods to identify audio/visual patterns representative of the desired semantic concepts. However, before any high-level analysis can be done, the video has to firstly be segmented into tractable units. Shot Boundary Detection (SBD) is a process where frame sequences taken by a continuous camera action are grouped into individual segments/units called shots [6]. Shotbased analysis is more manageable since shots are less dense compared to single frames, and less abstract compared to scenes. Broadcasters utilize various camera shooting styles to convey different aspects of the game. Farviews for example are used to convey relatively an uneventful situation such as ball passes. During such instances, the overall playfield is visible without focusing on specific players or events. Close ups on the other hand are used to indicate something worth paying attention to. For example, the camera closes in on a player who committed a bad foul. Being able to identify such shooting styles is useful since they can act as important cues when used for semantic concept identification. An example is shown in [3] where the authors firstly classified shots into slow motion replay, far, close up and medium to be used in a statistical model for event detection. Since the aforementioned task of SBD can only be used for video segmentation, an additional process is therefore required to designate shots with class labels consistent with the current camera shooting style. This process is referred to as Semantic Shot Classification (SSC) and is the focus of this work. In this paper, we propose an SSC algorithm to classify shots into either far or close up-views. We only consider these classes since they are prominently used during soccer broadcasts. Generally, the playfield region is firstly identified within each video frame of a shot. The playfield ratio is then calculated, followed by identifying the largest object overlapping the playfield. The semantic class is finally assigned based on majority voting.
Journal of Convergence Information Technology(JCIT) Volume8, Number4,Feb 2013 doi:10.4156/jcit.vol8.issue4.47
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
2. Related Work Identifying the playfield are within each video frame can be done through searching for regions containing the most amount of grass. The basic idea is that far-view frames exhibit higher playfield ratios compared to close up-views. Therefore, some previous algorithms have used playfield ratio as the sole determining factor for SSC. Normally, low-level visual features are used to calculate the ratios, followed by a comparison with predetermined thresholds for final shot classification. Examples are such as [7] and [8], where color and texture features were extracted to obtain the statistics of each frame’s playfield region. The percentage of the grass-area ratio was compared against a predefined threshold for far or close up-view classification. The work by [9] firstly extracted the hue histograms of 50 randomly selected frames from the first 5-minutes of a video. The hues were summed up to obtain the representative playfield color, which was finally compared with a threshold for shot classification. In [4] and [10], the field color was assumed to be the dominant color within video frames. Working in the HSI (Hue, Saturation and Intensity) color space, the dominant color mean values were learnt. The cylindrical distance in [11] was then used to compare pixels to the learnt playfield color to identify playfield and non-playfield regions. Far and close up-views were consequently assigned by comparing the frame-ratio with a threshold. The main similarity between the mentioned techniques is frame ratios exceeding a certain threshold basically indicates the presence of large grassy regions and hence, a far-view classification. Lesser ratios indicate a close up-view. Sole reliance on the playfield ratio however, can cause misclassification. This is because, while large ratios almost always indicate far-views, in many cases, close up-views can also have similarly (or larger) ratios. An example is illustrated in Figure 1, where the close up-view’s playfield ratio clearly exceeds that of the far-view. Due to this, we propose an additional consideration of object size to improve classification accuracy. Similar to previous works, we begin with playfield region extraction and ratio calculation for each frame within a shot. This is followed by identifying and then calculating the size of the largest object overlapping the playfield. The basic notion is that large objects highly indicate a close up-view, whereas smaller objects indicate a far-view.
Figure 1. Playfield ratio: a) a close up-view - 72.7%; and b) a far-view - 64.4%.
3. The proposed algorithm Figure 2 shows the general flow of the proposed algorithm. Notice the input data file SBD.mat, which is exemplified in Table 1. Initially, this file contains only the first three columns after the process of Shot Boundary Detection (SBD), which segments a video V into m-number of shots, where V = {S1, S2, S3, …, Sm}. Each shot S contains a sequence of individual frames, where Si = [Fi,start, Fi,end], for i = 1, 2, …, m. Note that in this work, the SBD algorithm in [12] was used as it provided high detection accuracy. From the flow in Figure 2, each k-th frame is individually processed within a particular shot. The frame goes through playfield region extraction and playfield ratio calculation, and additionally through the process of (largest) object size calculation if necessary. A class label is assigned to each frame through comparisons with predetermined thresholds. After all frames within a shot have been classified, majority voting is performed where the dominant class label is selected as the final label for the shot. An example of the label assignments are shown in the fourth column of Table 1.
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
Figure 2. The flow diagram of the proposed algorithm.
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
Shot number (i) 1 2 3 . . . 485
Table 1. An example SBD.mat file End Frame (Fi,end) Start Frame (Fi,start) 1 29 30 49 50 155 . . . . . . 41555 42000
Shot Class far close up far . . . close up
3.1 Playfield Region Extraction For this process, the first step is to determine the range of pixel colors regarded as the playfield. Each frame is initially converted from RGB to HSV. The motivation for this conversion is the simplicity and intuitiveness HSV offers compared to RGB. The conversion formulas are shown in Equations 1 through 5. MAX max( R, G , B )
(1)
MIN min( R, G , B )
(2)
undefined GB 60 o MAX MIN GB Hue 60 o MAX MIN BR o 60 MAX MIN RG 60 o MAX MIN
, if MAX MIN 0o
, if MAX R and G B
360 o
, if MAX R and G B
120 o
, if MAX G
240 o
, if MAX B
0 Saturation MAX - MIN MIN 1 MAX MAX
Value MAX
(3)
, if MAX 0 , otherwise
(4)
(5)
3.1.1 Candidate Playfield Extraction Each frame’s dominant color has to firstly be identified. A 64-bin HSV histogram is generated where the dominant color is assumed as the peak index (idxp) of the hue component. Since the governing body of world soccer (FIFA) requires green playfields [13], we look for dominant hues between 0.155 and 0.350. Further considerations were also made. We observed that pixels within an offset α = 0.1 can also be considered as the playfield, provided the Saturation is between 0.0 to 0.1, and Value must exceed 0.13. Consequently, all pixels having adhering to these considerations are identified as playfield pixel candidates. Note that all these considerations were based on extensive inspection of ~30-hours of soccer footage from various broadcasters. The whole process can be written as Eq. 6 and illustrated in Figure 3.
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
1 f ( x, y ) 0
, if Hue f(x,y) [ Hueidxp , Hueidxp ] and 0.00 Saturation f ( x, y ) 0.10
(6)
and Value f ( x, y ) 0.13 , otherwise
where f(x, y) is the current pixel under consideration. This process converts the current video frame into a binary image Coriginal, where playfield pixels are converted into the binary value 1 (i.e. white). Please refer to Figure. 4a and 4b.
Figure 3. Peak hue index with value 0.222 from the generated HSV histogram. Note that if 0.155 ≤ Hueidxp ≤0.350 for dominant hue does not hold, it is highly likely that the camera is not capturing the playfield. Therefore, a close up-view can directly be assigned to the frame as it might indicate footage outside of the playfield, or an actual close up-view with little or no grass. Any further processing is hence skipped for such frames.
3.1.2 Holes Filling Further processing is performed where any black regions encapsulated by white pixels are removed (filled). Filling is done based on set-dilations, complementation and intersection. The formula is given in Eq. 7 [14]. c X k ( X k 1 B) Coriginal , k 1, 2, 3, ...
(7)
where X0 is a pixel within Coriginal that is inside the boundary to be filled. B is a symmetric 3x3 crossedc
shaped structuring element with 4-pixel neighborhood adjacency. The intersection with Coriginal (i.e. the complement of Coriginal) acts as a control so that only the current region of interest is affected by the fill process. This procedure is performed to remove any unwanted objects in order to retain only the playfield. Figure 4c shows the resulting binary image Cfilled.
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
Figure 4. a) The current video frame and its corresponding hue histogram component; b) the binary mask Coriginal generated by applying Eq. 6; and c) the filled binary image Cfilled. 3.1.3 Noise Removal and Final Playfield Region Determination
At this point, Cfilled is considered the rough playfield, which can still contain unwanted (white) regions outside of the estimated playfield. These have to be removed as they might be mistaken for the actual playfield in case their sizes are significant. Moreover, it is also preferable that the candidate playfield is ’clean’ so that a more accurate representation can be obtained. To accomplish this, morphological operations are applied: 1. Downsizing and Morphological Opening: Cfilled is firstly rescaled to 25% its original size. Working at a smaller scale is not only fast, but enables the definition of simpler structuring elements for the upcoming morphological opening operation. Morphological opening smoothens contours, discards small white patches and sharp peaks, and eliminates small interconnections between white pixels. The formula is shown in Eq. 8, which is obtained from [14]. C filled Bopening (C filled
Bopening ) Bopening
(8)
We determined the best structuring element Bopening as rectangular with dimensions 2x10 (Eq. 9). Continuing from the previous example in Figure 4c, the original, downsized and morphologically opened images are shown in Figure. 5a, 5b & 5c, respectively; 1111111111 B opening 1111111111
(9)
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
Figure 5. a) Cfilled; b) the downsized image; c) after morphological opening; d) after morphological closing; and e) the final playfield region Cfinal. 2.
Morphological Closing and Resizing: Gaps and small breaks within Cfilled cannot be filled due to the absence of encapsulating borders. An example is shown in Fig. 5c, where small gaps are produced by white lines and players on the playfield. Due to this, morphological closing is used to fuse these gaps and/or any existing thin breaks. The formula from [14] is used, and is shown in Eq. 10. C filled Bclo sin g (C filled Bclo sin g )
Bclo sin g
(10)
Figure 5d illustrates that the gaps can effectively be removed, revealing a more complete version of the playfield. Through extensive observation of the dataset, the most effective structuring element Bclosing was determined to be 10x30 rectangular (Eq. 11). The morphologically closed version of Cfilled is then rescaled to its original size and determined to be the final candidate playfield binary mask Cfinal (Fig. 5e);
B opening
3.
111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111 111111111111111111111111111111
(11)
Connected Components Analysis: To identify the actual playfield, connected components analysis (CCA) is performed to look for interconnected white pixels within the binary mask. Based on the assumption that the playfield forms the largest region of connected components, CCA locates this region within Cfinal. Note that it is possible that smaller regions of nonplayfield regions still exist, even after steps 1 and 2. Therefore, the largest connected component is assumed to be the playfield. CCA can be represented in Eq. 12 [14]. X k ( X k 1 B) C final , k 1, 2, 3, ...
(12)
where X0 is a pixel within of a known connected component in Cfinal. The structuring element B adheres to the 8-neighborhood pixel adjacency set N8(Xk), where neighboring coordinates are as defined in Eq. 13 [14].
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
N 8 ( X k ) {( x 1, y ), ( x 1, y), ( x, y 1), ( x, y 1), ( x 1, y 1),
( x 1, y 1), ( x 1, y 1), ( x 1, y 1)}
(13)
The connected component with the largest area is deemed to be the actual playfield region. Two values within this region are extracted for further processing and analysis, which are the playfield area ϕ, and the set of the playfield pixel indices Θ. Finally, the playfield ratio is obtained through Eq. 14 by dividing ϕ with the dimensions of the frame, where δ is the calculated playfield ratio, with h and w as the frame’s height and width, respectively.
h w
(14)
3.2 Object size determination Considering the object size overlapping the playfield region is crucial to identify close up-views that contain significant amounts of playfield. The notion is that larger overlapping objects indicate close upview frames, whereas smaller objects indicate far-views. Obtaining the object size is done by counting the pixel intersections between the largest object and the playfield itself. The whole process can be explained in the following: 1. Obtaining Black Objects’ Pixel Indices: Objects other than the playfield are represented by black pixels in the Coriginal (Figure 4b). By taking the complement/inverse of Coriginal , the image in Figure 6 is obtained. CCA is applied to this image where the p-number of connected components other than the playfield is identified. This provides the set of pixel indices for each object, which is represented as L = {l1, l2, …, lp}; 2. Calculating the Intersection: To identify the largest black object overlapping the playfield, the maximum cardinality or count of pixel index intersections is obtained between Θ and each set element within L. This is done using Eq. 15:
max(| l j |) , j 1, 2, ..., p
(15)
Figure 6. The inverted image of Coriginal shows objects overlapping the playfield being represented as the white (connected) components. where | . | is the cardinality of the intersection between Θ and all indices of L, and let ζ be the maximum cardinality of the intersection. Note that ζ only considers pixels overlapping the playfield area, since intersection cardinalities with off-playfield objects will have a value of 0. An example of largest object size intersection is shown in Fig. 7.
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
Figure 7. The two players are identified as the largest black object overlapping/intersecting the playfield. 3.2.1 Shot Classification In this step, each frame within a particular shot is assigned either a far or close up-view label based on the two thresholds Tδ and Tζ. The former relates to playfield ratio δ from Eq. 14, whereas the latter to the object size intersection ζ from Eq. 15. The value Tδ was determined by observing over 1000 actual far-view frames taken from separate matches. Each were binarized before manually identifying the actual playfield regions. Video frames were chosen from matches exhibiting varying broadcasting and presentation styles. This allows for more varied camera shooting styles to be inspected. It was discovered that the minimum playfield ratio was between 43% to 58% across the sampled frames. Due to this, the threshold for Tδ was set to the minimum value within the range, which is 43%. Similarly for Tζ, over 1000 actual close up-view frames were observed from several matches. Each were binarized and the size of the largest objects were manually calculated. It was discovered that the minimum sizes were between 13,500 and 17,500 pixels. Therefore, the threshold Tζ was set to the minimum value of 13,500.
Majority Voting For final shot classification, majority voting is performed. Since there is a possibility of frames having different labels within a particular shot, the majority frame label is used as the shot’s actual class label.
4.0 Experimental results Match video 1. Barcelona vs. Man. United 2. Man. United vs. Chelsea 3. Inter Milan vs. Bari 4. Real Madrid vs. Espanyol 5. Man. United vs. Wigan 6. Man. City vs. Man. United
Table 2. The video dataset. League / Championship
# far-views
# close up-views
96 92 97 101 95 98 579
104 108 103 99 105 102 621
Champions’ League ‘09 Champions’ League ‘08 Italian Serie A Spanish La Liga English Premier League English Premier League TOTAL
The proposed algorithm was implemented using MATLAB 2007a. Experiments were conducted using a video dataset comprising 1200-shots from 6 different matches, as shown in Table 2. All frames were processed as JPEG images with a dimension of 640x480. Three different leagues/championships were considered since each employed different yet consistent camera shooting styles. This variety was necessary to demonstrate the applicability of the algorithm across broadcasters. Shot classification was evaluated using precision and recall, which can be calculated using Eqs. 16 and 17, respectively.
precision
true positives true positives false positives
(16)
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
recall
true positives true positives false negatives
(17)
True positives, false positives and false negatives can be explained based on the following context; supposing the positive class being predicted is far-view:
True positive: Assigning a shot as far-view, when the actual class is indeed a far-view; False positive: Assigning a shot as far-view, when the actual class is a close up-view; False negative: Assigning a shot as close up-view, when the actual class is a far-view.
In the following, a comparison is made between considering playfield ratio alone and considering playfield ratio and object size.
4.1 Results and Discussion The results show that the proposed algorithm is able to identify the majority of shot classes correctly. Some conclusions were drawn based on the experiments: 1.
Table 3 indicates that the number of false positives for far-view classification is high, ranging from ~21% (21 from 101 in Match-1) to 67% (62 from 92 in Match-2). It was discovered that this was due to the respective close up-view shots being dominated by frames having high amounts of playfield. This shows that using playfield ratio alone is not enough for accurate classification; The high number of false positives brings down average precision (77%) for far-view classification. Nonetheless, average recall is high at 99%, meaning that only 7 far-views were misclassified. Despite this high percentage, many close up-views were also wrongly classified due to the entire task being a two-class classification problem. Resultantly, average recall for close up-view classification was pulled down to 66%, which is equivalent to 252 misclassifications; 2. Table 4 on the other hand shows promising overall improvement. Average precision increased to 94% for far-view classification whereas is maintained at 98% for close up-view classification. The number of false positives for far-view classification especially, tremendously dropped to 7-times lower than in Table 3. Although false positives slightly increased for close up-views, it is acceptable since the average precision is maintained. Furthermore, the most noticeable improvement is the tremendous increase in recall for closeup view classification. These results justifies the additional consideration of object size, which proved to be the deciding factor when large playfield regions were present.
4.1.1 Classification Errors Some classification errors occurred due to the presence of superimposed text. Specifically, the text blocks were deemed as the largest overlapping object. Moreover, their sizes exceeded the minimum value of Tζ. This caused some far-view shots to be misclassified as close up-views. This scenario is depicted in Figure 8.
5.0 Conclusion Far-view classification has reported average precision of 94% whereas recall was 98%. Similarly, the precision and recall were also promising for close up-view classification with 98% and 94%, respectively. These measurements are significant improvements and clearly justify the additional consideration of object size. Correctly identifying shot classes is very important, especially during the initial phases of semantic soccer video analysis. Semantic shot labels are crucial in tasks such as events and highlights detection, where they can act as mid-level visual cues to identify the important segments containing desired semantic concepts. In future works, we hope to use better threshold determination
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
mechanisms that do not rely on rule-sets. Moreover, it might also be useful to consider shot classes other than far and close up-views alone.
Shot Class
Far-view
Close upview
Table 3. Precision and Recall – Playfield Ratio consideration only. Match Actual True +ve False +ve Precision 1 96 96 33 0.74 2 92 88 62 0.59 3 97 96 34 0.74 4 101 99 21 0.83 5 95 95 57 0.63 6 98 98 45 0.69 Average 0.77 1 104 71 0 1.00 2 108 46 4 0.92 3 103 69 1 0.99 4 99 78 2 0.98 5 105 48 0 1.00 6 102 57 0 1.00 Average 0.98
Table 4. Precision and Recall – Playfield Ratio and Object Size considerations. Shot Class Match Actual True +ve False +ve Precision 1 96 93 7 0.93 2 92 85 16 0.84 3 97 96 4 0.96 4 101 98 2 0.98 Far-view 5 95 95 5 0.95 6 98 98 2 0.98 Average 0.94 1 104 97 3 0.97 2 108 92 7 0.93 3 103 99 1 0.99 Close up4 99 97 3 0.97 view 5 105 100 0 1.00 6 102 100 0 1.00 Average 0.98
Recall 1.00 0.96 0.99 0.98 1.00 1.00 0.99 0.68 0.43 0.67 0.79 0.46 0.56 0.60
Recall 0.97 0.92 0.99 0.97 1.00 1.00 0.98 0.93 0.85 0.96 0.98 0.95 0.98 0.94
Figure 8. Example scenario of a far-view frame to be misclassified as a close up-view.
Semantic Shot Classification in Soccer Videos via Playfield Ratio and Object Size Considerations Alfian Abdul Halin, Nurfadhlina Mohd Sharef, Azrul Hazri Jantan
6. References [1] Li Li, Xiaoqing Zhang, Weiming Hu, Wanqing Li, and Pengfei Zhu, “Soccer video shot classification based on color characterization using dominant sets clustering”, Advances in Multimedia Information Processing - Lecture Notes in Computer Science, Vol. 5879, pp. 923-929, 2009. [2] Xueming Qian, Guizhong Liu, Huan Wang, Zhi Li and Zhe Wang, “Soccer Video Event Detection by Fusing Middle Level Visual Semantics of an Event Clip”, Advances in Multimedia Information Processing – Lecture Notes in Computer Science, Vol. 6298, pp. 439-451, 2011. [3] Xu Changsheng, Jinjun Wang, L Lu, and Y Zhang, “A novel framework for semantic annotation and personalized retrieval of sports video”, IEEE Transactions on Multimedia, Vol. 10, No. 3, 2008, pp. 421-436. [4] A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic soccer video analysis and summarization”, IEEE Transactions on Image Processing, Vol. 12, No. 7, 2003, pp. 796-807. [5] M. A. Refaey, Wael Abd-Almageed, and L. S. Davis, “A logic framework for sports video summarization using text-based semantic annotation” In Proceedings of the 3rd International Workshop on Semantic Media Adaptation and Personalization, pp. 69-75, 2008. [6] J. Yuan, H. Wang, L. Xiao, W. Zheng, J. Li, F. Lin, and B. Zhang, “A formal study of shot boundary detection”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 2, 2007, pp. 168-186. [7] Chen Shu-Ching, Shyu Mei-Ling, Chen Min, and Zhang Chengcui, “A decision tree-based multimodal data mining framework for soccer goal detection”, In Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 265-268, 2004. [8] Chen Min, Chen Shu-Ching, Shyu Mei-Ling, and K. Wickramaratna, “Semantic event detection via multimodal data mining”, IEEE Signal Processing Magazine, Vol. 23, No. 2, 2006, pp. 38-46. [9] Peng Xu, Xie Lexing, Chang Shih-Fu, A. Divakaran, A. Vetro, and Sun Huifang, “Algorithms and system for segmentation and structure analysis in soccer video”, In Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 721-724, 2001. [10] Huang Chung-Lin, Shih Huang-Chia, and Chao Chung-Yuan, “Semantic analysis of soccer video using dynamic Bayesian network”, IEEE Transactions on Multimedia, Vol. 8, No. 4, 2006, pp. 749-760. [11] K. N. Plataniotis and A. N. Venetsanopoulos, Color image processing and applications, SpringerVerlag: Berlin, Germany, 2000. [12] W.Abd-Almageed, “Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracing”, In Proceedings of the 15th IEEE International Conference on Image Processing, pp. 3200-3203, 2008. [13] FIFA - Laws of the game 2010/11. [Online] Available at: , [Accessed on 10 November 2010], 2010. [14] Rafael C. Gonzalez and Woods E. Richards, Digital Image Processing - 2nd edition, Prentice Hall, New Jersey, 2002. [15] Xueming Qian, Huan Wang, Guizhong Liu and Xingsong Hou, “HMM based soccer video event detection using enhanced mid-level semantic”, Multimedia Tools and Applications - online first, 2011, pp. 1-23. [16] Halin, Alfian Abdul, Abul Hasanat, Mozaherul Hoque and Rajeswari, Mandava, “Event Detection in Soccer Videos through Text-based Localization and Audiovisual Analysis”, International Journal of Digital Content Technology and its Applications, Vol. 6, No. 15, pp. 164-170-6, 2012. [17] Kim, Hyun-Sook, “Automatic Classification Of Offensive Patterns For Soccer Game Highlights using Neural Networks”, Malaysian Journal of Computer Science, Vol. 15, No. 1, 2002, pp. 57-67. [18] Zhicheng Wei and Xue Yang, “A Novel Soccer Video Summarization Model Based on Video Time Density Function”, International Journal of Digital Content Technology and its Applications, Vol. 6, No. 10, pp. 248-256, 2012. [19] Masoomeh Zameni, Mahmood Fathy and Amin Sadri, “A Low Cost Algorithm for Expected Goal Events Detection in Broadcast Soccer Video”, International Journal of Digital Content Technology and its Applications, Vol. 4, No. 8, pp. 118-125, 2010.