Robust Dominant Color Region Detection with

0 downloads 0 Views 519KB Size Report
template and may require template initialization for each game separately ..... low and high thresholds for Hd, and TG is the threshold for Gd. TLow field is essen ..... [14] J. Assfalg, M. Bertini, C. Colombo, and A. del Bimbo, “Semantic annotation.
Robust Dominant Color Region Detection with Applications to Sports Video Analysis ?

Ahmet Ekin a,1 and A. Murat Tekalp a,b a Department

of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA 14627

b College

of Engineering, Koc University, Istanbul, Turkey {ekin,tekalp}@ece.rochester.edu

Abstract This paper proposes a novel automatic dominant color region detection algorithm that is robust to temporal variations in the dominant color due to field, weather and lighting conditions throughout a sports video. The proposed algorithm uses two color spaces, a control space and a primary space. The robustness of the algorithm results from adaptation of the statistics of the dominant color in the primary space with drift protection using the control space, and fusion of the information from two spaces. We then present applications of the proposed dominant color region detection method to generic sports video processing, including shot boundary detection and shot type classification in sports video, referee and player of interest detection, and a fast algorithm for play-break event detection. The detection of referee and players of interest and their teams are achieved by a single algorithm, which uses distinguishing uniform colors of these objects. The efficiency, effectiveness, and robustness of the proposed algorithms are demonstrated over a dataset of various sports video, including basketball, football, golf, and soccer video. Key words: automatic dominant color region detection, sports video processing, shot-type classification, referee and players of interest detection, play-break event detection

? This work is supported in part by National Science Foundation under grant number IIS-9820731 and in part by NYSTAR grant to the Center for Electronic Imaging and Science, Rochester, NY. 1 Ahmet Ekin is the corresponding author of this manuscript.

Preprint submitted to Elsevier Science

1

Introduction

Sports video generates large interest worldwide; hence, the processing of sports video results in applications with high impact. For example, detection of important events and creation of summaries, makes it possible to deliver sports video over various networks, including narrowband networks, such as the Internet and wireless, since the valuable semantics generally occupy only a small portion of the whole content [1]. In addition, automatic and real- or near realtime processing of sports video is a cost-effective alternative to current manual off-the-shelf solutions [2]. Finally, computer vision algorithms in sports domain may be useful for revenue control [3], virtual game enhancements [4], scouting, and coaching. Since most sports are characterized by distinct color features, such as field color and team uniform colors, this paper introduces color-based features and algorithms for generic analysis of sports video. In the sports video processing literature, color has been commonly used for detection and tracking of specific objects [5]-[9]. In [5]-[7], color is utilized to detect and track ball in soccer, tennis, and baseball domains, respectively. Football players are tracked based on their intensity templates under closed-world assumption in [8]. Similarly, color templates of players are used for tracking in [9] to generate 3-D scenes of soccer games. Another popular use of color is detecting field region since such information can be utilized for various purposes. For example, the number of field colored pixels in a frame is used for classification of soccer frames into several view types [10], such as long, medium, and close-up views. Then, the authors use this information for structure analysis of soccer video [11]. Field region information has been used for verification of a particular play type in [12] and [13], where start of plays in football and serve events in tennis are detected, respectively. Color features of field and/or field marks may also be used without detecting these regions. In [14], the authors do not attempt to detect the field region explicitly, but they use color histogram characteristics, along with other types of features, to verify a frame is of field view type. Similarly, color features of paint region in basketball are used for event detection in [15]. Babaguchi et al. [16] observe that football touchdown events occur in those shots that start with green colored field view and ends with endzone view, which is non-green. Thus, they compute the similarity of color features of N frames from the start and end of each shot to a touchdown shot template. However, the proposed approach is sensitive to the selected template and may require template initialization for each game separately since endzone color may be different from one stadium to another. In [1], the authors proposed a real-time soccer video analysis and summarization framework. However, the algorithms in that work utilize specific characteristics of the soccer domain, and thus are not generic for use in other 2

sports video. In contrast, this paper introduces generic color-based features and algorithms for robust dominant color region detection and sports video processing. The main differences between this paper and [1] are: • In this paper, we use a primary space and a control space for robust adaptive dominant color region detection. We use segmentation results in a normalized control space to avoid drift of the statistics of the dominant color due to adaptation in the primary space. Furthermore, this paper introduces an algorithm to fuse the results from two color spaces to improve the pixelwise accuracy of the final segmentation mask. • Referee detection algorithm in [1] assumes that referee uniform can be described by one color. Although this assumption holds for soccer, referees may wear uniforms with multiple colors, such as striped black and white shirts in basketball and football games. For the same reason, [1] does not propose any algorithms to detect players, whose uniforms are usually multi-colored, whereas this paper proposes a new single algorithm for the detection of both referee and players. • This paper introduces a generic play-break event detection algorithm, and a new shot type classification algorithm for basketball video. In general, main contributions of this paper are: (1) We propose a new generic dominant color region detection algorithm with a primary space and a control space for sports video. The proposed algorithm automatically detects the dominant color of the field independent of the sports type, and updates the statistics of the dominant color to the variations in imaging conditions throughout a game. We also introduce a fusion algorithm to integrate the segmentation results of two color spaces for pixelwise more accurate segmentation. The effectiveness of the proposed dominant color region detection algorithm are demonstrated over basketball, football, golf, and soccer video by objective performance metrics. (2) We propose generic algorithms for referee detection and player identification, and play-break event detection. A single algorithm has been developed to detect referee and players of interest. The algorithm describes distinguishing color features of referee and player shirts by color histograms, and locates and verifies these objects by histogram similarity. Play-break events are detected based on shot type and shot length features, and game summaries are generated from detected play events. Both algorithms are applicable to a variety of sports. (3) We introduce a novel shot type classification algorithm for basketball video. The proposed algorithm automatically determines the required parameter values by using shot length and average dominant color ratio features. The use of shot length for shot classification is new in the sports video literature, and it provides robustness to the variations in the field 3

structure of basketball courts. The proposed dominant color region detection algorithm is described in the next section. Then, in Sec. 3, we introduce applications of the proposed method to color-based sports video processing at the semantic level. In Sec. 4, we demonstrate the efficiency and the effectiveness of the proposed algorithms over various sports video, including basketball, football, golf, and soccer. Finally, Sec. 5 concludes the paper.

2

Dominant Color Region Detection

The playing field in most sports can be described by one distinct dominant color. This dominant color, however, demonstrate variations from one sport to another, from one stadium to another, and sometimes within one stadium during a sporting event. In a specific sporting event, variations in the dominant field color are caused by changes in environmental factors, such as sunset, shadows, rain, and snow, spatio-temporal variations in illumination intensity due to irregularly positioned and/or imperfect stadium lights, and spatial variations in field properties, such as grass color variations and field deformations. In this section, we propose a new algorithm that learns the statistics of the dominant color, adapts these statistics to changing imaging conditions for robust performance, and detects dominant color region independent of the sports type. The generic nature of the algorithm is an important first step for generic sports video processing, since reliable and accurate detection of field region is important for higher level analysis of sports video, such as shot classification and event detection. In the following, we first describe the related work in color image processing and sports video processing. Then, color spaces and distance functions are explained in Sec. 2.2. Finally, Sec. 2.3 introduces the proposed algorithm.

2.1 Background The robustness of color models to changes in imaging conditions, such as viewing direction, object geometry, illumination direction, illumination intensity, and illumination color, is a desirable feature. Since some of the above imaging conditions may change during a sports broadcast, it is important to select a color space that is robust to the variations in the above features. Funt and Finlayson [17] used the ratios of RGB color triplets from neighboring locations to eliminate the effects of variations in illumination intensity and color. Gevers and Smeulders [18] evaluated different color spaces in their robustness to the changes in various factors. Then, they propose m1m2m3 space, which 4

is also based on color ratios as [17], but, unlike [17], m1m2m3 space is insensitive to surface orientation changes. Although the proposed color models derived from color ratios are useful for isolated object detection, they respond to only edges and yield equal values (zero) for smooth areas independent of RGB color values. Therefore, for our application, they are not suitable. Another alternative is to use color constancy algorithms, which match similarly colored surfaces under unknown illumination without a priori knowledge of the surface reflectance functions [19]. However, Funt et al. [20] advocate the need for better color constancy algorithms for object detection. In sports video processing literature, popular color spaces for field region extraction are hue-saturation variants, such as HSI and HSV, [6,10,12,14] and RGB [9,16]. However, the adaptation of color statistics to changing imaging conditions is neglected except for [12]. Li and Sezan [12] discuss the need to compute new specifications of green in their football play-break detection system. Their approach, in HSV space, is to use the physical model of the field for adaptation. They update green statistics for the scenes where the field lines are visible, 2 which limits the proposed approach to a specific sports type, i.e. football, and favors a specific shot type, i.e. long shots. In contrast, our approach does not assume any physical model of the domain; hence it is generic, and it also automatically updates color variations between different shot types. 2.2 Color Spaces and Distance Metrics Gevers and Smeulders [18] summarize the limitations of different color spaces. Their work shows that color models that are correlated to intensity (I), such as Y of Y CrCb and L of La∗ b∗ , and those that are linear combinations of RGB, such as I1I2I3, are variant to common imaging conditions, while normalized color spaces, such as rg, are variant to illumination color, highlights, and interreflections, and they are invariant to viewing direction, surface orientation, and illumination direction and intensity. On the other hand, hue is found to be invariant to highlights in addition to all of the conditions normalized color spaces are invariant of. Since a detailed discussion on color models is outside the scope of this paper, we refer to two excellent sources on color models and color space conversions for an extended analysis [21,22]. In this paper, we have conducted experiments over normalized color spaces, rg and CrCb, and color spaces with intensity information, HSI and La∗ b∗ for dominant color region detection. Although intensity information is variant to many imaging conditions and may not be convenient for object detection, it is used for color segmentation [23]. For HSI space, the use of intensity 2

Unfortunately, the details of the adaptation algorithm are not available in [12]

5

information also prevents the breakdown of the algorithm when HS values correspond to achromatic region. Various color distance metrics exist in the literature [23,24]. In this paper, we employ two distance metrics, Euclidean metric (L2 norm), defined in Eq. 1 for color vectors, Xj and Xk , for all spaces except for HSI, for which robust cylindrical metric, defined in Eqs. 2-4, is used [23]. dEuclidean (j, k) = (

p X

1

(Xjl − Xkl )2 )( 2 )

(1)

l=1

dintensity (j, k) = |Ij − Ik | dchroma (j, k) =

q

dcylindrical (j, k) =

(2)

(Sj )2 + (Sk )2 − 2Sj Sk cos(θ)

(3)

q

(4)

(dintensity (j, k))2 + (dchroma (j, k))2

In Eqs. 2-4, S, and I refer to saturation and intensity, respectively, j and k are the j th and k th pixel, and θ is the minimum absolute difference between the two hue values, i.e. θ is limited to [0, π]. For achromatic dominant color regions and pixels, the distance dcylindrical is taken to be equal to dintensity . 2.3 Proposed Algorithm In this section, we introduce a new dominant color region detection algorithm for sports video. The proposed generic algorithm automatically extracts dominant color statistics and adapts them to the variations in imaging conditions that are typical for sports video. We observed that none of the color spaces, if they are used alone, performs well under all variations in sports video. Furthermore, when intensity information is used, the resulting segmentation map is cleaner, but the statistics need adaptation, which may cause drifts from the actual dominant color statistics in the long term. Therefore, we propose to use two color spaces, which should be complementary and essentially different color spaces for robustness. One of these spaces, control space, is employed to control the other space, called primary space, which is adapted to local variations; hence, color spaces that include intensity information, such as HSI or La∗ b∗ , may be employed as primary space. Control space serves for two purposes: 1) It prevents the local statistics to dominate and eventually drift from the true statistics in the primary space, and 2) the fusion of segmentation masks in primary and control color spaces increases the pixelwise accuracy in dominant color region detection. In Fig. 1, the flowchart of the proposed algorithm is given. At start-up, the system computes initial statistics and the values of several parameters for 6

each color space from the frames in the training set (this step has not been included to Fig. 1). After the initialization of parameters, dominant color region for each new frame is detected in both control and primary color spaces. Segmentation results in these spaces are used by the fusion algorithm, explained in Sec. 2.3.1, to obtain pixelwise more accurate final segmentation mask. The rest of the blocks in the flowchart are utilized for adaptation of primary color space statistics by two feedback loops. The inner feedback loop, connected with the dashed lines, computes local statistics in primary color space and captures local variations, whereas the other feedback loop, connected with the dotted lines, becomes active when segmentation results conflict with each other, which indicates drifting of local statistics from true statistics in primary color space. The activation of this outer feedback loop resets primary color statistics to their initial values. 3 In the following, we first present the algorithm for detection of dominant color region and fusion of results. Then, computation of color statistics for initialization and adaptation are explained in Sec. 2.3.1. Finally, we describe the details of the adaptation algorithm in Sec. 2.3.3.

2.3.1 Dominant Color Region Detection and Fusion of Results Dominant field color is described by mean values of each color component, such as rM and gM for rg, and HM , SM , and IM for HSI. Assuming these parameters are computed at the start-up and updated to account for various changes in imaging conditions, dominant color region in the current frame is determined by finding the color distance of each pixel, cylindrical distance for HSI and Euclidean distance for others, to these mean values in both primary and control color spaces. If color distance of a pixel to the dominant color is C P less than Tcolor in control space (Tcolor in primary color space), then the pixel is assigned as a field pixel in the corresponding color space. The parameters C P Tcolor and Tcolor are the maximum allowable distance values in control and primary color spaces, respectively. As shown in Fig. 1, dominant color region detection results in control and primary spaces are fed into the fusion algorithm. Denoting these input masks as MC and MP , segmentation results in control and primary spaces, respectively, and the output mask from the fusion algorithm as MF , we define GC and GP as dominant colored pixel ratios in MC and MP , respectively. Both GC and GP are normalized by the frame size so that their values are in the interval [0,1]. Then, the pseudo-code of the proposed fusion algorithm can be written as: 3

Although it is not explicitly shown in the figure, if a number of consecutive failures in primary color space occur, initial primary color space statistics are updated by control space result.

7

if |GC − GP | > TLarge Inconsistent masks, set MF = MC , and call adaptation else if GC < TN on−F ield or GP < TN on−F ield if GC < GP set MF = MC else set MF = MP else if GC > TF ield or GP > TF ield if GC > GP set MF = MC else set MF = MP else set MF = MP The proposed fusion algorithm determines the consistency of the results by verifying the absolute difference of GC and GP to be small (the first condition in the pseudo-code). When there is inconsistency, the algorithm assumes that the result in primary color space is unreliable, discards MP , and uses control space dominant color region mask, MC , as the final result, MF . In this case, the second feedback loop, linked with dotted lines in Fig. 1, are activated to reset the primary color space statistics to their initial values. When the results are consistent, the fusion algorithm selects the mask image with the smallest number of dominant colored pixels if either GC or GP is small. This is because the conditions are indicative of an out-of-field or a close-up view, defined in Sec. 3.2, and selecting the mask with lower dominant colored pixel ratio increases the segmentation accuracy. In the opposite case, i.e. when GC or GP are large, the fusion algorithm selects the mask image with the largest number of dominant colored pixels. When the above two conditions are not satisfied, primary color space is used.

2.3.2 Computation of Dominant Color Statistics Dominant color statistics are computed for two purposes: 1) to initialize the statistics in control and primary color spaces from the training set, and 2) to adapt the statistics in the primary color space to variations in imaging conditions, shown as “compute color statistics” block in Fig. 1. Since dominant color is described by mean color, the algorithm estimates the mean values of each color component from average color histograms of either the frames in the training set (initialization) or the frames in the buffer (adaptation). Although the same algorithm is employed for both purposes, there is a slight difference in the preprocessing. For the frames in the training set, the system builds one-dimensional average color histograms using all pixels, while the average histograms of the frames in the buffer are computed from those pixels whose 8

color values are close to the dominant color. The peak index for each one-dimensional average histogram is localized to estimate the mean value of dominant color for the corresponding color component. In order to remove noise effects due to using a single index, an interval about each histogram peak is defined, where the interval boundaries (imin , imax ) correspond to the closest indices to the peak index that have less pixel count than some percentage of the histogram peak (20% in our experiments). After the interval boundaries are determined, the mean color in the detected interval is computed for each color component, e.g. rM and gM for rg, and HM , SM , and IM for HSI. This algorithm is similar to alpha-trimmed histograms [25]. The difference is that we define an interval about the peak index, and the value of α is automatically determined by the saliency of the peak index. That is, an interval may consist of only peak index, which is the case for a large isolated peak, or may include a large number of neighboring indices, which may be the case for small peaks in long tailed histograms. C During initialization, the system also computes the parameters, Tcolor and P Tcolor , defined in Sec. 2.3.1 if one of the following information for the frames in the training set is available: i) a rough boundary of the field region, or ii) C P dominant colored pixel ratio. Then, Tcolor and Tcolor are adjusted to classify all pixels, other than the outliers, in the bounding box or the given percentage of pixels in the whole frame in the dominant color region class.

2.3.3 Adaptation of Color Statistics The adaptation of primary color statistics is achieved by either the inner feedback loop or both feedback the inner and the outer loops in Fig. 1. The inner feedback loop, linked with dashed lines in Fig. 1, computes local statistics from the histograms of the last N selected frames. For this purpose, every frame that has enough number of dominant colored pixels is added to the frame buffer. The histograms from these frames are constructed by using: 1) those pixels that are in the dominant color region class, and 2) those that are P not in the dominant color class but have close distance to Tcolor and satisfy a special condition, which is determined by the characteristics of the selected primary space. For example, when HSI is the primary space, hue value of these pixels should be within a range determined by the initial hue mean value, since hue is shown to be invariant to highlights in addition to several of the imaging conditions [18]. When N frames are accumulated in the buffer, the average histogram and the mean values of color components from the average histogram are computed as explained in Sec. 2.3.2. The statistics in primary color space are updated as the sum of weighted local and initial statistics as shown in Eq. 5, where wl and wi denote the weights 9

for local color mean vector, Mlocal , and initial color mean vector, Minitial , respectively. As shown in Fig. 1, wl becomes zero when local statistics become unreliable, i.e., when the outer feedback loop is activated. Mcurr = wl ∗ Mlocal + wi ∗ Minitial

3

(5)

Applications

In this section, we present four applications that use color-based features. In the next section, we explain a shot boundary detection algorithm for sports video. Then, in Sec. 3.2, shot type classification algorithms for soccer and basketball are introduced. Then, a single generic algorithm for referee and players of interest detection is presented. Finally, a novel play-break event detection algorithm is introduced in Sec. 3.4.

3.1 Shot Boundary Detection

Shot boundary detection is a common pre-processing step in video analysis. Although there exists a large body of literature, it is not a completely solved problem [26]. Shot boundary detection in sports video is challenging since large camera and object motion prevent the employment of motion-based algorithms, such as those based on change detection, and if field region appears in two consecutive shots, the color correlation of these shots will be high, which may result in missed shot boundaries. Therefore, we introduce a robust shot boundary detection algorithm for sports video by adapting color histogram thresholds to frame features and using dominant color ratio difference in addition to color histogram difference. Shot boundaries are detected by computing two features: 1) Gd , the absolute difference of the dominant colored pixel ratio between two frames, and 2) H d , the difference in color histogram similarity. Computation of Gd between the ith and (i−k)th frames is given by Eq. 6, where Gi represents the dominant colored pixel ratio for frame i, i.e. the number of dominant colored pixels in the ith frame normalized by the frame size. For the same frame pair, Hd is computed by Eq. 7, where HI(Hi , Hi−k ) indicates the similarity between the normalized histograms of ith and (i − k)th frames, Hi and Hi−k , respectively, and it is computed by popular histogram intersection method [27]. The algorithm uses different k values in Eqs. 6 and 7 to detect instant and gradual transitions. Instant transitions, i.e. cuts, are detected for k = 1, while gradual transitions, i.e. wipes, dissolves, and other editing effects, are verified against a range of k 10

values for k > 1. Gd (i, k) = |Gi − Gi−k |

(6)

Hd (i, k) = |HI(Hi , Hi−k ) − HI(Hi−k , Hi−2∗k )|

(7)

A shot boundary is determined by comparing Hd and Gd with a set of thresholds. A novel feature of the proposed method, in addition to the introduction of Gd as a new feature, is the adaptation of the thresholds over Hd . When a sports video shot corresponds to out-of-field or close-up views (the definitions of both will be given in Sec. 3.2), the number of field colored pixels will be very low and shot properties will be similar to those of a generic video shot. In such cases, the problem is the same as generic shot boundary detection; hence, we use only Hd with a high threshold. In the situations where the field is visible, we use both Hd and Gd , but using a lower threshold for Hd . Thus, we define four thresholds: THLow , THHigh , TG , and TfLow ield . The first two thresholds are the low and high thresholds for Hd , and TG is the threshold for Gd . TfLow ield is essentially a rough estimate for low dominant colored pixel ratio and determines when the conditions change from out-of-field or close-up view to field view. These thresholds can be adjusted for each sports type after a supervised or non-supervised learning stage. Furthermore, the proposed algorithm is robust to spatial downsampling since both Gd and Hd are size-invariant.

3.2 Shot Type Classification

Production crews use different shot types to convey certain semantics, which can be used for high-level video analysis in a particular domain. Cinematographers classify a shot into one of long, medium, or close shot class [28], the definitions of which are usually domain-dependent. In the following, we define these three classes for sports video: • Long shot: A long shot displays the global view of the field as shown in Fig 2 (a) and (b). • Medium shot: A medium shot, where a whole human body is usually visible, is a zoomed-in view of a specific part of the field as in Fig 2 (c) and (d). • Close-up or Out-of-field Shot: A close-up shot usually shows the abovewaist view of a player (Fig 2 (e)). The audience, coach, and other shots are denoted as out-of-field shots (Fig 2 (f)). In the remainder of this paper, we will use the term close-up shots (or close-ups) to refer to both close-ups and out-of-field shots. 11

In the following, we describe color-based shot type classification algorithms for soccer and basketball. Both algorithms use dominant colored pixel ratio in a frame, G, either for final decision or in the interim steps. The variations in the algorithms result from the differences between the field structures of these sports.

3.2.1 Shot Type Classification for Soccer In soccer broadcasts, a frame of a close-up or an out-of-field shot consists of few dominant (grass for soccer) colored pixels while that of a long shot has a high dominant colored pixel ratio, G. Then, an intuitive method for classification is to define regions in G space, where a low G value in a frame corresponds to a close-up or an out-of-field view, while high G value indicates that the frame is a long view, and in between, a medium view is selected. In [10], this approach was employed for soccer. However, by using only dominant colored pixel ratio, medium shots with high G value will be mislabeled as long shots. The error rate due to this approach depends on the broadcasting style and it may reach intolerable levels for higher level applications, such as goal event detection and summarization in [1]. Therefore, another feature is necessary for accurate classification of the frames with a large number of grass colored pixels. We propose a compute-easy, yet very efficient, cinematographic measure for the frames with a high G value. We define regions by using Golden Section spatial composition rule [29], which (itself or one of its variations) is used by broadcasters to capture a scene. The rule suggests dividing up the screen in 3:5:3 proportion in both directions, and positioning the main subject(s), i) in the center region if there is one main subject in the scene as in Fig. 3 (a), or ii) on the intersection of the lines if multiple subjects are visible and are distant to each other. For soccer video, we divide the field (grass) region box instead of the whole frame since the field defines the region of interest in soccer broadcasts. Grass region box is defined as scaled minimum bounding rectangle (MBR) around the field region, where scale is adjusted to fit at least 90% of the grass pixels in the rectangle. In Fig. 3, the examples of the regions obtained by modified Golden Section rule are displayed over several medium and long views. In the regions R1 , R2 , and R3 in Fig. 3 (d) and (f), we define, 1) GR2 , the grass colored pixel ratio in the second region, and 2) Rdif f , the mean value of the absolute grass color pixel differences between R1 and R2 , and between R2 and R3 , defined in Eq. 8, to measure the distribution of the field pixels in medium and long views. When Golden Section rule is applied, long shots have high GR2 and low Rdif f values. On the other hand, medium shots with one main subject have low GR2 and high Rdif f , while medium shots with multiple subjects will have medium GR2 and Rdif f , which enable distinguishing them from long shots. Bayesian classifier is employed to assign the feature vector 12

0

x = [GR2 Rdif f ] to the appropriate class [1]. 1 Rdif f = {|GR1 − GR2 | + |GR2 − GR3 |} 2

(8)

The flowchart of the proposed shot classification algorithm is shown in Fig. 4. The first stage uses G value and two thresholds, TcloseU p and Tmedium to determine the frame view label. These two thresholds are roughly initialized to 0.1 and 0.4 at the start of the system, and as the system collects more data, they are updated to the minimum of the grass colored pixel ratio, G, histogram as suggested in [10]. When G > Tmedium , the algorithm determines the frame view by using cinematographic features GR2 and Rdif f .

3.2.2 Shot Type Classification for Basketball Basketball is an indoor sport with small field size and short distance between the field and the stadium. For this reason, unlike soccer, long shots almost always display some part of the stadium, which decreases the dominant colored pixel ratio, G, in long shots. Furthermore, painted regions on the field may exist, which further decreases G and aggravates the shot classification problem. The conditions above make basketball shot classification, arguably, harder than that of soccer. We propose a flexible method, the accuracy of which can be adjusted to a particular application. The algorithm defines regions in dominant colored pixel ratio, G, space, which is also employed for soccer. In basketball broadcasts, however, it is observed that a long shot usually does not have excessive amount of field colored pixels due to the physical structure of the field explained above, such as its small size and its being painted. Therefore, three parameters, (TcloseU p, Tmedium , ThighLong ) are introduced to define regions in G space. The first two of these parameters have the same meaning as for soccer, i.e. the former separates close-ups from medium shots and the latter separates medium shots from long shots. ThighLong adds an upper bound to long shot field colored pixel ratio, i.e., a long shot is defined as Tmedium < G Tmedium

Long View Golden Sec. Composition Medium

TcloseUp

Tmedium

Fig. 4. The flowchart of the shot type (view) classification algorithm

27

14

−. Close−Up − Medium −− Long

12

Amplitude

10

8

6

4

2

0

0

0.1

0.2

0.3

0.4 0.5 0.6 Field Colored Pixel Ratio

0.7

0.8

0.9

1

Fig. 5. The distribution of field color ratio in a training video for the three shot classes

(a)

(b)

Fig. 6. a) Referee uniform colored blocks and the bounding box for the detected referee region, and b) Detected players of interest from each team Video Shot Boundary Detection and Shot Type Classification Long

CloseUp Medium

Long

Medium CloseUp

Long

Select Long Shots having length bigger than L min > L min

> L min < TB

> L min > TB

Concatenate the intervals between two long play shots that are smaller than T B in length

PLAY

BREAK

PLAY

Fig. 7. An example of play-break event detection (please note that unlike illustrated in the figure, the algorithm completes play-break event detection in one pass)

28

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 8. a-d) Original images, dominant color region detection results for b-e) control space and c-f) primary space

1

1

0.95

0.9

0.95

0.85

0.9 Recall

Recall

0.8

0.75

0.7

0.85

0.65

0.6

0.8

0.55

0.5 0.75

0.8

0.85

0.9

0.95

1

0.75 0.8

0.82

0.84

0.86

0.88

Precision

(a)

0.9 Precision

0.92

0.94

0.96

0.98

(b)

Fig. 9. Precision-Recall plots for detection of a) referee, b) players of interest

29

1

Fig. 10. Examples of referee and players of interest detection for soccer and basketball sequences; detected players from one team and the other are shown in blue and green boxes, respectively, while the referee region is surrounded by a red-colored box

30

Suggest Documents