automatic landmark detection and validation in soccer ...

7 downloads 0 Views 166KB Size Report
project whose objective is the 3D reconstruction of the scene (players, referees, ... The other 3D elements, such as the gallery for instance, do not bring any ...
AUTOMATIC LANDMARK DETECTION AND VALIDATION IN SOCCER VIDEO SEQUENCES Arnaud Le Troter, Sebastien Mavromatis, Jean-Marc Boi and Jean Sequeira LSIS Laboratory (UMR CNRS 6168) - LXAO group University of Marseilles- France [email protected]

Abstract

Landmarks are specific points that can be identified to provide efficient matching processes. Many works have been developed for detecting automatically such landmarks in images: our purpose is not to propose a new approach for such a detection but to validate the detected landmarks in a given context that is the 2D to 3D registration of soccer video sequences. The originality of our approach is that it globally takes into consideration the color and the spatial coherence of the field to provide such a validation. This process is a part of the SIMULFOOT project whose objective is the 3D reconstruction of the scene (players, referees, ball) and its animation as a support for cognitive studies and strategy analysis.

Keywords:

Landmark detection, Recognition area, Classification

1.

Introduction

The SIMULFOOT project developed at the LSIS laboratory provides an interesting framework for theoretical researches in 3D scene analysis and cognitive ergonomics ([1], [2], [3]). It also gives a support for many partnerships with various professionals. This project aims to produce 3D scene reconstruction and animation from video sequences ([4], [5], [6]). 3D scene analysis and registration has been widely studied for many years and decades but, up to now, no general solution has been brought to this problem. Solutions are related to given contexts on which we have some knowledge (a given set of elements, a given geometry,). In the frame of the SIMULFOOT project, we can express the constraints as follows: - Most relevant information is related to a 2D space (the field) as, for instance, the lines drawn on it - Those 3D elements that are relevant have a known relation with the 2D space, as the goal or the players, for instance, who are in 3D space but, relatively to the scale factor global view of the scene - their feet are considered as being on the field

2

INTERNATIONAL CONFERENCE ON COMPUTER VISION AND GRAPHICS - The other 3D elements, such as the gallery for instance, do not bring any information in relation with our problem (we only want to model what happens on the field)

These considerations on the conditions of our problem have to be completed by two remarks: - The field is not of a given color (even if it looks as being "green" or "brown") and some parts of it are hidden (by the players) but the area of the field is characterized by a color coherence and a spatial coherence - The transformation that associates the points on the field in an image (captured with a camera) with the corresponding ones in the model is a projective transformation, i.e. we can express it by means of a (3,3) matrix in the homogeneous space ([7]).

Taking into account all these considerations and remarks we have designed the following approach: - We look for the lines in the image - We characterize the field area in the image - We only keep those lines belonging to the field area - We extract landmarks and we look for the corresponding elements in the model - We compute the transformation matrix

We will not describe the solutions we have brought at each step of the global process (because it could not be done in so few pages) but we will focus on one of these steps that is the landmark validation process, directly related to the field detection.

2.

Landmark detection

Let us consider the following image (Figure 1a). Many straight lines can be detected in it but not all of them are relevant: for instance, the top of the wall and the vertical lines are not significant.

Figure 1a.

A soccer image

Figure 1b. After the use of a Figure 1c. Laplacian Filter ing

After threshold-

Landmarks are usually points. But in this case, they may be points so that straight lines. The reason is that there is an exact correspondence between

Automatic landmark detection and validation in soccer video sequences

3

them (the only available points are intersections of straight lines). Thus, we will talk either of points or lines. We use a classical way for detecting such lines. We compute the lightness component and we apply a Laplacian filter on it as illustrated below (Figure 1b). Then we obtain a binary image after thresholding the result of the previous step (Figure 1c). Finally, we use a "2 to 1" Hough Transform to detect straight lines, using a classical polar description with ρ and θ to represent them (Figure 2a). The final result is the set of straight lines detected in the image (Figure 2b).

Figure 2a.

The Hough Space

Figure 2b.

Straight lines detected

Our goal was not to design an optimal approach for detecting such lines but to effectively detect them. And we do it even if this approach is not original, because these lines appear very clearly in the context of this application.

3.

Field detection and landmark validation

The field detection is the key point of our approach because it provides the mask for landmark validation. The only assumption we make is that the soccer field type of color is majority in the image (i.e. the camera is not oriented toward the stand and does not take a close-up of a player) as illustrated in Figure 1a. Research works have been developed to automatically extract the foreground elements from the background in video sequences, such as those described in [8] and [9]. But the problem mentioned in these papers is quite different: we do not want to characterize the background but to find the area in the image that corresponds to the relevant part of the background. In other words, we want to provide a background segmentation into two parts, the relevant one and the non-relevant one. Let us also mention the works of Vandenbroucke on a method that provide a color space segmentation and classification before using snakes to track the players in video sequences [10]. The ideas developed in this paper are very interesting but they do not fully take advantage of the color and spatial coherence.

4

INTERNATIONAL CONFERENCE ON COMPUTER VISION AND GRAPHICS

For these reasons, we have developed a new approach that integrates all these features. This approach consists in: - Analyzing the pixel distribution in the color space - Characterizing the relevant area in the color space (color coherence) - Selecting the corresponding points in the image - Using a spatial coherence criterion on the selected pixels in the image

All the pixels can be represented in a color space that is a 3D space. Usually, the information captured by a video camera is made of (red, green, blue) quantified values. But this color information can be represented in a more significant space such as the HLS (Hue, Lightness, Saturation) one, for example (it provides the expression of the hue that seems determinant in this context). The first idea we developed was to select points on the basis of a threshold in the "natural" discrete HLS space. "Natural" means that we use the set of values associated with the quantified number of (R,G,B) values. The algorithm is very simple: we count the values occurrences in the HLS space and we keep those with the higher score. It is very simple but it is not efficient at all because of various possible local value distributions. Many improvements have been brought to this algorithm. They are described with many details in [2]: - We define a discrete HLS space - Each pixel representation in this space acts as a potential source so that this process produces a strong interaction between pixels that bring similar color information - A potential value (the sum of these potential) is then assigned to each cell of this discrete HLS space - We select those cells that have the highest potential values and that are connected (to take advantage of the color coherence)

At this step, we have provided a pixel selection only based on a color coherence criterion (Figure 4a). We then use spatial coherence for selecting the field area: - We use a closing to connect the separated elements in the field (e.g. those that are separated by white lines) - We use an opening to cut off the "wrong connections" (e.g. if some green is in the gallery, close to the field) - We select the connected area that contains the most pixels (e.g. to eliminate some green parts in the gallery and only keep the field) - We fill the holes in this area

Automatic landmark detection and validation in soccer video sequences

Figure 3a.

The classical HLS space

Figure 4a. Pixel classifica- Figure 4b. tion using color coherence image

4.

Figure 3b.

5

A discrete HLS space

Main areas in the Figure 4c. selection

Automatic field

Results

As an illustration of the whole process, we consider the initial example. Once we have selected the field area (Figure 4c), we only keep those lines that belongs to it (Figure 5b).

Figure 5a.

The initial image

Figure 5b.

The selected lines

We did not go through a theoretical study of the algorithm complexity. But, as an experimental result, we can say that on a classical laptop computer (less than 1 GHz): - The first process (discrete HLS space analysis) - that is performed only when there are major changes, i.e. at the beginning of the sequence - takes a few seconds - The current selection takes a few milliseconds

6

5.

INTERNATIONAL CONFERENCE ON COMPUTER VISION AND GRAPHICS

conclusion

We have focused our works on the automatic landmark selection and validation because it is the key point of the global registration process. The approach we propose is very fast and robust: it is compatible with a rate of twenty images per second on a basic computer and it gives a right selection even in very tricky situations. In addition, it does not depend on the color, except if there are various colors on wide areas of the field. Solutions to all the other steps have been implemented. They are not optimal but they make the global process run. One problem remains anyway that is automatically finding the logical relation between the detected landmarks and the correspondent model elements: this problem is easy to solve when the camera does not move a lot (i.e. when we are in the same situation) but it would need the use of "Artificial Intelligence" procedures in the general case (i.e. when the camera is moving from one side to the other side of the field).

References [1] S. Mavromatis, J. Baratgin, J. Sequeira, "Reconstruction and simulation of soccer sequences," MIRAGE 2003, Nice, France. [2] S. Mavromatis, J. Baratgin, J. Sequeira, "Analyzing team sport strategies by means of graphical simulation," ICISP 2003, June 2003, Agadir, Morroco. [3] H. Ripoll, "Cognition and decision making in sport," International Perspectives on Sport and Exercise Psychology, S. Serpa, J. Alves, and V. Pataco, Eds. Morgantown, WV: Fitness Information technology, Inc., 1994, pp. 70-77. [4] B. Bebie, "Soccerman : reconstructing soccer games from video sequences," presented at IEEE International Conference on Image Processing, Chicago, 1998. [5] M. Ohno, Shirai, "Tracking players and estimation of the 3D position of a ball in soccer games," presented at IAPR International Conference on Pattern Recognition, Barcelona, 2000. [6] C. Seo, Kim, Hong, "Where are the ball and players ? : soccer game analysis with colorbased tracking and image mosaick," presented at IAPR International Conference on Image Analysis and Processing, Florence, 1997. [7] S. Carvalho, Gattas, "Image-based Modeling Using a Two-step Camera calibration Method," presented at Proceedings of International Symposium on Computer Graphics, Image Processing and Vision, Rio de Janeiro, 1998. [8] A. Amer, A. Mitiche, E. Dubois, "Context independent real-time event recognition: application to key-image extraction." International Conference on Pattern Recognition Ð ˝ QuObec (Canada) Ð August 2002. ´ [9] S. LefRvre, L. Mercier, V. Tiberghien, N. Vincent, "Multiresolution color image segmentation applied to background extraction in outdoor images." IST European Conference on Color in Graphics, Image and Vision, pp. 363-367 Poitiers (France) April 2002. [10] N. Vandenbroucke, L. Macaire, J. Postaire, "Color pixels classification in an hybrid color space." IEEE International conference on Image Processing, pp. 176-180 Ð Chicago Ð 1998.

Suggest Documents