ploitation of Support Vector Machines (SVM) for classifica- tion in real application .... Finally, we show the performances of the adopted tech- niques in the ...
Object detection in images: run-time complexity and parameter selection of Support Vector Machines N. Ancona, G. Cicirelli, E. Stella and A. Distante Istituto Elaborazione Segnali ed Immagini - C.N.R. Via Amendola 166/5 - 70126 Bari - Italy e-mail: ancona,grace,stella,distante @iesi.ba.cnr.it
Abstract In this paper we address two aspects related to the exploitation of Support Vector Machines (SVM) for classification in real application domains, such as the detection of objects in images. The first one concerns the reduction of the run-time complexity of a reference classifier, without increasing its generalization error. In fact we show that the complexity in test phase can be reduced by training SVM classifiers on a new set of features obtained by using Principal Component Analysis (PCA). Moreover, due to the small number of features involved, we explicitly map the new input space in the feature space induced by the adopted kernel function. Since the classifier is simply a hyperplane in the feature space, then the classification of a new pattern involves only the computation of a dot product between the normal to the hyperplane and the pattern. The second issue concerns the problem of parameter selection. In particular we show that the Receiver Operating Characteristic (ROC) curves, measured on a suitable validation set, are effective for selecting, among the classifiers the machine implements, the one having performances similar to the reference classifier. We address these two issues for the particular application of detecting goals during a football match.
1. Object detection and classification
spect to the football ground, the problem of goal detection can be reduced to the problem of detecting the ball in images of the goalmouth [2]. The problem of object detection can be seen as a learning from examples problem in which the examples are particular views of the object we are interested to detect. In particular, object detection can be seen as a classification problem, because our ultimate goal is to determine a separating surface, optimal under certain conditions, which is able to separate object views from image patterns that are not instances of the object. So in this perspective the data to classify are image patches represented by vectors which, in general, live in spaces with a very high number of dimensions, for example equal to the number of pixels in the patch. Moreover the detection of objects in images requires an exhaustive search of the current image, that is all the patches with a given dimension has to be classified in the image. The size of the data to classify and the need of scanning exhaustively the image for looking for the candidate object show that the problem of object detection in images is a time consuming task and so some strategy for reducing the complexity of the classifier has to be taken into account for handling real contexts.
2. Motivation In this paper we address two issues concerning the exploitation of learning machines in real application domains. In particular, given a small set of noisy input-output pairs , and , and defined a hypothesis space , that is the set of functions the machine implements [6], we focus on the problem of i) determining the “best” parameters for the given model and ii) reducing the run-time complexity of the selected machine. , We address both questions for classification, and, in particular, in the context of Support Vector Machine (SVM) [11]. The problem of the reduction of the run-time computational complexity concerns the problem of the time required by the learning machine for classifying unseen pat
The problem of object detection in images is the problem of detecting three dimensional objects in the scene by using the image projected by the object on the sensing plane of a standard camera. Face detection is one of the most interesting applications of object detection in images [10]. An other interesting application, which is getting particular attention from referee associations, sport press and supporters [9, 7, 5], concerns the problem of detecting goals during a football match by using images acquired by a standard TV camera. For an appropriate position of the camera with re-
1051-4651/02 $17.00 (c) 2002 IEEE
$
&
(
(
terns. So it is strictly connected to the computational time in test phase. The problem of parameter selection concerns the problem of determining, given and , the function having the best generalization capacity, that is the capacity of the machine to correctly classify patterns never seen before. In the case of few data, leave one out (LOO) procedure is a suitable scheme for parameter selection. The technique consists of selecting the model parameters minimizing the LOO error, which is an estimate of the generalization error of the learning machine. An alternative approach foresees the exploitation of a validation set and selects the model which minimizes the number of misclassified patterns belonging to the validation set. An appropriate model for analysing the performances of a classifier is the ROC curve [4] which shows for a given percentage of TP patterns, the percentage of FP patterns detected by the classifier1 . An ideal classifier should be described by a ROC curve equal to a unit step function. The issues of complexity reduction and parameter selection in the framework of classification can be unified in the following problem: let us suppose to have a classifier trained on a training set , with performances measured on a given test set and which requires a time for classifying a new pattern. Determine a new classifier trained on having performances measured on and for classifing a new pattern. requiring a time In this paper we use SVM for constructing the reference classifier, as described in section 3, Moreover, we use PCA for reducing the dimension of the space where the input patterns live and we perform an exhaustive search of the parameter space for determining the classifier, among the ones belonging to a well defined , with the best generalization capabilities measured in terms of ROC curves (see section 4). Finally, we show the performances of the adopted techniques in the particular application of detecting goals during a football match (see section 5).
$
,
of 900 images, with a size of . Each test image was exhaustively scanned and all the sub-images with were classified as instance of the football size or not. The figure 2 shows typical images used for testing. For better understanding the performances of the classifier, we analyzed all the test images checking for the visibility of the football, before the classification process. We counted the images in which the football was visible, occluded and partially occluded, with occlusion less than or greater than . Row 1 in table 1 shows the performances of . The per image. time required for test was P
:
;
Q
R
?
?
W
;
X
,
P
;
Y
Z
\
Error rate on test set Visible FN FP 1.7 0.2 1.7 1.2 1.4 1.2 1.7 0.7 1.7 0.7
1 2 3 4 5
Occl. 50% FN FP 48.1 3.8 39.9 7.7 42.1 10.4 45.4 5.5 43.7 10.9
Occl. 50% FN FP 99.2 5.0 95.8 22.5 98.3 24.2 99.2 10.8 98.3 23.3
b
^
_
`
Absent FP 5.2 17.9 15.6 6.9 14.5
c
Total FN FP 23.8 2.6 21.7 8.6 22.3 8.9 23.2 4.2 22.8 8.4
Table 1. Performances of different classifiers.
-
0
.
1
.
,
2
4. Feature reduction and parameter selection
-
3
1
3
1
.
.
-
.
0
5
.
8
$
3. Reference ball detector
A possible approach for reducing the computational time required for testing is to reduce the number of components to work with, that is to reduce the dimension of the space in which the examples live. In [12] feature reduction is accomplished selecting a suitable subset of the input features, in particular the features among the input ones minimizing an upper bound of the LOO error. An alternative approach (see [8] for example) consists of generating a new set of features by linearly combining the original features, and selecting the new ones maximizing the performances of the classifier measured in terms of ROC curves. In this paper we investigate the second strategy and use PCA for generating a new set of uncorrelated features from the examples , as follows: \
D
For the construction of the reference classifier [2], hereafter denoted by , we used 2004 manually selected positive examples (see fig. 1) and 7971 negative examples, selected . Masking and according to [3], with a sixe of histogram equalization reduced the size of each examples to 172 components. For obtaining , we trained an SVM with kernel function and regularization parameter . Notice that is the feature space induced by the adopted kernel function, and in this case the features are monomials of degree less than or equal to 2. The performances of were measured on a test set
&
e
K
h
i
(
j
,
:
;
?
,
D
(
H
K
D
N
A
O
:
;
;
$
,
Z
:
l
n
(1)
j
n
o
h
\
q
e
q
positive (TP) patterns are positive patterns correctly classified by the machine. False positive (FP) patterns are negative patterns classified as positive by the machine.
l
where and , are the non negative eigenvalues and orthonormal eigenvectors of the covariance matrix respectively, and is the sample mean. Two issues are addressed in this paper regarding PCA. The first concerns what examples have to be taken into account in the computation of . In fact, we can envisage of using all the examples in , as suggested in [8], or of using only the positive examples. The second issue regards the role of the scaling factor in (1). In particular, if each feature is scaled with as in (1) then is considered to be a realization of a random vector distributed according to the j
1 True
l
j
1051-4651/02 $17.00 (c) 2002 IEEE
D
s
t
Normal law . Otherwise, if the scaling factor is not taken into account in the computation of the new features, where is the diagothen is considered to be nal matrix having the on the main diagonal. So, in some sense, the scaling factor influences the shape of the feature space2 . The choice of the final classifier is influenced by a set of parameters. In terms of SVM, a classifier is parametrized by the adopted kernel function and by the regularization term . In all of our experiments, we have fixed the kernel for avoiding overfitting because to the number of examples in the training set is less of the feature space than the dimensionality induced by . As a consequence, the free parameters to determine are and the number of principal components to use for generating . The analysis of these two parameters is influenced by different factors. The types of training examples used for computing the eigenvectors and the eigenvalues of (positive examples only or all the examples), and the exploitation of the scaling factor used in (1). For determining the optimal values of and in all these cases, we used the same training set used for training , and we generated a validation set subsampling the entire test set of 900 images in the following way. All the 424 occlusion free football patterns were included in the new test set togheter with the 29179 non football patterns more similar to football patterns. In particular, a non football pat. So, tern was included in the new test set if the cardinality of the validation set was 29603 patterns3 . r
D
s
y
y
r
j
A
O
D
(
H
K
D
N
A
W
(
(
W
W
;
$
A
n
O
D
q
n
O
,
&
l
W
,
;
:
classifier with the largest margin and then with the best, in principle, generalization capacity. The ROC curves of the , for and , and selected classifier of measured on the entire test set by using only images in which the football is fully visible are shown in figure 3. measured Row 2 in table 1 shows the performances of on the entire test set containing 900 images. We repeated the experiments and in this case we used all the examples for computing as before, but we did not scale each component with the corresponding . Also in this case we by using . obtained the best classifier for Row 3 of table 1 shows the performances of on the entire test set. The analysis of the ROC curves measured on the reduced test set shows that in general, as it is obincrease proportionally to vious, the performances of . Moreover, for a fixed , the performances of does not change too match changing . Comparing rows 2 and 3 in table 1, we note that, number of used components being equal, to scale with does not influence the performances of the classifier. Moreover, comparing rows 1 and 2 in table 1, we note that even if the ROC curves of and computed both on the reduced test set (not reported here) and on the entire test set (see figure 3) are very similar, are worse than mainly due to the performances of the increased number of FPs. We repeated the experiments this time considering positive examples only computing . Also in this case we tested the influence on the classification performances of the scaling. Since the performances were not influenced by this factor, we illustrate here the results when any scaling is done on the new features. We obtained by using . Row 4 of the best classifier for on the entire test table 1 shows the performances of set. Comparing rows 3 and 4 of table 1, we note that the last classifier has the same performances in terms of detected FNs, but it performs better in terms of detected FPs. Two factors can influence this results: the number of used components (60 vs. 40) and the type of examples considered for computing . At this aim, we repeated the experiment considering all the examples for computing , selecting with and . Row 5 of table 1 shows the measured on the entire test set. Comperformances of paring rows 4 and 5 of table 1, we note that, number of used components being equal, the classifier trained on features obtained considering only positive examples for computing performs better than the other one, especially in terms of FP rate when the football is partly occluded or totally absent. Finally, its performances are comparable with the ones of the reference classifier (compare rows 4 and 1 in table 1).
,
2
,
D
2
n
O
q
j
D
O
l
(
n
;
;
(
;
D
Q
;
;
X
n
,
2
O
n
,
n
R
;
O
O
2 The feature space where the vectors live has not to be confused with the feature space induced by the kernel in SVM. 3 Notice that if a pattern is inside the margin between the two classes . So, the choice of including in the validation set then negative patterns such that is equivalent to consider both non football patterns close to the boundary between the classes and misclassified negative patterns.
;
2
q
j
n
R
;
2
,
2
,
n
(
O
;
n
,
2
O
j
,
,
2
2
,
,
q
;
Due to space constraints, we report the main results only. We refer the reader to [1] for a more detailed description. the classifier trained on Hereafter, we denote with by using and . In the first set of experiments, we used all the examples in the training set for computing and as in eq. (1). we scaled each with the corresponding We computed the ROC curves of SVM classifiers trained and . The first on varying 80 components of capture the of the whole variance of the training set. Comparing the ROC curves relative to varying and with the ROC curve relative to , we found that was the minimum value of which provides a high degree of similarity. Behaviours being equal of the ROC curves for different values, we selected the one relative to the smallest . This is equivalent to select the
R
O
,
,
n
(
;
,
n
5. Experimental results
l
O
2
R
;
q
q
n
,
2
;
,
O
2
:
;
q
In SVM, when the feature space induced by is finite dimensional, then it is possible to explicitly represent the classifier as a hyperplane in . For an input space of components, as in our case, has a dimension . In , the classification of a new pattern involves the $
A
$
n
;
(
Q
\
$
(
$
1051-4651/02 $17.00 (c) 2002 IEEE
ª
¬
®
¯
K
H
°
evaluation of the decision function , where . So, mapping explicitly the input space the time required for in the feature space reduces to exhaustively scanning an image, 60 times faster than . ¯
³
P
;
´
µ
¶
Figure 1. Image patterns of football.
,
6. Conclusions In this paper the reduction of the computational complexity and the parameter selection of an SVM for classification in the general framework of object detection in images have been addressed. The results are relative to the particular application of detecting goals during a football match. Future work will focus on the selection and computation of the relevant features for the problem at hand.
References
Figure 2. Images used for testing. [10] K. Sung and T. Poggio. Example-based learning for viewbased human face detection. Technical Report A.I. Memo No. 1521, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1994. [11] V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995. [12] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for svms. In Advances in Neural Information Processing Systems, volume NIPS 13, pages 668–674, 2000. ROC curves on the test set without occlusions. PCA from all examples, with scaling. 1 C=200 C=0.1 N=40 0.95
0.9
TP rate
[1] N. Ancona. Complexity reduction and parameter selection in support vector machines. Technical Report R.I.-IESI/CNR-Nr.03/2001, Istituto Elaborazioni Segnali ed Immagini - Consiglio Nazionale delle Ricerche, Bari, Italy, 2001. (available at the URL: http://www.iesi.ba.cnr.it/users/ancona/). [2] N. Ancona, G. Cicirelli, A. Branca, and A. Distante. Goal detection in football by using support vector machines for classification. In Proceedings of the International Joint Conference on Neural Networks, Washington, DC, July 15-19 2001. [3] A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271, 1997. [4] A. P. Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:1145–1159, 1997. [5] G. Cicirelli, N. Ancona, G. Attolico, and A. Distante. Object positioning by projective properties. In Proceedings of the IASTED International Conference on Applied Simulation and Modeling, pages 52–57, September 2001. [6] T. Evgenious, M. Pontil, and T. Poggio. A unified framework for regularization networks and support vector machines. Technical Report A.I. Memo No. 1654, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1999. [7] A. Haas, R. Maierhofer, and R. Sendlhofer. Goalwatcher: a system for the automatic detection of the ball position in soccer games. In Proceedings of SPIE Vol. 4567 Machine Vision and Three-Dimensional Imaging Systems for Inspection and Metrology II, Boston, Massachusetts, 28 October 2 November 2001. [8] B. Heisele, T. Poggio, and M. Pontil. Face detection in still gray images. Technical Report A.I. Memo No. 1687, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 2000. [9] I. Reid and A. Zisserman. Goal-directed video metrology. In 4th European Conference on Computer Vision ’96, Cambridge, April 1996.
¸
0.85
0.8
0.75
0.7 0
0.001
0.002
0.003
0.004
0.005 FP rate
0.006
0.007
0.008
l
0.009 (
Figure 3. ROC curves of for and (dashed line) and of (solid line) measured on the entire test set. ,
n
2
R
;
1051-4651/02 $17.00 (c) 2002 IEEE
,
O
;
0.01