Classical image feature detection, such as edge, blob, and ridge detection, is based on non-linear functions of lin- ear filter responses. We argue that classical ...
Jet Based Feature Classification Martin Lillholm and Kim Steenstrup Pedersen Image Analysis Group IT University of Copenhagen Copenhagen, Denmark {grumse,kimstp}@itu.dk
Abstract In this paper, we investigate to which extent the “raw” mapping of Taylor series coefficients into jet-space can be used as a “language” for describing local image structure in terms of geometrical image features. Based on empirical data from the van Hateren database, we discuss modelling of probability densities for different feature types, calculate feature posterior maps, and finally perform classification or simultaneous feature detection in a Bayesian framework. We introduce the Brownian image model as a generic background class and extend with empirically estimated densities for edges and blobs. We give examples of simultaneous feature detection across scale.
1. Introduction Geometric image features play an essential role in computer vision as the foundation of many algorithms for solving a wide range of vision problems. The early work by Marr [10] emphasises the importance of image features in computer vision. In this paper, we argue for a soft multi-scale image feature detection in which each image point is assigned a probability of being of one of several feature types. Such a soft feature detection would be beneficial in probabilistic higher level processing of images. Soft feature detection has previously been studied by among other Konishi et al. [6] and Laptev et al. [7]. Laptev et al. have applied a feature likelihood map to the problem of tracking. Classical image feature detection, such as edge, blob, and ridge detection, is based on non-linear functions of linear filter responses. We argue that classical feature detection can be substituted with a simple soft classification with linear filter responses as input. The soft classification we propose here is based on a Bayesian rationale. By using a classifier we also get the capability of simultaneous feature classification into several image feature types within one framework.
We will use the coefficients of the truncated Taylor series, known as the k-jet, calculated using scale space image derivatives, as the description of local image structure. To avoid confusion we will use the term feature to denote local image structure and jet to denote the feature vector in the PR sense. As a proof of concept, we will show results on detection of edges and blobs. We introduce the Brownian image model as an analytical model of generic image background. Hence image background is defined as regions with Brownian behaviour, i.e. spatially correlated Gaussian noise. The work by Konishi et al. [6] is related to our work, but differ in that they only focus on edges and use loglikelihood ratios of on/off edge probabilities. Furthermore, they use two filter banks different from the scale space jet. In our opinion, the scale space jet is a more natural language when studying local image structure.
2. Jet Representation Through the notion of generalised functions [4], we can regularise the inherently ill-posed problem of calculating derivatives of discrete images I. Using the Gaussian kernel as test function, gives rise to Gaussian scale space theory [5] where the scale space image L at scale σ is defined as the convolution with the Gaussian kernel G of standard deviation σ: L(x, y; σ) = G(x, y, σ) ∗ I(x, y), σ ∈ S ⊆ R+ . Furthermore, we get scale normalised (see Sec. 4) derivatives [3] of L by convolution with the corresponding scale normalised derivative of the Gaussian kernel: Lxn ym (x, y; σ) = σ n+m ∂xn ym G(x, y; σ) ∗ I(x, y)
(1)
This framework enables us to calculate the coefficients of the truncated Taylor series up to any finite order k for any given point ~x of L. As a characterisation of the local image structure in ~x, we use the k-jet [3], where the k-jet is a functional mapping of scale-space images L(~x; σ) into RN , j k : C ∞ (R2 × R+ ) 7→ RN , with N = (k + 2)(k + 1)/2. As local structure should be invariant wrt. luminance, we omit the zeroth order term and represent local structure as
a point in an N − 1 dimensional feature space — jet-space — resulting in an N − 1 dimensional feature vector of scale normalised derivatives. We will use the shorthand notation jσ (~x) = j k [L](~x; σ) excluding the zeroth order term.
3. Soft Feature Classification The main thesis of the paper is that geometrical image features such as edges, ridges, and blobs are sufficiently separated, in terms of their jet-space representation, such that traditional feature detection can be performed using standard classification techniques. Based on the Bayesian rationale, we argue for a classification based feature detection. Doing so allows us to do simultaneous detection of several feature types. Let us assume that we want to detect a set of features f ∈ F. We will represent the local image structure by the scale normalised jet jσ (~x) as described in Sec. 2. The conditional probability density that the local structure specified by jσ (~x) is of feature type f at scale σ will be denoted by p(f, σ|jσ (~x)) and will be called the posterior probability density. Using Bayes theorem, we can write the posterior as p(jσ (~x)|f, σ)p(f, σ) p(f, σ|jσ (~x)) = . (2) p(jσ (~x)) Here p(jσ (~x)|f, σ) is the likelihood of the image structure and p(f, σ) is the prior distribution of the individual features f at scale σ. Assuming that we can specify the terms on the right hand side of Eq. (2), we locally have a probability distribution p(f, σ|jσ (~x)) on feature types f and scales σ. We can now detect features and their inner scales by maximising the posˆσ terior, (f, ˆ ) = arg max(f,σ)∈F ×S p(f, σ|jσ (~x)). In the discrete setting, the classification of features and selection of scale boils down to classification of the elements of the discrete set (f, σ) ∈ F × S0 , where S0 is a discrete (countable) subset of scales S.
4. Modelling Image Features Having defined the framework, we would now have to either specify the likelihood terms for the individual features as well as the feature prior or directly specify the posterior. We are in both cases faced with the choice of either modelling the densities and using parametric estimation techniques, or using purely non-parametric density estimation. To demonstrate the method we will do experiments based on background, edge, and blob feature types. We introduce an analytical background model by specifying a background likelihood term and extend with samples that implicitly model known edge and blob behaviour. In this paper, we choose to estimate the feature posteriors based on a k-nearest-neighbour (KNN) estimate (see e.g. [1]). We can estimate the joint probability density
p(f, σ, jσ (~x)) by using the KNN rule. Assume we have a labelled set of jets, each labelled according to a feature f and a scale σ. The density at jσ (~x) can be estimated by finding the k-nearest-neighbours and counting the number kf,σ of neighbours labelled as feature f and at scale σ. The posterior density p(f, σ|jσ (~x)) can be written in terms of the joint density p(f, σ, jσ (~x)) as p(f, σ, jσ (~x)) kf,σ P = . k p(f, σ, j (~ x )) 0 σ f ∈F σ∈S (3) We use the Euclidean metric in jet space in the KNN estimate. This is justified by the fact that we use scale normalise derivatives, hence all derivatives are dimensionless and comparable. p(f, σ|jσ (~x)) = P
4.1. The Background Model Early studies [2] of natural image statistics focused on the correlation between pixels and showed that natural images have a power law behaviour of the power spectrum indicating a self similar, and in some situations scale-invariant behaviour of the second order statistics. Classes of natural images with wide variation of motif can be shown to have a scale invariant second order statistics which can be modelled by the Brownian image model (see e.g. [12]). We introduce the Brownian image model as a model of generic image background. The purpose of this is twofold: i) to investigate the Brownian image model’s ability to model “image background” and ii) to model the background as an explicit class instead of everything non-feature like. Mapping the Brownian model into jet space results in a scale invariant Gaussian distribution with an anisotropic covariance structure centred at the origin. This results in flat regions being more likely as background than other regions. A study of the Brownian model in scale space can be found in [12]. The likelihood for the background feature class f = B can therefore be written as a Gaussian density 1 1 T −1 p(jσ (~x)|B, σ) = · exp − jσ (~x)Σ jσ (~x) , (4) Z 2 where |Σ| denotes the determinant of the covariance matrix Σ and Z is a normalisation constant. An analytic expression for the covariance Σ can be found in [12].
4.2. Modelling Edges and Blobs As described in the beginning of this section, we have chosen to model edges and blobs implicitly using samples from known edge and blob models/detectors. Specifically, we detect edges and blobs in a large image database and use the k-jet at the detected points as training set for the KNN-classifier. We use Lindeberg’s scale space edge and
blobs detectors [8, 9], where edges are defined as maxima spatially and across scale of the scale normalised gradient magnitude, and blobs as maxima spatially and across scale of the scale normalised Laplacean. As both these detectors are multi-scale by design, the resulting feature samples span several scales in accordance with the classification framework in Sec. 3.
5. Experimental Results The KNN-classifier is trained using a subset (excluding images suffering from saturation artifacts, motion blur, major focus problems) of the van Hateren Database [13] of natural images, resulting in 1500 images. For each of these, edges and blobs are detected in the central 512x512 region discarding features that suffer from boundary effects. For each scale σ, we estimate the prior as the relationship between the number of detected edges and blobs and the total number of pixels (5122 × 1500). The actual number of edge sample points, blobs sample points, and background points sampled from the analytical (Eq. (4)) likelihood function reflects this scale-dependent prior; on average the combined edge, blob, and background sample contain 106 points. Finally, we use the scale-normalised 3-jet as suitable compromise between descriptive power and computational effort, as our feature vector. In this section, we give a few examples of soft and hard classification of images into edge-, blob-, and backgroundlike regions using the described setup. The top row in Fig. 1 contains an artificial image with both isotropic and anisotropic Gaussian blobs and a blurred step edge. The second image is an example of an everyday scene with many different kinds of structure. Second, third, and fourth row contain the background posterior, blob posterior, and edge posterior respectively. The final row contains the resulting hard classifications, where dark gray, light gray, and gray indicate background-, blob-, and edge-like regions respectively. For the artificial image, we see that both the edge and blob posteriors behave as one would expect around the edge and the centre of the blobs. Likewise for the background model, that capture the flat regions as expected — corresponding observations hold for the hard classification where both the edge and the blob centres are classified as one would expect. The very broad edge band around the centre of the blobs is, however, somewhat surprising. The presence of edge-like structure is expected as each Gaussian blob is naturally delineated by an edge but as all the blobs are quite flat, these “ramps” toward their centre does naturally contain more edge-like (first order) structure than typical blob and background points and for lack of a better feature class, they are labelled as edge like points. Finally, the blob posterior has a ringing effect at the outer periphery of the blobs and a corresponding “echo” at the boundary of the edge region; these are regions with pronounced second order struc-
Figure 1. Examples of two images and their feature posteriors calculated using five scales.
ture and are labelled as blob-like. For the second image, similar observations hold but are not as pronounced; primarily because of the higher granularity of the image. The background region seems rea-
sonable; especially the distinction between the two persons, where the woman has coarse scale structure in her top and ridge-like highlights in the calves. An important observation is the connectedness of the edge like regions in general and specifically around T-junctions. In Fig. 2, we give an example of using the edge posterior as the basis for calculating an actual edge map. Top left is the original image, top right the edge posterior map masked with the hard classification. Superimposed in red, Lindeberg edges calculated for the original image. Bottom left, Canny edges for the original image without hysteresis, and finally an edge map calculated from the edge posterior using the watershed transform. This example is only calculated for one scale to facilitate comparison and does not demonstrate the full potential of our method (or any of the two others for that matter) but again the main point is that, as for the posterior, the edge map is, in general, well connected, specifically around T-junctions. Finally, the small dot-like artifacts in the bottom right edge map are due to suboptimal watershed calculations and not the quality of the posterior. In a similar, but simpler, fashion the blob posterior can be used to identify actual blob centres as spatial maxima corresponding to the output of a traditional blob detector.
cluding a generic background class using a KNN-classifier, implicit feature models, and training data from a database of natural images. Examples have been given using edges and blobs but the scheme is in principle extendible to any other feature type such as ridges and corners. Furthermore, actual feature maps have been calculated from the posterior maps. Obviously, a more formal evaluation similar to that presented in e.g. [6] must be carried out to verify the quality of the method. Another interesting development would be to use human “ground truth” training sets such the segmentation database presented in [11] in place of the models. Conversely, there is much potential space and time saved (and insight gained) by developing parametric feature models in jet space to take the place of large feature samples. Finally, the current scale selection scheme is rather naive and could be extended to allow for more than one classification over scale to enable detection of e.g. two concentric blobs of different size; very much along the line of thought of Lindeberg [9].
Acknowledgements This work is part of the DSSCV project sponsored by the IST Programme of the European Union (IST-2001-35443).
References
Figure 2. An example of extracting an edge map from the edge posterior compared to Canny and Lindeberg edges. Calculated at a single scale.
6. Discussion and Summary We have presented a simple multi-scale scheme for classifying image points into one of several feature types in-
[1] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, 2nd edition, 2001. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Optic. Soc. of Am., 4(12):2379–2394, 1987. [3] L. Florack. Image Structure. Kluwer, Dordrecht, 1997. [4] G. Friedlander and M. Joshi. Introduction to The Theory of Distributions. Cambridge University Press, 1998. [5] J. J. Koenderink. The structure of images. Biological Cybernetics, 50:363–370, 1984. [6] S. Konishi, A. L. Yuille, J. M. Coughlan, and S. C. Zhu. Statistical edge detection: Learning and evaluating edge cues. IEEE T-PAMI, 25(1):57–74, 2003. [7] I. Laptev and T. Lindeberg. A distance measure and a feature likelihood map concept for scale-invariant model matching. IJCV, 52(2/3):97–120, 2003. [8] T. Lindeberg. Edge detection and ridge detection with automatic scale selection. IJCV, 30(2):117–154, 1998. [9] T. Lindeberg. Feature detection with automatic scale selection. IJCV, 30(2):79–116, 1998. [10] D. Marr. Vision. W. H. Freeman, New York, 1982. [11] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV01, pages II: 416–423, 2001. [12] K. S. Pedersen. Properties of brownian image models in scale-space. In Proceedings of the 4th Scale-Space conference, LNCS 2695, pages 281–296, Isle of Skye, Scotland, June 2003. [13] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proc. R. Soc. Lond. Series B, 265:359 – 366, 1998.