Applying Feature Selection Techniques for Visual Dictionary Creation

0 downloads 0 Views 198KB Size Report
In literature, the visual dictionary is often created from a large candidate set of features by random selection or by a clustering algorithm. We apply techniques ...
Applying Feature Selection Techniques for Visual Dictionary Creation in Object Classification I.M. Creusen1 , R.G.J. Wijnhoven1,2 , and P.H.N. de With1,3 1 Video Coding and Architecture group, Eindhoven University of Technology, The Netherlands 2 ViNotion B.V., Eindhoven, The Netherlands 3 Cyclomedia Technology B.V., Waardenburg, The Netherlands Abstract— This paper introduces improved methods for visual dictionary creation in an object classification system. In literature, the visual dictionary is often created from a large candidate set of features by random selection or by a clustering algorithm. We apply techniques from feature selection literature to create a more optimal visual dictionary and contribute with a novel feature selection algorithm. As a second step, feature extraction techniques for creating the candidate set are investigated. Subsequently, the size of the candidate set is varied. It was found that the exploitation of feature selection techniques gives a clear improvement of 2-5% in classification rate at no additional computational cost in normal system operation. The proposed algorithm called extremal optimization, outperforms state-of-the-art algorithms. The paper discloses results on candidate set creation using interest point operators. As a general bonus, the evaluated feature selection techniques are generally applicable to any problem that uses a dictionary of features, as typically applied in the object recognition domain. Keywords: object recognition, feature evaluation and selection

1. Introduction With the ever increasing number of installed cameras, video surveillance personnel can be effectively assisted by extracting useful information from each video stream. Stateof-the-art camera systems for video surveillance detect and track key objects in the monitored scene. Towards full scene understanding, recognition of these tracked objects is key. Given a set of object classes and an image containing one object, the task of object categorization is to determine the correct object class label of the visualized object. The operation of object categorization systems is divided in two phases: training and testing. During training, the system learns from the training set, consisting of a number of example images for each object class. The performance of the algorithm is determined as the percentage of correctly labeled objects from the test set, averaged over all object classes. This paper concentrates on feature selection in object categorization and shows that this concept contributes significantly to an improved classification score. Classification of objects within images has been studied in earlier work. Early work by Agarwal et al. [1] uses

a visual dictionary for car detection. The bag-of-words model for object classification was recently pioneered by Csurka et al. in [2], and has received much attention [3], [4]. These models compare small object parts of the input image to the set of known object parts, called the visual dictionary. A common feature in this work is the method for constructing the dictionary, typically done by applying a clustering algorithm on features extracted from the training set. A biologically plausible object recognition framework (HMAX) was introduced by Riesenhuber and Poggio [5], recently optimized by Serre et al. [6]. Moreno et al. [7] have shown that HMAX performs slightly better in a categorization task than SIFT [8]. An interesting aspect of the HMAX system is that the visual dictionary is created by a different technique: features are extracted from random locations in images of natural scenery. Both the random and clustering techniques for constructing the dictionary do not optimize the dictionary specifically for the classification task. We have used the HMAX object classification system proposed by Serre et al. [6] as a starting point, which was explored by Wijnhoven and De With in [9]. The input image is filtered using Gabor filters at different scales and orientations and the result is subsampled using a local MAX-operator. For each dictionary feature the best match in the input image is stored in the feature vector. The classifier uses this vector to learn and determine the true object class. The computational complexity of labeling an unknown object depends linearly on the number of visual dictionary words. For an embedded camera implementation, the available computation power is strictly limited. In order to reduce this computational cost, we have optimized this system by creating a more discriminative dictionary with less visual words by applying feature selection techniques. Starting with a large candidate set of features, which are randomly extracted from images in the training set, the selection algorithms select the most distinctive features. For categorization, we compare the results of three common feature selection techniques. The investigated techniques outperform the random and clustering methods. Moreover, we adopt a new algorithm from optimization literature and exploit this successfully for dictionary creation. The visual dictionary is created during the training phase. This is an offline process without real-time constraints. Therefore, the

computational complexity of the dictionary creation process is not taken into account. Two additional aspects of visual dictionary creation have been explored for creation of the candidate set from which features are selected. First, we compare interest point operators with random selection for creating the candidate set. Second, we investigate the effect of creating candidate sets of different sizes. The paper is organized as follows. Related work with respect to feature selection is presented in Section 2. Section 3 describes the object classification system in more detail. Section 4 explains the feature selection techniques used for visual dictionary creation, and introduces a novel feature selection algorithm. Several experiments to optimize the visual dictionary are performed in Section 5, using feature selection techniques. A significant performance gain over popular methods from literature is demonstrated. Conclusions can be found in Section 6.

2. Related Work Bluma and Langley [10] give an introduction to the problem of feature selection as selecting a sparse subset of relevant features from a large candidate set. Several publications have proposed feature selection methods within the field of object recognition. Since the work of Viola and Jones [11], boosting methods have gained popularity. In this proposal, a small sub-set of visual features is selected from a larger set using a boosting algorithm. Fleuret [12] proposes the Conditional Mutual Information Maximization criterion in a forwardselection algorithm and shows that it outperforms AdaBoost and SVM-based feature-selection. Dorko and Schmid [13] compare different feature selection techniques based on likelihood ratio and mutual information, using both the Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers. Mutch and Lowe [14] propose to extend the HMAX algorithm from [6] with SVM-based feature selection as proposed by Weston et al. [15] and show a significant performance gain for marginal loss in classification power. Sun et al. [16] compare different feature sub-set selection techniques for the purpose of object detection using features extracted with PCA. This approach compares manually selected features with Sequential Backward Floating Selection (SBFS) and selection based on a genetic algorithm. It is shown that SBFS outperforms manual selection. However, the genetic algorithm performs best. Jurie and Triggs [17] investigate the visual dictionary creation process for both object detection and categorization. They show that k-means clustering is not suitable for visual dictionary creation and propose a new clustering method that significantly outperforms k-means in a detection context. In addition, the following feature selection methods are

compared: Odds-Ratio, Mutual Information and a SVMbased method. In most of their experiments the SVM-based method outperforms the others. This paper is close to the work of Sun et al and Jurie and Triggs. However, whereas Sun et al. evaluate feature selection for detection, we apply selection for multi-class classification. Compared to Jurie and Triggs we use different feature selection methods to solve a similar problem. In contrast to their work we focus more on search-space exploration using iterative methods which continuously adjust the set of selected features. Their work focuses on the feature-ranking process (see Section 4), and is therefore complementary to our work. Furthermore, we propose a new algorithm, called Extremal Optimization [18], that outperforms state-of-the-art algorithms in our experiments. For comparison, we evaluate random selection and cluster-size selection with the following different selection techniques: Backward Selection, Genetic selection [19] and Extremal Optimization [18].

3. System Training: Dictionary Creation The visual dictionary creation process is visualized in Figure 1. First, features are extracted from the training set, around selected points. The total set of extracted features defines the candidate set. This candidate set is input for the feature selection algorithm, that selects the features for the visual dictionary. As a first step, points are selected by an Interest Point Operator (IPO), at random or on a uniform grid. For object classification, the resulting features should distinguish between the object classes. Since IPOs were originally designed for a different goal, we investigate the performance of state-of-the-art IPOs in this specific application. Feature selection

Train images

Test images

Point selection

Local description

Visual dictionary

Feature matching

Classifier

Object Class

Fig. 1: Creating the visual dictionary.

4. Feature Selection In order to optimize the visual dictionary of the object classification system, feature selection techniques are applied. The aim is to select the most appropriate features for the classification task and to eliminate irrelevant, noisy, or redundant features. When feature-selection performs well, the task of the classifier becomes simpler, so that its performance will increase. The process of feature selection, within the scope op visual dictionary creation, is shown in Figure 2. Initially,

the candidate set of 10k or more features is created by extracting features from the training set. Feature selection is an iterative process. A subset of features is selected by the Search Algorithm, where the quality of this subset is evaluated by the Evaluation Function, and a Stopping Criterion. Finally, the best performing dictionary is stored as the visual dictionary.

Train Images

Create Candidate Set

Search algorithm

Evaluation function

Visual Dictionary

Stopping criterion

Fig. 2: Overview of the feature selection process. Selecting the appropriate features is not a straightforward process and extensively discussed in corresponding literature [20]. If we consider features individually, it is likely that the selected features do not complement each other well. Hence, they should be considered as a group. Features that are poor for classification when considered individually can provide a significant performance gain when used in combination with others. In general, selecting a smaller subset from a candidate set of size N can be done in 2N different ways. Therefore, considering all possible subsets is computationally impossible. However, we can use a search algorithm to search for a limited amount of subsets. In this paper, the existing search algorithms backward selection and genetic algorithm [19] are used. In this framework, we contribute with a new algorithm, called extremal optimization [18]. The suitability of a group of features as a visual dictionary is evaluated with an evaluation function. There are several ways to do this. In this paper, the wrapper method is used, which trains and tests a classification algorithm to perform the evaluation. K-fold cross-validation is applied with K=10 [21]. This method is accurate, but can be computationally expensive. Alternatively, filter or embedded methods can be used, but these are not considered. Some search algorithms make use of a feature ranking algorithm to determine the relative importance of the feature dimensions. These algorithms assign a weight to each feature to determine its usefulness compared to the others. In this paper, the ReliefF algorithm by Konenko [22] is used for this purpose. This algorithm uses conditional dependencies between features in the ranking process. A different approach for visual dictionary creation is to use a clustering algorithm. Similar features are grouped together and represented by their average. Teynor and Burkhardt [23] compare several clustering algorithms for visual dictionary creation. It is shown that the Modified Basic Sequential Algorithmic Scheme (MBSAS) from [24] performs similar to other clustering algorithms but has a

much lower computational complexity. We compare MBSAS with the other feature selection techniques.

4.1 Backward selection Backward selection is a simple iterative heuristic search method. Starting with all features, in each iteration, a feature ranking algorithm determines the relative importance of the features. A number of the lowest ranking features is discarded. The algorithms keeps iteratively removing features until the desired number of features remains. For feature ranking, we use the ReliefF [22] algorithm.

4.2 Genetic algorithms Genetic algorithms use techniques inspired by evolutionary biology, such as mutation, inheritance and crossover. They were originally introduced by Goldberg in [25], and are currently popular in feature selection literature [19]. A genetic algorithm uses a pool of potential solutions, referred to as a population and each solution is referred to as an individual. Each individual represents a visual dictionary. An initial population of solutions is created at random. A fitness function estimates the classification performance of an individual. The most fit individuals of the current population are used to create the next population. Each individual is defined by a set of genes, representing the set of features in the visual dictionary. A fixed number of generations is simulated, and the algorithm keeps track of the best solution encountered. We have used the GAlib1 implementation.

4.3 Extremal optimization In [18], Boettcher and Percus proposed the extremal optimization algorithm. This algorithm was inspired by self-organized critical models of co-evolution, such as the Bak-Sneppen model [26]. It has been successfully applied to hard optimization problems like the traveling salesman problem. Unlike genetic algorithms, extremal optimization uses only one solution and iteratively modifies this single solution. Another difference with most search/optimization algorithms is its simplicity and the complete absence of tunable parameters. The application of this algorithm to the problem of feature selection is novel to the knowledge of the authors. Moreover, it has never been applied in object recognition. The extremal optimization feature selection algorithm proceeds in the following steps. 1) Create an initial solution: randomly or using a feature selection method. 2) Determine the fitness of the current solution. 3) Rank the features in the current solution (ReliefF [22]). 1 GAlib

genetic algorithm package, written by Matthew Wall at MIT

5. Experimental Results In this paper different techniques are compared to optimize the visual dictionary with respect to classification performance. Our system is evaluated using the Caltech-5 dataset2 and the video-surveillance dataset from [9]. The Caltech dataset contains single-viewpoint images of the object classes motor, face, plane, leaf and car-back. The videosurveillance dataset contains images of 13 object classes from different viewpoints, extracted from an hour of video of a traffic intersection. The images have been converted to grayscale and scaled to 140 pixels in height while preserving the aspect ratio. Per class, 30 images are used for training and the remainder is used for testing. The final classification scores are averaged over five random train/test set divisions using a nearest neighbour classifier. The number of feature selection iterations has been empirically set to a level, such that additional iterations did not improve the performance further.

Fig. 3: Example images from the Caltech-5 dataset (top) and the video-surveillances dataset (bottom).

5.1 Feature selection for different dictionary sizes In the first experiment, we compare different feature selection techniques. Dictionaries of different sizes are created, to investigate the trade-off between the number of features and classification performance. A candidate set of approximately 10,000 features is created, extracted at random locations from images of natural scenery. The choice of extracting from natural images is motivated by the HMAX system [6]. Especially with a small number of training samples, extraction from natural images has superior performance. In 2 http://www.robots.ox.ac.uk/∼vgg/data/data-cats.html

Classification performance

5) Repeat the process from Step 2 onwards for a fixed number of iterations. The algorithm tracks the best solution (evaluation score).

0.9 0.85 0.8

Extremal Optimization Genetic Algorithm Backward Selection Random Cluster size

0.75 0.7 0.65 0

50

100

150

200

250

300

350

400

450

500

Number of selected features

Fig. 4: Feature selection performance (Caltech-5).

Classification performance

4) Replace the worst feature by a random feature (outside current solution).

0.8 0.75

Extremal Optimization

0.7

Genetic Algorithm 0.65

Backward selection Random

0.6

Cluster size 0.55 0

50

100

150

200

250

300

350

400

450

500

Number of selected features

Fig. 5: Feature selection performance (video-surveillance).

contrast, in the next experiment in Subsection 5.3, features will be extracted from the training set. The following feature selection techniques are used in this experiment: random selection for comparison reasons as the baseline performance, cluster size using the MBSAS algorithm, genetic algorithm (500 generations, each 100 individuals, Two-point crossover, mutation probability 0.02), extremal optimization (3,000 iterations, ReliefF with 30 iterations, k=2), backward selection (using ReliefF, after each iteration worst 50 features are discarded). The results of the analysis are shown in Figure 4 and Figure 5. Overall, extremal optimization obtains the best results. The performance of the genetic algorithm is 23% lower. The performance of backward selection using ReliefF is only slightly better than random selection. The performance of the commonly used clustering algorithm is below the performance of random selection and should therefore not be used for visual dictionary creation in the context of this paper. Repeating the experiment with the SIFT descriptor confirms the conclusions but these results are omitted here. For any dictionary size, it can be seen that the performance gain of using feature selection is larger than the gain of using 2 (video-surveillance) to 4 (Caltech) times as many random dictionary features. Alternatively, we can conclude that a desired performance level can be obtained using less computations by applying feature selection.

5.2 Size of Candidate Set The effect of the size of the candidate set on the feature selection performance was tested in a second experiment. The size of the candidate set determines the number of possible visual dictionaries. It is expected that a small candidate set will decrease the classification score because it does not offer sufficient variation. A large candidate set may decrease the performance of feature selection, because finding the optimal dictionary becomes more difficult. We will now evaluate the relation between candidate set size and feature selection performance. The candidate sets contain a varying number of randomly extracted features from images of natural scenery. The feature selection algorithm selects 50 features from each set. The dictionaries are evaluated using the video-surveillance dataset. The results are shown in Figure 6.

Classification performance

0.75

0.7

0.65 Extremal Optimization Genetic algorithm Backward selection Random Cluster size

0.6

0.55 0

5000

10000

15000

verify this assumption. In this experiment, we will compare different IPOs for candidate set creation. The applied IPO algorithms4 are Harris-Laplace (HarLap) [27], [28], HessianLaplace (HesLap) [28], Maximally Stable Extremal Regions (MSER) [29] and Difference of Gaussians (DoG) [8]. These popular methods are compared with sampling on a uniform grid (UnifSamp) and random sampling (Rand). To compare with the method used in the previous experiment, also random selection from images of natural scenery (RandNat) is considered. The results for Caltech-5 are shown in Figure 7. It can be seen that using an IPO for dictionary creation does not significantly improve the classification performance over random selection. For both the HMAX and the SIFT descriptor, the differences in performance for the top-4 scores are within 1%, which is within the margin of error. Thus, within the considered system, employing an IPO to generate a visual dictionary is not useful. This conclusion has been confirmed using the video-surveillance dataset. In addition, the conclusion holds when using feature selection (extremal optimization) instead of random selection. Interestingly, random selection from the training set only slightly outperforms random selection from images of natural scenery. This confirms the conclusions by Serre et al. [6] for HMAX, which we generalize for the SIFT descriptor on the evaluated dataset.

20000

Size of candidate set

It can be seen that the performance of most algorithms does not depend significantly on the size of the candidate set. A notable exception is the cluster-size technique which is widely applied, but performs significantly worse for large candidate sets and has the worst overall performance.

Classification performance

0.95

Fig. 6: Feature selection for different candidate set sizes.

0.9 0.85 0.8 0.75 0.7 0.65 0.6 RandNat Random UnifSamp

5.3 Dictionary Creation using Interest Points In the previous experiments, the candidate set was created from randomly extracted features from images of natural scenery. In this experiment, the candidate set will be formed by features extracted around interest points extracted from the training set. Both the effect of the source for the candidate set and the effect of the extraction method (interest point operators) will be investigated. A dictionary of 500 features is randomly selected from the candidate set and described with both the HMAX and the SIFT3 descriptors. Random selection is used to eliminate the influence of feature selection. Typically, IPOs are used to select features for the visual dictionary. Since IPOs detect stable image points, it is assumed that these points are characteristic for the object classes. However, there has been no in-depth research to 3 R.

Hess, http://web.engr.oregonstate.edu/∼hess/

HMAX

SIFT

0.55 HarLap

HesLap

DoG

MSER

Fig. 7: Results of the candidate set creation experiment.

6. Conclusions We have presented a new algorithmic combination for object classification, which is based on employing feature selection techniques for improving the quality of the visual dictionary. Several existing feature selection methods have been applied to the visual dictionary creation step. Backward selection and a genetic algorithm have been compared with random selection and cluster-size. In addition, we have proposed the extremal optimization algorithm. This algorithm has not yet been applied for the purpose of feature selection, 4 K.

Mikolajczyk, http://www.robots.ox.ac.uk/∼vgg/research/affine

nor in the field of object recognition. The proposed approach outperforms a genetic algorithm which is a state-of-the-art feature selection technique. It has the additional advantage that it lacks tunable parameters. Surprisingly, the popular cluster-size algorithm has the lowest performance. Additionally, we have investigated the effect of the size of the candidate set on the feature selection performance. One would expect that a larger candidate set increases the difficulty of selecting the optimal features. However, results show that the performance of most algorithms does not depend significantly on the size of the candidate set. A notable exception is the cluster-size technique, which performs significantly worse for large candidate sets and again has the lowest overall performance. In these experiments, the candidate set has been generated by randomly extracting features from images of natural scenery. In an alternative experiment, features were extracted from the training images from randomly selected points and points selected by state-of-the-art interest point operators (IPOs). The experiment showed that for the considered categorization system and the evaluated datasets, the hypothesis of making a better pre-selection by applying interest point operators, over random selection, is false. Thus, employing an IPO to generate a visual dictionary is not useful in the considered system. In addition, random extraction from the training set only slightly outperforms random extraction from images of natural scenery, for both the HMAX and the SIFT descriptor. The generalization of these results towards different categorization system and datasets is considered future work. Summarizing, applying feature selection results in a more efficient visual dictionary. This efficiency gain can be exploited to obtain a better dictionary, or a lower computational complexity for the same dictionary performance.

References [1] S. Agarwal, A. Awan, and D. Roth, “Learning to detect objects in images via a sparse, part-based representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 26, no. 11, pp. 1475–1490, November 2004. [2] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual categorization with bags of keypoints,” in Proc. European Conference on Computer Vision (ECCV), May 2004. [3] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman, “Discovering objects and their location in images,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), vol. 1, October 2005, pp. 370–377. [4] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky, “Learning hierarchical models of scenes, objects, and parts,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), vol. 2, October 2005, pp. 1331– 1338. [5] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience, vol. 2, no. 11, pp. 1019–1025, November 1999. [6] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust object recognition with cortex-like mechanisms,” Trans. Pattern Analysis and Machine Intelligence (PAMI), vol. 29, no. 3, pp. 411–426, March 2007.

[7] P. Moreno, M. Marín-Jiménez, A. Bernardino, J. Santos-Victor, and N. P. de la Blanca, “A comparative study of local descriptors for object category recognition: SIFT vs HMAX,” in 3rd Iberain Conf. on Pattern Recognition and Image Analysis (iBPRIA), June 2007. [8] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal of Computer Vision (IJCV), vol. 60, no. 2, January 2004. [9] R. Wijnhoven and P. H. de With, “Patch-based experiments with object classification in video surveillance,” in Proc. Advanced Concepts for Intelligent Vision Systems (ACIVS), LNCS, vol. 4678. SpringerVerlag, August 2007, pp. 285–296. [10] A. L. Bluma and P. Langley, “Selection of relevant features and examples in machine learning,” Artificial Intelligence, vol. 97, no. 1, pp. 245–271, December 1997. [11] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2001, pp. 511–518. [12] F. Fleuret, “Fast binary feature selection with conditional mutual information,” Journal of Machine Learning Research, vol. 5, pp. 1531–1555, November 2004. [13] G. Dorko and C. Schmid, “Selection of scale-invariant parts for object class recognition,” in Proc. IEEE Int Conf. on Computer Vision (ICCV), vol. 1, October 2003, pp. 634–639. [14] J. Mutch and D. Lowe, “Multiclass object recognition with sparse, localized features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 1, June 2006, pp. 11–18. [15] J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping, “Use of the zero norm with linear models and kernel methods,” Journal of Machine Learning Research, vol. 3, pp. 1439–1461, March 2003. [16] Z. Sun, G. Bebis, and R. Miller, “Boosting object detection using feature selection,” in Proc. IEEE Conf. on Advanced Video and Signal Based Surveillance (AVSS), Dept. of Comput. Sci., Nevada Univ., Reno, NV, USA, July 2003, pp. 290–296. [17] F. Jurie and B. Triggs, “Creating efficient codebooks for visual recognition,” in Proc. IEEE Int Conf. on Computer Vision (ICCV), October 2005, pp. 604–610. [18] S. Boettcher and A. G. Percus, “Extremal optimization: Methods derived from co-evolution,” in Proc. Genetic and Evolutionary Computation Conference (GECCO). Morgan Kaufmann, 1999, pp. 825– 832. [19] J.Yang and V. Honavar., “Feature subset selection using a genetic algorithm,” Intelligent Systems and their Applications, vol. 13, no. 2, pp. 44–49, March 1998. [20] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157– 1182, March 2003. [21] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. Int. Joint Conf. on Artificial Intelligence, vol. 2. Morgan Kaufmann, 1995, pp. 1137–1143. [22] I. Kononenko, “Estimating attributes: Analysis and extensions of relief,” in Machine Learning: ECML-94, LNCS, vol. 784. Springer Verlag, 1994, pp. 171–182. [23] A. Teynor and H. Burkhardt, “Fast codebook generation by sequential data analysis for object classification,” in Proc. Int. Symp. on Visual Computing (ISVC), 2007. [24] S. Theodoridis and K. Koutroumbas, Pattern Recognition. Academic Press, 2006. [25] D. E. Goldberg, K. Deb, and J. Horn, “Genetic algorithms,” in Search, Optimization, and Machine Learning. Addison-Wesley, 1989. [26] P. Bak, How Nature Works: The Science of Self-Organized Criticality. Springer, August 1996. [27] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. 4th Alvey Vision Conference, 1988, pp. 147–151. [28] K. Mikolajczyk and C. Schmid, “Scale and affine invariant interest point detectors,” Int. Journal of Computer Vision (IJCV), vol. 60, no. 1, pp. 63–86, November 2004. [29] J.Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in Proc. British Machine Vision Conference (BMVC), 2002, pp. 384–393.

Suggest Documents