Page 1 ... advantage of Curvature Scale Space (CS2) and a linear Support Vector Ma- ... In the seminal paper on Curvature Scale Space [1], the CS2.
Curvature Scale Space Application to Distorted Object Recognition and Classification Natan Jacobson, Truong Nguyen, University of California, San Diego, United States; Frank Crosby, Naval Surface Warfare Center, Panama City, United States (Invited Paper) Abstract-Contour classification methods which operate directly on an image are greatly affected by small magnitude transformations to the image. In this paper, a contour classification method is developed which takes advantage of Curvature Scale Space (CS2) and a linear Support Vector Machine (SVM) classifier. The CS2 representation boasts invariance to transformations including: scaling, rotation, translation and noise. In addition, the linear SVM is a robust tool for classification problems involving multiple labels. The combination of these tools produces a classifier well suited for object recognition in photographs where distortion is present.
I. INTRODUCTION
There is significant ongoing research which deals with object classification in image processing. One such area of research concerns the Curvature Scale Space (CS2) representation. This representation is powerful as it boasts invariance to affine transformations of the object to be classified. I propose the application of statistical learning techniques to improve the performance of a CS2 classifier. It is my hope that the combination of transformation invariance from the CS2 representation, and classification prowess of the statistical learning method will produce an excellent object classifier. II. RELATED WORK
In the seminal paper on Curvature Scale Space [1], the CS2 representation is used to register a Landsat satellite image of an area to a map which contains the shorelines of the same area. The maxima of the CS2 representation are used in a matching algorithm which finds the minimum cost for matching the CS2 image of the satellite image with that of the map. Pre-processing methods such as manually correcting for skew in the map and deleting small contours are employed. The CS2 maxima are also employed in [2], where a database of 1100 marine animal contours is used for similarity comparison. An input image is tested by comparing the maxima of its CS2 representation with those in the database. For each input, the closest database matches are found using a cost-based matching technique. Performance is evaluated as the number of models determined to be in the same class as the input, divided by the number of members of that class. Performance is compared favorably to the methods of Fourier Descriptors and Moment Invariants. CS2 has also been shown to be a powerful tool for image segmentation in [3]. In this paper, historical documents are scanned and computed at varying scales. At a sufficiently large scale, the shapes of letters will merge into word shapes, which are bounded. This method far outperformed the previous state of the art which calculated words based on a gap metric. A more complex recognition scheme is introduced in [4], where CS2 is used for object segmentation, while an Artificial
978-1-4244-2110-7/08/$25.00 C2007 IEEE
Neural Network (ANN) learns associations between object segments. The CS2 zero-crossings are used at varying scale to break a detected object into segments which are placed into a segment hierarchy. A second ANN trains via back-propagation with momentum to recognize important associations generated by the first ANN. This recognition scheme proved to be invariant to partial occlusion and rigid transformations of the object. There are also effective classification techniques which operate directly on image intensity values. One example [5] compares the performance of several statistical learning techniques on face detection, where the training set consists of faces in different orientations. These techniques include: single Neural Networks, multiple Neural Networks with arbitration, and Neural Networks with zero hidden units (perceptron). This approach works well, although one caveat is the requirement of application-specific parameter tuning for optimal results. Another intensity-based object classification approach by Belongie et al. [6] compares shape contexts for object matching. This involves finding the relation of each pixel along a contour to all other pixels on the contour, and then computing the best match on a per-pixel basis. This method is tested using the MNIST database. In contrast to the previous work, I propose a method which combines invariances into a Support Vector Machine classifier as in [7], using the CS2 representation. III. TESTING
The CS2 classifier will be tested using the MNIST Digit Database. This includes a training set of 60,000 images and testing set of 10,000 images, each of handwritten digits [8]. Each example is an 8-bit 28x28 pixel image including a single digit. This database is extremely useful as various other classification techniques have already been employed and compared (http://www.research.att.com/lyann/exdb/mnist/index.html). Performance of the CS2 classifier will be decided empirically by a % Test Error Rate, calculated as the number of misclassifications per total test cases examined. IV. CURVATURE SCALE SPACE OVERVIEW
Contours play an important role both in the human visual system and in the realm of image processing. In an image, objects can be defined by the contours which bound them, meaning that
there is significant information contained therein. The CS2 representation, first proposed by F Mokhtarian and A. Mackworth [1], takes advantages of this fact. The idea is to represent a planar curve at varying levels of detail, with invariance to rotation, scaling and translation. This is accomplished by deter-
2110
Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 24, 2009 at 15:24 from IEEE Xplore. Restrictions apply.
mining the zeros of curvature of the contour at varying scales. It will be very difficult to determine the curvature of the contour in a continuous sense, therefore a parametric contour is used which depends on the pixel values of the contour. The x and y-coordinates of the contour pixels are parameterized by an arc-length parameter u which traces around the contour. The result will be two vectors x (u) and y (u) which define the contour boundary. With these vectors obtained, the curvature of the contour is given as follows [9]:
() =X (u) y (u) _X
(u) X
-
(U)2 + (U)2
(u)
(b:
(a)i
(1)
2
Y
So far this is only a description of the curvature at one scale. Other scales are computed by smoothing x (u) and y (u) with a Gaussian kernel. This process is known as evolution [2]. The curvature of the contour is then given by (u, o) where u is the arc-length parameter and
X(u
)Y
is the scale.
(,o
(d)
7U(Y ,
(X ,
7) XUU
(Xu (U,cx)2 + yu (U,T)7)
An example of the CS2 representation Fig. of curvature for a given contour.
1
7)
(2)
2
displays the zeros
Fig. 2. Preprocessing steps: (a) original image, (b) contour from original image, (c) upscaled image, (d) contour from upscaled image
are
interpolated to the size of the largest vector. This is necesfor the CS2 representations to be of similar dimension.
sary
V. CLASSIFICATION ARCHITECTURE
B. CS2 Formation The CS2 classifier is divided into three stages. In the first Once the vectors x(u) and y(u) are available, the CS2 represtage, the digit image is pre-processed so that the grid of pixel sentation is produced. The vectors xu (u, o) and yu (u, o) are intensities is converted into an x and y vector of the digit con- obtained (4), and similarly, xuu (u, ) and y (u, ) are obtour. These vectors are interpolated to be of the same length. In tained (5). Here X denotes convolution. The trick of convolvthe second stage, the CS2 representation is formed for a spec- ing each path-length vector with the derivatives of a Gaussian ified range of scales. Finally, certain features of the CS2 rep- kernel is allowed by the differentiation property of convolution. resentation are selected and used as the input feature space of (u,C) = (u) X g (u,C) the linear SVM classifier. Selection of these features will be (3) discussed later. x
x
xu (u, C) = X (u) X gu
A. Pre-processing In order to construct a CS2 representation, the digit contour must be obtained. The first step is to resize the image four times in each dimension using a bicubic filter, increasing the resolution from 28x28px to 1 12xl 12px. Next, a Canny edge detection algorithm is used to extract the digit contour (Fig. 2). Note that the contour obtained from the high-resolution digit is smoother than that obtained at the original resolution. The next step is to transform the image contour from a grid of pixel intensity values into the vectors x(u) and y(u). This is accomplished using a contour tracing algorithm which removes each image pixel as it is accumulated. Finally, the two vectors
AmkL
1010
II -L
lr
-lw;.
ALI
-"h, -mb
Fig. 1. Sample CS2 representation
.:(
tc
_f7-
'Wm
(u, C)
(4)
(5) The curvature (u, u) is then computed (2) for each value of in the selected range. We determine zeros of curvature when IK(u, )1 < where is a user-specified threshold. The CS2 representation is a binary matrix where each entry is specified by (6). xUU (u, C) = X (u) X guu
T
(u, C)
T
Cs2 (u
7)
O
I(U,7)I
otherwise
< T
(6)
The CS2 representation for the digit contour obtained in Fig. 2 is observed in Fig. 3. C. Feature Selection
The selection of a feature space is an important step in the classification architecture. The idea is to use a low dimensionality in each feature to decrease computation time and memory requirements as much as possible [10], [11]. At the same time, the feature space should contain significant information about each contour and be easily separable (although this is hard to verify by inspection). The standard dimension of a CS2 representation for this application is 320 pixels for u and 120 scales
2111
Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 24, 2009 at 15:24 from IEEE Xplore. Restrictions apply.
lem. Test error for the Gaussian kernel was as high as 80%, with the polynomial kernel performing marginally better. In contrast, the linear kernel (with a small training misclassification cost) performed at or below 30% test error in all trials. Out of all possible SVM Classifiers, the C-Support Vector Classifier (C-SVC) was employed. This implementation utilizes the decision function (7), which includes a set of slack variables ((i). The slack variables allow for some misclassifications, thus improving regularization. The C-SVC solves (8) for some userdefined positive value C. C is the cost corresponding to the sum of the slack variables which is essentially a trade-off between minimizing the training error and maximizing the margin [14]. Yi
((Xi, W) + b) > I .j
arc length (u)
for o. In total, this is 38,400 data points for each digit contour. In contrast, the original image was only 28x28 pixels, for a total of 784 data points. Clearly, the dimensionality of the raw CS2 representation is too great, and will require significant memory and computation time. For this application, the maxima of the CS2 space are considered as features (Fig. 4). This reduces dimensionality drastically, but not by a constant amount. Because different contours will produce a non-constant number of maxima, dummy variables must be added so that the SVM Classifier operates on data of uniform dimension.
HIII2w+C
1
min {7 (w, (i)
Fig. 3. Cs2 representation is formed in the second stage of the classification architecture
(7)
2
~
lI2 E
(8)
J
This problem can be reformulated by finding its dual. Instead of solving (8), we can instead solve (9) subject to constraints (10-11) where ai 74 0 are the support vectors.
Im
m
° Ei
max
2
,
i,j=l
i=l
< aji < CO
C
(10)
Vi= l...m
m
(9)
aiajyiyjk(xi, xj)j>
m
,ii
(1 1)
0
i
i=l
:9U
w ;VI
a
I
I I I I I I
I I
I
I I
0
0
0
80
Xp
60~
-
O
*.
20
°0
100
1o
arc
length (u)
200
250
I
300
..1
0 gi-
-*. 'al..
0
Fig. 4. Cs2 maxima r
w
Fig. 5. Soft-Margin 2-class SVM Classifier
VI. SUPPORT VECTOR MACHINE CLASSIFIER The LIBSVM library [12] was used for all training and testing. The provided LIBSVM MEX files were compiled via MATLAB r2006a and the GCC C++ compiler. For more information on LIBSVM, see [13]. Several kernels were considered for the classifier including: Gaussian radial basis function, polynomial, and linear kernels. It was determined early in the project that the two former kernels underperformed due to the many-class nature of the prob-
In order to decrease computation time, a hack was employed. Instead of using a single 10-class linear SVM, a series of 2-class SVM's were used with arbitration. The system is comprised of 45 2-class units, each of which is trained on two digit classes (Fig. 6), and outputs a decision in {0, 1}: unit 5/6:
Y =
xCclass5 _1 x C class 6
2112
Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 24, 2009 at 15:24 from IEEE Xplore. Restrictions apply.
(12)
For example, the '5/6' unit is trained on features from the '5' and '6' digit classes. Because the units are trained independently, training can now be conducted in parallel, which will significantly reduce computation time. For the testing portion, each unit casts a vote based on its decision. The votes are tallied up to form the final decision.
Method Linear OVM SVMC (pixel domain) Linear SVMC (pixel domain) Linear OVM SVMC (CS2) Linear SVMC
(CS2)
0/1 + 0/2 + 0/3 + . .. + 0/9 1/2 + 1/3 + 1/4 + . .. - 0/1 (13)
y =max
8/9 - 0/8 - 1/8 -0/9 - 1/9 - 2/9 -
-
7/8 8/9
-
. ,X,t_ ._I.
.~ ~~EElX 1WS
'>_~
a
zH
H
5.96%
105.88 mins.
18.35%
137.89 mins.
18.35%
865.40 mins.
0.63%
unknown
0.39%
unknown
H
A few examples are provided in Fig. 7 and Fig. 8 demonstrating the difference in CS2 representations between digit classes. Note the similarity in respresentation for the two digits from class '2'. The linear classifier performs very well operating directly on the raw data. In this case, the training time for 45 units comprising 60,000 training examples completed in 17.6 minutes. The testing portion took 17.83 minutes for 10,000 test examples. By exploiting a fully parallel architecture, this run time can be decreased by as much as 10 times. Next, the maxima of the CS2 representation were used as an input space. The dimensionality of this space is low compared with that of the raw data, with an average of roughly 5 maxima for each representation. However, this input space discriminates between the classes less well, resulting in an increase in training time, and a decrease in classification performance.
e
VIII. SUMMARY AND CONCLUSIONS
~~1
INrm
We have seen that the CS2 classifier underperforms against the state of the art [15] when tested against the MNIST digit
)~~~~~E El, El[_ E .~
Compute Time 35.43 mins.
,
The testing portion, which is examined in (13) can also be distributed among several computers. The result of this hack is that many CPU's can be used together to drastically reduce computation time.
1
State of the Art (Shape Contexts) State of the Art (Convolutional Net)
Results % Test Error 5.96%
~
~
~~~89
Fig. 6. Structure for linear One vs. One classifier
(b)
VII. RESULTS
VEu
The (CS2) feature space is explored for a linear One vs. One SVM classifier using the MNIST database. Performance of the CS2 classifier will be decided empirically by a % Test Error, calculated as the number of misclassifications per total test cases examined.
(C)
(d)
Fig. 7. Example of pre-processing step for four MNIST digits: (a) digit from class '2', (b) digit from class '2', (c) digit from class '4', (d) digit from class '5'
2113 Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 24, 2009 at 15:24 from IEEE Xplore. Restrictions apply.
tours. Another thing which could benefit the classifier is the implementation of a v-SVC instead of a C-SVC. Finally, it would be interesting to apply the classifier to a database other than MNIST. Perhaps performance will be better for a different type of object classification problem.
CS2 Rep reseatirdon
80
- 1d
.*1
.-
li
P
I
so
_
io
200
I so
(hi
II
I
I~~~~iI
04 eI 4\ EC
100
1 Fa
2T
2SO
300
CS'RPrt-ita io U-
,40
_
_ ___j
so
DO
1 CT
2E1
250
I00
(d) Fig. 8. Example of CS2 representation for four MNIST digits: (a) digit from class '2', (b) digit from class '2', (c) digit from class '4', (d) digit from class '5'
database. For the classifier to be useful in this application, it would need to perform at or below 10% Error Rate. As it is right now, using the raw digit data in a linear classifier is a better choice than employing the CS2 representation. One explanation for this is that the transformations between digits are very complex, and are rarely characterized by translation, rotation or scaling. This explains why performance on the MNIST database is suboptimal, however there are still plenty of applications for the CS2 classifier. One such area is object classification where translation, rotation and scaling are the predominant transformations. This tends to be the case in realworld situations involving classification of rigid objects. For example, classification of different cars in a street scene may be a good fit for the CS2 representation.
REFERENCES [1] F. Mokhtarian and A. Mackworth, "Scale-based description and recogntion of planar curves and two-dimensional shapes," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 8, pp. 34-43, 1986. [2] S. Abbasi, F. Mokhtarian, and J. Kittler, "Curvature scale space image in shape similarity retrieval," Multimedia Systems, vol. 7, no. 6, pp. 467-476, November 1999. [3] R. Manmatha and J.L. Rothfeder, "A scale space approach for automatically segmenting words from historical handwritten documents," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 8, pp. 1212-1225, 2005. [4] G. Bebis, G. Papadourakis, and S. Orphanoudakis, "Recognition using curvature scale space and artificial neural networks," in Proceedings oj the IASTED International Conference, 1998. [5] H.A. Rowley, S. Baluja, and T. Kanade, "Neural network-based face detection," in Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on, 1996, pp. 203208. [6] S. Belongie, J. Malik, and J. Puzicha, "Shape matching and object recognition using shape contexts," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 4, pp. 509-522, 2002. [7] Bernhard Scholkopf, Chris Burges, and Vladimir Vapnik, "Incorporating invariances in support vector learning machines," in ICANN, 1996, pp. 47-52. [8] Y Lecun, L. Bottou, Y Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [9] F. Mokhtarian and A.K. Mackworth, "A theory of multiscale, curvaturebased shape representation for planar curves," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 14, no. 8, pp. 789-805, 1992. [10] Isabelle Guyon, Nada Matic, and Vladimir Vapnik, "Discovering informative patterns and data cleaning," in Advances in Knowledge Discovery and Data Mining, pp. 181-203. AAAI Press/ MIT Press, 1996. [11] B. Scholkopf, A. Smola, and K. Muller, "Support vector methods in learning and feature extraction," 1998. [12] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001. [13] C. W. Hsu, C. C. Chang, and C. J. Lin, "A practical guide to support vector classification," Tech. Rep., Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan, Taipei, 2003. [14] B. Scholkopf and A. Smola, Learning with Kernels, The MIT Press, 2002. [15] Patrice Simard, Dave Steinkraus, and John Platt, "Best practice for convolutional neural networks applied to visual document analysis," in International Conference on Document Analysis and Recogntion (ICDAR), IEEE Computer Society, Los Alamitos, 2003, pp. 958-962.
IX. FUTURE WORK
There are several aspects of the CS2 classifier which will benefit from additional work. First, improvements can be made to the pre-processing algorithm to increase contour invariance. This may involve making assumptions about closedness of con-
2114 Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on February 24, 2009 at 15:24 from IEEE Xplore. Restrictions apply.