Study and evaluation of a multi-class SVM classifier

0 downloads 0 Views 1MB Size Report
Mar 15, 2010 - extended for the design of multiclass SVM classifiers [6]. Design of hidden Markov ..... The confusion matrix for SVM–DL classifier using wavelet.
ARTICLE IN PRESS Neurocomputing 73 (2010) 1676–1685

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Study and evaluation of a multi-class SVM classifier using diminishing learning technique J. Manikandan, B. Venkataramani Department of ECE, National Institute of Technology Trichy (NITT), Tamil Nadu, India

a r t i c l e in f o

a b s t r a c t

Available online 15 March 2010

Support vector machine (SVM) is one of the state-of-the-art tools for linear and non-linear pattern classification. One of the design objectives of an SVM classifier is reducing the number of support vectors without compromising the classification accuracy. For this purpose, a novel technique referred to as diminishing learning (DL) technique is proposed in this paper for a multiclass SVM classifier. In this technique, a sequential classifier is proposed wherein the classes which require stringent boundaries are tested one by one and once the tests for these classes fail, the stringency of the classifier is increasingly relaxed. An automated procedure is also proposed to obtain the optimum classification order for SVM–DL classifier in order to improve the recognition accuracy. The proposed technique is applied for SVM based isolated digit recognition system and is studied using speaker dependent and multispeaker dependent TI46 database of isolated digits. Both LPC and MFCC are used for feature extraction. The features extracted are mapped using self-organized feature maps (SOFM) for dimensionality reduction and the mapped features are used by SVM classifier to evaluate the recognition accuracy using various kernels. The performance of the system using the proposed SVM–DL classifier is compared with those using other techniques: one-against-all (OAA), half-against-half (HAH) and directed acyclic graph (DAG). SVM–DL classifier results in 1–2% increase in recognition accuracy compared to HAH classifier for some of the kernels with both LPC and MFCC feature inputs. For MFCC feature inputs, both HAH and SVM–DL classifiers have 100% recognition accuracy for some of the kernels. The total number of support vectors required is the least for HAH classifier followed by the SVM–DL classifier. The proposed diminishing learning technique is applicable for a number of pattern recognition applications. & 2010 Elsevier B.V. All rights reserved.

Keywords: Isolated digit recognition Multi-class classification Support vector machines Support vectors

1. Introduction Support vector machine (SVM) is one of the popular techniques for pattern recognition and is considered to be the state-of-the-art tool for linear and non-linear classification [1]. SVM has been used as a binary classifier in several applications such as beam forming [2], ultra wide band (UWB) channel equalization [3], channel estimation in orthogonal frequency division multiplexing (OFDM) systems [4] and voice activity detection [5]. The SVM classifier has been proposed for binary classification in literature and it has been extended for the design of multiclass SVM classifiers [6]. Design of hidden Markov models (HMM) and SVM based isolated digit recognition system for speaker with spastic dysarthria is considered in [7]. SVM is used for phoneme based speaker recognition in [8]. Isolated digit recognition using MFCC features is presented in [9].

 Corresponding author.

E-mail address: [email protected] (J. Manikandan). 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2009.11.042

Multitraining support vector machine is proposed in [10] for content based image retrieval, which improves the recognition performance of SVM classifiers over conventional SVM-based relevance feedback (RF) scheme [11]. It uses an RF model in which the unlabeled data are used to augment labeled data, based on three features (colour, texture and shape) of an image that are redundant but not completely correlated. Asymmetric bagging based SVM (AB-SVM) approach is proposed and integrated with random subspace SVM (RS-SVM) in [12] to overcome the drawbacks of relevance feedback based SVM by incorporating the benefits of bootstrapping and aggregation. The AB-SVM classifier bootstraps training samples and the RS-SVM classifier bootstraps in the feature space and the technique is employed for content based image retrieval using a Gaussian kernel for the SVM classifiers. Computational complexity and classification time for the SVM classifier depend on the number of support vectors required for the SVM. Increase in number of support vectors leads to increase in computational requirements such as floating point multiplication and addition. The memory required to store the support vectors for SVM classification is directly proportional to

ARTICLE IN PRESS J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

the number of support vectors. Hence, there is a need to reduce support vectors to speed up the classification and reduce the hardware and computational resources required for classification. K-means clustering technique is proposed in [13] for reducing the number of support vectors of the SVM classifier for handwritten digits. In order to reduce the training time for SVM classifier, number of techniques such as chunking [14], decomposition algorithms [15], sequential minimal optimization (SMO) technique [16] and online support vector classifier [17] are reported in the literature. In order to reduce the number of support vectors, an SVM classifier using a technique referred to as diminishing learning (DL) is proposed in this paper and is applied for isolated digit recognition system. Its performance is compared with the systems using the following classifier techniques: one-againstall (OAA) [18] (which is also called as one-against-rest or oneagainst-remaining), one-against-one (OAO) [18], half-against-half (HAH) [19] and directed acyclic graph (DAG) [20]. The organization of the paper is as follows: Section 2 gives an overview of the algorithm used in support vector machine (SVM) and the architecture of SVM based isolated digit recognition system. In Section 3, the diminishing learning technique is explained and Section 4 describes the experimental results of SVM–DL. The performance of the digit recognition system obtained using different classifier techniques are presented and compared in Section 5, followed by conclusion and references.

1677

the constrained optimization problem which maximizes the width of the margin and minimizes the structural risk and is given by N X min 1 T xi w wþ C w,b 2 i¼1

ð2Þ

subject to yi ðwT Uðxi Þ þ bÞ Z 1xi

xi Z0,8i where C is the trade-off parameter and xi the slack variable, which measures the deviation of a data point from the ideal condition of pattern. The penalty parameter ‘C’ controls the trade-off between the complexity of the decision function and the number of wrongly classified testing points. The correct C value cannot be known in advance and a wrong choice of the SVM penalty parameter C can lead to a severe loss in performance. Therefore, the parameter values are usually estimated from the training data by cross-validation and exponentially growing sequences of C. The solution to (2) can be reduced to a QP optimization problem [22]: N max X

a

ai 

i¼1

N X N 1X aaH 2 i ¼ 1 j ¼ 1 i j i,j

ð3Þ

subject to 0 r ai r C,8i N X ai yi ¼ 0

2. SVM classifier for isolated digit recognition system

i¼1

2.1. SVM classifier

where H is an N  N matrix, with (i,j)th element given by

The aim of SVM classifier is to devise a computationally efficient way of learning ‘good’ separating hyperplanes between different classes in a high dimensional feature space. SVM is used to identify a set of linearly separable hyperplanes which are linear functions of the high dimensional feature space as shown in Fig. 1. The hyperplanes are placed such that they maintain maximum distance from both the classes. The basic form of an SVM classifier can be expressed as YðzÞ ¼ wT FðzÞ þ b

ð1Þ

where z is the test input vector, w is a vector normal to the hyperplane which separates the classes in the feature space. The feature space is produced from the feature mapping function U(  ) which can be either linear or non-linear. The parameter b is the bias. The separating hyperplane (described by w) is determined by minimizing the structural risk instead of the empirical error [21]. Minimizing the structural risk is equivalent to seeking the optimal margin between two classes and the width of the margin is 2/wTw. Let {xi,yi} for i¼1,2,y,N denote the training dataset where yi is the target output for training data xi. SVM training can be posed as

Hi,j ¼ yi yj ðFT ðxi ÞFðxj ÞÞ

ð4Þ

There is a Lagrange multiplier ai for each training sample xi. Those samples whose ai’s are nonzero are called support vectors (SVs) and a portion of training samples become SVs [21]. Solving the QP problem [23] yields w¼

N X

ai yi Uðxi Þ

ð5Þ

i¼1

and (5) can be rewritten in terms of support vectors as w¼

NSV X

asv ysv Uðxsv Þ,

ð6Þ

sv ¼ 1

where NSV denotes the number of support vectors and the parameters asv, gsv and xsv represent the parameters corresponding to each support vector. From (1) and (6), the SVM classifier equation for a test data z is expressed as NSV X

YðzÞ ¼

asv ysv ðFT ðxsv ÞFðzÞÞ þb

sv ¼ 1

¼

NSV X

ð7Þ

asv ysv Kðxsv ,z þ Þb

sv ¼ 1

Fig. 1. Non-linear to linear transformation using the SVM technique.

where K(xsv, z)¼(UT(xsv) U(z)) is a kernel function. The advantage of using kernel function is that the U(  ) need not be found explicitly for each kernel. It is observed from (7) that the increase in number of support vectors leads to increase in computational requirements and hence making the classifier slow for non-linear kernels. Reduction in number of support vectors also reduces the number of times the kernel needs to be evaluated. Examples of some of the most commonly-used kernel functions [22,24] in various SVM classifiers are given in Table 1. The kernel functions map the input into higher dimensional feature space and the

ARTICLE IN PRESS 1678

J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

kernel functions must satisfy the Karush–Kuhn–Tucker (KKT) Conditions and Mercers condition [21] to retain the geometrical interpretation of the feature space, so that the solution of the objective function gives a hyperplane in the feature space. For a multi-class classification problem, the output domain is Y(z)¼{1,2,y,m} for an m-class problem. Support vectors {xj, yj},

Table 1 SVM kernels and kernel functions. Kernel

Kernel function

Linear Polynomial RBF Sigmoid Wavelet

K(x,z) ¼xiTzi K(x,z) ¼(xiTzi + g)d where d is the degree of polynomial K(x,z) ¼exp(  g||xi  zi||2) K(x,z) ¼tanh(gxiTzi + 1) N   Q h xiazi where h(x) ¼cos(1.75x)exp (  x2/2) and a is Kðx,zÞ ¼ i¼1

the dilation parameter

Fig. 2. Block diagram of SVM based isolated digit recognition system.

Lagrange Multipliers (aj) and bias (bj) values are associated to each of the m-classes, where j e{1,2,...,NSV} and the decision function is given by YðzÞ ¼ argmaxðYi ðzÞÞ

8i

ð8Þ

i

where Yi(z) is computed for each class i e{1,2,...,m} using (7). Geometrically this is equivalent to associating a hyperplane to each class, and assigning the test input z to that class whose hyperplane is furthest from it. More details about SVM classifier can be had from [21,22,25]. 2.2. Isolated digit recognition Isolated digit recognition is a problem of pattern classification which has a very vast feature size. Fig. 2 shows the block diagram of proposed isolated digit recognition system using SVM classifier. It comprises of feature extraction, self-organized mapping of the features extracted and SVM classifier. The input speech is divided into frames and LPC/MFCC coefficients are computed for each frame. (The linear predictive coefficient (LPC) features are extracted from the input speech using the pre-emphasis, frame blocking, windowing, and autocorrelation blocks. Details about the steps involved in extracting the LPC features can be had from [26,27]. Similarly, Mel frequency Cepstral coefficients (MFCC) are computed using the following blocks: pre-emphasis, frame blocking, windowing, DFT, Mel filter bank processing, IDFT, computation of energy, computation of delta and double delta features. Detailed account of the procedure adopted for computation of MFCC is given in [28].) Fig. 3(a) and (b) shows the block diagram for LPC and MFCC feature extraction of the input speech, respectively. Each frame results in 11 LPC coefficients or 25MFCC (13MFCC+12 Delta MFCC) coefficients. The total number of LPC/MFCC coefficients for each digit is not uniform because each input digit comprises of different number of frames. SVM based recognition requires equal number of feature vectors for each input. In order to make the size of input feature uniform for all the digits, Kohonen’s self-organizing

Fig. 3. (a) Block diagram for LPC feature extraction and (b) Block diagram for MFCC feature extraction.

ARTICLE IN PRESS J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

feature map (SOFM) [29] is used. SOFM reduces the input feature size for classification too. The SOFM transforms the acoustic vector sequences of features into trajectories in a square matrix of fixed dimension, where each node of the matrix take on binary values [30]. The feature map is trained by Kohonen’s selforganization learning. For the purpose of SOFM training, ten utterances of all the ten digits are used. The self-organization learning has the property that after training, physically similar input vectors in Euclidean space correspond to topologically close nodes in the feature map. The size of SOFM can be 12  12, 18  18, 24  24 or any other combination. Mapped features of SOFM for each digit are obtained using the SOFM weights. In the SOFM array shown in Fig. 2, black dots, white dots denote the values ‘1’ and ‘0’, respectively.

3. Diminishing learning technique In this section, the diminishing learning technique is proposed for SVM classifier. Eq. (7) shows that the computational complexity of the SVM classifier depends on the number of support vectors (NSVs). Support vectors are obtained during the learning phase of SVM based recognition system. One of the popular and simplest techniques used for multi-class classifier is the oneagainst-all (OAA) algorithm (also called as one-against-rest or one-against-remaining). This may be adapted for the SVM based

1679

isolated digit recognition system as shown in Fig. 4. In Fig. 4, the entire training dataset is used for forming a decision boundary between classes and the support vectors are obtained for each class. A parallel SVM classifier may be designed as shown in Fig. 5. The major advantage in this approach is that the boundary is always stringent with respect to each class and hence each classifier may recognize the input test data more accurately, thus improving the overall recognition accuracy. The disadvantage of this technique is that the number of support vectors may be increased due to the stringent boundary between classes. The number of support vectors may be reduced if a sequential SVM classifier is adapted, which operates in the following manner: Assume that the classifier tests an input test data in the order 0–9 sequentially. The classifier tests whether the test data is digit 0. If the test fails, it tests whether the input test data is digit 1. If the test fails again, it tests whether the data is digit 2 and this procedure is repeated till the digit is classified as one of the digits. However, the moment the test data is classified, no further classifications are required. Hence, 9 tests will be required for classification only in the worst case (if the test data correspond to digit 8/9). The advantage of this technique is that the decision boundary is same as that of the OAA classifier for the first test of the classifier. However, if the second test is required, a less-stringent boundary which separates the digit 1 and the set of digits 2–9 could be used, since the classifier knows that the digit is not 0. Similarly, if the third test is required, a lesser-stringent boundary which separates digit 2 and the set of digits 3–9 could be used, since the classifier knows that the digit is neither 0 nor 1. In general, if ith test is required, a further lesserstringent boundary which separates the digit ‘i1’ and the set of digits ‘i’ to 9 could be used. The SVM using sequential classification procedure is shown in Fig. 6, where the stringency between classes at each level gets diminished. The architecture of the sequential SVM classifier employing DL technique is shown in Fig. 7.

4. Experimental results of SVM–DL classifier

Fig. 4. Learning mechanism for SVM based recognition system.

A study of the SVM–DL classifier for isolated digit recognition system is carried out using both speaker dependent and multispeaker-dependent TI46 database [31]. Three different

Fig. 5. Parallel SVM classifier for OAA.

ARTICLE IN PRESS 1680

J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

Fig. 6. Learning mechanism for proposed SVM–DL classifier.

Fig. 7. Sequential SVM–DL classifier architecture.

datasets are used for SVM–DL: training, optimization and testing. The training dataset is used to train the SVM classifier, the optimization dataset is used to find the optimum classification order for classifier and the testing dataset is used to test the system. For speaker dependent case, ten utterances of each digit from 1 female speaker are used for training the system and eight utterances of test data for each digit from the same speaker at different sessions are used for finding the optimum classification order and eight more utterances for each digit are used for testing. For multispeaker-dependent case, the system is trained using ten utterances of each digit from three speakers, and is optimized and tested using the optimizing and testing tokens of each speaker, respectively. Feature extraction using LPC/MFCC and SOFM feature mapping have been implemented using C-code. The SOFM output data are fed as input to the SVM classifier during training and classification phase of the system. The SVM classification algorithm described in Section 2 is implemented using MATLAB [32] as well as C-code and the performance of the SVM based isolated digit recognition system is evaluated with LPC and MFCC feature inputs using various kernels. The recognition accuracy (RA) achieved and the number of support vectors (SVs) required for the SVM–OAA based isolated digit recognition system for various SOFM sizes with LPC feature input and speaker dependent case are tabulated in Table 2. It is observed that for both linear and non-linear kernels, the best recognition accuracy is achieved with SOFM of size 18  18 which

coincides with the result in [30] and hence SOFM of size 18  18 is used for further analysis. The confusion matrix for SVM–DL classifier using wavelet kernel (a¼10) for LPC feature inputs with SOFM of size 18  18 and speaker dependent case is given in Fig. 8. The confusion matrix gives the number of instances where each digit is correctly classified and also the instances where they are misclassified on using optimization dataset. For example, when digit 1 is tested using the classifier, out of 8 utterances of digit 1, it is correctly classified in 7 utterances and it is misclassified as 9 once. The confusion matrix in Fig. 8 corresponds to the SVM classifier classified in the default order 0123456789. Using the confusion matrix, a formal procedure for arriving at the optimum sequence in which the digits are to be tested in order to increase the recognition accuracy is proposed as follows: Let E(i,j) denote the number of utterances wrongly classified as digit ‘j’ when the input is digit ‘i’ and E(j) denote the number of utterances wrongly classified as digit ‘j’ for iaj with i,je{0,1,2,3,4,5,6,7,8,9}. If E(i,j) is larger, it indicates that the classifier for digit ‘j’ is less stringent. In order to increase the stringency, it should be tested before other digits for which E(i,j) is smaller. For example, in the confusion matrix given in Fig. 8, out of 6 instances of misclassification, 2 of them correspond to misclassification of digit 5 as digit 8 (i.e., E(5,8)¼2). This indicates that the stringency of classification for digit 8 needs to be increased. The sequential classifier has to test for digit 8 first so that more stringent classifier is used for digit 8

ARTICLE IN PRESS J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

Table 2 Recognition performance for various SOFM sizes. Kernel

Linear

Non-linear wavelet (a¼ 10)

SOFM size

Recognition accuracy (RA) (%)

No. of support vectors (SVs)

12  12 18  18 24  24 12  12 18  18 24  24

85 97 94 89 99 92

627 692 739 703 800 865

Output ↓ j 0 1 2 3 4 5 6 7 8 9

0 8

1

1681

2

3

Input i 4 5

6

7

8

1 7

8

9

7 8 8 7 7 1

1

1

1 8

1

6

Fig. 10. Confusion matrix for SVM–DL with classification order 8601234579.

Output ↓ j 0 1 2 3 4 5 6 7 8 9 Fig. 8. Confusion 0123456789.

0 8

2

3

Input i 4 5

6

7

8

Output ↓ j 0 1 2 3 4 5 6 7 8 9

9

7 8 8 7 6 8

1 8

2 1

matrix

Output ↓ j 0 1 2 3 4 5 6 7 8 9

1

0 8

for

1

1 7

1 SVM–DL

2

3

7

with

Input i 4 5

default

6

classification

7

8

order

9

8 8

1 8 7 7

1 8

1

2 6

1 8

0 8

1

2

3

Input i 4 5 6

7

7 1

1

8

9

7 8 8 1

8

1

8 7

1 8 6

Fig. 11. Confusion matrix for SVM–DL with classification order 8670123459.

Output ↓ j 0 1 2 3 4 5 6 7 8 9

0 8

1

2

3

Input i 4 5

6

7

8

1 7

8

9

8 8 8 8

1 6 1 8 1

7

6 Fig. 12. Confusion matrix for SVM–DL with classification order 8675012349.

Fig. 9. Confusion matrix for SVM–DL with classification order 8012345679.

(as mentioned earlier, the digit which is tested first has the most stringent boundary). If E(i,j) values corresponding to more than one pair of (i,j) are equal and have the largest values, then either value of ‘j’ may be chosen for improving the stringency. Fig. 9 gives the confusion matrix for SVM–DL on using the classification order 8012345679. From Fig. 9, it may be observed that, out of 6 instances of misclassification, 2 of them correspond to misclassification of digit 7 as digit 6 (i.e., E(7,6)¼2). This implies that after digit 8, digit 6 should be tested. The classification order is now modified to 8601234579 and the confusion matrix is given in Fig. 10. It is observed from Fig. 10 that, E(i,j)r1,8,i,j. Now, in order to decide the next digit to be included in the sequence for testing, we determine the digit for which the misclassification error is maximum (i.e., we find out the value of ‘j’ for which E(j) is the largest) and choose that as the next digit to be added in the sequence. For example, out of 6 instances of misclassification, 3 of them correspond to misclassification as digit 7 (i.e., E(7)¼3). The classification order is modified to 8670123459 and the confusion matrix is given in Fig. 11. It may be observed from Fig. 11 that 4 out of 5 misclassifications correspond to digit 5 (i.e., E(5)¼4). Hence, the classification order is modified to 8675012349 and the confusion matrix is given in

Fig. 12. From Fig. 12, it may be observed that there is no clue for further optimization in the classification order and hence the process is terminated. The above procedure for automatic determination of optimum classification order for an m-class SVM–DL classifier is given in Fig. 13 and it holds good for all the SVM kernels.

5. Performance of digit recognition system using different classifiers The performance of isolated digit recognition system using six multi-class classification schemes for SVM classifiers (SVM–DL, one-against-all (OAA), one-against-one (OAO), half-against-half (HAH#1 and HAH#2) and directed acyclic graph (DAG)) are evaluated using MATLAB and the results are presented and compared in this section. Fig. 14 shows the architecture of HAH#1, HAH#2 and DAG based SVM classifier for isolated digit recognition system. The OAO architecture is similar to DAG having one SVM binary classifier for each pair of classes, but DAG is a structured OAO classifier. For OAO classifier, the test input is fed to all the binary SVM classifiers and OAO finds the resultant class by choosing the

ARTICLE IN PRESS 1682

J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

Fig. 13. Flowchart for automatic determination of optimum classification order.

class voted by the majority of the classifiers. The architecture of OAA and SVM–DL classifier is given in Figs. 5 and 7, respectively. For the SVM–DL classifier, the recognition performance is obtained using the test dataset and assuming the optimum classification order. The total number of classifiers and the maximum number of classifiers required for the above six classifiers for an m-class problem are given in Table 3. It may be observed from Table 3 that HAH #1, HAH #2, SVM–DL, DAG, OAA and OAO require utmost 4, 5, 9, 9, 10 and 45 binary SVM classifiers, respectively, to classify any test digit. HAH#1 requires utmost 4 binary SVM classifiers to classify any test digit, and hence it is the fastest classifier when compared with other techniques. The disadvantage with HAH classifier is its reduced recognition accuracy for some cases. SVM–DL is the second fastest classifier which requires at least 1 classifier to utmost 9 classifiers to classify any test digit. Hence, the average number of classifiers required for SVM–DL is 4.5. The number of classifiers required for any test input is dynamic for SVM–DL and it depends on the characteristics of test input data as explained in Section 3.

DAG is a structured one-against-one (OAO) classifier with both OAO and DAG requiring m(m  1)/2 SVM classifiers for an m-class problem and hence OAO classifier is not used for further evaluation. The performance of the isolated digit recognition system using the five SVM classifiers is evaluated for various test cases using both LPC and MFCC feature inputs for speaker dependent and multispeaker dependent datasets. The recognition accuracy (RA) obtained and the number of support vectors (SVs) required for the SVM classifiers for different kernels corresponding to speaker dependent dataset and multispeaker dependent dataset (for three speakers) are tabulated in Tables 4 and 5, respectively. The recognition accuracy and the number of support vectors corresponding to multispeaker dependent dataset with five speakers for the most commonly used RBF kernel is given in Table 6. The following observations may be made from Tables 4 to 6:

 It may be observed from Tables 4 and 5 that HAH requires the least number of support vectors for all the kernels.

ARTICLE IN PRESS J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

1683

Fig. 14. Different SVM classifier architectures: (a) HAH#1 SVM classifier; (b) HAH#2 SVM classifier and (c) DAG SVM classifier.

Table 3 Number of binary classifiers required for different SVM classifiers. SVM classifier

OAO OAA DAG SVM-DL HAH #2 HAH #1

Table 4 Recognition performance for speaker dependent dataset.

Total no. of binary SVM classifiers required

Maximum no. of binary classifiers required to classify any test input

Input feature

m(m  1)/2 m m(m  1)/2 (m  1) (m  1) (m  1)

m(m  1)/2 m (m  1) (m  1) [average: (m  1)/2] m/2 ((m/2)  1)

LPC

OAA SVM-DL HAH#2 DAG HAH#1

MFCC

OAA SVM-DL HAH#2 DAG HAH#1

 For LPC feature input, it may be observed from Tables 4 and 5 that the performance of SVM–DL classifier is superior over HAH classifiers for all the kernels.

Kernel- RBF SVM k

Sigmoid

Wavelet

RA (%) SVs RA (%) SVs RA (%) 98.75 97.50 96.25 96.25 95.00 100.0 100.0 100.0 100.0 100.0

715 426 347 895 333 740 432 336 772 321

98.75 97.50 96.25 97.50 93.75 100.0 100.0 100.0 100.0 99.00

SVs RA (%) SVs

674 416 348 868 309

100.0 97.50 97.50 96.25 95.00

800 477 360 895 331

700 410 312 867 333

100.0 100.0 100.0 100.0 100.0

785 463 349 868 327

RA: recognition accuracy; SVs: number of support vectors.

Poly

92.50 85.00 83.75 73.75 82.50 100.0 98.75 97.50 97.50 97.00

881 515 369 890 335 915 502 366 870 328

ARTICLE IN PRESS 1684

J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

Table 5 Recognition performance for multispeaker dependent dataset (three speakers). Input feature

Kernel-

RBF

SVM k

RA (%)

LPC

OAA SVM-DL DAG HAH#1 HAH#2

MFCC

OAA SVM-DL DAG HAH#1 HAH#2

93.33 94.58 94.17 91.67 92.92 100.0 100.0 100.0 100.0 99.33

Wavelet SVs 1858 1412 2344 903 991 2251 1311 2270 793 883

RA (%) 94.58 92.50 93.33 91.67 92.08 100.0 100.0 100.0 100.0 99.67

Poly SVs 2342 1349 2624 976 1067 2153 1244 2271 921 890

RA (%) 92.08 92.50 89.17 87.92 89.17 100.0 99.58 100.0 99.67 99.33

SVs 2340 1339 2645 973 1069 2306 1288 2343 866 943

RA: recognition accuracy; SVs: number of support vectors.

Table 6 Recognition performance for multispeaker dependent dataset (five speakers). Input feature

LPC

MFCC

Kernel-

RBF

SVMk

RA (%)

SVs

OAA SVM-DL DAG HAH#1 HAH#2

91.25 83.00 86.00 81.25 75.25

4736 2669 4477 1693 1888

OAA SVM-DL DAG HAH#1 HAH#2

98.50 99.50 99.00 99.50 98.75

3996 2303 4347 1606 1812

RA: recognition accuracy; SVs: number of support vectors.

 It may be observed from Table 4 that the SVM–DL classifier









  

with LPC feature inputs reduces the number of support vectors by a factor ranging from 1.62 to 1.71 at the cost of 1.25–7.5% degradations in recognition accuracy, when compared with the OAA based SVM classifier. It may be observed from Tables 4 and 5 that for MFCC feature input, the recognition performance of all the classifiers are almost identical with each classifier requiring different number of support vectors to classify any test input. It may be observed from Table 4 that for speaker dependent data with MFCC feature input, the recognition accuracy of SVM–DL is better than that of HAH#1 and HAH#2 for some of the kernels and are equal for other kernels. From Table 5, it may be noted that for multispeaker dependent data with MFCC feature input, the overall recognition performance of HAH#1 using various kernels is better than that of SVM–DL and HAH#2. It may be observed from Tables 4 and 5 that the recognition performance of SVM–DL is at par with other classifiers and it can be considered to be competitive with other classifiers in terms of recognition accuracy and reduced number of support vectors. The sigmoid kernel could not classify for the multispeaker dependent test case and hence it is not given in Table 5. It may be observed from Table 6 that for LPC feature inputs, the recognition accuracy is the largest for OAA followed by DAG and SVM–DL. It may be observed from Table 6 that for MFCC feature inputs, both SVM–DL and HAH#1 have the same recognition accuracy and is the largest.

6. Conclusion In this paper, a new algorithm known as SVM–DL classifier is proposed for SVM based recognition systems. The proposed technique is applied to isolated digit recognition system for speaker dependent and multispeaker dependent cases with MFCC and LPC as feature inputs. It is observed that the recognition performance of SVM–DL technique is enhanced by using the optimum classification order for all the kernels. The performance of digit recognition system is also evaluated using other SVM classifiers and is compared with that of SVM–DL. From this study, it is found that HAH requires the least number of support vectors and is the fastest classifier. SVM–DL is the next fastest classifier requiring lower number of support vectors compared with all other classifiers except HAH. SVM–DL has better recognition accuracy compared with HAH for some of the kernels. The recognition accuracy of SVM–DL classifier is almost at par with that obtained using other SVM classifiers. The SVM–DL technique can be employed for other pattern recognition applications such as face recognition, character recognition, fingerprint recognition and target recognition.

Acknowledgment The authors would like to thank all the anonymous reviewers for their constructive comments on an earlier version of this paper. References [1] Christopher J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (1998) 121–167. [2] M. Martı´nez Ramo´n, Nan Xu, C.G. Christodoulou, Beamforming using support vector machines, IEEE Antennas and Wireless Propagation Letters 4 (2005) 439–442. [3] Mohamed S. Musbah, Xu Zhu, support vector machines for DS-UWB channel equalisation, in: IEEE International Conference on Wireless Communications, Networking and Mobile Computing (WiCom’07), September 2007, pp. 524–527. [4] M. Julia Ferna´ndez-Getino Garcı´a, Jose´ Luis Rojo-A´lvarez, Support vector machines for robust channel estimation in OFDM, IEEE Signal Processing Letters 13 (7) (2006) 397–400. [5] Q. Fengyan, Changchun Bao, Yan Liu, A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech, in: IEEE International Symposium on Chinese Spoken Language Processing (ISCSLP’04), December 2004, pp. 77–80. [6] Andrew W. Moore, Support Vector Machines, School of Computer Science, Carnegie Mellon University. [7] M. Johnson, J. Gunderson, A. Penman, T. Huang, HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06), vol. 3, May 2006, pp.1060–1063. [8] W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek, Phonetic Speaker Recognition with Support Vector Machines, MIT Lincoln Laboratory, Lexington, MA 02420. [9] Ramon Fernandez-Lorenzana, Fernando Perez-Cruz, Jose Miguel GarciaCabellos, Carmen Pelaez-Moreno, Ascension Gallardo-Antolin, Fernando Diaz-de-Maria, Some Experiments on Speaker-Independent Isolated Digit Recognition Using SVM Classifiers, Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Spain. [10] Jing Li, Nigel Allinson, Dacheng Tao, Xuelong Li, Multitraining support vector machine for image retrieval, IEEE Transactions on Image Processing 15 (11) (2006) 3597–3601. [11] Y. Rui, T.S. Huang, S. Mehrotra, Content-based image retrieval with relevance feedback in MARS, in: IEEE International Conference on Image Processing, vol. 2, Washington, DC (October 1997), pp. 815–818. [12] Dacheng Tao, Xiaoou Tang, Xuelong Li, Xindong Wu, Asymmetric bagging and random subspace for support vector machines based relevance feedback in image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (7) (2006) 1088–1099. [13] Jiaqi Wang, Xindong Wu, Chengqi Zhang, Support vector machines based on K-means clustering for real-time business intelligence systems, Journal of Business Intelligence and Data Mining 1 (1) (2005) 54–64. [14] T. Joachims, Making Large-scale SVM Learning Practical. Advances in Kernel Methods – Support Vector Learning, MIT Press, Cambridge, MA, 1999, pp. 169–184.

ARTICLE IN PRESS J. Manikandan, B. Venkataramani / Neurocomputing 73 (2010) 1676–1685

[15] E. Osuna, R. Freund, F. Girosi, Training support vector machines, in: Conference on Computer Vision and Pattern Recognition, CVPR’97, 1997, pp. 130–136. [16] John C. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines, Technical Report MSR-TR-98-14, Microsoft Research, 1998. [17] K.W. Lau, Q.H. Wu, Online training of support vector classifier, Journal of Pattern Recognition 36 (2003) 1913–1920. [18] Jonathan Milgram, Mohamed Cheriet, Robert Sabourin, One against one or one against all: which one is better for handwriting recognition with SVMs? E´cole de Technologie Supe´rieure, Montre´al, Canada. Available from: /http:// www.livia.etsmtl.ca/publications/2006S. [19] Hansheng Lei, Venu Govindraju, Half-against-half multi-class support vector machines, Springer Book Series, Lecture Notes in Computer Science, 2005, pp. 156–164. [20] John C. Platt, Nello Cristianini, John Shawe-Taylor, Large Margin DAGs for Multiclass Classification, MIT Press, 2000. [21] Nello Cristianini, John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, 2000. [22] Simon Haykin, Neural Networks, second ed., Prentice-Hall of India, 2003. [23] Edwin K.P. Chong, Stanislaw H. Zak, An Introduction to Optimization, Wiley Interscience Publications, 2004. [24] Li Zhang, Weida Zhou, Licheng Jiao, Wavelet support vector machine, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics 34 (1) (2004). [25] Corrina Cortes, V. Vapnik, Support vector networks, Journal of Machine Learning (1995). [26] Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of Speech Recognition, Prentice-Hall Signal Processing Series, 1993. [27] Ben Gold, Nelson Morgan, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley and Sons Inc, 2000. [28] Daniel Jurafsky, James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, New Jersey, 2008. [29] Martin T. Hagan, Howard B. Demuth, Mark Beale, Neural Network Design, PWS Publishing Company, 1996. [30] Zezhen Huang, Anthony Kuh, A combined self-organizing feature map and multilayer perception for isolated word recognition, IEEE Transactions on Signal Processing 40 (11) (1992) 2651–2657.

1685

[31] /http://www.ldc.upenn.edu/Catalog/readme_files/ti46.readme.htmlS. [32] The Mathworks Inc. – MATLAB, /www.mathworks.comS.

J. Manikandan, received the B.E. degree in Electronics and Communication from Madras University in 2000 and the M.E. degree in Communication Systems from Regional Engineering College (REC), Trichy, India in 2002.He worked as a Scientist at Aeronautical Development Agency (ADA), DRDO Lab under the Ministry of Defence, Bangalore, India from April 2002 to July 2007. Since August 2007, he is with National Institute of Technology, Trichy (Formerly known as REC, Trichy) and is a recipient of IBM Scholarship. He has published numerous papers in international conferences. His research interests include pattern recognition techniques related to speech, automatic target recognition and FPGA based system design.

B. Venkataramani received the B.E. degree in Electronics and communication engineering from Regional Engineering College, Tiruchirappalli, India, in 1979 and the M.Tech. and Ph.D. degrees in electrical engineering from Indian Institute of Technology, Kanpur, India, in 1984 and 1996, respectively. He worked as Deputy Engineer in Bharath Electronics, Ltd., Bangalore, India, and as a Research Engineer in the Indian Institute of Technology, each for approximately three years. Since 1987, he has been with the faculty of the National Institute of Technology, Trichy (Formerly known as Regional Engineering College, Trichy). Currently he is the Professor and Head of the Electronics and Communication Department. He has published two books and numerous papers in journals and international conferences. His current research interests include field-programmable gate array (FPGA) and system on a single chip (SOC)-based system design and performance analysis of high-speed computer networks.

Suggest Documents