Wavelet-Based Feature Extraction for Support Vector ... - IEEE Xplore

4 downloads 0 Views 777KB Size Report
Early identification of at-risk gait can help prevent falls and injuries. ... R. K. Begg is with Biomechanics Unit, Center for Ageing, Rehabilitation, Ex- ercise and Sport ...... of Centre of Expertise on Networked Decision & Sensor Systems. He is the.
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 15, NO. 4, DECEMBER 2007

587

Wavelet-Based Feature Extraction for Support Vector Machines for Screening Balance Impairments in the Elderly Ahsan. H. Khandoker, Member, IEEE, Daniel T. H. Lai, Rezaul K. Begg, Senior Member, IEEE, and Marimuthu Palaniswami, Senior Member, IEEE Abstract—Trip related falls are a prevalent problem in the elderly. Early identification of at-risk gait can help prevent falls and injuries. The main aim of this study was to investigate the effectiveness of a wavelet based multiscale analysis of a gait variable [minimum foot clearance (MFC)] in comparison to MFC histogram plot analysis in extracting features for developing a model using support vector machines (SVMs) for screening of balance impairments in the elderly. MFC during walking on a treadmill was recorded on 13 healthy elderly and 10 elderly with a history of tripping falls. Features extracted from MFC histogram and then multiscale exponents between successive wavelet coefficient levels after wavelet decomposition of MFC series were used as inputs to the SVM to classify two gait patterns. The maximum accuracy of classification was found to be 100% for a SVM using a subset of selected wavelet based features, compared to 86.95% accuracy using statistical features. For estimating the relative risk of falls, the posterior probabilities of SVM outputs were calculated. These results suggest superior performance of SVM in the detection of balance impairments based on wavelet-based features and it could also be useful for evaluating for falls prevention intervention. Index Terms—Elderly, falls risk, gait, minimum foot clearance, support vector machines (SVMs), wavelet.

I. INTRODUCTION

I

T has been well documented in the literature that ageing influences gait patterns which in turn affects the control mechanism of human locomotor balance. Falls in the elderly might be linked to declines in the balance control function due to ageing. While some research in ageing gait has investigated time-distance variables (e.g., walking speed, stance/swing times, step length) [1] to identify key variables of gait degeneration in the elderly, it has been suggested that more sensitive gait variables such as minimum foot clearance (MFC) during walking over the walking surface should be used to describe age-related declines in gait in an effort to find predictors of falls risk [13]. Minimum foot clearance, which occurs during the midswing phase of the gait cycle, is defined as the minimum vertical distance between the lowest point under the front part of the shoe/

Manuscript received October 12, 2006; revised March 25, 2007; accepted June 5, 2007. This work was supported by the Australian Research Council Linkage under Grant LP0454378. A. H. Khandoker, D. T. H. Lai, and M. Palaniswami are with Department of Electrical and Electronic Engineering, The University of Melbourne, Melbourne, VIC 3010, Australia (e-mail: [email protected]; [email protected]; [email protected]). R. K. Begg is with Biomechanics Unit, Center for Ageing, Rehabilitation, Exercise and Sport, Victoria University, Melbourne, VIC 8001, Australia (e-mail: [email protected]). Digital Object Identifier 10.1109/TNSRE.2007.906961

foot and the ground, and has been identified as a potential gait parameter associated with trip-related falls in older population [13]. This is mainly because of the fact that during this MFC event, the foot travels very close to the walking surface and MFC fluctuation has the potential to cause tripping, especially for unseen obstacles. Falls in older population has been identified as a major health issue in Australia, costing the community $2.4 billion per annum [3]. Among the various fall types, tripping during walking has been identified to account for more than 50% of all falls [4]. Therefore, a model for early assessment of the risk of falls is critical to reducing the incidence of falls. Several studies have attempted to predict falls prospectively, with varying results. For example, studies by Topper et al. [5] and Maki et al. [6] used measures of static posturography to indicate risk of falls. Results from their work showed that control of mediolateral sway may be a strong predictor of falls in the elderly. Other studies have used medio-lateral centre of mass motion during obstacle crossing to identify elderly individuals with balance impairment [43]. In our previous study, statistical measures of the MFC were used to detect balance impairment in the elderly [15], in which it was demonstrated that different features in the gait data could carry different information affecting the accuracy of detection. It has been reported that human gait dynamics is a complex and nonlinear process [8], [42], which may not be represented by only statistical features. In recent years, wavelet analysis has proven to be a powerful multiscale resolution technique [7], well suited to understanding the complex features of real world processes like biological signals with nonlinearity, nonstationarity, oscillatory behavior, and trends. With reference to gait dynamics, it has been reported that relationship of gait variability from lower to higher scale exhibits intrinsic properties of healthy human locomotor system, and breakdown of such a relation would indicate gait pathology and dysfunction [8]. This suggests a strong rationale for the application of a wavelet based multiscale analysis on the MFC signals to extract gait features. In order to facilitate automated recognition of gait patterns related to pathology, neural networks and fuzzy clustering techniques have been applied for classification of normal and pathological gait [20], [21]. Recently, support vector machines (SVMs) have emerged as a powerful technique for general purpose pattern recognition. It has been applied to classification and regression problems with exceptionally good performance on a range of binary classification tasks [22]–[27]. The primary advantage of SVM is its ability to minimize both structural and

1534-4320/$25.00 © 2007 IEEE

588

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 15, NO. 4, DECEMBER 2007

Fig. 1. Schematic diagram of SVM-based gait diagnostic model. Four main steps in designing a diagnostic system for automated diagnosis of balance impairments are shown.

empirical risk [26] leading to better generalization for new data classification even with limited training data set. In our previous studies, SVM technique was successfully applied for automated recognition of young/old gait patterns using temporal and distance measures, kinetic and kinematic variables [27] and also using histogram and Poincaré plot features relating to MFC data [16]. Continuing on from these studies, it is hypothesized that an SVM model would be suitable for constructing relationships between MFC gait features and the respective gait types, i.e., healthy elderly and the balance impaired elderly. Thus, in this study, applying SVMs for automated screening of gait patterns related to balance impairments using wavelet based features is proposed. II. METHODS Schematic representation of an automated diagnostic system for the detection of balance impairments using MFC measurements is shown in Fig. 1. In the following, a brief description of MFC data collection and feature extraction techniques is given, followed by performance evaluation measures for the SVM models. A. MFC Gait Data MFC data from 13 healthy elderly (mean age: 67.5 years, mean height: 170 cm, mean weight: 63.2 kg)and 10 elderly (mean age: 68.2 years, mean height: cm, mean weight: 66.9 kg) with a history of 166 falls (a history of falls was defined as an occurrence of more than one fall) were taken from Victoria University’s (VU) Biomechanics Unit database. All subjects undertook informed-consent procedures as approved by the Victoria University Human Research Ethics Committee. The detailed procedure for gait data collection has been described elsewhere [13]. Briefly, foot clearance (FC) data were collected during steady state self-selected walking on a treadmill using a 2-D Motion Analysis system (Vicon Motus, Oxford, U.K.). A 50-Hz Panasonic F15 video s, was positioned 9 m camera, with a shutter speed of from the treadmill, perpendicular to the plane of foot motion to record unobstructed treadmill walking. Two reflective markers were attached to each subject’s left shoe laterally at the fifth metatarsal head and the great toe. Each subject completed about 10–20 min of normal walking at a self-selected comfortable walking speed. The total number of gait cycles analysed per subject (i.e., the number of MFC data) varied across the subjects due to their individual walking speed. However, for feature extraction purposes, the first 512 continuous gait cycles (and hence MFC data points) were used. The foot markers were automatically digitized for the entire walking task and raw data was digitally filtered using a Butterworth filter with cutoff frequencies

ranging from 4 to 8 Hz [13]. The marker positions and shoe dimensions were used to predict the position of the shoe/foot end-point i.e., the position on the shoe travelling closest to the ground at the time when MFC occurs using a 2-D geometric model of the foot [13]. The MFC of each stride was calculated by subtracting ground reference from the minimum vertical coordinate in the swing phase [13], [16]. B. Gait Feature Extraction Using MFC Histograms Each subject’s MFC data were plotted as histograms showing individual MFC data and their respective frequencies. Two examples of MFC histograms are shown in Fig. 2. Features describing major statistical characteristics of these distributions were extracted as 1) Q1 (25th percentile), 2) Q2 (median or 50th percentile), 3) Q3 (75th percentile), 4) MEAN (mean MFC), 5) STD (standard deviation of MFC), 6) MIN (minimum MFC), 7) MAX (maximum MFC), 8) SKEW (skewness) 9) KURT (kurtosis) 10) DEVSQ (sum squared differences from mean MFC). 11) HARMEAN (harmonic mean) 12) GEOMEAN (geometric mean). Mean ( standard deviation) of all features extracted from histograms of healthy and balance impaired elderly are shown in Table I. In order to provide the relative importance of features, receiver-operating curve (ROC) analysis was used [28], [29], with the areas under the curves for each feature represented by the letter R. An R value of 0.5 means that the distributions of the variables are similar in both populations. Conversely, an R value of 1.0 would mean that the distributions of the variables of the two populations do not overlap at all. C. Feature Extraction Using Wavelet-Based Multiscale Exponents of MFC A scale invariant concept [11] was proposed to describe the of fractal signals by the following power spectrum density, empirical equation [10], [11] as (1) where is the frequency, is the variance of the original signal, and exponent is the spectral component (the slope that gets the spectral density over several decades of frequency). In particular, the exponent is 0 for white noise and 2 for Brownian motion. Recently, the discrete wavelet transform method based on orthonormal wavelet decomposition was proposed to estimate the exponent [10]. Here, this approach is explained as applied to MFC data. The decomposition of MFC ) transform gives a serial list of detailed series by wavelet ( coefficients, which represent the evolucoefficients named at tion of the correlation between the series and the chosen different ranges of frequencies. Daubechies wavelet with order

KHANDOKER et al.: WAVELET-BASED FEATURE EXTRACTION FOR SUPPORT VECTOR MACHINES

589

Fig. 2. MFC histogram plots of one healthy elderly (A) and one balance impaired elderly (B).

TABLE I

STATISTICAL FEATURES (MEAN

6SD) FROM MFC HISTOGRAM PLOT FOR HEALTHY (H) AND BALANCE IMPAIRED (I) ELDERLY GROUP. R = Area OF ROC CURVE

6 as function [9] was used in this study. For each record coefficients were calculated on sets of of MFC signals, the 512 MFC data points, giving eight separate levels of analysis 2, 4, 8, 16, 32, 64, 128, and named 256. As an example, Fig. 3 displays the decomposition of MFCs of a healthy and a balance impaired subject. Then, the variability power, level by level, was calculated as variances of is represented the coefficients. The frequency band (f) of with level at scale . Thus, the as in (1) can replace the variance of . The variance of at each scale can be given for the orthonormal wavelet decomposition as follows:

were calculated for over Multiscale exponents . Mean ( 512 samples of MFC data in the range of standard deviation) of all multiscale exponents calculated from MFC data of healthy and balance impaired elderly are shown in Table II. D. SVM Gait Diagnostic Model

(2)

In this study, SVM models [12] were considered for constructing the relationship between features extracted from MFC signals and the presence or absence of balance impairments. The SVM developed by Vapnik et al. [12] has been shown to be a powerful supervised learning tool. The standard soft-margin SVM is a binary classifier applied to classify a data set defined as

where exponent is calculated from the log-scale plot of the variance versus the resolution (i.e., level ). The variance of the wavelet coefficient at scale is given as

(5)

(3) is the mean of where the number of samples of the

at scale and at scale

represents

(4)

where are vectors containing the measurements of gait data and are the corresponding class labels. The SVM formulation is essentially a regularized minimization problem leading to the use of Lagrange Theory and quadratic programming techniques. The formulation defines a boundary separating two classes in the form of a linear hyperplane in data space where the distance between the boundaries of the two classes and the hyperplane is known as the margin of the hyperplane. This idea is further extended for data that is not linearly separable; where it is first mapped via a nonlinear function to a higher dimension

590

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 15, NO. 4, DECEMBER 2007

Fig. 3. Wavelet decomposition of MFC event series into eight levels for (A) a healthy elderly subject and (B) a balance impaired elderly subject during 512 samples. The range of y axis is 3 to 3 for all decomposition plots.

0

MULTISCALE EXPONENTS ( ) (4) (MEAN

TABLE II

6SD) BETWEEN SUCCESSIVE WAVELET COEFFICIENT LEVELS FOR HEALTHY (H) AND BALANCE IMPAIRED (I) ELDERLY GROUP. R = Area OF ROC

feature space. Maximizing the margin of the hyperplane in either space is equivalent to maximizing the distance between the class boundaries. In Fig. 4, an optimal separating hyperplane is shown as the one that generates the maximum margin (dashed line) between the two data sets. The separating hyperplane has the following form:

. have a one to one correspondence with the elements of Now for separation in feature space, we would like to obtain the hyperplane with the following properties:

(6)

(7)

where are the weights of the hyperplane and the scalar is the hyperplane bias. In fact, the nonlinear mapping and defines the mapping from data space to feature space. Hence, the weights in feature space will

The conditions above can be described by a strict linear discriminant function, so that for each element pair in , we require that (8)

KHANDOKER et al.: WAVELET-BASED FEATURE EXTRACTION FOR SUPPORT VECTOR MACHINES

591

the SVM model namely, the wavelet exponents of the MFC signal and the MFC histogram based statistics, respectively, la(H), beled with the diagnosed gait types [ impaired (I)]. These features were used to represent the differences between normal and pathological gait patterns (tripping). All SVM architectures were trained and tested on the D2CSVM software which is an iterative decomposition algorithm for solving the SVM quadratic program. Further algorithm details can be found in [18] and [19]. E. Performance Testing of SVM Model

+1 01

Fig. 4. An example of two-class ( & ) problem with optimal separating hyperplane and the maximum margin. The circles and squares represent samples of class and , respectively.

+1

01

The distance from the hyperplane to a support vector is and the distance between the support vectors of one class to the by geometry. The soft-margin miniother class is simply mization problem relaxes the strict discriminant in (8) by introducing slack variables, and is formulated as

(9) We now apply Lagrange Theory (details can be found in Vapnik [12]) to solve (9) giving us the usually solved dual Lagrangian form of

(10) where is a constant parameter, called regularization parameter, that determines the trade off between the maximum margin and minimum classification error. The explicit definition of the has been circumvented by the use of nonlinear mapping the kernel function defined as (11) The separating hyperplane surface in (6) can now be written in terms of Lagrange Multipliers as (12)

The process of obtaining the quadratic program solution is known as training the SVM and using the trained SVM to classify new examples is known as testing. In this application two types of gait features were separately presented as input to train

The optimal SVM parameter set ( and kernel parameters) was determined by using a leave-one-out procedure [40] which is the recommended cross validation test if the dataset is not too large. In this scheme, the dataset was divided into 23 subsets each consisting of 22 training examples and a single test example. First, a subset was used to train the SVM model while the remaining data example was used for testing. The process was repeated for the other subsets so that in the end each example had been tested. Accuracy results of each test were then combined to obtain an average accuracy known also as the leave one out accuracy which was used to measure the generalization performance of the SVM. The optimal SVM parameters correspond to the SVM model which gave the highest leave one out accuracy. Experiments were conducted over various C values and the three kernel types i.e., linear, polynomial, and Gaussian radial basis function (RBF). These kernels have the following forms, respectively. . 1) Linear: 2) Polynomial: is the degree of polynomial. 3) Gaussian radial basis function (RBF): is the width of RBF function. Classification outcomes were represented using accuracy rates and also ROC curves. ROC plots have been used in many investigations [14] to gauge the predictive ability of a classifier over a wide range of threshold values. The predicted output of the SVM in response to an unknown gait pattern is used to generate the ROC curves. A threshold value was applied such that an output below the threshold was assigned into a healthy category whereas a value equal to or above the threshold was assigned into balance-impaired category. Threshold values were calculated by dividing the range of values (min to max) into 23 equally spaced thresholds where 23 is the total number of training examples. Sensitivity is defined as a measure of the ability of the classifier to identify a balance-impaired gait, whereas Specificity is a measure of the classifier to detect healthy gait characteristics. ROC curve plots sensitivity against (1-specificity) as the threshold decision level of the classifier is varied. ROC curves were plotted using results to examine qualitatively the effect of threshold variation on the classification performance. Furthermore, the ROC areas were approximated numerically using the trapezoidal rules where the larger the ROC area the better the classification accuracy.

592

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 15, NO. 4, DECEMBER 2007

F. Feature Selection Generalization performance of a classifier depends primarily, among other factors, on the success of a selection of good features i.e., features that represent maximal separation between the classes [27]. A hill-climbing feature selection algorithm [16] was used to identify features that provided the most contribution in separating the two classes. This algorithm iteratively searches for features that positively improves classification accuracy. Initially the single best feature was picked according to the highest individual area of ROC over all features. The single best feature was found to be STD for statistical features and for wavelet based features. Then the next best feature was iteratively added so as to maximize classification accuracy. This technique was repeated until all the features have been added to the fixed feature set in the order of their importance. G. Estimation of Posterior Probability of SVM Outputs Constructing a classifier to produce a posterior probability, P(class input) can be useful in estimating the relative status of a case (to be in a class) over all cases. In this study, the estimated were posterior probabilities of SVM output, calculated from values of (13) using the method described by Platt [17] for SVM classifiers. Briefly, in this method the following parametric sigmoid was fitted to the outputs of SVM classifier: (14) The parameters A and B are determined from minimizing the negative log likelihood of the training data which has the form of a cross entropy error function [17] (15) where

Fig. 5. Dependence of % classification accuracy on the number of features selected by “hill-climbing” feature selection algorithm using gaussian RBF ( = 0:1), linear and polynomial (d = 3) kernels. (A) Best subset of histogram based statistical features containing Q1, Q3, STD, MIN that provided maximum accuracy. (B) Best subset for wavelet based features contains ; ; ; .

al. [17]. MATLAB pseudocode described in Platt [17] was implemented in this study. III. RESULTS

The target probabilities can be determined using Baye’s rule. positive examples, then the maxAssume that there are imum a posterior (MAP) estimate of the positive examples can be determined from the following equation: (16) For negative examples, the estimate target probability of negative examples would be (17) The values of are obtained using the leave-one-out procedure instead of the cross validation method described in Platt et

Fig. 2 is an example of MFC distribution of a healthy elderly (A) and a balance impaired elderly subject (B). These plots reveal some obvious qualitative differences between two subjects such as differences in variability and MFC central tendency. Features extracted from these plots were used to train SVM and later on test their capability to discriminate the two elderly groups. decomposition of the MFC event series of a typical healthy elderly (A) and falls risk elderly (B) subject is illusdecomposition has been trated in Fig. 3. The method of described in Section II-C. An enhancement of the coefficients , and 128 in a balance impaired for the levels elderly can be visually noted, compared with a healthy elderly subject, but none of the coefficients are significantly different ) between the two groups. However, (student test; are all estimated multiscale exponents except . significantly different between two groups

KHANDOKER et al.: WAVELET-BASED FEATURE EXTRACTION FOR SUPPORT VECTOR MACHINES

593

TABLE III CLASSIFICATION PERFORMANCE OF SVM CLASSIFER WITH DIFFERENT KERNERLS (LINEAR, GAUSSIAN RBF, AND POLYNOMINAL) FOR DIFFERENT REGULARIZATION PARAMETER (C) AND NUMBER OF FEATURES. ACC = ACCURACY, SENS = SENSITIVITY, SPEC = SPECIFICITY, D = DEGREE OF POLYNOMIAL,  = WIDTH OF RBF NETWORK

ROC curves were built separately for each feature (Tables I and II). The best statistical feature was found to be STD (stanand dard deviation of MFC data) the best wavelet based feature was . Fig. 5 presents classification accuracy plotted as a function of features selected by (hill-climbing) feature selection algodisplayed overall better rithm. Polynomial kernel performance (Max % for statistical features and 100% for wavelet features) relative to linear and gaussian RBF kernels. The important statistical features selected by the algorithm to achieve maximum accuracy (see Fig. 5(A) and Table II) were: Polynomial kernel- Q1, Q3, STD, MIN; Linear kernel- STD, MAX, and MIN; Gaussian RBF- STD, MAX, MIN, GEOMEAN. On the other hand, the important wavelet based features selected by the algorithm to achieve maximum accuracy (see Fig. 5(B) and Table III) were: Polynomial kernel; Linear kernel; Gaussian RBF. Overall, it emphasizes that all classifiers were able to discriminate well when trained with a subset comprising a few good features. Table III displays the classification performance (overall accuracy, sensitivity, and specificity) of the SVM classifier using different kernels as a function of number of gait features and also for three C values (0.1,1,10). Accuracy was at best 86.95% with sensitivity 100% and specificity 70% (in polynomial

, and degree 4 for ) when kernel of degree 3 for the four best statistical features (Q1, Q3, STD, MIN) were used in the SVM inputs. However, the accuracy rate reached 100% , and degree 3 (in polynomial kernel of degree 2 for for ) when four best wavelet based features were used to train the SVM. When all statistical or wavelet features were used separately, classification performance deteriorated. It can be inferred that the polynomial kernel performs better discrimination of healthy/balance impaired gait pattern. Sensitivity and specificity results showed higher mean sensitivity (mean: 91.58%; range: 76.92%–100%) compared to their specificity (51.43%; 0%–70%) across all kernels when statistical features were used. On the other hand, wavelet-based features demonstrated the opposite trend with lower mean sensitivity (70%; 0%–100%) and higher mean specificity (90.10%; 69.3%–100%). For comparison purposes, the ROC areas for different kernels using statistical and wavelet features were examined. The results are presented in Fig. 6 and Table IV. The trend observed in the figure shows that polynomial kernel gives a better separation using any combination of features. Besides the detection of balance impairments, it is interesting to look at the influence of observations from the available sample on the probability of being balance impaired or at risk of tripping falls. In order to estimate the relative falls risk for each individual, the posterior probabilities of SVM classifier outputs were calculated. The posterior probabilities (shown in

594

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 15, NO. 4, DECEMBER 2007

Fig. 7. Posterior probability estimate P (y = +1jf (x)), for all subjects with balance impaired elderly (1–10), and healthy elderly (11–23), calculated from SVM output values for each class (14).

IV. DISCUSSION

Fig. 6. ROC curves showing sensitivity (true positive) and 1-specificity (false positive) for various thresholds using linear, polynomial (d = 3), and Gaussian RBF ( = 0:1) kernels; (A) using selected four statistical features, (B) using selected four wavelet based features by the feature selection algorithm. See text and Fig. 5 for details on feature selection and the area under the ROC curves.

TABLE IV ROC AREA OF LINEAR, POLYNOMIAL (d = 3) AND GAUSSIAN RBF ( = 0:1) KERNELS FOR c = 10 AND TOP FOUR STATISTICAL AND WAVELET FEATURES SELECTED BY THE ALGORITHM

Fig. 7) that represent probability of being a balance impaired elderly show clear separation between the two elderly groups with clustering of points in two distinct “clouds.”

Wavelet analysis as a sophisticated signal processing and feature enhancement technique has been successfully applied in clinical diagnosis [37]. Specifically, it has become a powerful alternative for the analysis of nonstationary signals whose spectral characteristics change significantly over time. This is very important in biological signal analysis since most of the statistical characteristics of these signals are nonstationary. In practice, the wavelet transform method has been reported to be appropriate for the analysis of biological signals consisting of different frequency (high, low, very low) components [33]. Sekine et al. [35] studied the complexity of the body acceleration signal during walking quantified by the fractal dimension [34] using wavelet-based fractal analysis. In addition, it was applied to identify structural differences in medio-lateral and anterior–posterior sway between centre of pressure traces of healthy and Parkinson s patients [36]. West et al. [41] constructed fractional Langevin equation to model the underlying motor control system during walking providing additional evidence for the weakly multifractal nature of gait. In this study, MFC data were used from steady-state gait because it provides a more sensitive measure of the motor function of the locomotor system compared to some gross overall kinematic descriptions of gait such as joint angular changes. The MFC signal recorded during continuous walking represents both the complex pattern and the nonstationary property [13]. The multiscale exponent of the MFC signals, which represents correlations of the variances of the wavelet coefficients at successive scales, was calculated. These exponents provide valuable information about the variance progression over the wavelet scales. This multiscale exponent may be used for quantifying a dynamical property of human gait control under various conditions and it has been applied to MFC signals for the first time. This study was designed to test the ability of a SVM model for screening the risk of falls in the elderly using wavelet based

KHANDOKER et al.: WAVELET-BASED FEATURE EXTRACTION FOR SUPPORT VECTOR MACHINES

multiscale features compared to MFC histogram based statistical features. The results of this study demonstrated that using a subset of selected four wavelet based features and using a leave gives 100% acone out method, the polynomial kernel . On curacy with the value of regularization parameter, the other hand, the maximum accuracy was found to be 86.95% when statistical features were used, which indicates that the SVM model based on multiscale exponents (by wavelet analysis) of MFC gait data performs better than the model based on MFC statistical features. It also suggests that SVM polynomial kernel gives superior performance when applied to healthy/balance-impaired gait patterns. Previous research on automated gait classification has used neural networks and fuzzy clustering techniques for applications in diagnosis of pathological gait [20], [39]. In a recent study by Hahn et al. [38], an artificial neural network model which was designed to detect balance impairments using EMG of lower extremities, temporal-distance measures of gait and medio-lateral motion of the whole body center of mass, achieved 89% classification accuracy. This work reports a superior gait classification performance which suggests that SVM appears to be a better alternative for automated pathological gait diagnosis and also for such applications as monitoring the progress of treatment or intervention outcomes of gait in clinical and rehabilitation situations. However, classification performance of SVM models based on statistical features showed less efficiency than that of study made by Hahn et al. [38]. In our previous studies [16], [27] of young/old gait classification, the problem of feature selection for young/old gait pattern classification using the SVM was addressed and demonstrated that only a handful of properly selected features (3–5) were necessary for effective classification. Adding new information, in the form of weak features, was found to actually degrade performance [16], [27]. Hence, our primary approach of using the hill climbing feature selection algorithm was to minimize the effects of feature noise and redundancy. The feature selection was shown in this study to be an integral part of designing an accurate classifier. For example, when all wavelet features were used as inputs, the highest accuracy obtained from the SVM model was only 86.95% (Table III, Fig. 5). Classification performance of the SVM also depends on the selection of the regularization parameter, i.e., as demonstrated in Table III. As mentioned in (9) and (10), C is the penalty parameter for misclassification and has to be carefully selected to achieve maximum classification accuracy. Regularization parameter C provides a balance between classification violation and margin maximization. A high C can minimize training error but will also compromise margin separation. Table III also emphasizes that optimal value of C could be different for different number of features and has to be selected by trial and error. In addition to classifying all individuals into the categories of “healthy,” or “balance impaired,” the posterior probability of being balance impaired was calculated in which the level of relative risk of falls was estimated for each individual. The aim was to calibrate the SVM classifier output in the form of a numeric that represents the level of risk of falls due to tripping over

595

obstacles to elderly individuals while walking. Such a measure could then also be applied to evaluate the effectiveness of the existing exercise program undertaken by the balance impaired individuals. As the standard SVM does not provide such probabilities, the output of the SVM classifier, therefore, needs to be a calibrated posterior probability so that it could be used as an indicator of risk estimation. Wahba [31] proposed a logistic link function and a negative log multinomial likelihood function to obtain a classifier that gave probabilistic outputs. Vapnik [12] proposed mapping the outputs of the SVM by fitting the probability using a sum of cosine terms while Hastie and Tibshirani [32] utilized Gaussian functions. For this application, an SVM was first trained and then the parameters of an additional sigmoid function were trained to map the SVM outputs into probabilities using the method described by Platt [17]. This idea is also very similar to the sigmoid function used for the training of ANN in another study [30]. The posterior probabilities (shown in Fig. 7) indicate an elderly falls risk and can provide an estimate of the severity of falls risk (0 indicates very healthy gait whereas a value close to 1 indicates highly impaired balance). However, the relative severity of falls risk for the individual subjects was not quantified beforehand, therefore, preventing us to validate risk estimation from the present analysis. In our future work, it is planned to follow these elderly individuals who participated in this study with regard to their tripping falls frequencies and gait MFC measures so that a direct validation of our results could be made. Nevertheless, it is interesting to examine the risk estimation level of a healthy elderly subject (13 in Fig. 7) who has not fallen previously, but is beginning to show sign of imbalance. Besides falls risk estimation, another application of such probability measures from a classifier could be to investigate the effectiveness of any exercise program applicable to balance impaired subjects by monitoring the change in the probability data. For the diagnosis of pathological gait, like balance impairments, processing and extraction of gait features that correlate well with that particular pathology can be thought of as an important step in designing a diagnostic model. Investigation into combining MFC data with other types of gait features (e.g., stride-to-stride time and distance, foot–ground reaction forces, joint/muscle moments, electrical activity of lower limb muscular contractions) might be useful to understand the control strategies in relation to balance impairments and should be followed in future studies. The area under the ROC curve provides a measure of overall performance of the classifier i.e., larger the ROC area the better is the classification accuracy over a range of thresholds. The generalization performance of the SVM was evaluated by ROC plots (Fig. 6) for the three types of kernels for different threshwas 1.0 olds. As the ROC area for polynomial kernel (Table IV), it correctly recognized all healthy and balance impaired subjects (in the leave-one-out test) using wavelet based gait features, which indicates that our approach may have clinical utility. The output of our model is a diagnostic conclusion as to existence or nonexistence of balance impairments and then estimation of the risk of falls risk, which could be applied to assess the effect of falls prevention intervention by monitoring the change of probabilistic output

596

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 15, NO. 4, DECEMBER 2007

In conclusion, this study demonstrated the effectiveness of wavelet based feature extraction from MFC signals for SVM models for recognizing an elderly subject of proper balance control as being within healthy ranges, or having risk of falls to be categorized as a balance impaired. Findings of this study were based on a relatively small number of impaired subjects and healthy peers. Further validation of classification and relative risk estimation task is suggested in a larger, more diverse sample of healthy and balance impaired falls risk elderly adults, which may subsequently lead us to the design of a more robust automated diagnostic model for falls risk estimation. The significance of this study is that it provides a technique for early estimation of relative risk of falls in the elderly and holds great potential for quantifying balance-improving interventions aimed at reducing the risk of falls in older adults during locomotion. ACKNOWLEDGMENT MFC gait data for this study were taken from VU Biomechanics database. Several people have contributed to the creation of the gait database, especially Simon Taylor of the VU Biomechanics Unit. REFERENCES [1] K. M. Ostrosky, J. M. VanSwearingen, R. G. Burdett, and Z. Gee, “A comparison of gait characteristics in young and old subjects,” Physical Therapy, vol. 74, pp. 637–646, 1994. [2] M. G. Karst, A. P. Hageman, F. T. Jones, and S. H. Bunner, “Reliability of foot trajectory measures within and between testing sessions,” J. Gerontol. Med. Sci., vol. 54, pp. 343–347, 1999. [3] B. Fildes, Injuries Among Older People: Falls at Home and Pedestrian Accidents. Melbourne, FL: Dove, 1994. [4] T. M. Owings, M. J. Pavol, K. T. Foley, and M. D. Grabiner, “Exercise: Is it a solution to falls by older adults?,” J. Appl. Biomech., vol. 15, pp. 56–63, 1999. [5] A. K. Topper, B. E. Maki, and P. J. Holliday, “Are activity-based assessments of balance and gait in the elderly predictive of risk of falling and/or type of fall?,” J. Am. Geriatrics Soc., vol. 41, pp. 479–487, 1993. [6] B. E. Maki, P. J. Holliday, and A. K. Topper, “A prospective study of postural balance and risk of falling in an ambulatory and independent elderly population,” J. Gerontol., vol. 49, pp. M72–M84, 1994. [7] M. Akay, “Introduction: Wavelet transforms in biomedical engineering,” Ann. Biomed. Eng., vol. 23, pp. 529–530, 1995. [8] J. M. Hausdorff, P. L. Purdon, C.-K. Peng, Z. Ladin, J. Y. Wei, and A. L. Goldberger, “Fractal dynamics of human gait: Stability of longrange correlations in stride interval fluctuations,” J. Appl. Physiol., vol. 80, pp. 1448–1457, 1996. [9] I. Daubechies, “Orthonormal bases of compactly supported wavelets,” Commun. Pure Appl. Math., pp. 909–996, 1988. [10] W. G. Wornell and A. V. Oppenheim, “Estimation of fractal signals from noisy measurement using wavelets,” IEEE Trans. Signal Process., vol. 40, no. 3, pp. 611–623, Mar. 1992. [11] P. Flandrin, “On the spectrum of fractional Brownian motions,” IEEE Trans. Inform. Theory, vol. 35, no. 1, pp. 197–199, Jan. 1989. [12] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed. New York: Springer, 2000. [13] R. K. Begg, R. J. Best, S. Taylor, and L. Dell’Oro, “Minimum foot clearance during walking: Strategies for the minimization of trip-related falls,” Gait Posture, vol. 25, no. 2, 2007. [14] K. Chan, T. W. Lee, P. A. Sample, M. H. Goldbaum, R. N. Weinreb, and T. J. Sejnowski, “Comparison of machine learning and traditional classifiers in glaucoma diagnosis,” IEEE Trans. Biomed. Eng., vol. 49, no. 9, pp. 963–74, Sep. 2002. [15] R. K. Begg, D. Lai, S. Taylor, and M. Palaniswami, “SVM-based models in the assessment of balance impairments,” in 3rd IEEE Int. Conf. Intell. Sensing Inf. Process., Banglore, India, Dec. 14–17, 2005, pp. 248–253.

[16] R. K. Begg, M. Palaniswami, and B. Owen, “Support vector machines for automated gait classification,” IEEE Trans. Biomed. Eng., vol. 52, no. 5, pp. 828–838, May 2005. [17] J. Platt, , A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, Eds., “Probabilistic outputs for support vector machines and comparison to regularized likelihood methods,” in Advances in Large Margin Classifiers. Cambridge, MA: MIT Press, 2000. [18] D. Lai, M. Palaniswami, and N. Mani, A new method to select working sets for decomposition methods solving support vector machines Tech. Rep., MECE-30-2003, 2003. [19] D. Lai, M. Palaniswami, and N. Mani, A basic heuristic decomposition framework for training support vector machines Tech. Rep., MECSE-26-2005, 2005. [20] S. H. Holzreiter and M. E. Kohle, “Assessment of gait pattern using neural networks,” J. Biomech., vol. 26, pp. 645–651, 1993. [21] M. J. O’Malley, M. F. Abel, D. L. Damiano, and C. L. Vaughan, “Fuzzy clustering of children with cerebral palsy based on temporal distance gait parameters,” IEEE Trans. Rehabil. Eng., vol. 5, no. 4, pp. 300–309, Dec. 1997. [22] N. Zavaljevski, F. J. Stevens, and J. Reifman, “Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions,” Bioinformatics, vol. 18, pp. 689–696, 2002. [23] S. Ben-Yacoub, Y. Abdeljaoued, and E. Mayoraz, “Fusion of face and speech data for person identity verification,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1065–1074, Sep. 1999. [24] O. Chapelle, P. Haffner, and V. N. Vapnik, “Support vector machines for histogram-based classification,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1055–1064, Sep. 1999. [25] C. H. Q. Ding and I. Dubchak, “Multi-class protein fold recognition using support vector machines and neural networks,” Bioinformatics, vol. 17, pp. 349–358, 2001. [26] S. Gunn, Support vector machines for classification and regression University of Southampton, , UK, ISIS Tech. Rep., 1998. [27] R. K. Begg and J. Kamruzzaman, “A machine learning approach for automated recognition of movement patterns using basic, kinetic and kinematic gait data,” J. Biomech., vol. 8, pp. 401–408, 2005. [28] J. A. Hanley and B. J. McNeil, “The meaning and use of the area under receiver operating characteristic (ROC) curve,” Radiology, vol. 143, pp. 29–36, 1982. [29] J. A. Hanley and B. J. McNeil, “A method of comparing the areas under receiver operating characteristic curves derived from the same cases,” Radiology, vol. 148, pp. 839–843, 1983. [30] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks: A tutorial,” IEEE Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996. [31] G. Wahba, “Multivariate function and operator estimation, based on smoothing splines and reproducing kernels,” in Nonlinear Modeling and Forecasting. SFI Studies in the Sciences of Complexity., M. Casdagli and S. Eubank, Eds. New York: Addison-Wesley, 1992, pp. 95–112. [32] T. Hastie and R. Tibshirani, “Classification by pairwise coupling,” Ann. Stat., vol. 26, no. 2, pp. 451–471, 1998. [33] R. Fischer and M. Akay, , M. Akay, Ed., “Fractal Analysis of heart rate variability,” in Time Frequency and Wavelets in Biomedical Signal Processing. Piscataway, NJ: IEEE, 1998, pp. 719–728. [34] B. B. Mandelbrot, The Fractal Geometry of Nature. New York: Freeman, 1983. [35] M. Sekine, T. Tamura, M. Akay, T. Fujimoto, T. Togawa, and Y. Fukui, “Discrimination of walking patterns using wavelet-based fractal analysis,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 10, no. 3, pp. 188–96, Sep. 2002. [36] C. J. Morales and E. D. Kolaczyk, “Wavelet-based multifractal analysis of human balance,” Ann. Biomed Eng., vol. 30, no. 4, pp. 588–597, Apr. 2002. [37] P. C. Ivanov, M. G. Rosenblum, C.-K. Peng, J. Mietus, S. Havlin, H. Eugene, and A. L. Goldberger, “Scaling behaviour of heartbeat intervals obtained by wavelet-based time- series analysis,” Nature, vol. 383, pp. 323–332, 1996. [38] M. E. Hahn and L. S. Chou, “A model for detecting balance impairment and estimating falls risk in the elderly,” Ann. Biomed. Eng., vol. 33, no. 6, pp. 811–820, Jun. 2005. [39] M. J. O’Malley, M. F. Abel, D. L. Damiano, and C. L. Vaughan, “Fuzzy clustering of children with cerebral palsy based on temporal-distance gait parameters,” IEEE Trans. Rehabil. Eng., vol. 5, no. 4, pp. 300–309, Dec. 1997. [40] B. D. Ripley, Pattern Recognition and Neural Networks. Cambridge, U.K.: Cambridge Univ. Press, 1990.

KHANDOKER et al.: WAVELET-BASED FEATURE EXTRACTION FOR SUPPORT VECTOR MACHINES

[41] B. J. West and M. Latka, “Fractional langevin model of gait variability,” J. Neuroeng. Rehabil., vol. 2, p. 24, 2005. [42] B. J. West and N. Scafetta, “Nonlinear dynamical model of human gait,” Phys. Rev. E. Stat Nonlin Soft Matter Phys., vol. 67, p. 051917, 2003, Epub. [43] L. S. Chou, K. R. Kaufman, M. E. Hahn, and R. H. Brey, “Medio-lateral motion of the center of mass during obstacle crossing distinguishes elderly individuals with imbalance,” Gait Posture, vol. 18, pp. 125–133, 2003. Ahsan Khandoker received the B.Sc. in electrical and electronic engineering from Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh, in 1996, the M.Eng.Sc. degree from Multimedia University (MMU), Cyberjaya, Malaysia, in 1999, and the M.Engg. degree and Doctor of Engineering degree in physiological engineering from Muroran Institute of Technology (MIT), Muroran, Japan, in 2001 and 2004, respectively. Currently, he is an ARC Research Fellow at University of Melbourne, Melbourne, Australia, where he researches in the area of mathematical processing and machine classification of physiological signals. His research interests focus on the diagnosis of sleep disordered breathing, gait analysis and its pattern recognition, biomedical instrumentation, artificial intelligence techniques in physiological modeling, and perinatal cardiac physiology. He has published over 25 articles in journals, conferences, and book chapters. He maintained industrial linkage at Compumedics Pty Ltd., Melbourne, Australia. He chaired a number of conference sessions and was on the Technical Program Committee for several major international conferences. Dr. Khandoker has received several awards including Monbusho scholar medal in Japan.

Daniel T. H. Lai received the B.Eng. degree in electrical and computer systems and the Ph.D. degree from Monash University, Melbourne, Australia, in 2002 and 2006, respectively. He is currently a Research Fellow in the University of Melbourne, Melbourne, Australia. His research interests include decomposition techniques for support vector machines, application of signal processing, computational intelligence techniques, and wireless sensor networks to biomedical engineering applications.

597

Rezaul Begg (M’93–SM’06) received the B.Sc. and M.Sc. Eng. degrees in electrical and electronic engineering from Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh, and the Ph.D. degree in biomedical engineering from the University of Aberdeen, Aberdeen, U.K. Currently, he is an Associate Professor of Biomedical Engineering at Victoria University, Melbourne, Australia where he researches in biomedical engineering, biomechanics and machine learning areas. Previously, he worked with BUET and Deakin University. His current research focuses on gait and movement analysis, sensor networks for ageing healthcare and computational intelligence techniques and their applications in various biomedical domains. He has authored/coauthored over 100 research publications in journals and conferences, and four books. He is a regular reviewer for a number of international journals and was on the Technical Program Committee for several major international conferences. Dr. Begg is a recipient of the BUET Gold Medal and the Chancellor’s award for academic excellence.

M. Palaniswami (SM’95) received the B.E.(Hons.) degree from the University of Madras, Chennai, India, the M.E. degree from the Indian Institute of Science, Bangalore, India, the M.Eng.Sc. degree from the University of Melbourne, Melbourne, Australia, and the Ph.D. degree from the University of Newcastle, Sydney, Australia. He has been serving the University of Melbourne for over 16 years. He has published more than 180 refereed papers and a huge proportion of them appeared in prestigious IEEE journals and conferences. His research interests include SVMs, sensors and sensor networks, machine learning, neural network, pattern recognition, signal processing and control. He is the convener for Australian Research Network on Intelligent Sensors, Sensor Networks, and Information Processing (ISSNIP). He is the Co-Director of Centre of Expertise on Networked Decision & Sensor Systems. He is the Associate Editor for International Journal of Computational Intelligence and Applications and International Journal of Information Processing. He is also the Subject Editor for International Journal on Distributed Sensor Networks. Dr. Palaniswami was given a Foreign Specialist Award by the Ministry of Education, Japan in recognition of his contributions to the field of Machine Learning. He served as Associate Editor for journals/transactions including IEEE TRANSACTIONS ON NEURAL NETWORKS AND COMPUTATIONAL INTELLIGENCE FOR FINANCE.