IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 4, APRIL 2014
1167
Self-Correcting Pattern Recognition System of Surface EMG Signals for Upper Limb Prosthesis Control Sebastian Ams¨uss, Student Member, IEEE, Peter M. Goebel, Student Member, IEEE, Ning Jiang, Member, IEEE, Bernhard Graimann, Liliana Paredes, and Dario Farina∗ , Senior Member, IEEE
Abstract—Pattern recognition methods for classifying user motion intent based on surface electromyography developed by research groups in well-controlled laboratory conditions are not yet clinically viable for upper limb prosthesis control, due to their limited robustness in users’ real-life situations. To address this problem, a novel postprocessing algorithm, aiming to detect and remove misclassifications of a pattern recognition system of forearm and hand motions, is proposed. Using the maximum likelihood calculated by a classifier and the mean global muscle activity of the forearm, an artificial neural network was trained to detect potentially erroneous classification decisions. This system was compared to four previously proposed classification postprocessing methods, in both able-bodied and amputee subjects. Various nonstationarities were included in the experimental protocol to account for challenges posed in real-life settings, such as different contraction levels, static and dynamic motion phases, and effects induced by day-to-day transfers, such as electrode shifts, impedance changes, and psychometric user variability. The improvement in classification accuracy with respect to the unprocessed classifier ranged from 4.8% to 31.6%, depending on the scenarios investigated. The system significantly reduced misclassifications to wrong active classes and is thus a promising approach for improving the robustness of hand prosthesis controllability. Index Terms—Artificial neural networks (ANNs), myoelectric control, pattern recognition (PR), robustness, upper limb prostheses.
Manuscript received September 10, 2013; revised November 25, 2013; accepted December 16, 2013. Date of publication December 23, 2013; date of current version March 17, 2014. This work was supported by the European Commission via the Industrial Academia Partnerships and Pathways under Grant 251555 (AMYO) and conducted within the Bernstein Focus Neurotechnology G¨ottingen. Asterisk indicates corresponding author. S. Ams¨uss and N. Jiang are with the Department of Neurorehabilitation Engineering, Georg August University, 37073 G¨ottingen, Germany (e-mail:
[email protected];
[email protected]. de). P. Goebel is with Otto Bock Healthcare Products GmbH, 1060 Vienna, Austria (e-mail:
[email protected]). B. Graimann is with Otto Bock Healthcare GmbH, 37115 Duderstadt, Germany (e-mail:
[email protected]). L. Paredes is with the Laboratorio di Robotica e Cinematica, Fondazione Ospedale San Camillo, 30126 Venice, Italy (e-mail: liliana.paredes@ ospedalesancamillo.net). ∗ D. Farina is with the Department of Neurorehabilitation Engineering, Georg August University, 37073 G¨ottingen, Germany (e-mail: dario.farina@bccn. uni-goettingen.de). Digital Object Identifier 10.1109/TBME.2013.2296274
NOMENCLATURE Surface electromyogram. Pattern recognition. Majority voting. Artificial neural network. Linear discriminant analysis. Root mean square. Average RMS of all channels. Trust index. ANN (proposed approach) with globally optimizing parameters. ANN-IND ANN (proposed approach) with individually optimized parameters for each subject and day. LDA-MV LDA with majority voting postprocessing. LDA-RJNM LDA with rejection option to no movement. LDA-RJRM LDA with rejection option and remain in the last accepted class. WP Wrist pronation. WS Wrist supination. WE Wrist extension. WF Wrist flexion. HO Hand open. KG Key grip. FP Fine pinch. NM No movement. MLVC Maximum long-term voluntary contraction. tAcc Total accuracy. aAcc Active accuracy. sEMG PR MV ANN LDA RMS RM S TI ANN-GO
I. INTRODUCTION HILE pr of sEMG signals has been regarded as a promising approach for decoding the motor intent of amputees to control multifunctional and highly dexterous upper limb prostheses for many decades [1]–[3], to date no clinically viable system has been implemented and commercialized using PR control [4]. Several algorithms have been investigated by a variety of research groups in order to achieve direct control of multiple DOF prostheses (see [5] and [6] for a comprehensive overview and [7] for the related taxonomy). However, due to the unsatisfactory performance of these existing algorithms in reallife settings, recently several enhancements have been proposed. For example, Nishikawa et al. [8], Sensinger et al. [9], Tommasi et al. [10], and Chen et al. [11] proposed adaptive learning schemes, attempting to include online recorded data into the
W
0018-9294 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
1168
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 4, APRIL 2014
classifier’s training dataset in order to account for changes in the testing data with respect to the training data, induced, e.g., by electrode shifts [12], arm position changes [13], fatigue [14], impedance changes, time-related effects [15], movement strategy changes of the user (mutual adaptation) [16], and psychological factors [17]. Although the presented ideas are promising, the unsupervised addition of data to the training set is a big challenge which remains an open problem. Hargrove et al. [18] applied multiple binary classifications as a different approach, which was extended by Scheme et al. [19]. The basic idea of these approaches is to accept higher overall classification errors but with a decrease of the errors which would lead to false activations of the prosthesis. Because they require correcting movements, this type of error was shown to be more frustrating to the user than unintended short pauses of the artificial limb movement (false classifications to no motion class) [18]. In addition to tuning the PR algorithms themselves, several preprocessing and postprocessing steps have been proposed as well. For preprocessing, sEMG whitening [20], spatial filtering [21], and principal component analysis [22] have been used in order to enhance PR accuracies. Different postprocessing methods of the classification result have also been proposed. For example, Englehart and Hudgins [23] introduced MV of the class outputs. With this method, some misclassifications could be successfully removed, with the expense of an additional time delay in switching between classes. This method is simple to implement and has been shown to present a viable option to enhance controllability, if a good tradeoff between the delay and the gain in prosthesis controllability is found. Another interesting approach, termed the velocity ramp, has been proposed by Simon et al. [24]. The basic idea of this approach is to accept misclassifications but to diminish their effect on the prosthesis movement. Every class has its own specific speed coefficient, which is increased if the classifier decision is in favor of this class or decreased otherwise. Therefore, short inconsistent classification decisions had low impact on the prosthesis movements, whereas consistent decisions allowed for faster movements. The obvious limitation of this approach is the increased perceived response time for the user. Thus, a tradeoff between controllability gain and responsiveness of the system has to be found, similar to MV. In this study, a novel postprocessing method for PR algorithms is proposed, aiming to identify and remove misclassifications made by the classifier, thus providing the means for a self-correction mechanism. To achieve this goal, an ANN was trained on information about the global muscle contraction level and prior classifier outputs, so it can infer the reliability of the classifiers decision. The rest of this paper is organized as follows. In Sections II-A and II-B, the novel algorithm is introduced and an overview of other postprocessing methods is given, to which the results achieved with the proposed method were compared. In order to assess the performance of the introduced methods, sEMG data were acquired from able-bodied and amputee subjects, as described in Section II-C. In Sections III and IV, the results of applying the discussed methods to these data are presented and discussed, respectively.
II. MATERIALS AND METHODS A. Background In previous experiments, we observed that misclassifications often occurred during class transitions and dynamic movement phases, as investigated deeper in [25]. LDA is a popular choice for myoelectric PR algorithms due to its efficient implementation resulting in real-time applicability and it has been proven to achieve similar or better classification results than more complex classification methods [5], [19]. Therefore, LDA is chosen as the classifier in the current study. It was observed that misclassifications usually had a low corresponding likelihood value obtained by the LDA classifier, which was also exploited in [26]. Essentially, LDA is a parametric classifier that calculates the Mahalanobis distance dm ahal to the centroids μω in the feature space of Ω pretrained classes ω ∈ {1, . . . , Ω}, using a pooled covariance matrix Σ [27]. Hence, the calculated likelihoods are inversely proportional to dm ahal , i.e., the lower the distance to μω , the higher the likelihood of this point belonging to class ω. The detailed derivation of the LDA as used in this study can be found in [26]. The likelihood of a feature vector x belonging to class ω can be calculated as follows: f (x, ω) = wωT x + cω
(1)
wω = Σ−1 μω and cω = μTω Σ−1 μω .
(2)
with
The value of (1) has to be calculated for each class, which can be done very efficiently in matrix form by assembling wω and cω in matrices. B. Proposed Method In this study, LDA was used as a base classifier and it was attempted to enhance its performance by finding an implicit relation between the calculated classification likelihoods, the average muscle activity, and the estimated probability of correctness of a classifier decision using an ANN. The network was trained to produce a certain output for correct and incorrect decisions of the LDA on a training set. In order to apply the system to a new feature vector, the sample was first classified by the LDA and its decision would be either accepted as the classification result if the ANN produced a high confidence value for this decision, or rejected and the previously accepted class would be maintained in case of low confidence produced by the ANN. The rational for this rule is based on the observation that the vast majority of intended user motions will be very consistent between two adjacent time windows of 50 ms increment. Only in a small fraction of classification decisions, the user will actually want to switch from one class to another. It is therefore reasonable to apply this prior knowledge to the classification stream. The average RMS value of all channels at time window t was calculated as RM S (t) =
C 1 RMS (t)c C c=1
(3)
¨ AMSUSS et al.: SELF-CORRECTING PATTERN RECOGNITION SYSTEM OF SURFACE EMG SIGNALS FOR UPPER LIMB PROSTHESIS CONTROL
Fig. 1. Schematic structure of the proposed approach. A standard classification procedure of sEMG signals of the forearm is performed and an MLP ANN is applied to provide correction mechanism. The mean RMS values of all channels and the maximum likelihoods calculated by the LDA classifier were presented to the network. Based on the output of the ANN, a TI was calculated. This index was used to determine whether the classification decision was accepted or rejected and the previous class was maintained instead.
with the RMS of the window t being calculated as: 1 2 RMS (t)c = ζl l
(4)
classifier was trained with a certain training dataset (depending on the investigated scenario, described in detail in Section II-E.). Then, for the purpose of training the ANN, the same data were classified by the trained classifier. The ANN was trained for each feature vector to output +1 if the classification was correct and –1 if it was incorrect. In this way, the same set of data was used to train both the classifier and the ANN; therefore, no additional training data or data selection was necessary for the proposed approach. It was observed that using separated subsets of the training data for LDA training and ANN training (maintaining the same size of the whole training set) yielded slightly worse results and was therefore not pursued further in this study. The testing dataset for evaluating the performance of the entire system (LDA + ANN correction) was always separated from the training and validation set; thus, the previous choices do not impact the set of data on which the results were reported (testing data). A TI was calculated as a function of the output of the ANN, n (t): TI (t) = |TI (t − 1)|α n (t) + β (t)
l
where C is the total number of sEMG channels, cε {1 · · · C} , RM S (t)c is the RMS value of channel c, computed from a time interval centered at the time t and with l samples, and ζl the instantaneous sEMG value measured at channel c. The sEMG RMS value provides an approximate estimate of the force produced by the muscles contributing to this sEMG signal [28], [29]. A multilayer perceptron ANN was used with feedforward structure and one hidden layer, comprising eight neurons with hyperbolic tangent sigmoid transfer function. By supplying the network with the information of the RM S (t) and the calculated maximum likelihood value of the LDA, the ANN was trained to find incorrect decisions and thus facilitate correction of these mistakes. The network was provided with the current values and also with those of the previous 10 time steps (equal to a history of 500 ms) of the inputs to account for the detection of motion transition phases. This number of time steps was found to yield the best tradeoff between latency and accuracy in pilot experiments. Hence, the temporal relation between sEMG feature vectors was explicitly accounted for, which is in contrast to most PR methods which only regard instantaneous feature vectors without considering the time evolution of the sEMG signal. Thus, the network had 22 input neurons with linear transfer functions. There was only one output neuron producing continuous output. The inputs were not normalized, since normalization did not yield better results. The output neuron had a hyperbolic tangent sigmoid transfer function, limiting the output range within [−1; 1]. The Levenberg–Marquardt back-propagation training method was applied for the training of the network. The structure of the proposed system including the neural network is illustrated in Fig. 1. Since Levenberg–Marquardt optimization does not guarantee finding a globally optimal solution [30], for each experiment, five ANNs were trained and the one with the best performance on the validation set was further used. It was found that typically four of the five networks yielded very similar results on the validation set and one was considerably worse. The LDA
1169
(5)
with
⎧ ⎪ ⎨ β(t − 1) + β(t) = 0, ⎪ ⎩ β(t − 1),
1 200
,
if LDA class output change if TI(t) − TI(t − 1) > 0.5 otherwise.
(6) In (5), TI (0) and β (0) were initialized to 0.5 and 0, respectively. The factor α acts as smoothening factor and its optimal value was determined by varying it in nine steps between 0.1 and 0.9. Consistent LDA output between adjacent time windows was regarded as indication for trustworthiness of this decision. An integration factor β (t) was introduced to reflect this increasing confidence with a rate of 0.1 per second. On the other hand, if the LDA output was inconsistent or a large decrease of TI (t) (defined as > 0.5) was observed, β (t) was modified accordingly to reflect the insecurity of the LDA decision. The exact rate of increase of β (t) was found to be not critical in pilot experiments, due to the application of a threshold θTI to the TI. If TI (t) was below a threshold θTI , the class label calculated by the LDA was discarded and replaced by the previously accepted class decision. The output class label was not altered if TI (t) was above θTI . The raw network output and the processed TI (t) are illustrated in Fig. 2, together with the effect of α on TI (t). It can be observed that n (t) and TI (t) share the same trend and that α smoothens TI (t) effectively. The threshold θTI was varied in the range between 0.01 and 0.99 in 100 even steps. The pseudooptimal values for α and θTI were determined employing grid search at the previous specified step sizes and the best combination was used for two cases. 1) ANN-GO (ANN with globally optimizing α and θTI for TI (t)). The globally optimizing combination of α and θTI , which yielded the best average performance across all subjects and days, was used. This parameter setting could be used “off the shelf” for any new application. 2) ANN-IND (ANN with individually optimized parameters for each subject and day). Similar to ANN-GO, with the difference that the optimizing value pairs of α and θTI
1170
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 4, APRIL 2014
Fig. 2. Examples of TI(t) for the same network output n (t) (top graph) with different smoothing factors α (graphs 2–4). As an example, in section (1), the filtering effect of α becomes apparent. In the plateau phases of the network output, such as in section (2), the effect of integrating with a constant factor over time can be observed. It can be seen that choosing large values of α results in better smoothing but also in an increased delay of response. The same threshold value is shown as dash-dotted line in each TI (t) graph for reference.
3)
4)
5)
6)
were determined individually for each experiment. Therefore, this approach required individual optimization but was expected to yield the better results than ANN-GO. We compared the two proposed methods to four other algorithms that have been reported in the literature. LDA: The classification accuracy of the LDA classifier without any postprocessing was used as the base line for comparison with all other algorithms [5]. LDA-MV: An MV of the LDA output was performed. In a preliminary study, the use of nine windows was found to yield the best tradeoff between accuracy gain and the maximum induced reaction delay (250 ms in the case of nine votes [23]). LDA-RJNM: This method was proposed by Scheme et al. [26]. The calculated LDA class likelihoods are linearized and their sum normalized to 1. If the maximum probability of a class was lower than 0.97, classifier’s decision was rejected and no motion was output instead of an active class. The threshold of 0.97 was reported as the optimal value in [26] and was not further modified in this study. LDA-RJRM: Here, we use a minor alteration to LDARJNM by not relabeling an unreliable classification result to no motion class, but rather to the last accepted classification output. Short glitches in classification output could therefore be removed without causing the prosthesis to perform a discontinued movement. This method thus uses the same strategy for relabeling a detected misclassification as the proposed method.
C. Subjects The experimental protocol was approved by the local ethics committees and each participant signed an informed consent. Four male transradial amputee subjects (ages 25, 28, 29, and 64 years) with medium stump lengths participated in the experiment. In addition, seven able-bodied subjects (five male, two
Fig. 3. Trapezoidal contraction reference line was shown to the subject on a computer screen along with visual and auditory cues on which movement to perform. RM S (t) was calculated online during data recording and presented to the subject as an overlay of the reference line for biofeedback. The current position was highlighted with a cursor (diamond). Subjects were asked to follow the reference line to the best of their ability with the cursor for repeatable dynamic movement phases and static contraction levels. Here, an example of a 60% MLVC tracking performance is shown. The raw sEMG signal was not presented to the subject and is added here for reference, scaled to fit the illustration.
female, age 25.4±1.4 years) participated as controls. Eight commercially available double differential electrodes (13E200 = 50AC Otto Bock Healthcare Products GmbH, Vienna, Austria) were placed equidistantly around the forearm of the subjects, approximately 6.5–7 cm distal to the olecranon of the elbow. For each amputee subject (two right and two left side), an individual hard shaft prosthesis socket was manufactured by an orthopedic technician, housing the electrodes. This setup was therefore very close to a realistic situation of data acquisition in amputees. No controlled repositioning was feasible for amputee subjects because the electrodes were occluded by the shaft. For ablebodied subjects, their dominant side (right for all) was used and a custom, spring loaded mounting device held the electrodes in place with slight pressure. The exact locations of the electrodes were marked using a skin friendly, sweat and water resistant pen and renewed when necessary for accurate repositioning of the electrodes in able-bodied subjects. D. Experimental Setup During the experiment, the subjects were seated and instructed to hold their elbows flexed in a 90◦ , while resting the dorsal side of the upper arm against the backrest of their seats. The subjects were instructed to perform the following eight movement classes: WP, WS, WE, WF, HO, FP, KG, and NM, while sEMG were recorded. To start each movement, the subject was presented with a visual cue on a computer screen, displaying a picture of the prompted motion along with a caption and an according audio instruction. A trapezoidal contraction force reference line (except NM, which had flat line) was shown to the subject along with biofeedback in the form of a cursor, which moved with time along the x-axis and its amplitude was set to the RM S (t) value. The subjects were asked to follow the trapezoidal reference lines to the best of their abilities with the cursor for controlled and repeatable contraction forces. The plateau phase of the contraction had to be reached within 1 s, then held steadily for 3 s and then the motion was to be released slowly within 1 s. Therefore, each movement lasted 5 s (see Fig. 3).
¨ AMSUSS et al.: SELF-CORRECTING PATTERN RECOGNITION SYSTEM OF SURFACE EMG SIGNALS FOR UPPER LIMB PROSTHESIS CONTROL
Each movement was repeated five times at three plateau contraction levels: 30%, 60%, and 90% of MLVC, which was defined as the maximum RM S (t), which the subject could hold for 30 s for a specific movement. Therefore, the reference lines were specifically adapted for each movement and subject, according to their respective MLVC levels. The biofeedback represented the instant contraction level of a performed movement with respect to its calibrated MLVC. The biofeedback itself was, however, not calculated movement specifically, but it was merely the average RMS of all channels (see (4)). One session was comprised of 3 × 5 × 8 = 120 movements. With sufficient pauses to prevent fatigue, each session lasted for about 45 min. All subjects completed two such sessions per day (half an hour break between sessions, 2 h in total) on five consecutive days (resulting in 10 sessions per subject). E. Signal Acquisition and Processing The acquired raw signals were amplified to a range of 0– 4.5 V and filtered in the bandwidth 20–450 Hz, with inclusion of a 50 Hz notch filter by the active Otto Bock electrodes. The filtered signals were then sampled at 1 kHz with a 10 bit A/D converter and transferred to a computer via Bluetooth by the Axonmaster (Otto Bock HealthCare Products GmbH). For the LDA classifier, four widely used time-domain features, namely RMS, zero crossings, slope sign changes, and waveform length, were used [26], [31]–[34]. The features were calculated in intervals of 128 ms and a frame increment of 50 ms (78 ms overlap). In order to account for the different challenges that are imposed on a prosthetic system that has to be clinically viable in everyday use, several nonstationarities were included in the analyzed dataset. 1) The entire contraction length (i.e., including movement onset, stable phase, and relaxation) was used for training and testing the classifiers; thus, the dynamic data portions known to influence myoelectric controllers’ performance were included [25]. 2) Three different plateau contraction force levels were exerted, thus comprising different contraction levels that would change signal features [35]. 3) The training sets (used for LDA training, ANN training, and validation) were the recordings from day 1, 2, 3, or 4 and the corresponding testing sets from day 2, 3, 4, or 5, respectively. Training and testing sets were thus separated by one day. Therefore, day-to-day transfer effects, such as electrode repositioning effects, impedance changes, and psychometric changes of the subjects, were included. These factors are known factors affecting the performance of myoelectric controllers in real-life usage (see references in Section I). The obtained results were compared to a reference scenario in which the training and testing set were from the same session, using fivefold cross validation. 4) Since NM is usually recognized correctly at a very high rate, it does not add useful information on the algorithm performance and was therefore excluded from the classification to avoid biased results. However, misclassifications to NM of other classes were taken into account.
1171
Fig. 4. Example of correcting erroneous LDA outputs using ANN-GO in WS and WP. Especially during on and offset of a movement, misclassifications occurred by the classifier, visible as spikes in the class output of the LDA (top part, light gray line). Using TI(t) together with the threshold θT I , (middle part) most of these glitches were successfully removed (top part, black line). The movement prompts shown to the user are depicted by the black-dotted line for reference.
The classification accuracy was calculated offline and two different accuracy types were investigated: the tAcc and the aAcc, calculated as # correct classifications · 100 # total classifications
(7)
# correct active classifications · 100 # total active classifications
(8)
tAcc =
aAcc =
where activeclassifications are decisions that would lead to prosthetic limb movement. Hence, in aAcc, misclassifications that require correcting motions by the user are regarded as more severe than misclassifications to NM (which result in stopping of the prosthesis, but do not require compensatory movements). However, a trivial system that only outputs NM would yield 100% aAcc; therefore, aAcc and tAcc always have to be considered together, which was previously discussed and proposed similarly by Hargrove et al. [18] and Scheme and Englehart [36]. F. Statistical Analysis A single factor analysis of variance (ANOVA) for repeated measures was used to investigate the performance of different algorithms. The algorithm factor had six levels (the six algorithms listed earlier) and subjects and days as random variables. The null hypothesis was that there was no difference among performance of the algorithms. A Tukey HSD post hoc analysis was conducted to assess pairwise differences between algorithms. If not indicated otherwise, all results are presented as mean ±1 standard deviation across all days and subjects. For all statistical analyses, significant level is set to p < 0.05. III. RESULTS A. Proposed Method A representative example of the ANN output and its effect to misclassification removal are shown in Fig. 4 for one ablebodied subject and day (α = 0.2; θTI = 0.61). It is evident that when the LDA classifier output did not change and the RM S (t) value was constant, the ANN output was constantly integrated. This resulted in an increase of confidence in system’s output, because its decisions were very consistent. However, as
1172
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 4, APRIL 2014
Fig. 5. Histogram of trust index TI (t), grouped by correct and incorrect LDA decisions. The majority of the wrong classifier decisions corresponded to values of TI (t) are below 0.6. The majority of the correct LDA decisions had larger TI (t) values, demonstrating that the proposed index was suitable for detecting correct and incorrect classifier decisions.
soon as the classifier predicted different class labels within a short time and also when the RM S (t) value of all channels changed to a certain extent, the ANN recognized these changes and responded with a decrease of the confidence value. Potential misclassifications could therefore be removed. It was analyzed further how TI (t) values were correlated with the (in-)correctness of the LDA decisions. It was observed that the TI (t) tended to be below approximately 0.6 in case of wrong LDA predictions and above that value in case of correct predictions. By finding the optimal threshold, TI (t) could thus be used effectively to discard wrong LDA decisions. In Fig. 5, the correlation between TI (t) and the correctness of the LDA decisions is shown for the same subject and day as above, clustered for correct and incorrect LDA decisions. It is worth noting that in some cases TI (t) was also low for correct LDA predictions. However, in these cases, the previously accepted class was used as an output, so the effect of this incorrectly low TI (t) did not necessarily result in wrong motion output. B. Able-Bodied Subjects It was found that the postprocessing method used had a significant influence (repeated measures ANOVA, algorithms as factor with six levels, p