Mapping Fuzzy Membership Functions to Normal ... - CiteSeerX

3 downloads 324 Views 294KB Size Report
Distributions to Understand Boxing Motions ... to the area of behaviour understanding, that is to say the recognition ..... [12] J. Kim, J. Seo, and G. Kim. Estimating.
Mapping Fuzzy Membership Functions to Normal Distributions to Understand Boxing Motions Mehdi Khoury Institute of Industrial Research University of Portsmouth Portsmouth, PO1 3QL UK [email protected]

Abstract We present a novel way to generate fuzzy membership functions that allows us to identify human activities from motion capture data. The system is able to recognise seven different boxing stances simultaneously with an accuracy superior to a GMM-based classifier. Experimental results seem to indicate that a template can be learned and a stance identified in under 10 milliseconds, which may allow recognition in real-time.

1

Introduction

This study presents a novel technique related to the area of behaviour understanding, that is to say the recognition and description of actions and activities from the observation of human motions. This process of recognition is usually performed by comparing observations to examples using different machine learning techniques. Such techniques presented in [1] and [20] can be used either in the context of template matching[3], state-spaces approaches[22], or semantic description[16]. Our application domain is focused on sport, and more precisely boxing. We have discarded template matching as it is generally more susceptible to noise, variations of the time intervals of the movements, and is viewpoint dependent [20]. We are not interested in a pure semantic description as we need to analyse and evaluate a boxing motion in a relatively detailed way. We therefore focus on the state-spaces approach. In this context, we now have to focus on the problem of identifying such states during a motion. Conventionally, machine learning techniques in use for solving such a problem vary from dynamic Time Warping [2], to Hidden Markov Models[15], Neural Networks[9], Principal Component Analysis[21], or variations of HMM or NN such as Coupled Hidden Markov Models [15], Variable-Length

Honghai Liu Institute of Industrial Research University of Portsmouth Portsmouth, PO1 3QL UK [email protected]

Markov Models[8], Fuzzy HMM [24], or TimeDelay Neural Networks [14]. In this paper, we present a different method that allows us to build fuzzy qualitative templates corresponding to different states from examples. We propose an automated way to generate fuzzy membership function that is applicable to biologically “imprecise” human motion, by mapping an estimation of centroid and range from a cumulative normal distribution to a membership function. First we will describe the human skeletal representation in use, then we will explain how we recognize stances(Guard, Jab, Cross, Lower Cross, Lower Jab, Right Hook, Left Hook, Lower Left Hook, and Right Uppercut) with fuzzy membership functions, and finally, we will present and discuss experimental results.

2

Human Skeletal Representation

The next step of our work will be focused not only on motion recognition but also performance analysis. We might have to assess the correctness of a given motion. This means that focusing on watching the displacements of the end-effectors of a chain of links (for example watching the trajectory of the end of a foot and hands instead of the whole body)to understand a motion[5] is not sufficient. We need to be able to decompose the motion into subcomponents such as the rotations of individual joints. Therefore, we use a representation that keeps track of multiple joints rotations. The data are in the .BVH or Biovision[19] motion capture format, in which a human skeleton is formed of skeletal limbs linked by rotational joints. We make the following choices and assumptions: • Knowing that motion capture data cannot give absolutely exact skeletal displacements of the joints [6] due to soft tissues movements, we simply seek to use it to obtain an approximation which would be good enough to characterize the motion.

• We simplify the body to nineteen main joints and assume that this number is sufficient to characterize and understand the general motions of a human skeleton performing boxing combinations. • Each joint is seen as having three degrees of freedom. The rotations of such joints are represented by Euler ZXY angles. We therefore characterize a joint rotation by three rotation angles Z, X and Y given in degrees by the .BVH motion capture format sampled at the speed of 120 frames per second. In practice, for every frame, our observed data takes the shape of a nineteen-by-three matrix describing ZXY Euler Angles for all nineteen joints in a simplified human skeletal representation.

3

The Fuzzy Membership Function Template

To learn to recognize a stance, we need to build a template (here a fuzzy membership function) of this stance from learning data. We will later identify this stance during a motion by comparing the observed data to the template. To do so, we compute its membership score. We first present the notion of fuzzy membership function, and then describe how we generate it using a normal mapping. Finally, we show how the degree of membership of observed data to a given template is computed. 3.1

Definition

The fuzzy linguistic approach introduced by Zadeh[23] allows us to associate a linguistic variable such as a “guard” stance with linguistic terms expressed by a fuzzy membership function. We are more specifically interested in using a trapezoid fuzzy-four-tupple (a, b, α, β) which defines a function that returns a degree of membership in [0,1] (see figure 1 and equation 1) as it seems to be a good compromise between precision and computational efficiency (compared with, for example, the triangular membership function).  0      α−1 (x − a + α) 1 µ(x) =  −1  β (b + β − x)    0

xb+β (1)

Figure 1: Fuzzy-4-Tupple Membership Function 3.2

Fuzzy Membership Function Generation By Mapping Normal Distributions

We use frames identified as “Guard” of membership equal to one as learning samples. The identification of these example data is made by a system similar to Reverse Rating[17], which is to say that, in our case, we ask an expert (a human observer) to answer the following question: Identify a group of frames whose motion indicates a stance that possesses the degree 1.0 of membership in the fuzzy set “Guard”. Once we have these learning data, we can proceed to generate a fuzzy membership function. Many kinds of procedures for the automated generation of membership functions can be found in the literature. Fuzzy Clustering [10], Inductive Reasoning[11], Neural Networks [12] and Evolutionary Algorithms[18] have been used, among others, to build such functions. Estimation of S-Function from Histograms[4] has also been done.So far, one downside of such techniques has been the difficulty to link the notion of fuzzy membership to the notion of probability distribution. One noticeable attempt to link both concepts in the generation of membership functions has been done by Frantti [7]in the context of mobile network engineering. Unfortunately, this approach is relatively limited as the minimum and maximum of the observed data are the absolute limits of the membership function. As a consequence, such a system does ignore motions which are over the extremum of the learning range of the examples. We propose a method that overcomes this problem by introducing a function that maps the probability that values fall within a given cumultive normal distribution to a degree of membership. This works with the assumption that, for a population of samples representing a given motion, the Z, X

and Y Euler angles characterizing the motion will tend to be normally distributed. We have a limited number of motion capture samples of a given stance (let us say a defensive posture called “Guard”). As we look at each Euler angle Z, X, Y for every joint j during a motion, we observe that, in our training sample, each Euler Angle e in each joint has a minimum and a maximum. We define this range between minimum and maximum of the learning sample as the range δ(e, j) of degree of membership one in the fuzzy set “Guard”. Knowing the size of our training sample, we can make an estimation of how much the range of our learning sample represents compared to the range of all possible guards. For example, if we think that the range of our sample represents around 68.2% of the maximum range of all possible guards, then we have a degree of membership 1 for two standard deviations (one on each side) on the population maximum range. This means that the rest of the distribution that will have membership inferior to one will take three remaining standard deviations on each side. Depending on this cumulative normal distribution evaluation, a portion of the four standard deviations representing the total range will be allocated to the membershipone-range and the remaining part will be allocated to the lower membership degrees.

Figure 2: Influence of the cumulative normal distribution parameter on the shape of the fuzzy membership function

Figure 3: Moving the centroid shifts the distribution and deforms the fuzzy membership function

• We approximate the maximum range by assuming that it is four standard deviations away in both directions from the mid-point of the range of membership one.

then our fuzzy membership function will be symmetric (α = β = 2 standard deviations on each side of the membership-one-range). Similarly, if the centroid c is such that :

• Depending on the cumulative normal distribution evaluation, a portion of the four standard deviations representing the total range will be allocated to the membershipone-range and the remaining part will be allocated to the lower membership degrees.

|c − a| = γ × |b − a|

Let this range be evaluated as representing 95% of the global theoretical range, then the fuzzy membership function would be shifted to the left such that : 

• We extract the average of the means of each learning sample. This will correspond to the centroid of the data samples of membership one. • While the distance |(b + β) − (a − α)| will be constant, a − α and b + β will be shifted to the side proportionally to the way the centroid is shifted from the midpoint (see Figure 3 and equation 3). For example, if the centroid is at the same position with the middle of the membership-one-range δ(e, j), and this range is evaluated as representing 95% of the maximum theoretical range,

(2)

3.3

α = (1 − γ) × (α + β) β = γ × (α + β)

(3)

Membership Evaluation

Our observed data take the shape of a nineteenby-three matrix describing ZXY Euler Angles for all nineteen joints. We evaluate how close this matrix is from a “Guard” stance by calculating the degree of membership of every Euler Angle in every joint (we have previously built a fuzzy-4-tupple corresponding to the “Guard” stance for every one of these Euler angles), and then, we compute an average membership score. This approach could probably be improved in

the near future by introducing weighted average for certain joints (for example, the position of the elbow might be more important than the position of the knee when in guard). If a frame have a high membership score for several fuzzy sets, we can establish a order of preference of these sets by comparing the euclidian distance of the observed data to the centroid of each fuzzy set (seen in section 3.2).

4 4.1

Experiment and results Apparatus

The motion capture data are obtained from a Vicon Motion Capture Studio with eight infrared cameras. The motion recognition is implemented in MATLAB 2007 on a single machine: a PC with an Intel core duo 2Ghz with 2 Gigs of RAM. An additional MATLAB toolbox [13] is also used for extracting Euler Angles from .BVH files. 4.2

Participants

Three male subjects, aged between 18 and 21, of light to medium-average size (167cm to 178cm) and weight (59 to 79kgs), all practising boxing in competition at the national level. None of them presented any abnormal gait. Optical Markers were placed in a similar way on each subject to ensure a consistent motion capture. 4.3

Procedure

The motion capture data are obtained from several subjects performing four time each boxing combination. There are twenty-one different boxing combinations, each separated by a guard stance. These are performed at two different speeds (medium-slow and medium fast). We extract a fuzzy membership function template corresponding to a “Guard” stance from various samples. First we learn from all three participants, and test how well we the system recognizes some of their Guard stances. Then we see how the system cope to learn from only two participants, and test how well it recognize stances from a third different participant. We then observe the accuracy of the system when learning to recognize nine different boxing stances simultaneously.

4.4

Results

We evaluate the classifier by comparing its performance to a human observer. One expert identifies “Guard” frames of membership 1 and of membership 0 (non-guard frames). We then count the number of false positives ( frames identified by the expert as non-guards, but identified by the classifiers as guards) and false negatives (frames identified by an expert as guards, but classified as non-guards by the system). We use ROC analysis to plot the true positive rates versus the false positive rates in function of different membership thresholds. The data are partitioned into sub-samples and tests are run using K-fold cross-validation. The k results from the folds are averaged to produce a single estimation. We present results for a 3-fold crossvalidation where one third of the data is used for learning while the rest is used for testing (around 107000 unidentified frames)as shown in figure 4. In this example, we analyse two situations: • first situation: all participants are used for learning and testing. This means there is a greater similarity between the learning and the test samples, as the gait differences are reduced. • second situation: two participants are used for learning and a third one for testing (all combinations are averaged to produce a single estimation). In this case, there are greater gait differences between the learning sample and the test sample as we do not use the same subjects for learning and for testing.

Figure 4: 3-Fold Cross-Validation ROC Analysis of the Guard Classifier The optimum accuracy of the classifier is 0.95 if the same participants are used for learning

and testing, or 0.88 when different participants are used for learning and testing. Crisp evaluation(the accuracy obtained for detecting frames of “Guard” membership only equal to 1.0) gives inferior results: 0.906 in the first case and 0.506 in the second case. We also compare in figure 5 the accuracy of our Fuzzy Membership Function(FMF) system with a standard Gaussian Mixture Models(GMM) algorithm when classifying seven different stances (Guard, Jab, Cross, Lower Cross, Right Hook, Left Hook, and Lower Left Hook).

We have observed that a high threshold value is needed to obtain good results. If the threshold is inferior to a membership degree of 0.8, we obtain a maximum True Positive Rate (most of known guards are correctly identified) and a minimum false Negative rate(nearly all known non-guard are identified as guards). A t-test shows with 95% confidence that our classifier seems to perform significantly better than the GMM-based one (besides, it is worth noting that it has a general average accuracy of 87.71% while the GMM algorithm is 49% accurate). It takes in average less than 8.3 milliseconds to create a template and evaluate a frame membership score for one stance. The time complexity for recognizing n stances approximates the order of O(n). It is very likely that this system can be implemented in real-time motion recognition applications.

6 Figure 5: Comparing accuracy on seven stances: GMM versus FMF The system can recognize nine different stances with an average accuracy of 88.68% when using half/half of the data for learning and testing on all three participants. Some of these movements have very few learning data available. In this case, we fine-tune the threshold by decreasing it to compensate this data sparsity. Without any optimization, building a fuzzy membership template function from one frame on MATLAB takes in average over 100 runs 0.437 millisecond. To compute the membership score of one frame, the average time length is 5.14 milliseconds.

5

Discussion

It is worth noticing that this system is not a binary but a fuzzy classifier. The threshold value will therefore stay between 0 and 1, which might give the illusion of an “unfinished” ROC curve if the learning and test samples are similar enough (see Figure 4 where the membership-one point start with a high True Positive rate because we learn from and test with the same boxers). The ROC curves show that, the fuzzy classifier performs better than its crisp counterpart (the one that only identifies Guards of membership one). This gain is especially noticeable when the learning and the testing data present less similarity.

Conclusion

We have presented a technique that performs better than a GMM classifier, allows the simultaneous recognition of seven dynamic stances from .BVH motion capture data with an accuracy of 88.28% in a time period under 36 milliseconds. This open the possiblity for some future work involving real-time learning and recognition using probabilistic models in order to get the complete state-spaces of a complex motion. We plan to compare the performances of this system with other techniques such as Hidden Markov Models, and we also intend to examine how to evaluate the correctness of a motion.

Acknowledgements Many thanks to Portsmouth University Boxing Club and to the Motion capture Team: Alex Counsell, Geoffrey Samuel, Ollie Seymour, Ian Sedgebeer, David McNab, David Shipway and Maxim Mitrofanov.

References [1] J. K. Aggarwal, Q. Cai, W. Liao, and B. Sabata. Articulated and elastic non-rigid motion: A review, June 15 1994. [2] A. F. Bobick and A. D. Wilson. A state based technique for the summarization and recognition of gesture. In International Conference on Computer Vision, pages 382–388, 1995.

[3] C. S. Chan, H. Liu, and D. J. Brown. Recognition of human motion from qualitative normalised templates. Journal of Intelligent and Robotic Systems, 48(1):79–95, 2007. [4] B. B. Devi and V. V. S. Sarma. Estimation of fuzzy memberships from histograms. Inf. Sci, 35(1):43–59, 1985. [5] D. Dragulescu, M. Tascau, and D. Stanciu. Kinematic and dynamic modeling of human lower limb. IASTED International Conference Robotics and Aplications Nov, pages 19–22, 2001. [6] J. Favre, R. Aissaoui, B. M. Jolles, O. Siegrist, J. A. de Guise, and K. Aminian. 3d joint rotation measurement using mems inertial sensors: Application to the knee joint. In ISB-3D: 3-D Analysis of Human Movement, 28-30 June 2006, Valenciennes, France., 2006.

model for part-of-speech tagging. JOURNAL OF INTELLIGENT AND FUZZY SYSTEMS, 4:309–320, 1996. [13] N. D. Lawrence. Mocap toolbox for matlab. Available on-line at http://www.cs.man.ac.uk/ neill/mocap/. [14] C.-T. Lin, H.-W. Nein, and W.-C. Lin. A space-time delay neural network for motion recognition and its application to lipreading. Int. J. Neural Syst, 9(4):311–334, 1999. [15] A. P. Pentland, N. Oliver, and M. Brand. Coupled hidden markov models for complex action recognition. In Massachusetts Institute of Technology, Media Lab, 1996. [16] P. Remagnino, T. N. Tan, and K. D. Baker. Agent orientated annotation in model based visual surveillance. In International Conference on Computer Vision, pages 857–862, 1998.

[7] T. Frantti. Timing of fuzzy membership functions from data. Academic Dissertation, July 2001.

[17] S. Sanghi. Determining membership function values to optimize retrieval in a fuzzy relational database. Proceedings of the 2006 ACM SE Conference, 1:537–542, 2006.

[8] A. Galata, N. Johnson, and D. C. Hogg. Learning variable-length markov models of behavior. Computer Vision and Image Understanding, 81(3):398–413, Mar. 2001.

[18] D. Simon. H infinity estimation for fuzzy membership function optimization. Int. J. Approx. Reasoning, 40(3):224–242, 2005.

[9] Y. Guo, G. Xu, and S. Tsuji. Understanding human motion patterns. In International Conference on Pattern Recognition, pages B:325–329, 1994. [10] T. Iokibe. A method for automatic rule and membership function generation by discretionary fuzzy performance function and its application to a practical system. NAFIPS/IFIS/NASA’94. Proceedings of the First International Joint Conference of the North American Fuzzy Information Processing Society Biannual Conference. The Industrial Fuzzy Control and Intelligent Systems Conference, and the NASA Joint Technolo, pages 363–364, 1994. [11] C. Kim and B. Russell. Automatic generation of membership function and fuzzy rule using inductive reasoning. Industrial Fuzzy Control and Intelligent Systems, 1993., IFIS’93., Third International Conference on, pages 93–96, 1993. [12] J. Kim, J. Seo, and G. Kim. Estimating membership functions in a fuzzy network

[19] J. Thingvold. Biovision bvh format. [Online] Available at http://www. cs. wisc. edu/graphics/Courses/cs-838-1999/Jeff, 1999. [20] L. Wang, W. Hu, and T. Tan. Recent developments in human motion analysis. Pattern Recognition, 36(3):585–601, 2003. [21] Y. Yacoob and M. J. Black. Parameterized modeling and recognition of activities. In ICCV, pages 120–127, 1998. [22] J. Yamato, J. Ohya, and K. Ishii. Recognizing human action in time-sequential images using hidden markov model. In IEEE Computer Vision and Pattern Recognition or CVPR, pages 379–385, 1992. [23] L. ZADEH. Fuzzy sets. Information and Control, 8:338–353, 1986. [24] X. Zhang and F. Naghdy. Human motion recognition through fuzzy hidden markov model. Proceedings of the International Conference on Computational Intelligence for Modelling, Volume 02:450–456, 2005.

Suggest Documents