Wiizards: 3D gesture recognition for game play input

6 downloads 0 Views 414KB Size Report
Drexel University. 3141 Chestnut Street. Philadelphia, PA 19104 [email protected]. ABSTRACT. Gesture based input is an emerging technology gaining wide-.
Wiizards: 3D Gesture Recognition for Game Play Input Louis Kratz

Matthew Smith

Frank J. Lee

Dept. of Computer Science Drexel University 3141 Chestnut Street Philadelphia, PA 19104

Digital Media Labs Drexel University 3141 Chestnut Street Philadelphia, PA 19104

Dept. of Computer Science Drexel University 3141 Chestnut Street Philadelphia, PA 19104

[email protected]

[email protected]

[email protected]

ABSTRACT Gesture based input is an emerging technology gaining widespread popularity in interactive entertainment. The use of gestures provides intuitive and natural input mechanics for games, presenting an easy to learn yet richly immersive experience. In Wiizards, we explore the use of 3D accelerometer gestures in a multiplayer, zero sum game. Hidden Markov models are constructed for gesture recognition, providing increased flexibility and fluid tolerance. Users can strategically effect the outcome via combinations of gestures with limitless scalability.

Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation]: User Interfaces—Input devices and strategies; K.8.0 [Personal Computing]: General—Games; I.5.5 [Pattern Recognition]: Implementation—Interactive systems

Figure 1: Wiizards presents a two player, spell based environment.

General Terms Games, Interactive systems, Pattern Recognition

Keywords Gestures, HMM, Games

1.

INTRODUCTION

Gesture recognition as an input mechanic has been explored academically in a variety of approaches. Two definitions of gestures have been popular in the literature. The first defines a gesture as the spatial orientation of a person or hand. This approach was utilized by Freeman et. al. [3], Segen et al. [13] [12] and GeFighters [14]. The second definition of gestures classify specific motion paths of the user. This work explores the use of accelerometers to classify motion paths via a Bayesian approach. By using accelerometer data, the path that the user creates can be recorded directly, rather than relying on tracking to estimate the motion differentials. Other approaches include Heap [4] who uses active shape models to track the user and identifies gestures by Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. FuturePlay 2007, November 15-17, 2007, Toronto, Canada. Copyright 2007 ACM 978-1-59593-943-2/07/0011...$5.00

209

the shapes parameters. Heap’s approach, while accurate, requires the use of vision based system for the shape model, which introduces latency and speed issues not found in accelerometers. Keskin et. al. [8] also uses a vision based approach. Accelerometer based gesture recognition has been explored, though the construction of such have varied. Keir et al. [6] created a gesture recognition system for accelerometers using curve fitting. The curve fitting approach presented by Keir who integrates the accelerometer data for absolute position. Payane [10] uses the gesture recognition created by Keir as a game input mechanic. Payane’s work, though similar in spirit to ours, does not have the advantages of a Bayesian method. Wiizards uses Hidden Markov Models [11] (HMMs) to classify gestures from the accelerometer data. This Bayesian approach provides more flexibility on a per user basis, and handles noisy sensors data within the model. A HMM is a statistical model whose hidden states exhibit the Markov process. HMMs are parametrized by the number of states in the model N , a probability distribution function for each state Bi , initial state distribution π, and a transition probability matrix A . HMMs have been used in other applications for gesture recognition, however. Keskin et. al. [8], for example, use HMMs

Figure 2: The unique ordering of gestures for spell creation.

in their vision based approach. In addition Kela et al. [7] use accelerometer gestures for design applications. Kela’s construction, however, transforms the 3D signal into a sequence of discrete symbols whose tolerance of noise and ambiguity to the accelerometer state is not investigated. Mantyjarvi et al. [9] also uses discrete HMMs for controlling a DVD player. Gesture recognition in gaming is just now being explored in commercial products. AiLive [5] has produced a product for TM R Wii , gesture recognition and training for the Nintendo but has not released the details of their machine learning techniques.

2. IMPLEMENTATION 2.1 Game Overview Wiizards is a two player zero-sum game. The goal for each player is to damage the opposition to a critical point while limiting damage to themselves. The player casts spells by performing gestures, the order of which determines the effect that they have. The gestures are unique arm motions divided into three categories: Actions, Modifiers and Blockers. Each of these serve a different purpose in the combination of the gestures to complete a total spell.

2.2

Strategic Composition

The user interface is divided into three sections: a bar revealing the current status of all the gestures available to each of the players, a playing field, and a queue for each player indicating the current spell (Figure 1). The gesture bar serves two purposes: a visual reminder to the player of how to perform the gesture, and the cool down time remaining. Alpha transparency is used to indicate how long until a particular gesture available. The amount of transparency indicates how long until it will be available again. When the representation of the gesture is fully visible and opaque it is available for the player to use. At current the game is in mid development with the queuing system, gesture recognition system, and GUI completed.

2.4

Communication Design

2.5

Gesture Recognition

R Our software utilizes three main components: the Nintendo TM Wii controller, the gesture recognition system, and the graphical game implementation. Communication with the TM R Nintendo controller is done via publicly available Wii open source libraries [1]. The accelerometer data is then directed into our HMM gesture recognition package. The TM R via results of such are communicated to Adobe Flash XML.

2.5.1

Model Construction

The observations for the TM models are the accelerometer data R controller. The device provides Wii from the Nintendo a gravitational reading for three axis, making our observations a three dimensional vector o as indicated in (1). This data is normalized using the wiimote calibration information [1]. 2 3 x (1) oi = 4 y 5 z

The order in which the gestures are composed is vital to determining the behavior of a spell. Each spell consists of blockers and modifiers, and must conclude with an action. Modifiers effect only the gestures following them in the spell (Figure 2). For example, if a spell consists of gestures XY Z, the modifier X will effect Y and Z, while Y will only effect Z. To successfully block a spell, players must directly mimic their opponent’s gestures. For example, to block spell XY Z, a player must perform gestures BXY Z where B is a blocker. Blockers can also be modified by gestures performed prior to them.

Each gesture, or observation vector, is a collection of observations.

A queue is populated as players perform gestures. When the spell is cast, the elements are removed from the queue in order, modifying the parameters of proceeding gestures. The ability to combine multiple gestures in spell creation provides highly customizable and scalable game play experience. The level of customization also gives a wider range of possibilities to each of the players, making the game scalable in strategy and individual skill level. Players more fluent in the gestural language of the game can explore different strategies as they find more effective usages for each gesture. Gesture management is also a major strategic component to the game. Each gesture has a cool down time limiting how often it may be used, forcing the player to make use of a wide verity of gestures.

We create a separate model Mi for each gesture to be recognized. To classify an observation sequence as a specific gesture, we maximize over the probability of the sequence for each model as shown in equation 3.

2.3

Visual Feedback

210

Gi = o1 , o2 , . . . , om

(2)

Since the observations are vectors, multi-variant Gaussian distributions are used for the emission probabilities. Therefore each emission probability Bi is classified by a 3 dimensional mean vector, and a 3×3 covariance matrix. The model parameters Bi ,A, and π, are trained using the Baum-Welch algorithm [2].

2.5.2

Gesture Classification

Gesture(G) = arg max p(G|Mi ) i

(3)

The probability of a gesture G given a model M is the distribution of the observations and the hidden states, as calculated by the Viterbi Algorithm [2].

Training Convergence Rate Correct Classification Percentage

Correct Classification Percentage

HMM State Recognition Rates 100 90 80 70 60 50

5

10

15

20

25

Number of States

Figure 3: Percentage of classifications for varied model size.

100 80 60 40 20 0

5

10

15

20

25

30

35

40

Number of Gestures used in Training

Figure 4: Average correct classification for varied training set sizes.

3. IMPLEMENTATION RESULTS 3.1 Model Size Exploration

The number of states was varied for each gesture, and an HMM was created with the data from all of the users. We then measured the percentage of correct classifications based on those models, results of which are show in Figure 3. A recognition rate of over 90% was achieved with only ten states. For the game implementation, we use 15 states which achieves over 93% recognition rate with our test data.

Training Convergence Rate

The gesture recognizer models were trained with the sample data to measure how quickly a model would adapt to each user. For each user, we trained models with an increasing number of observation sequences, and then evaluated the percentage of correct classifications with the sample data. The models used have 15 states. The results of this are shown in Figure 4. Significant accuracy, over 80%, is achieved with a training set of only 10 gestures. At 20 gestures the recognition rate is over 95%. This data also indicates that user-dependent training is more reliable than the global training measured in Figure 3. The classification correctness was also measured against models where the user had not contributed to the training set. We created HMM models using the sample data from other users, and measured the recognition correctness of the user against it. This scenario approximates how an “out of the box” gesture set would perform. The results of this are in Figure 5. The average recognition rate remains around 50% regardless of how much training data is used. The sample standard deviation is indicated by the vertical bars. This large sample standard deviation indicates that some gestures are frequently being misclassified.

3.3

100 80 60 40 20 0

5

10 15 20 25 30 35 Number of Gestures used in Training

40

Figure 5: Average correct classification without user training.

HMM Recognizer Performance Gestures Evaluated Per Second

3.2

Classification Without Local Training Correct Classification Percentage

To train our gesture recognizer models, we gathered training data from 7 different users. Each user was presented with images of the gestures from the game, and performed each gesture over 40 times.

2200 2000 1800 1600 1400 1200 1000 800 600 400 200 0

5

10

15

20

25

Number of States

Figure 6: Average evaluation time for increasing model sizes.

Implementation Performance

The time to evaluate the probability of a gesture is directly related to the number of states in the HMM. We measured

211

5.

HMM Trainer Performance Average Time to Train HMM

30 25 20 15 10 5 0

5

10

15

20

25

Number of States

Figure 7: Average training time for increasing model sizes.

the average time to evaluate a gesture for HMMs with varying number of states. These experiments were run on an Intel Core 2 processor at 2.66Ghz with 4GB of RAM. As shown in Figure 6, an HMM with 15 states can evaluate over 250 gestures per second. Note that this number is for a single model, thus equation 3 will introduce a scaling factor. Training the HMMs, however, can not be achieved in a real time environment. We measured the average training time for a set of 10 gestures for increasing model sizes. These results are presented in Figure 7. The time for training significantly increases with the number of states. Our implementation which uses 15 states takes about 10 seconds to train. The training for each user must therefore be done offline, with a trade off between the training time and recognition rate.

4.

CONCLUSION

Natural, innovative input is increasingly becoming the selling point for interactive applications. With this work we explore how simple, easy to learn controls can lend themselves to a highly strategic and player driven experience. Wiizard’s stack based spell approach grants the players the freedom to play at their skill level and strategy of their choice. Our hidden Markov model construction allows the players a level of input flexibility while providing easy extensions for more detailed game play. The accuracy of the recognition depends on time spent on user training, and the number of states in the model. For high accuracy, user specific training is required. Our gesture recognition system can perform in real time with high accuracy after an initial training period. The implementation achieves significant recognition rates with 10 − 20 user samples, however we consider the machine training time of 10 seconds to be limiting. Our game implementation will handle this by providing training sessions for each player. The goal will be to train both the user and the system together in an entertaining fashion. After this initial training period, in game data can be used to update the model. Future work will explore the use of adaptive HMMs to avoid this training overhead and explore alternative input devices such as multi touch displays.

212

REFERENCES

[1] J. Andersson and C. Phillips. Simple wiimote library for linux, 2007. [2] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, August 2006. [3] W. T. Freeman, D. B. Anderson, P. A. Beardsley, C. N. Dodge, M. Roth, C. D. Weissman, W. S. Yerazunis, H. Kage, K. Kyuma, Y. Miyake, and K. ichi Tanaka. Computer vision for interactive computer graphics. IEEE Computer Graphics and Applications, 18(3):42–53, 1998. [4] A. Heap. Real-time hand tracking and gesture recognition using smart snakes, 1995. [5] A. Inc. Livemove white paper. Technical report, AiLive Inc., http://www.ailive.net/, 2006. [6] P. Keir, J. Payne, J. Elgoyhen, M. Horner, M. Naef, and P. Anderson. Gesture-recognition with non-referenced tracking. In 3DUI ’06: Proceedings of the 3D User Interfaces (3DUI’06), pages 151–158, Washington, DC, USA, 2006. IEEE Computer Society. [7] J. Kela, P. Korpipaa, J. Mantyjarvi, S. Kallio, G. Savino, L. Jozzo, and D. Marca. Accelerometer-based gesture control for a design environment. Personal Ubiquitous Comput., 10(5):285–299, 2006. [8] C. Keskin, A. Erkan, and L. Akarun. Real time hand tracking and 3d gesture recognition for interactive interfaces using hmm. In Proceedings of the Joint International Conference ICANN/ICONIP 2003. Springer. [9] J. Mantyjarvi, J. Kela, P. Korpipaa, and S. Kallio. Enabling fast and effortless customisation in accelerometer based gesture interaction. In MUM ’04: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, pages 25–31, New York, NY, USA, 2004. ACM Press. [10] J. Payne, P. Keir, J. Elgoyhen, M. McLundie, M. Naef, M. Horner, and P. Anderson. Gameplay issues in the design of spatial 3d gestures for video games. In CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pages 1217–1222, New York, NY, USA, 2006. ACM Press. [11] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. pages 267–296, 1990. [12] J. Segen and S. Kumar. Fast and accurate 3d gesture recognition interface. In ICPR ’98: Proceedings of the 14th International Conference on Pattern Recognition-Volume 1, page 86, Washington, DC, USA, 1998. IEEE Computer Society. [13] J. Segen and S. Kumar. Human-computer interaction using gesture recognition and 3d hand tracking. In ICIP (3), pages 188–192, 1998. [14] J. M. Teixeira, T. Farias, G. Moura, J. Lima, S. Pessoa, and V. Teichrieb. Gefighters: an experiment for gesture-based interaction analysis in a fighting game. In SBGames, Brazil, 2006.

Suggest Documents