Human Activity Recognition via the Features of Labeled Depth Body ...

Human Activity Recognition via the Features of Labeled Depth Body Parts Ahmad Jalal1, Sungyoung Lee2, Jeong Tai Kim3, and Tae-Seong Kim1 1

Department of Biomedical Engineering Department of Computer Engineering 3 Department of Architectural Engineering, Kyung Hee University 1 Seocheong-dong, Giehung-gu, Yonggin-si, Gyeonggi-do, 446-701, Republic of Korea [email protected], [email protected], {jtkim,tskim}@khu.ac.kr 2

Abstract. This paper presents a work on labeled depth body parts based human activity recognition. In this work, we label depth silhouettes for various specific body parts via trained random forests. From the labeled body parts, the centroid of each body part is computed, resulting in 23 centroids from each depth silhouette. Then from the centroids in 3D, we compute motion parameters (i.e., a set of magnitude and directional angle features). Finally, Hidden Markov Models are trained with these features and used to recognize six daily human activities. Our results show the mean recognition rate of 97.16% over the six human activities whereas a conventional HAR approach achieved only 79.50%. Our system should be useful as a smart HAR system for smart homes. Keywords: Human activity recognition, random forests, labeled body parts, depth silhouettes, Hidden Markov Models.

1

Introduction

Human Activity Recognition (HAR) is an important area of proactive computing research due to its potential in understanding activity context from daily video recordings of human activities. Its applications include patient-monitoring systems, surveillance systems, health care systems, human machine interaction, and etc. [1]. The goal of HAR is to recognize outdoor and indoor activities from a video having a sequence of activity images. In typical video-based HAR, binary silhouettes of human body [2] have been widely used to get activity features. Recently, depth silhouettes are adopted providing richer information than the binary silhouettes [2]: body parts are differentiated by means of different intensity (i.e., depth) values. If each body part can be recognized in the depth silhouettes, much advanced HAR could be possible for complex human activities. Recently, body parts recognition from depth silhouettes has become an active topic in human motion recognition. For instance, in [3], random forests (RFs) are used to label each depth pixel for its corresponding body part. In general, this approach requires multiple motion capture (MoCap) cameras and optical markers attached to a subject to create a database (DB) of depth silhouettes and corresponding pre-labeled body parts. In this work, we propose a novel labeled body parts based HAR system. However, we first provide a way of creating such DB with a single depth camera (i.e., without MoCap M. Donnelly et al. (Eds.): ICOST 2012, LNCS 7251, pp. 246–249, 2012. © Springer-Verlag Berlin Heidelberg 2012

Human Activity Recognition via the Features of Labeled Depth Body Parts

247

cameras and optical markers), with which RFs are trained for body parts labeling. Secondly, the trained RFs are used to label every incoming depth silhouette into 23 body parts. Then, from the centroids of the labeled body parts, we estimate motion parameters as activity features. Finally, these features are used in Hidden Markov Models (HMMs) for HAR. We have tested our system over six human activities.

2

Methodology

The overall processes of our proposed HAR system is given in Fig. 1, showing key steps of body parts labeling via trained RFs, activity feature generation, and HAR via trained HMM. Depth Silhouette

Trained Random Forest

Depth Silhouette

Feature

Body Parts Labeled

Generation

Trained HMM

Recognized Activity

Fig. 1. Overall flow of proposed labeled body parts based HAR system

2.1

Labeled DB Generation

To label the body parts in the depth silhouettes via RFs, a DB is required as fore mentioned. Fig. 2 show how this DB is created. Fig. 2 (a) shows a depth silhouette, (b) a skeleton model derived from (a) with a vector of 15 joints of body parts. In Fig. 2 (c), each body part is represented with a Gaussian contours based on the information of body parts proportions and joint locations. Finally in Fig. 2 (d), each body parts are identified and denoted in different colors.

(a)

(b)

(c)

(d)

Fig. 2. The DB generation processes: (a) a depth silhouette (b) its skeleton model of a walking activity (c) body parts representation with Gaussian contours, (d) classified body parts

To train the RFs, the generated DB in Section 2.1 is used as shown in Fig. 3.

Fig. 3. Sequential flow of training trees using depth silhouettes and segmented body parts

248

2.2

A. Jalal et al.

Body Parts Labeling and Feature Extraction

With the trained RFs, incoming depth silhouettes are labeled into 23 body parts. Then, a set of 23 centroids gets computed from every depth silhouette of activities. Finally, from two consecutive sets of centroids, a set of motion parameters including motion magnitude and directional angles are obtained. The overall procedures of these body parts labeling and feature extraction processes are illustrated in Fig. 4.

Fig. 4. Overall procedures of body parts labeling and feature extraction

2.3

Activity Recognition via HMM

To perform HAR, we train HMMs [2]. In training, the Linde, Buzo, and Gray [2] clustering algorithm is used to generate a codebook of features from the motion parameters [2] with a codebook size of 32. In our implementation, we used four-state left-to-right HMMs to encode sequential events of the features. Finally, the trained HMMs are used to recognize activities. Fig. 5 shows an exemplary HMM of a right hand waving activity before and after training [2]. 0.333

1

0.333

0.333

2

0.333

0.333

0.333

(a)

3

0.274

0.5

0.5

4

1

1

0.144

0.582

0.439

2

0.164

3

0.397

(b)

0.581

0.419

4

1

Fig. 5. A right hand waving HMM (a) before and (b) after training

3

Experimental Results

In this study, we utilized a PrimeSense camera [5] to acquire a sequence of depth silhouettes (i.e., 10 frames in a clip) of six human activities. In our HAR experiments, a set of 10 video clips was used in training and a set of 25 video clips for testing. Table 1 shows the HAR results with our proposed HAR technique along with the results from a conventional HAR method [2], in which the principle component (PC) features of the depth silhouettes after linear discriminant analysis (LDA) and HMMs are used. The superior recognition rates of all activities were obtained with the mean recognition rate of 97.16% whereas the conventional only achieved 79.50%.

Human Activity Recognition via the Features of Labeled Depth Body Parts

249

Table 1. Recognition results using a conventional method and our proposed labeled body parts based HAR Activities Walking Running Right Hand Waving Both Hand Waving Sitting-Down Standing-Up

4

LDA on PC features [2] Recognition Rate (%) Mean 84.0 78.50 85.50 79.50% 89.0 72.50 67.50

Labeled body Parts features Recognition Rate (%) Mean 98.50 95.0 97.50 97.16% 100 98.0 94.0

Conclusions

In this paper, we have presented a novel labeled body parts based HAR system using depth silhouettes. Our proposed HAR system utilizes motion features from the centroids of the labeled body parts and HMMs for activity recognition. Our experimental results have shown some promising performance achieving the mean recognition rate of 97.16%. The proposed methodology should be applicable to smart homes for better and much advanced HAR. Acknowledgements. This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2012-(H0301-12-1004)). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2012-0000609).

References 1. 2.

3.

4. 5.

Chan, M., Esteve, D., Escriba, C., Campo, E.: A review of smart homes-Present state and future challenges. Computer Methods and Programs in Biomedicine 91, 55–81 (2008) Jalal, A., Uddin, M.Z., Kim, J.T., Kim, T.-S.: Recognition of Human Home Activities via Depth Silhouettes and R Transformation for Smart Homes. Journal of Indoor and Building Environment, 184–190 (2012) Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-Time Human Pose Recognition in Parts from Single Depth Images. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1297– 1304 (2011) Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: IEEE International Conference in Computer Vision, pp. 1–8 (2007) http://www.primesense.com