ConvNets-Based Action Recognition from Depth Maps ... - Google Sites

ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring Pichao Wang1 , Wanqing Li1 , Zhimin Gao1 , Chang Tang2 , Jing Zhang1 , and Philip Ogunbona1 1 Advanced Multimedia Research Lab, University of Wollongong, Australia; 2 School of Electronic Information Engineering, Tianjin University, China [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] B ACKGROUND

R OTATION TO MIMIC VIRTUAL CAMERAS

1. Action recognition has been an active research topic in computer vision due to its wide range of applications including intelligent surveillance and human-computer interactions. 2. The release of the Microsoft Kinect Sensors opens up new opportunities for action recognition. 3. Deep learning approach has achieved great success in several kinds of applications.

Pd

Image center (Cx,Cy)

X

o

β Po

O

θ Pt

y z

Z

f

x

(a)

Fig.1 Action Recognition source from

T. Lan et. al.

Fig.2 Kinect Sensors source from Apple

Fig.3 Deep Learning source from VLab MIT

The rotation of the 3D points can be performed equivalently by assuming a virtual RGB-D camera moves around and points at the subject from different viewpoints. The coordinates of subject with respect to the virtual camera can be computed by the transformation: T 0 0 0 T [X , Y , Z , 1] = Try Trx X, Y, Z, 1 (1) 0

0

0

where X , Y , Z represent the 3D coordinates with respect to the virtual camera system and Try denotes the transformation along Y axis (right-handed coordinate system) while Trx denotes the transformation along X axis and they are:

P ROPOSED M ETHOD The proposed method consists of two major components: three ConvNets3 and construction of DMMs2 from sequences of depth maps as the input to the ConvNets. Given a sequence of depth maps, 3D points are created and three DMMs are constructed by projecting the 3D points to the three orthogonal planes. Each DMM serves as an input to a ConvNet for classification4 . Final classification of the given depth sequence is obtained through a late fusion of the three ConvNets. Three strategies have been developed to deal with the challenges posed by small datasets. Firstly, more training data are synthesized by rotating the input 3D points to mimic different cameras; Secondly, the same ConvNet architecture as the one for ImageNet is adopted so that the model trained over ImageNet can be adapted to our problem through transfer-learning. Thirdly, each DMMs goes through a pseudo-color coding process to separate different motion patterns with enhancement into the PseudoRGB channels before being input to the ConvNets. ConvNet Rotation And Pseudocoloring

(b)

11 11

Ry (θ) Ty (θ) Rx (β) Tx (β) Try = ; Trx = 0 1 0 1 

where

1 Ry (θ) = 0 0

0 cos(θ) sin(θ)

    cos(β) 0 0 0 −sin(θ) Ty (θ) =  Z · sin(θ)  ; Rx (β) =  −sin(β) Z · (1 − cos(θ)) cos(θ)

0 1 0

(2)

   −Z · sin(β) sin(β) . 0 0  Tx (β) =  Z · (1 − cos(β)) cos(β)

P SEUDOCOLORING Motivated by the work1 where color-coding can harness the perceptual capabilities of the human visual system to extract more information from gray images and, hence, effectively to enhance the detailed texture patterns contained in the image, it is proposed in this paper to code a DMMs into a Pseudo-color image such that to effectively exploit/enhance the texture in the DMMs that corresponds to the motion patterns of actions.

c

DMMf

4096

4096

Ci=1,2,3

ConvNet

11

c

DMMs

4096

4096

class score fusion

ConvNet

Input depth maps

Rotation And Pseudocoloring

11 11 c

DMMt

4096

conv1

conv2

conv5

fc6

4096

fc7 fc8

fusion

R EFERENCE 1. B. R. Abidi, Y. Zheng, A. V. Gribok, and M. A. Abidi, “Improving weapon detection in single energy X-ray images through pseudocoloring,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 36(6):784–796, 2006. 2. X. Yang, C. Zhang, and Y. Tian, “Recognizing actions using depth motion maps-based histograms of oriented gradients,” ACM MM, 2012. 3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012. 4. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”arXiv:1408.5093, 2014.

1

Corresponding normalized color value

Rotation And 11 Pseudocoloring

1 1 2 = {sin[2π · (−I + ϕi ) · + ]} · f (I) 2 2

0.9 0.8 0.7 0.6

R(α = 1) G(α = 1) B(α = 1) R(α = 10) G(α = 10) B(α = 10) Amplitude modulation

0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

Normalized gray level

E XPERIMENTAL R ESULTS

(3)

ConvNets-Based Action Recognition from Depth Maps ... - Google Sites

ConvNets-Based Action Recognition from Depth Maps ... - Google Sites

Suggest Documents

Action Recognition from Depth Sequences Using Depth Motion Maps

Action Recognition from Depth Maps Using Deep ...

Action Recognition using Multi-layer Depth Motion Maps ... - IEEE Xplore

Action recognition from depth sequences using weighted fusion of 2D ...

Action Recognition from Depth Sequences Using ... - Semantic Scholar

Human Action Recognition based on MSVM and Depth Images

A Real-Time Human Action Recognition System Using Depth and ...

Multimodal Multipart Learning for Action Recognition in Depth Videos

Human action recognition from RGB-D frames

Estimation of Face Depth Maps from Color ... - Semantic Scholar

Dense Depth Maps from Epipolar Images - DSpace@MIT

Obtaining Reliable Depth Maps for Robotic Applications from a Quad ...

Human Behavior Analysis from Depth Maps - Semantic Scholar

Dense Depth Maps from Sparse Models and Image ...

Consistent Depth Maps Recovery from a Video Sequence - CUHK CSE

Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect ...

Accurate 3D maps from depth images and motion sensors via ...

Extracting View-dependent Depth Maps from a Collection ... - CiteSeerX

Question Generation from Concept Maps - Google Sites

Face Recognition using depth information

Facial Expression Recognition From Depth Video With ... - IEEE Xplore

Depth CNNs for RGB-D scene recognition: learning from ... - arXiv

Facial Expression Recognition From Depth Video With ... - IEEE Xplore

Depth-From-Recognition: Inferring Meta-data by Cognitive Feedback