Learning Sparse Representations for Human Action Recognition - IITK
Recommend Documents
the need for accurate and automatic systems for human action recognition is ... proposed a 3D convolutional network to learn their features automatically and then a ... Sparse autoencoder is a neural network that is capable of learning features ...
Breakthrough results in compressive sensing (CS) have shown that high dimensional signals (vectors) can often be accurately recov- ered from a relatively small ...
(based on the adjacency graph) and achieve dimensionality reduction by applying ... usually performed using the k-nearest neighbors (kNN) classifier.
differences from action to action, though with possible par- tial overlap among activities. Therefore, Daniel Weinland and Edmond Boyer [22] have successfully ...
Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for ... visual scenes [7] and sparse representation has exhibited superior ..... of X, then on average, the amount of information in TkX is proportional to dk.
online learning based hyperspectral image compression methods are ... Results indicate that, independent of the sparsity models; online learning based ...
emotion recognition: speech and music-based recognition. We propose the use ... and when classifying music, characteristics such as tempo and instrumentation [8] ..... Classical, Country, Disco, Hip hop, Jazz, Metal, Pop, Reggae and Rock.
machine learning researchers in Computer Vision topics like image processing ... non-sparse formulations promote different degrees of sparsity at the kernel ...
in practice or even how to relate the optimal model capacity to the given data. Nevertheless, ANNs .... tion using parametric and differentiable pooling operators. Chapter 7 .... multi-layer, deep} neural networks or multi-layer perceptrons. Hereafte
Jan 3, 2010 - linked perception-action loops enable powerful possibilities for incre- ..... is associated to a scene (during learning or instantiation), the pose of.
Electrical and Computer Engineering. University of British Columbia, Vancouver, Canada ... For example, music signals are sparse in Fourier bases and many ...
object. Despite this we are able to perceive that the changing signals are produced by the ... the local surroundings of the vertex. Such a feature ... points (the correspondences between center and border views), which have been provided by ...
known Deep learning model is the Deep Belief Network[1]. In [2], Deep Belief ... beled handwritten digit datasets, collectively called n-MNIST, created by adding.
... image classes. In the camera phone example shown in Figure 1, the system could identify the ... from a limited number of classes are available. In [2], Lampert ...
May 20, 2011 - speech signals, and deriving a dictionary learning algorithm that is ...... U.K., in 1999 and the Ph.D. degree in signal processing from King's.
Furthermore, the manipulative primitives are spotted by a particle ..... During a manipulative action, the hand movements in the object vicinity can indicate an.
Nov 1, 2016 - vided into overlapping windows and each window gives a probability ..... put video sequence, we divide it into overlapping segments as done ...
Nov 1, 2016 - and online action recognition using the skeletal joints. ... each class computed with the sparse coefficients is used ... arXiv:1611.00218v1 [cs.
human action datasets: the KTH dataset and the Weiz- mann dataset show that the proposed approach outper- forms most existing methods. Key-words: action ...
Jan 9, 2011 - Recognition of human actions is done with a fast matching algorithm that .... to the starting point of the motion cycle in the sequence and to the ...
Aug 1, 2013 - AbstractâWe present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and ...
Aug 1, 2013 - AbstractâWe present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and ...
... of Computer Engineering. Faculty of Engineering, Prince of Songkla University. Hat Yai ..... Computer Vision and Pattern Recognition, Miami, USA, June 2009.
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Abstractâ In this work we ... recognition systems in the context of specific tasks, such as human activity ...
Learning Sparse Representations for Human Action Recognition - IITK
call the Local Motion Pattern descriptor. We also show ..... call the Random Sample Reconstruction. ..... horse riding (s5), running (s6), skating (s7), swinging (s8).
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. , NO. , JULY 2011
1
Learning Sparse Representations for Human Action Recognition Tanaya Guha, Student Member, IEEE, and Rabab K Ward, Fellow, IEEE, Abstract—This paper explores the effectiveness of sparse representations obtained by learning a set of overcomplete basis (dictionary) in the context of action recognition in videos. Although this work concentrates on recognizing human movements - physical actions as well as facial expressions, the proposed approach is fairly general and can be used to address other classification problems. In order to model human actions three overcomplete dictionary learning frameworks are investigated. An overcomplete dictionary is constructed using a set of spatio-temporal descriptors (extracted from the video sequences) in such a way that each descriptor is represented by some linear combination of a small number of dictionary elements. This leads to a more compact and richer representation of the video sequences compared to the existing methods that involve clustering and vector quantization. For each framework, a novel classification algorithm is proposed. Additionally, this work also presents the idea of a new local spatio-temporal feature that is distinctive, scale invariant and fast to compute. The proposed approach repeatedly achieves state-of-the-art results on several public datasets containing various physical actions and facial expressions. Index Terms—Action recognition, dictionary learning, expression recognition, overcomplete, orthogonal matching pursuit, sparse representation, spatio-temporal descriptors.
!
1
I NTRODUCTION
S
PARSE signal representation has emerged as an extremely successful tool for analyzing a large class of signals. Many signals like audio, images, video etc. can be efficiently represented with linear superposition of only a small number of properly chosen basis functions. Although the use of orthogonal bases like Fourier or Wavelets is wide-spread, the latest trend is to use overcomplete basis - where the number of basis vectors is greater than the dimensionality of the input vector. A set of overcomplete basis (called a dictionary) can represent the essential information in a signal using a very small number of non-zero elements. This leads to more sparsity in the transform domain as compared to that achieved by sinusoids or wavelets alone. Such compact representation of signals is desired in many applications involving efficient signal modeling. With overcomplete basis however greater difficulties arise; because a full-rank dictionary matrix Φ ∈ Rn×m (n < m) creates an underdetermined system of linear equations b = Φx having infinite number of solutions. The goal is to find a sparse solution i.e. x ∈ Rn should contain no more than k (k