Accelerated Learning of Discriminative Spatio-temporal ... - IEEE Xplore

Accelerated Learning of Discriminative Spatio-temporal Features for Action Recognition Munender Varshney

Renu Rameshan

School of Computing and Electrical Engineering Indian Institute of Technology Mandi, Himachal Pradesh, India Email: munender [email protected]

School of Computing and Electrical Engineering Indian Institute of Technology Mandi, Himachal Pradesh, India Email: [email protected]

Abstract—Recently, paradigm has shifted from hand-designed local feature learning to unsupervised learning in order to extract features from raw data. In action recognition, good results are achieved using deep learning techniques such as stacking and convolution to extend the idea of independent subspace analysis (ISA). Albeit performance is good, it takes significant amount of time on big datasets due to high computational complexity and sequential implementation. We propose two methods for speeding up feature learning using ISA. We also propose input data modification which increases the classification performance. One method of faster feature learning is parallelization - we use the scalable programming model, MapReduce to parametrize ISA algorithm by distributing datasets into equal disjoint sets. The second method for increasing speed is by using spatio-temporal interest point detectors to extract “important” blocks from video. The latter not only enhances the speed but also improves the classification accuracy. We modified input as the gradient of video and achieved a better classification accuracy on all the datasets that were tested. We also created a dataset of water activities and used the ISA network for feature extraction. We achieved speed up by a factor of 4 and 2.4 in first and second method respectively.

I.

I NTRODUCTION

Human action recognition is an area of research which has a variety of unsolved problems and has applications in surveillance, shopping behaviour analysis and robotics etc. These systems are challenging to build because recognition is inherently a difficult task [1]. Recognition is done in two phases, first is feature extraction to discriminate the activities, followed by classification based on these extracted features. Feature extraction can be either supervised or unsupervised. Supervised feature extraction is problem-dependent and requires domain knowledge while in unsupervised methods feature is extracted directly from the raw data. In a recent work by Wang et al. [2] it is concluded that there is no best hand designed feature that can work for all datasets. This necessitates the learning of feature from data itself. Inspired by the success of deep neural networks, researchers have used convolutional neural nets(CNN) [3], [4], [5], deep belief nets [6] and sparse learning algorithms which are based on biological systems [7], [8] for learning hierarchical representation of local features. Quoc et al. [9] used independent subspace analysis (ISA), an extension of independent component analysis (ICA) to learn filters that resembles the receptive field of simple cells in primary visual c 2016 IEEE 978-1-5090-1746-1/16/$31.00

cortex [7], [10]. They extended the idea of ISA to video domain for learning hierarchical representation of spatiotemporal features using convolution and stacking to make the algorithm fast and scalable. This methodology works well and leads to good results in classification but has a drawback of expensive computation in terms of time i.e., 3 h 43 min for a dataset of size 3.4 GB on a machine with 16 cores Intel(R) Xeon(R) @2.3 GHz processor and 16 GB of RAM, for training the network. This is due to the high computational complexity of matrix operations involved. With increasing data volume, there is a need to speed up the system by parallel processing. We propose to use MapReduce [11] by Google which works similar to single instruction multiple data (SIMD) architecture of a processor. In MapReduce [11], data is divided equally into disjoint sets and then processed independently by workers in the cluster, which are controlled by a master. Each worker executes a task, which is a higher level map or reduce function [11]. Map processes key/value pairs and generates a set of intermediate key/value pairs which are then shuffled and sorted according to these intermediate keys. The reduce function merges the intermediate values according to these intermediate keys. This model can express many real world tasks in a simple way. The contributions of this paper are summarized below : •

•

• •

used distributed algorithm for training the ISA layers in the form of higher level MapReduce functions using RDD [12], an abstraction (provided by spark) responsible for efficient processing of an iterative algorithm and achieved a speed up by a factor of 4. used spatio-temporal interest point detectors [13] to extract the discriminative cuboids around the interest point which are used to train ISA and achieved a speed up of 2.4. We also note that in addition to enhanced speed, this gives the best classification performance. the classification performance is improved with modified input gradient while it is reduced with edges information. we have also built a dataset for human water activities for surveillance purposes which can be useful in real time decision making to avoid accidents.

In Section II we give an overview of related work and section III describes the ISA and it’s multilayer architecture in detail. Next section discuss different method of speed enhancement for ISA. Section V introduce various methods

Fig. 1: These are discontinuous frames with a gap of few seconds, from different videos of the water-activity dataset. First, second and third array of images belong to backstroke, freestyle and diving, respectively. This dataset consists of 202 videos from different human-water activities.

for improving the classification performance while next section give an overview of the human-water activity dataset developed. Section VII describes the experimental results followed by conclusion. II.

P REVIOUS W ORK

Owing to the success of biologically inspired convolutional neural network [3] for object recognition in images, research community extended this idea to video domain using different neural network architectures like 3D CNN [14], compact CNN [15] and TwoStream CNN [16] etc. These convolutional networks outperform other hierarchical models since they simulate visual systems in the way it represents different hierarchical level of features which are not affected by irrelevant variation in the environment to recognize objects [3]. These convolutional neural networks are computationally intensive in nature due to huge matrix operations. Quoc et al. [9], used stacking and convolution idea to make ISA algorithm fast and scalable for video domain to achieve hierarchical representation of features invariant to phase and shift. A PCA layer is included between two ISA layers to reduce dimension of feature vectors present in first ISA layer. This leads to reduction in training time of the network significantly [9]. Even with this improvement, increase in data size by a few gigabytes increases the time required for training by a few hours. Using graphics processing unit (GPU) leads to time reduction for training deep neural networks [17], but GPU’s main memory acts as a bottleneck and restricts scalability. A more feasible solution is to use clusters. GraphLab [18] and MapReduce [11] are the computational models used for processing large datasets using parallel distributed algorithms in a cluster. The MapReduce model simplifies implementation of data processing systems with high degree of parallelism and tolerance [12] to failure of nodes inbetween. Zhang et al. [19] used this programming model to speed up training of deep belief nets with stacked restricted Boltzmann machines (RBMs) [6]. They proposed a distributed back-propagation algorithm for training deep belief nets in the form of consecutive map and reduce functions over the spark [20] framework. III.

Quoc et at. [9] keeps the architecture of convolutional network simple with only two ISA layers to achieve hierarchical representation of features as shown in the Fig. 2. Training this network using projected gradient decent increases the time complexity due to orthogonalization of weights W [9] in each iteration of the ISA algorithm by O(N 3 ) where N is size of input videos. To scale the network for larger input convolutional network architecture with ISA as a sub-unit is used. Owing to the convolution operation, the dimension of feature vector is large. Also since the convolution is done over adjacent overlapping block, there could be similarity between the features obtained. Hence a PCA layer is introduced between two ISA layers to reduce dimension which results in training time reduction. The first ISA has subspace structure of size one and weight matrix V as identity. The multilayer architecture of ISA is shown in Fig.2 is trained by replicating first learned ISA layer over the bigger video blocks. The second ISA layer has V matrix as in equation (1) and subspace structure of size two as shown in dotted circle Fig.2.   1 1 0 0 ··· 0 0 0 0 1 1 · · · 0 0   V = (1) .. .. ..   ..  . . . . 0 0 0 0 ··· 1 1 In detail, we took blocks of size 16 × 16 × 10 for learning orthogonal weight kernel W for first ISA layer. Further we convolve these kernel over larger blocks of size 20 × 20 × 14 and got feature vectors of 2400(= 300 × 8) dimension. PCA layer reduces the dimension of these feature vector to 200 dimensions in the direction of higher variation in the features. We can visualize the above process as follows. The PCA at

OVERVIEW OF ISA AND M ULTILAYER ISA

A. ISA ISA [7] is a neural network which consists of two layers with square and square-root as their non-linear activation functions respectively, as shown in dotted circle Fig. 2. The first layer is a simple layer, followed by a pooling layer which combines the output of simple layer neurons. The weights W of the first layer are learned by fixing weight V of second layer which represents the subspace structure of neurons at second layer. Input to network is the unlabelled video dataset which is divided into fixed size video blocks. Details of the learning algorithm can be found in [9].

Fig. 2: Multilayer ISA is formed by replicating the learned ISA over larger video blocks and the output is sent to next ISA. Here we represent video blocks as non-overlapping, but experimentally these are overlapping video blocks.

input provides a representation of the video in 300 dimensional space where the bases corresponds to the most significant eigen vectors. These video representations are tranformed into a orthogonal feature space defined by weight matrix (W) of ISA layer. We observe that (Fig. 3) the feature space defined by first ISA layer is composed of edges. This confirms with the findings in [7], [10]. Also the second layer learns the motion

Fig. 3: These three set of figures represent input patch, PCA video bases vector and linear combination of PCA basis weighted by a row of weight matrix, W of the first ISA layer in multilayer ISA architecture. We have shown 1, 4, 7 and 10th subframe from the block of size16 × 16 × 10 in each group

of edges, since it processes the edge information obtained by convolving the first ISA layer over a bigger block.

MR1

MR2 . . .

MRn

Fig. 4: Mapreduce pipeline for ISA training with detail of a MapReduce stage for multiplication operation. In map stage key is index and it’s corresponding entry is value. Values of 1st and 2nd matrices are multiplied if their 2nd and 1st indices are same and then all values with same key are aggregated.

IV.

M ETHODS FOR SPEED ENHANCEMENT

A. Map-Reduce for Multilayer ISA The training algorithm for ISA has expensive operations like matrix multiplications and orthogonalization which are examined in the form of distributed MapReduce [11] jobs as shown in Fig. 4. The iterative MapReduce jobs restricts the usability of hadoop [21], so we used Spark [20] due to it’s capability to process an iterative MapReduce algorithm. Spark [20] utilize a cluster with multiple nodes and it has huge distributed memory for performing “in memory” computation. Spark provides powerful data structures like RowMatrix, IndexedRowMatrix and BlockMatrix which can acquire a huge size matrix of training data over a cluster in distributed fashion. The training starts with a mapper which map input matrix index to key and it’s entry to the value of key. The key/value pairs used for pre-processing will generate intermediate keyvalue pairs. On otherside, reducer takes those intermediate key/value pair to aggregate values with same matrix index (intermediate key). A complex data analytics application consist of multiple MapReduce in iterative fashion. Similarly, in training, we have multiple MapReduce jobs with different functionalities to obtain derivatives of a network layer with respect to its consecutive layer i.e., the derivative of output neurons at second layer with respect to first is a inverse squareroot function. In the same way, we calculate the gradient of objective function by applying chain rule over ISA layers in the form of multiple MapReduce jobs. The intermediate results remains in distributed main memory to reduce I/O cost and perform “in memory” computation iteratively. The optimization problem for finding diverse sparse features in video using ISA algorithm [9], requires orthogonalization of the weight matrix. We used LU decomposition concept for scalable matrix inversion similar to Xiang’s work [22] because LU decomposition [23] has higher level of parallelism than QR decomposition, Singular Value Decomposition

(SVD), and Gauss-Jordan elimination [24] etc.. The process includes multiplication of weight matrix and inverse of its Gramian matrix. We decompose Gramian matrix into lower traingular and upper triangular matrix using LU decomposition. The corresponding inverses with transpose of second are multiplied to get inverse Gramian matrix in the form of MapReduce job as shown in Fig. 4. The process explained above tunes the weight parameters of the first ISA layer. The bigger block are distributed to convolve in parallel fashion using the learned kernels of first ISA layer. we used IndexedRowMatrix data type from spark [20] which distribute the bigger block 20 × 20 × 14 and weight kernels over the cluster to make them parallelizable. Reducer takes inner product of these kernels with reshaped video vector to a 3D block 16×16×10 using mapper. The combined output of dimension 2400(= 300×8), behaves as an input to the next ISA layer which can be trained in the same way as discussed in the above paragraphs for first ISA layer. B. ISA on the spatio-temporal interest points We used spatio-temporal detector proposed by Dollár et al. [13] which is a separable linear filter capable of detecting sufficiently large number of interest points. The detector assume a stationary camera or a process accountable for camera motion. To find interest points, separable linear filters are applied on videos for calculating response function given by R = (I ∗ g ∗ hev )2 + (I ∗ g ∗ hod )2 (2) where g(x, y; σ) is the 2D Gaussian smoothing kernel, applied along spatial dimensions (x, y), and hev and hod are quadrature pair of 1D Gabor filters applied temporally, de2 2 fined as hev (t; τ, ω) = cos(2πtω)e−t /τ and hod (t; τ, ω) = −t2 /τ 2 sin(2πtω)e . Here σ and τ correspond to the spatial and temporal scales respectively. The parameters of response function is reduced to two by using ω = 4/τ . Normally complex or periodic motion gives high response to the function. The local maxima of the response function are taken as interest point. We extract cuboid at these point, containing spatio-temporally windowed pixel values that contribute to the response function. σ and τ are experimentally adjusted to get optimal results with 7 × 7 × 11 cuboid around interest points. These cuboids of dimension 7 × 7 × 11 are used to train ISA network. The interest point detector increases the complexity but at the same time it significantly reduces the data size by selecting important cuboids. Since convolution is not needed at first ISA layer which also reduces the time complexity, thereby increasing the speed of feature learning. V.

M ETHODS TO IMPROVE CLASSIFICATION

We noted that performance is also improved by training ISA using cuboids extracted from spatio-temporal interest point. In addition, we also used gradient and edge information

as input to improve performance since ISA algorithm learns Gabor like filters [7], [10]. In detail we concatenate X0 and0 Y component of pixel gra0 0 dients in video volume as {xtx , xty } where xtx and xty are the gradients of frames in X and Y directions respectively. Gradient gives us the intensity variation of pixels which is similar to edges so it should learn a better features representation.

like water-activity dataset, KTH [25], Weizmann [26] and Hollywood2 [27] of size 0.577 GB, 1.2 GB, 0.350 GB and 3.5 GB, respectively. Fig. 5 also shows that larger the data size, higher the performance in terms of time reduction. Method Quoc et al. [9] MapReduce (proposed) Interest point (proposed)

Execution time 3 h 43 min 54 min 93 min

TABLE I: Speed up comparision for Hollywood2 dataset on a machine consist of 16 cores Intel(R) Xeon(R) @2.3 GHz processor and 16 GB of RAM,.

Fig. 5: The graph represents the relation between execution time (in minutes) on y-axis and number of node running the ISA algorithm on x-axis.

We also used edges which is an area of strong variation in pixel. This reduces the training data and provides only structural features of an image. Canny edge detection algorithm is applied to all the frames in video which replace all of them with binary entries. The performance is measured for all various inputs given to the network instead of pixel intensity. VI.

DATASET FOR WATER ACTIVITIES

The dataset is developed to study human water activities for surveillance near swimming pool and other water bodies and to classify different swimming styles. It contains various water activities like drowning, diving, floating and different style of swimming like backstroke, breaststroke, freestyle and butterfly. Sample frames from some of the videos are given in Fig. 1. We collected videos from various sources like youtube, sports websites etc and then edited them to contain only one activity with time duration of 2 − 3 min at a frame rate 25fps and reslution of 176 x 144. This dataset contains 202 videos in which we have 150 videos for training and 52 for testing. The dataset is pre-processed to test efficiency of the Quoc’s experimental framework [9] with different input such as pixel intensity and it’s gradient. We also tested the dataset for diversity present in the features of multilayer ISA over Quoc’s experimental framework [9]. The network gives appreciable results as given in Table II with 300 and 200 dimensional weight kernel at first and second ISA layer, respectively. Results are discussed in the next section. VII.

E XPERIMENTAL R ESULTS

Map-reduce [11] implementation of the ISA algorithm reduces the training time due to parallel processing over a cluster. In Fig. 5 the time required for training (in minutes) vs the number of nodes is plotted. It can be observed that the amount of time needed decreases drastically with increasing number of nodes. Data required for training should be large enough, otherwise the overhead for parallelization will reduce the performance. We tested our MapReduce [11] implementation over a machine with 16 cores Intel(R) Xeon(R) @2.3 GHz processor and 16 GB of RAM, with different datasets

The spatio-temporal detector [13] removes the irrelevant video blocks and extracts a sufficient number of video blocks which are discriminative enough to recognize the activity. ISA is applied over these blocks without convolution. These video are very less than Quoc’s framework [9] which takes the complete set of video and use convolution over it. This reduces the training time with improved classification results as shown in Table I and III. The cuboid size was empirically determined as the one which gives best performance for all datasets. We applied multilayer ISA network on water-activity, KTH [25], Weizmann [26] and Hollywood2 [27] with gradient as input to extract features using convolution and stacking in the same way as Quoc et al. [9]. These features are used for classification using Wang’s proposed pipeline [2] which performs k-means clustering on features followed by histogram over the cluster centers and finally classification using SVM approach. In the Table II below we have classification matrix for human water activity dataset where each row correspond to a SVM for given activity and column represents the activities. Activity Floating (F1) Backstroke (B1) Butterfly (B2) Breaststroke(B3) Freestyle (F2) Diving (D1) Drowning (D2)

F1

B1

B2

B3

F2

D1

D2

45

0

0

0

1

0

0

2

47

0

0

1

0

0

0

1

45

2

0

0

0

0

0

1

45

2

0

0

1

0

1

0

43

0

0

0

0

0

0

0

52

0

1

0

0

0

0

0

47

TABLE II: The matrix is classification result of water-activity dataset using multilayer ISA. Abbreviation used for each water activity is shown in the small brackets corresponding to their names. Diagonal elements consist of true positive plus true negative.

The diagonal entries are the number of videos correctly classified as having or not having the given activities and the non-diagonal entries represent the activity classified wrongly to other activity. The wrong identifications are due to the resemblance of one activity with the other activities i.e., backstroke as floating, breaststroke as butterfly and sometimes freestyle to breaststroke. It can seen in the Table II that the number of misclassified activities is few due to the diversity present in the features owing to the orthogonality constraint on weight kernel. We also tested negative examples of ship floating in water, and the classification error was 8.3%. Since the classification results of water activity are good enough, this will be helpful in avoiding accidents near water bodies. We tested the multilayer ISA using gradient and edges as input, over water-activity, KTH [25], Weizmann [26] and Hollywood2 [27] dataset and compared results in Table III with

Dataset

Quoc et al. [9]

Gradient (proposed)

Edges (proposed)

Interest point (proposed)

KTH Water Activity Weizmann Hollywood

93.8 % 89.23 % 90.11 % 92.33 %

95.77 % 91.3 % 91.31 % 92.42 %

90.82 % 88.46 % 89.93% 91.81%

97.04% 91.8% 91.66% 92.63%

TABLE III: Performance of framework [9] with different inputs like pixel intensity, gradient and edges of the block.

the Quoc’s framework [9] which clearly shows an improvement of approximately 2.1%, 2%, 1.2% and lower than 1% for gradient input. In the Hollywood2 dataset, improvement is not significant due to the absence of repetitive motion in the local regions as the convolution is performed for a larger block. On the other hand KTH shows a better performance due to presence of recursive localized motion pattern. The results of edge video as input is poor. This may signify the fact that strong edges alone is not significant to learn good features. Intensity variations at other regions are also important. VIII.

[8]

[9]

[10]

[11]

[12]

[13]

C ONCLUSION

In this paper, we presented two methods to speed up ISA based classification of videos. First, we devlop a distributed parallel algorithm of ISA in the form of iterative MapReduce functions. Second, We reduced the data by taking the video block at high response interest point and then applied ISA without convolution. we achieved a 4 and 2.4 fold reduction in training time for first and second case respectively.

[14]

We also created a new dataset for human-water activities which have 202 videos of various activities like diving, drowning, floating and different stroke of swimming etc. We used video gradient instead of pixel intensity as input for multilayer ISA to improve quality of feature it learns from ISA, resulting in better classification of activity over water-activity dataset, KTH [25] and Weizmann [26].

[17]

For the new dataset that was created we could classify activities with 91.8 % accuracy. The classifier could clearly distinguish drowning from other activities and hence could be used in security and surveillance near water bodies. R EFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7]

N. Pinto, D. D. Cox, and J. J. DiCarlo, “Why is real-world visual object recognition hard?” 2008. H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,” in BMVC 2009British Machine Vision Conference. BMVA Press, 2009, pp. 124–1. Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on. IEEE, 2010, pp. 253–256. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014, pp. 1725–1732. G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, “Convolutional learning of spatio-temporal features,” in Computer Vision–ECCV 2010. Springer, 2010, pp. 140–153. G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006. A. Hyvärinen and P. Hoyer, “Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces,” Neural computation, vol. 12, no. 7, pp. 1705–1720, 2000.

[15] [16]

[18]

[19] [20]

[21]

[22]

[23]

[24] [25]

[26]

[27]

H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A biologically inspired system for action recognition,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, Oct 2007, pp. 1–8. Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 3361–3368. J. H. van Hateren and D. L. Ruderman, “Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex,” Proceedings of the Royal Society of London B: Biological Sciences, vol. 265, no. 1412, pp. 2315– 2320, 1998. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107– 113, 2008. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012, pp. 2–2. P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” in Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on. IEEE, 2005, pp. 65–72. S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013. Y. Poleg, A. Ephrat, S. Peleg, and C. Arora, “Compact cnn for indexing egocentric videos,” arXiv preprint arXiv:1504.07469, 2015. K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems, 2014, pp. 568–576. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le et al., “Large scale distributed deep networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1223–1231. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: a framework for machine learning and data mining in the cloud,” Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 716–727, 2012. K. Zhang and X.-w. Chen, “Large-scale deep belief nets with mapreduce,” Access, IEEE, vol. 2, pp. 395–403, 2014. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: cluster computing with working sets,” in Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, vol. 10, 2010, p. 10. A. Bialecki, M. Cafarella, D. Cutting, and O. OMalley, “Hadoop: a framework for running applications on large clusters built of commodity hardware, 2005,” Wiki at http://lucene. apache. org/hadoop, 2005. J. Xiang, H. Meng, and A. Aboulnaga, “Scalable matrix inversion using mapreduce,” in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. ACM, 2014, pp. 177–190. S. Lüpke, “Lu-decomposition on a massively parallel transputer system,” in PARLE’93 Parallel Architectures and Languages Europe. Springer, 1993, pp. 692–695. W. H. Press, Numerical recipes 3rd edition: The art of scientific computing. Cambridge university press, 2007. C. Schüldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local svm approach,” in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3. IEEE, 2004, pp. 32–36. L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247–2253, December 2007. M. Marszalek, I. Laptev, and C. Schmid, “Actions in context,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 2929–2936.

Accelerated Learning of Discriminative Spatio-temporal ... - IEEE Xplore

Accelerated Learning of Discriminative Spatio-temporal ... - IEEE Xplore

Suggest Documents

Jointly Learning Structured Analysis Discriminative ... - IEEE Xplore

Spatiotemporal Sparse Bayesian Learning With ... - IEEE Xplore

Spatiotemporal Focusing - IEEE Xplore

Discriminative Manifold Learning Based Detection of ... - IEEE Xplore

Spatiotemporal vehicle tracking - IEEE Xplore

( ) ( ) ( ) ( ) ( ) ( ) Online Selection of Discriminative ... - IEEE Xplore

Bilevel Model-Based Discriminative Dictionary Learning ... - IEEE Xplore

Speaker Identification Using Discriminative Features ... - IEEE Xplore

DISCRIMINATIVE VECTOR FOR SPOKEN LANGUAGE ... - IEEE Xplore

Analysis of Accelerated Gossip Algorithms - IEEE Xplore

Quasi-Periodic Spatiotemporal Filtering - IEEE Xplore

The Accelerated Operation Approach Substations ... - IEEE Xplore

Challenges in accelerated life testing - IEEE Xplore

Discriminative training of natural language call routers ... - IEEE Xplore

Discriminative Training of the Hidden Vector State ... - IEEE Xplore

Accelerated Direct Solution of the Method-of-Moments ... - IEEE Xplore

Spatiotemporal vehicle tracking - The use of ... - IEEE Xplore

Spatiotemporal Behavior of Integrated Water Vapor - IEEE Xplore

Using Learning Automaia - IEEE Xplore

Semiparametric Preference Learning - IEEE Xplore

A Hybrid Generative/Discriminative Method for EEG ... - IEEE Xplore

Discriminative feature extraction from X-ray images ... - IEEE Xplore

A Novel Bayesian Framework for Discriminative Feature ... - IEEE Xplore

A Novel Bayesian Framework for Discriminative Feature ... - IEEE Xplore