Deep Learning Methods for Efficient Large ... - Research at Google
Recommend Documents
Jun 14, 2017 - We present a solution to âGoogle Cloud and YouTube-. 8M Video ..... 128 samples batch size) achieved pr
based on feature monomials of degree k, or polynomial functions of degree k, ... on finding the best trade-off between c
... to take advantage of additional packing locality and fewer optical links with ... digital systems â e.g., server c
constraints are enforced by reverting to k-best infer- ..... edge eâ,0 between vâ1 and v0. Set the weight ... does n
use cases related to MIC, provides a brief model overview and describes the current state-of-the-art methods in the respective area. Basic knowledge about.
Dec 28, 2007 - its input labels, and further optimize the result with the application of the. 21 ... for providing hosti
geographical databases, spatial sampling, maps, data visu- alization ...... fairness objective is typically best used al
Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for. Map-less Navigation by Leveraging Prior Demonstrations. M. Pfeiffer1â, S. Shukla2â, ...
Sep 28, 2015 - provide affordable and effective measures of conservation outcomes. ... adaptive management, conservation outcomes, big data, sensor networks ... Ecosystem services [1], being the contribution of nature to human well-being ...
in [16]; a matrix vector recursive model for semantic compositionality, recursive neural network applied to sentence classification, configuration function is ...
Aug 31, 2013 - s1 s2 s3 s4 s5. sT . . Cepstra. Spectra. â â â â â. â â. â dimensinality reduction. ⢠Typically use lower dimensional approximation of ...
Aug 31, 2013 - Deep learning-based approaches ... Machine learning methodology using multiple-layered models ..... Learning deep architectures for AI.
Since error information backpropagates faithfully and isometrically through the network, this stronger requirement is ca
AbstractâLarge resistor networks arise during the design of very-large-scale integration chips as a result of parasitic extraction and electro static discharge ...
AbstractâLarge resistor networks arise during the design of very-large-scale ... electrical charge on the pins of a pa
isting learning tools with matching generalization analysis that stem from Valadimir. Vapnik's work [13, 14, 15]. Howeve
such as those used for predicting ad click through rates. (CTR) for sponsored ... Streeter & McMahan, 2010) or for f
and enables it to scale to massive datasets on low-cost com- modity servers. ... datasets. In this paper, we describe a
Theano [5], and Caffe [11] feature suites of custom kernels that implement basic operations such ... make it much easier
grams. If an n-gram doesn't appear very often in the training ... for training effective biasing models using far less d
1 for an illustration) and y1 is a random variable indi- cating the presence of a lip in the window defined by θ given a specific lip stage. The lip stages represent a ...
... Cumbria www.cumbria.ac.uk/LED ). Andy Ash (Deep Learning Teaching School Alliance http://www.deeplearningtsa.co.uk/
We present new ensemble learning algorithms for multi-class classification. Our ... provided that an efficient algorithm
Jun 20, 2018 - School of Data and Computer Science, Sun Yat-sen University, 132 East Waihuan Road, .... CNN is widely used in computer vision (especially in image ..... Sotirchos, E.S.; Calabresi, P.A.; Ying, H.S.; Prince, J.L. Retinal layer.
Deep Learning Methods for Efficient Large ... - Research at Google
Jul 26, 2017 - Google Cloud & YouTube-8M Video. Understanding Challenge ... GAP scores are from private leaderboard.
Kaggle Competition Google Cloud & YouTube-8M Video Understanding Challenge 5th place solution
Deep Learning Methods for Efficient Large Scale Video Labeling M. Pękalski, X. Pan and M. Skalic CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding Honolulu, HI July 26, 2017
Agenda 1.
Team You8M
2.
Models
3.
Data Augmentation and Feature Enginnering
4.
Training Methods
5.
Key to Success
Team You8M
Marcin Pękalski: Master’s degree in Mathematics and Master’s degree in International Economics. Currently a data scientist at Kambi, a B2B sportsbook provider. Stockholm, Sweden. Miha Skalic: Holds Master's degree in Biotechnology. Now working towards a PhD in Biomedicine at University Pompeu Fabra . Barcelona, Spain. Xingguo E. Pan: Trained as a physicist. After getting a PhD, he went into the financial industry. Chicago, USA.
Frame level Models
Bi-directional LSTM • • • •
Dynamic length RNN Two models running in oposite directions MoE with two experts applied to last layer 6 epochs took 3 days
Bi-directional GRU • • •
Similar structure as LSTM Layer sizes 625x2, 1250 Trained with 5 folds
Bi-directional LSTM model
Video level models
MoNN: Mixture of neural network experts input
MoNN3Lw 3 FC layers: • 2305x8 • 2305x1 • 2305x3
gate activations
fully connected fully connected
3 experts Features • mean_rgb/audio • std_rgb/audio • num_frames 5 epochs took 9 hours on GeForce GTX 1080ti
fully connected expert activation final predictions
# layers
Models
GAP
GAP
single checkpoint single fold
models of same family, checkpoints and folds ensemble
MoNN
0.823
0.835
LSTM
0.820
0.835
GRU
0.812
0.834
GAP All models ensemble
0.8419 Best score 0.8424
Model Correlation Models MoNN
• •
All models are implemented with Tensorflow. GAP scores are from private leaderboard.
LSTM GRU
MoNN
LSTM
GRU
1.0
0.96
0.96
1.0
0.98 1.0
Data augmentation & Feature engineering
Video splitting: • •
Split one video into two halves training samples: 6.3 million => 18.9 million
Video Level features: • mean-rgb/audio • std-rgb/audio • 3rd moments of rgb/audio • num_frames • Moments of entire video • top/bottom 5 per feature dimension
Training Being able to see the out-of-sample GAP greatly facilitated our management of the training process and of model and feature selection. Example: Both yellow and red models can well fit the train data, but yellow model peaked around 20 K steps on test data.
Monitor out-of-sample performance while training
Training
Dropout Truncated Labels Batch, Global and No Normalization
Training Methods that we experimented with and that helped to deepen our understanding of the data and models.
Exponential Moving Average Training on Folds Boosting Network
Key to Success
We tried a lot of different architectures (wide and narrow, deep and shallow, dropouts, ema, boosting, etc.). Combined video and frame level models. Tried different ensemble weighting techniques, utilizing individual model’s GAP score and the correlation between models. Generated different features: pre-assembled them into data for efficient training process. Data augmentation. Training on folds of data and averaging the results over folds and checkpoints.
The authors would like to thank the Computational Bio-physics group at University Pompeu Fabra for letting us use their GPU computational resources. We would also like to thank Jose Jimenez for valuable discussions and feedback.