Deep Learning Methods for Efficient Large ... - Research at Google

Recommend Documents

Deep Learning Methods for Efficient Large Scale Video Labeling

Jun 14, 2017 - We present a solution to âGoogle Cloud and YouTube-. 8M Video ..... 128 samples batch size) achieved pr

Learning with Deep Cascades - Research at Google

based on feature monomials of degree k, or polynomial functions of degree k, ... on finding the best trade-off between c

Efficient Topologies for Large-scale Cluster ... - Research at Google

... to take advantage of additional packing locality and fewer optical links with ... digital systems â e.g., server c

Efficient Inference and Structured Learning for ... - Research at Google

constraints are enforced by reverting to k-best infer- ..... edge eâ,0 between vâ1 and v0. Set the weight ... does n

Advanced Deep Learning Methods

use cases related to MIC, provides a brief model overview and describes the current state-of-the-art methods in the respective area. Basic knowledge about.

Kernel Methods for Learning Languages - Research at Google

Dec 28, 2007 - its input labels, and further optimize the result with the application of the. 21 ... for providing hosti

Efficient Spatial Sampling of Large ... - Research at Google

geographical databases, spatial sampling, maps, data visu- alization ...... fairness objective is typically best used al

Sample Efficient Deep Reinforcement Learning for

Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for. Map-less Navigation by Leveraging Prior Demonstrations. M. Pfeiffer1â, S. Shukla2â, ...

Deep Learning for Large Scale Biodiversity Monitoring

Sep 28, 2015 - provide affordable and effective measures of conservation outcomes. ... adaptive management, conservation outcomes, big data, sensor networks ... Ecosystem services [1], being the contribution of nature to human well-being ...

Efficient Deep Learning Model for Text Classification

in [16]; a matrix vector recursive model for semantic compositionality, recursive neural network applied to sentence classification, configuration function is ...

Deep Learning in Speech Synthesis - Research at Google

Aug 31, 2013 - s1 s2 s3 s4 s5. sT . . Cepstra. Spectra. â â â â â. â â. â dimensinality reduction. â¢ Typically use lower dimensional approximation of ...

Deep Learning in Speech Synthesis - Research at Google

Aug 31, 2013 - Deep learning-based approaches ... Machine learning methodology using multiple-layered models ..... Learning deep architectures for AI.

Resurrecting the sigmoid in deep learning ... - Research at Google

Since error information backpropagates faithfully and isometrically through the network, this stronger requirement is ca

Efficient Methods for Large Resistor Networks - Google Sites

AbstractâLarge resistor networks arise during the design of very-large-scale integration chips as a result of parasitic extraction and electro static discharge ...

Efficient Methods for Large Resistor Networks - Google Sites

AbstractâLarge resistor networks arise during the design of very-large-scale ... electrical charge on the pins of a pa

Efficient Learning of Sparse Ranking Functions - Research at Google

isting learning tools with matching generalization analysis that stem from Valadimir. Vapnik's work [13, 14, 15]. Howeve

Large-Scale Learning with Less RAM via ... - Research at Google

such as those used for predicting ad click through rates. (CTR) for sponsored ... Streeter & McMahan, 2010) or for f

Robust Large-Scale Machine Learning in the ... - Research at Google

and enables it to scale to massive datasets on low-cost com- modity servers. ... datasets. In this paper, we describe a

cuDNN: Efficient Primitives for Deep Learning - Google Sites

Theano [5], and Caffe [11] feature suites of custom kernels that implement basic operations such ... make it much easier

UNSUPERVISED CONTEXT LEARNING FOR ... - Research at Google

grams. If an n-gram doesn't appear very often in the training ... for training effective biasing models using far less d

EFFICIENT SEARCH METHODS AND DEEP BELIEF NETWORKS ...

1 for an illustration) and y1 is a random variable indi- cating the presence of a lip in the window defined by Î¸ given a specific lip stage. The lip stages represent a ...

Maths Research - Deep Learning Website.pdf - Google Drive

... Cumbria www.cumbria.ac.uk/LED ). Andy Ash (Deep Learning Teaching School Alliance http://www.deeplearningtsa.co.uk/

Multi-Class Deep Boosting - Research at Google

We present new ensemble learning algorithms for multi-class classification. Our ... provided that an efficient algorithm

Efficient Deep Learning-Based Automated Pathology ... - MDPI

Jun 20, 2018 - School of Data and Computer Science, Sun Yat-sen University, 132 East Waihuan Road, .... CNN is widely used in computer vision (especially in image ..... Sotirchos, E.S.; Calabresi, P.A.; Ying, H.S.; Prince, J.L. Retinal layer.

Deep Learning Methods for Efficient Large ... - Research at Google

Download PDF

4 downloads 350 Views 311KB Size Report

Comment

Jul 26, 2017 - Google Cloud & YouTube-8M Video. Understanding Challenge ... GAP scores are from private leaderboard.

Kaggle Competition Google Cloud & YouTube-8M Video Understanding Challenge 5th place solution

Deep Learning Methods for Efficient Large Scale Video Labeling M. Pękalski, X. Pan and M. Skalic CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding Honolulu, HI July 26, 2017

Agenda 1.

Team You8M

2.

Models

3.

Data Augmentation and Feature Enginnering

4.

Training Methods

5.

Key to Success

Team You8M

Marcin Pękalski: Master’s degree in Mathematics and Master’s degree in International Economics. Currently a data scientist at Kambi, a B2B sportsbook provider. Stockholm, Sweden. Miha Skalic: Holds Master's degree in Biotechnology. Now working towards a PhD in Biomedicine at University Pompeu Fabra . Barcelona, Spain. Xingguo E. Pan: Trained as a physicist. After getting a PhD, he went into the financial industry. Chicago, USA.

Frame level Models

Bi-directional LSTM • • • •

Dynamic length RNN Two models running in oposite directions MoE with two experts applied to last layer 6 epochs took 3 days

Bi-directional GRU • • •

Similar structure as LSTM Layer sizes 625x2, 1250 Trained with 5 folds

Bi-directional LSTM model

Video level models

MoNN: Mixture of neural network experts input

MoNN3Lw 3 FC layers: • 2305x8 • 2305x1 • 2305x3

gate activations

fully connected fully connected

3 experts Features • mean_rgb/audio • std_rgb/audio • num_frames 5 epochs took 9 hours on GeForce GTX 1080ti

fully connected expert activation final predictions

# layers

Models

GAP

GAP

single checkpoint single fold

models of same family, checkpoints and folds ensemble

MoNN

0.823

0.835

LSTM

0.820

0.835

GRU

0.812

0.834

GAP All models ensemble

0.8419 Best score 0.8424

Model Correlation Models MoNN

• •

All models are implemented with Tensorflow. GAP scores are from private leaderboard.

LSTM GRU

MoNN

LSTM

GRU

1.0

0.96

0.96

1.0

0.98 1.0

Data augmentation & Feature engineering

Video splitting: • •

Split one video into two halves training samples: 6.3 million => 18.9 million

Video Level features: • mean-rgb/audio • std-rgb/audio • 3rd moments of rgb/audio • num_frames • Moments of entire video • top/bottom 5 per feature dimension

Training Being able to see the out-of-sample GAP greatly facilitated our management of the training process and of model and feature selection. Example: Both yellow and red models can well fit the train data, but yellow model peaked around 20 K steps on test data.

Monitor out-of-sample performance while training

Training

Dropout Truncated Labels Batch, Global and No Normalization

Training Methods that we experimented with and that helped to deepen our understanding of the data and models.

Exponential Moving Average Training on Folds Boosting Network

Key to Success

We tried a lot of different architectures (wide and narrow, deep and shallow, dropouts, ema, boosting, etc.). Combined video and frame level models. Tried different ensemble weighting techniques, utilizing individual model’s GAP score and the correlation between models. Generated different features: pre-assembled them into data for efficient training process. Data augmentation. Training on folds of data and averaging the results over folds and checkpoints.

Resources & Acknowledgement s

arXiv Paper: • https://arxiv.org/abs/1706.04572 Source Code: • https://github.com/mpekalski/Y8M

©Apache License, version 2

Acknowledgements: •

The authors would like to thank the Computational Bio-physics group at University Pompeu Fabra for letting us use their GPU computational resources. We would also like to thank Jose Jimenez for valuable discussions and feedback.