Aggregating Frame-level Features for Large ... - Research at Google

Recommend Documents

Building High-level Features Using Large Scale ... - Research at Google

same network is sensitive to other high-level concepts such as cat faces and human bod- ies. Starting with these learned

Building high-level features using large scale ... - Research at Google

Model parallelism, Data parallelism. Image ... Face neuron with totally unlabeled data with enough training and data. â

Building high-level features using large scale ... - Research at Google

recognigon, object recognigon, biomedical imaging. Very large model -â> Cannot fit in a single machine. -â> Mo

Building High-level Features Using Large Scale ... - Research at Google

Using Large Scale Unsupervised Learning. Quoc V. Le ... a significant challenge for problems where labeled data are rare

DISCRIMINATIVE FEATURES FOR LANGUAGE ... - Research at Google

language recognition system. We train the ... lar approach to language recognition has been the MAP-SVM method [1] [2] .

Amulet: Aggregating Multi-level Convolutional Features for Salient ...

Aug 7, 2017 - for Salient Object Detection. Pingping Zhangâ . Dong Wangâ . Huchuan Luâ â Hongyu Wangâ . Xiang Ruanâ¡. â Dalian University of Technology, ...

Areal and Phylogenetic Features for Multilingual ... - Research at Google

munity is the growing need to support low and even zero- resource languages [8, 9]. For speech ... and Australian Englis

Areal and Phylogenetic Features for Multilingual ... - Research at Google

times phylogenetic and areal representations lead to significant multilingual synthesis quality ... phonemic configurati

Spherical Random Features for Polynomial ... - Research at Google

Their training complexity ... procedure can utilize the fast training and testing of linear methods while still preservi

Large Vocabulary Automatic Speech ... - Research at Google

Sep 6, 2015 - child speech relatively better than adult. ... Speech recognition for adults has improved significantly ov

Efficient Topologies for Large-scale Cluster ... - Research at Google

... to take advantage of additional packing locality and fewer optical links with ... digital systems â e.g., server c

Deep Learning Methods for Efficient Large ... - Research at Google

Jul 26, 2017 - Google Cloud & YouTube-8M Video. Understanding Challenge ... GAP scores are from private leaderboard.

ON LATTICE GENERATION FOR LARGE ... - Research at Google

The history conditioned lexical prefix tree search strategy requires only slight modifications to efficiently generate .

Omega: flexible, scalable schedulers for large ... - Research at Google

Apr 15, 2013 - K.6.4 [Management of computing and information systems]:. System .... âfairnessâ and business importa

Cultivating DNN Diversity for Large Scale Video ... - Research at Google

develop new approaches to video analysis and classifica- ... We then provide analysis on the link ...... tion and Softwa

Simultaneous Feature Aggregating and Hashing for Large-scale ...

Apr 4, 2017 - [43] J. Wang, W. Liu, S. Kumar, and S. Chang. Learning to hash for indexing ... [44] J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity.

Google Image Swirl: A Large-Scale Content ... - Research at Google

{jing,har,chuck,jingbinw,mars,yliu,mingzhao,covell}@google.com. Google Inc., Mountain View, ... 2. User Interface. After

Large-scale Privacy Protection in Google Street ... - Research at Google

false positives by incorporating domain-specific information not available to the ... cation allows users to effective

Large-scale Privacy Protection in Google Street ... - Research at Google

wander through the street-level environment, thus enabling ... However, permission to reprint/republish this material fo

Google Image Swirl: A Large-Scale Content ... - Research at Google

used to illustrate tree data data structures, there are many options in the literature, ... Visualizing web images via g

The Geometry of Random Features - Research at Google

tion in the regime of high data dimensionality (Yu et al.,. 2016) .... (4). Henceforth we take k = 1. The analysis for a

building musically-relevant audio features ... - Research at Google

features computed at different timescales, short-timescale events are put in context ... ture of music audio would certa

Aggregating causal judgements - LSE Research Online

London School of Economics, Government Department, London, UK. ... ways but here we adopt the Bayesian network framework familiar from the work of Pearl.

Aggregating Features and Matching Cases on Vague ... - IJCAI

Alfons Schuster*, Werner Dubitzky*1, Philippe Lopes**, Kenneth Adamson*, David A. Bell*,. John G. Hughes*, John A. White***. University of Ulster/Faculty of ...

Aggregating Frame-level Features for Large ... - Research at Google

Download PDF

5 downloads 328 Views 397KB Size Report

Comment

RNN. Residual connections. GRU. -. GRU. Bi-directional ... "NetVLAD: CNN architecture for weakly supervised place recogn

Aggregating Frame-level Features for Large-Scale Video Classification in the Google Cloud & YouTube-8M Video Understanding Challenge Shaoxiang Chen1, Xi Wang1, Yongyi Tang2, Xinpeng Chen3, Zuxuan Wu1, Yu-Gang Jiang1 1Fudan University 2Sun Yat-Sen University 3Wuhan University

Google Cloud & YouTube-8M Video Understanding Challenge • Video multi-label classification • 4,716 classes • 1.8 classes per video • Large amount of data (used in the challenge) Partition

Number of Samples

Train

4,906,660 (70%)

Validate

1,401,828 (20%)

Test

700,640 (10%)

Total

7,009,128

• Audio and rgb features extracted from CNNs are provided in both frame and video level.

Summary • • • • • • •

Our models are: RNN variants, NetVLAD and DBoF. By default we used the MoE (Mixture of Experts) as video level classifier. Our solution is implemented in TensorFlow based on the starter code. It takes 3-5 days to train our frame-level models on a single GPU. Achieved 0.84198 GAP on the public 50% test data and 4th place in the challenge. Paper: https://arxiv.org/pdf/1707.00803.pdf Code & Documentation: https://github.com/forwchen/yt8m

Model learning overview

Models & Training Model

Variations

LSTM

-

LSTM

Layer normalization & Recurrent dropout

RNN

Residual connections

GRU

-

GRU

Bi-directional

GRU

Recurrent dropout

GRU

Feature transformation

RWA

-

NetVLAD

-

DBoF

-

MoE

-

Training settings: • Learning rate: 0.001, decays every epoch • Batch size: 256 (RNNs), 1024(NetVLAD) • Adam optimizer

DBoF & MoE: https://github.com/google/youtube-8m Recurrent Weighted Average: https://github.com/jostmey/rwa RNN with residual connections: https://github.com/NickShahML/tensorflow_with_latest_papers

Models details

Structure of RWA (Recurrent Weighted Average) from [1]

Structure RNN with residual connections from [2] [1] Ostmeyer, Jared, and Lindsay Cowell. "Machine Learning on Sequential Data Using a Recurrent Weighted Average." arXiv 2017. [2] Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv 2016.

Models details Input sequence

𝑇×𝐷

𝒂1 , 𝒂2 , … , 𝒂𝐾

𝒓1 , 𝒓2 , … , 𝒓𝑆 sample

𝑆×𝐷

1d-conv & softmax

𝒖1 , 𝒖2 , … , 𝒖𝐾

[𝒗1 , 𝒗2 , … , 𝒗𝐾 ]

𝐾×𝐷

𝐾𝐷

𝑆×𝐾 𝑆

𝒗𝑘 = ෍ 𝑎𝑖𝑘 (𝒓𝑖 − 𝒖𝑘 ) 𝑖=1

Cluster means

𝐿 = 1024

Arandjelovic, Relja, et al. "NetVLAD: CNN architecture for weakly supervised place recognition.“ CVPR 2016

Results (Single model) Model

GAP@20

NetVLAD

0.79175

LSTM

0.80907

GRU

0.80688

RWA

0.79622

RNN-Residual

0.81039

GRU-Dropout

0.81118

LSTM-Layernorm

0.80390

GRU-Bidirectional

0.80665

GRU-feature-trans

0.78644

Results (Ensemble) Ensembles

GAP@20

NetVLAD

0.80895

LSTM

0.81571

GRU

0.81786

RWA

0.81007

RNN-Residual

0.81510

GRU-Dropout

0.82523

Ensemble 1

0.83996

Ensemble 2

0.83481

Ensemble 3 (searching)

0.83581

Ensemble 4

0.84198

Fusion weights: • Empirical, based on valid/test set performance • Searching / learning over small split of valid set

Model ensembles: fusion of 3-5 checkpoint predictions Ensemble 1: fusion of 9 model ensembles Ensemble 2: fusion of another ~20 models with equal weights Ensemble 3: fusion of the same models as 2, weights are obtained with searching Ensemble 4: fusion Ensemble 1, 3 and others.

Q&A