Distributed Deep Learning on Mesos with GPUs and Gang Scheduling

Recommend Documents

Feb 1, 1996 - For example, assume that the size of the process group is 1000, the ..... 16 nod"". Mnodes. 32 nod"s. 16 nodes. 32 noo,,". 1('; nudes o. 1000.

Implementation of Gang-Scheduling on Workstation Cluster

Tsukuba Research Center ... ment gang-scheduling on a workstation cluster and the problems we faced. ..... call of the function on the processor speci ed by the.

Local Scheduling out-performs Gang Scheduling on a ... - CiteSeerX

Computer Science, Faculty of Engineering and Information Technology, ... Keywords: parallel computing, job scheduling, gang scheduling, cluster computing.

Comparing Gang Scheduling with Dynamic Space Sharing on - IPDPS ...

traditional approaches to scheduling a mixed load of jobs. Performance results for ASAT ... and dispatch the threads of a process in a roughly synchronized manner. ... extent to which they use hardware or software and the techniques used to ...

Local Scheduling out-performs Gang Scheduling on a ... - CiteSeerX

Research School of Information Sciences and Engineering ... Computer Science, Faculty of Engineering and Information Technology, and the Computer Sciences ... May 2003. TR-CS-02-06 Stephen M Blackburn and Kathryn S McKinley. ..... Ultimately, by exte

Scheduling Transactions on Distributed Systems with

to manage distributed global transactions. In Section 2 we show how transactions at au- tonomous local systems are called from VPL. In Sec- tion 3 a control ow ...

Distributed Deep Reinforcement Learning using

environments from OpenAI Gym, the agent outperforms a decent human reference player with few days of training. Keywordsâ Deep Reinforcement Learning, ...

Distributed Deep Q-Learning - arXiv

Hao Yi Ong1, Kevin Chavez1, and Augustus Hong1. AbstractâWe ... function, and R is a reward function. T gives the ... taking action a at the current state s and is denoted R(s,a). To solve ..... Stone, âA neuroevolution approach to general atari.

Distributed OR Scheduling with Granularity

Keywordsâ parallel logic programming, static analysis, task scheduling, granularity analysis. I. INTRODUCTION. Parallel processing is a very important ...

Deep Reinforcement Learning-based Scheduling for Roadside

Today, Vehicular Ad-Hoc Networks (VANETs) offer a promising way to achieve ... where vehicles have non-safety download requests to be served by a battery-powered ... Section V lays out a detailed presentation of the MDP formulation. The.

Learning User Preferences in Distributed Calendar Scheduling

We take a statistical approach to learning a static time-of-day (TOD) preference ..... C. Lisa Dent, Jesus Boticario, John P. McDermott, Tom M. Mitchell, and David.

Scalable Distributed Deep Learning with Chainer Takuya Akiba ...

... with Chainer. Takuya Akiba | Preferred Networks, Inc. ... Data Parallel. 9 ... definition. Computational graph. Grad

Distributed Deep Learning Using Synchronous ... - Semantic Scholar

Feb 22, 2016 - Intel Parallel Computing Lab, Bangalore. Bharat Kaul. [email protected]. Intel Parallel Computing Lab, Bangalore. Pradeep Dubey.

How to scale distributed deep learning?

Nov 14, 2016 - How to scale distributed deep learning? Peter H. Jin, Qiaochu Yuan, Forrest Iandola, and Kurt Keutzer. Department of Electrical Engineering ...

Adaptive Heterogeneous Scheduling for Integrated GPUs

Aug 24, 2014 - new challenges to application developers, as achieving max- imum performance ... it should perform the profiling with near zero overhead, ef- fectively utilizing all ...... [7] M. E. Belviranli, L. N. Bhuyan, and R. Gupta. A dynamic ..

Experiments with Gang Scheduling for Real Time Systems - CiteSeerX

case execution time through measurement, since the effects related with uncoordinated ... There are M jobs, and all of them are ready for execution at time 0.

Gang Scheduling with Lightweight User-Level ... - Semantic Scholar

708â717, April 1997. [19] Werner Vogels, David Follett, Jenwi Hsieh, David Lifka, and David. Stern. Tree-Saturation Control in the AC Velocity Cluster. In Hot.

DLTAP: A Network-efficient Scheduling Method for Distributed Deep ...

... Containerized Cluster Environment. Wei QIAO1,a, Ying LI2,b and Zhong-Hai WU3,* ..... E. Howard, W. Hubbard, and L. D. Jackel,. âBackpropagation applied to ...

Distributed Genetic Programming on GPUs using ... - Semantic Scholar

problems, there are fewer fitness cases than shader processors. In this scenario, some ..... for technical support with the CUDA.Net toolkit. ... Wilson, G., Banzhaf, W.: Linear genetic programming GPGPU on microsoft's xbox 360. In: J. Wang (ed.) ...

Distributed OR Scheduling with Granularity ... - Semantic Scholar

PatrÄ±cia Kayser Vargas 1 Jorge Luis V. Barbosa 1 2 DÃ©bora Nice Ferrari 1. ClÃ¡udio F. R. Geyer 1 Jacques Chassin 3. 1 Universidade Federal do Rio Grande do ...

Overhead Analysis of Preemptive Gang Scheduling

Tsukuba Research Center. Real World ... The gang scheduler, called SCore-D, is implemented on top of a UNIX operating system and runs on .... user process is caught by an element process object with a wait() system call, and the stopped ...

Distributed Fair Scheduling with Variable Transmission Lengths ...

we call VLS, for âvariable-length schedulingâ, that provides weighted ..... t (ms). # of packets sent. S1. S2. S3. S4. S5. S6. S7. S8. S9. S10. (a) w/o VLS. 0. 0.5. 1 ..... Networking Conference, 2005 IEEE, volume 3, pages 1551â1556 Vol. 3, 200

Achieving Efficient Distributed Scheduling with ... - Computer Science

Achieving Efficient Distributed Scheduling with Message Queues in the Cloud ... Ramamurty, Ke Wang, Ioan Raicu .... higher number of jobs in order to improve the application ... scheduling/management system that satisfies four major.

Distributed Fair Scheduling with Variable ... - Semantic Scholar

Jul 8, 2007 - VARIABLE-LENGTH SCHEDULING (VLS). In this section, we assume that there is only one collision domain. That is, each station can sense ...

Distributed Deep Learning on Mesos with GPUs and Gang Scheduling

Download PDF

0 downloads 107 Views 14MB Size Report

Comment

Distributed Deep Learning on Mesos with. GPUs and Gang Scheduling. Min Cai, Alex Sergeev, Paul Mikesell, Anne Holler, UB

Distributed Deep Learning on Mesos with GPUs and Gang Scheduling Min Cai, Alex Sergeev, Paul Mikesell, Anne Holler, UBER

Who are we?

Min Cai

Alex Sergeev

Paul Mikesell

Anne Holler

Deep Learning @ UBER • Use cases: – Self-Driving Vehicles – Trip Forecasting – Fraud Detection

Self-Driving Vehicles

Self-Driving Vehicles

Trip Forecasting

Fraud Detection

Spam referral code

Partner up with the same driver

Cash out Uber credits

Why Distributed Deep Learning? • Speed up model training • Scale out to hundreds of GPUs • Shard large models that can not fit into a single machine

How Distributed Deep Learning Works

Why Mesos? • • • • •

Widely adopted GPU Support Nested Containers Highly Customizable Reliable and Scalable

Mesos Support for GPUs • •

Mesos Containerizer only Docker Containerizer support is not landed to upstream yet

Mesos Nested Containers • •

Separate management code from user docker images Avoid dependency conflict

What is Missing? • • • • •

Elastic GPU Resource Management Locality and Network aware Placement Gang Scheduling Task Discovery Failure Handling

Peloton Overview

Peloton Architecture

Elastic GPU Resource Management

Resource Pools

Gang Scheduling • A subset of Tasks in a Job can be specified for Gang Scheduling • Gang tasks are a single scheduling unit – Admitted, placed, preempted and killed as a group

• Gang tasks are independent execution units – Run in separate containers and may fail independently

• Gang execution is terminated if a gang task fails and cannot be restarted

Placement Strategies • Place as many as container into the same host or rack • Best fit algorithm to tightly packing GPU containers • Constraint based placement for same generation of GPUs

Why TensorFlow? • Most popular Open Source framework for Deep Learning • Combines high performance with ability to tinker with low level model details • Has end-to-end support from research to production

Architecture for Distributed TensorFlow

Architecture for Distributed TensorFlow on Mesos

Distributed Training Performance Training with synthetic , config=config, hooks=hooks) as mon_sess: while not mon_sess.should_stop(): # Perform synchronous training. mon_sess.run(train_op)

Giving Back

Horovod is available on GitHub today https://github.com/uber/horovod

Thank you!

Any questions?