Dynamically Allocating GPGPU to Host Nodes (Servers) - GTC 2012

35 downloads 4724 Views 1MB Size Report
Dell HPC. How to change the mapping ? ▫ Use Web User Interface. ▫ Connect to the C410X using laptop. ▫ Has to be done individually on each. C410X.
Dynamically Allocating GPUs to Host Nodes (Servers)

Saeed Iqbal, Shawn Gao and Alaa Yousif

Introduction

Dell HPC

How can we use GPUs in Servers to make solutions ?

Dell HPC

How can we use GPUs in Servers ? There are two fundamental options  External GPUs

 Internal GPUs

Dell HPC

How can we use GPUs in Servers ? There are two fundamental options  External GPUs

 Internal GPUs

– Number of GPUs Flexible

– Number of GPUs Fixed

– Sharing GPUs among users

– Less GPU related Cabling

– Easy to replace/service GPUs

– Each GPU has fixed BW to CPUs

– Targeted toward large number of GPU installations.

– Targeted towards small and large GPU installations.

Dell HPC

Overview of the Solution Components:- C410X Basically its, “Room and board” for 16 GPUs Features:          

Theoretical Max. of 16.5 TFLOPs Connects up to 8 hosts Connects up to 16 PCIe Gen-2 devices (GPUs) to hosts Connects a Maximum of 8 devices to a given host High density, 3U chassis Flexibility of selecting number of GPUs Individually serviceable modules N+1 1400W Power supplies (3+1) N+1 92mm Cooling fans (7+1) PCIe switches (8 PEX 8647, 4 PEX 8696)

Dell HPC

Overview of the Solution Components - C6220 Features High density – Four Compute Nodes in 2U space  Each Node:    

Dual Intel Sandy Bridge-EP (E5-2600) processor, 16 DIMMs up to 256GB per node Internal Storage 24TB SATA, 36TB SAS 1 PCIe Gen3 x8 Mezzanine (daughter card) 

 

FDR IB or QDR IB or 10GigE

1 PCIe x16 Gen3 (half-length, half-height) Embedded BMC with IPMI 2.0 support

 Chassis Design:   

Hot Plug, Individual Nodes Up to 12 x 3.5” drives (3 per node), 24 x 2.5” drives (6 per node) N+1 Power supplies (1100W or 1400W) Dell HPC

Host to GPU Mapping Options on the C410XConnect to 2,4 or 8 GPUs per Host

The three mapping options available on the C410X Dell HPC

How to change the mapping ?  Use Web User Interface  Connect to the C410X using laptop  Has to be done individually on each C410X  Easy for small installations

 Use the Command line (CLI)  Connect to the C410X through and use IPMITool  Can be scripted for automation, job scheduler/workload manager

 Can handle multiple C410X through the attached compute nodes  Targeted towards small and large installations

Dell HPC

Details of the Web User Interface

Dell HPC

Dynamic Mapping

Dell HPC

Baseboard Management Controller (BMC)  Industry Standard Support for IPMI v2.0  Out-of-band monitoring and control severs  Helps in generating FRU information report ,which includes main board part number, product name, manufacturer and so on.)  Health status/Hardware monitoring report.  View and clear events log  Event notification through Platform Event Trap (PET)  Platform Event Filtering (PEF) to take selected action for selected events.

BMC

Dell HPC

IPMITool  Utility for managing and configuring devices  

Open Standard for monitoring, logging, recovery and HW control. Independent of CPU, BIOS and OS

 IPMItool is a simple CLI to the remote BMC using IPMI (v1.5/2.0)        

Protocol IPMPI (Intelligent Platform Management Interface) Read/Print the sensor data repository (SDR) values Display the System Event Log (SEL) Print Field replaceable Unit (FRU) inventory information Read/Set LAN configuration parameters Remote chassis power control Ipmitool is part of the ipmiutil package which is part of the RHEL distribution Available from http://ipmitool.sourceforge.net/ Version 1.8.11 

By Duncan Laurie Dell HPC

The “port_map.sh” Script from Dell.com # ./port_map.sh # ./port_map.sh 198.168.12.146 The current iPass port map to PCIE port will be listed. For example If iPass1 port is configured as a 1:4: iPass1 PCIE1 PCIE2 PCIE15 PCIE16  iPass5 None  Change? (n/j/1/2/3/4/5/6/7/8): To configure iPass1 port as a 1:2 enter "1".  iPass1 PCIE1 PCIE15  iPass5 PCIE2 PCIE16 The PCIE port assignment for iPass1 & iPass5 will updated accordingly. Dell HPC

Putting it together- C410X+BMC+IPMITool Compute Nodes

Master Node

BMC 1. 2.

3. 4.

Calculated the new mappings for all compute nodes Send new mapping to compute nodes

IPASS Cables Scripts Using IPMITool 1. 2. 3. 4.

Get current mapping Change mapping to new Reboot the C410X Wait until C410X is up

Wait until C410X has new mapping Reboot the compute node

Gigabit Ethernet Fabric Dell HPC

Demo 1 of 2

Dell HPC

Putting it together- C410X+BMC+IPMItool Compute Nodes

Master Node

BMC

“Sandwich Configurations” BMC Several configurations are Possible !! Dell HPC

Putting it together- C410X+BMC+IPMItool

Master Node

64 GPU/32 Node Configuration

Compute Nodes Dell HPC

Possible Combinations 



There are 25 possible ways 16 GPUs can be mapped to 8 servers in the C410X. These range from all servers getting 2 GPUs each to two servers getting 8 GPUs each.

S3 8 4 2 4 2 8 4 2 4 2 8 4 2 4 2 8 4 2 4 2 8 4 2 4 2

S7 0 0 2 0 2 0 0 2 0 2 0 0 2 0 2 0 0 2 0 2 0 0 2 0 2

S4 0 4 4 2 2 0 4 4 2 2 0 4 4 2 2 0 4 4 2 2 0 4 4 2 2

S8 0 0 0 2 2 0 0 0 2 2 0 0 0 2 2 0 0 0 2 2 0 0 0 2 2

S1 8 8 8 8 8 4 4 4 4 4 2 2 2 2 2 4 4 4 4 4 2 2 2 2 2

S5 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 0 0 0 0 0 2 2 2 2 2 Dell HPC

S2 0 0 0 0 0 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2

S6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2

Use Cases

Dell HPC

Use Cases 1: HPC Data Centers

“The number of GPUs a given application requires is different” – A large number of users submit parallel jobs – The jobs have a requested number of GPUs/node

– The Job scheduler takes the requests into account to schedule – The Job scheduler tries to find nodes with the correct number of GPUs – If such nodes are unavailable it will trigger a dynamic allocation

Job Scheduler

Dell HPC

Use Case 2: HPC Cloud Providers (PaaS) “The nodes are provisioned with the correct number of GPUs at each instant” – Users request specific platforms features (number of GPUs, time) – Provision nodes with required number of GPUs, transfer control to the user – At the end the GPUs are detached and shared with other nodes 1. 2. 3. 4. 5. 6. 7.

4 nodes 4 GPU/node for 8 hours 8 nodes 2 GPU/node for 2 hours 4 nodes 8 GPU/node for 6 hours 16 nodes 2 GPU/node for 16 hours 8 nodes 2 GPU/node for 8 hours 32 nodes 4 GPU/node for 12 hours 64 nodes 2 GPU/node for 24 hours

Workload Manager

Dell HPC

Demo 2 of 2

Dell HPC

Questions

Dell HPC

Reference  IPMI http://www.intel.com/design/servers/ipmi/ani/index.htm  C410X BMC http://support.dell.com/support/edocs/SYSTEMS/cp_pe_c410x/en/BMC/B MC.pdf • Script from Dell.com/support http://www.dell.com/support/drivers/us/en/19/DriverDetails/DriverFileFor mats?c=us&s=dhs&cs=19&l=en&DriverId=R302138

25

Dell HPC