ANSYS HPC Parametric Pack - EnginSoft

50 downloads 247 Views 4MB Size Report
source points. "GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY
ANSYS HPC for CFD Applications Release 17.0

Agenda ƒ High-Performance Computing – Motivazioni ƒ Le soluzioni ANSYS HPC ƒ Miglioramenti delle performance HPC per ANSYS CFD R17.0

HPC – Motivazioni

ƒ A parità di complessità del modello, ridurre i tempi di design Ÿ impatto sul time to market ƒ A parità di tempo, possibilità di studiare modelli più complessi Ÿ maggiore dettaglio di conoscenza sui propri prodotti ƒ A parità di tempo e complessità del modello, possibilità di studiare più varianti Ÿ studi parametrici con analisi delle correlazioni input/output 3

HPC – Motivazioni

ƒ Necessità di studiare modelli più accurati e/o più complessi (high fidelity) Ÿ passaggio da studio di componente a studio di sistema Ÿ geometrie sempre più complicate e dettagliate Ÿ griglie di calcolo più fitte Ÿ maggiore dettaglio di conoscenza sui propri prodotti Ÿ maggiore possibilità di sviluppo dei prodotti 4

HPC – Motivazioni

ƒ Necessità di applicare modelli numerici più avanzati Ÿ transitori Ÿ turbolenza Ÿ combustione Ÿ multifase Ÿ ecc. 5

HPC – Motivazioni

ƒ Necessità di provare diverse configurazioni Ÿ analisi di sensitività Ÿ ottimizzazione Ÿ robust design

6

HPC – Motivazioni

Financial ROI Results

I risultati dell’indagine indicano un notevole ritorno dell’investimento sull’HPC: • $356 medi in ricavo per dollaro investito in HPC • $38 medi in profitti (o risparmi sui costi) per dollaro investito in HPC Source: IDC report “Creating Economic Models Showing the Relationship Between Investments in HPC and the Resulting Financial ROI and Innovation”; October 2013, IDC #243296, Volume: 1.

7

LE SOLUZIONI ANSYS HPC

Interdisciplinarietà: unica soluzione, multi-fisica ƒ Qualsiasi sia la richiesta di simulazione, ANSYS HPC fornisce la capacità di calcolo parallelo richieste per accelerare il tempo di soluzione e risolvere problemi con elevata accuratezza (high fidelity). ƒ I solutori ANSYS in ambito meccanico, fluidodinamico ed elettromagnetico, tra cui: ƒ ƒ ƒ ƒ ƒ ƒ

dinamica

esplicita,

ANSYS Mechanical ANSYS Autodyn ANSYS Fluent ANSYS CFX ANSYS Icepak ANSYS Polyflow

utilizzano tutti le stesse licenze ANSYS HPC per essere eseguiti in parallelo. 9

Courtesy Courte Cou rtesy sy y of FCA Italy Italy y

ANSYS HPC Solutions at Every Scale

Scalability on supercomputers HPC cluster appliances pp

Efficiency on multi-core orrkstat on workstation s

Le soluzioni ANSYS HPC ƒ ƒ

ƒ

ƒ

Per un singolo utente che vuole affrontare una simulazione sulla propria workstation, un singolo ANSYS HPC Pack permette l’accelerazione del calcolo fino a 8 volte. Per utenti che hanno accesso a grandi risorse HPC, gli ANSYS HPC Packs possono essere combinati per abilitare il calcolo parallelo su centinaia, o addirittura migliaia, di cores.

512 128 32 8

1

2

3

4

5

6

7

HPC Packs per simulazione

HPC Workgroup ƒ ƒ

ƒ

Cores abilitati

HPC (per processo) HPC Pack ƒ

32768 8192 2048

Offre la possibilità di avere grandi volumi di calcolo parallelo per migliorare la produttività degli utenti. Abilita un numero massimo totale di cores di calcolo (da 16 a 32768 sullo stesso server) al quale un team ha accesso.

HPC Parametric Pack ƒ

Moltiplica la disponibilità di licenze per le single applicazioni, abilitando l’esecuzione simultanea di più design points e consumando solo un set di licenze applicativo per volta (solo via ANSYS Workbench).

11

ANSYS HPC Parametric Pack ƒ

Le licenze ANSYS HPC Parametric Pack scalano la possibilità da parte dell’utente ad eseguire contemporaneamente più analisi parametriche all’interno di ANSYS Workbench.

ƒ

Una licenza ANSYS HPC Parametric Pack consente di valutare fino a 4 design simultaneamente, senza alcuna richiesta aggiuntiva di licenze applicativo (di fatto sono moltiplicate le licenze “base”).

Tempo

Number of Simultaneous Design Points Enabled 64

32

(esempio: 4 design points)

Esecuzione sequenziale

16 Riduzione tempo di calcolo

8

Esecuzione in simultanea

dp1

d dp2

dp3

dp4

4

1

2

3

4

5 12

Number of HPC Parametric Pack Licenses

MIGLIORAMENTI DELLE PERFORMANCE HPC PER ANSYS CFD R17.0 13

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example

Case Details: • • • • •

Application General flow

Airfoil External Aerodynamic Flow 100 M hex elements Single Domain Turbulent Flow

R17 vs. R15: >5X faster solution @ 2048 cores R17 vs. R16: Solution time reduced by up to 39% @ 4096 cores Scaling to 25K nodes/core

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example

32% faster!

R17 vs. R16: 32% faster @ 4096 cores

Case Details:

Application Mesh motion

• Automotive IC Engine Application • 146 M nodes (380M elements: tet/prism/pyramid) • Single Domain • Turbulent Flow

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example

Case Details:

Application Turbomachinery

• Full Turbine • Steady (FR) • 13 M nodes (hex) • 256 cores Æ 50K nodes/core • 4 Domains • Casing, guide vanes, runner, draft tube • Turbulent Flow

R17 vs. R16: Absolute 5-10% faster Minimal scaling change

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example

Case Details:

Application Turbomachinery

• Full Turbine • Unsteady (TRS) • 13 M nodes (hex) • 256 cores Æ 50K nodes/core • 4 Domains • Casing, guide vanes, runner, draft tube • Turbulent Flow

R17 vs. R16: Absolute 10-30% faster Speed-up @ 16 compute nodes 5.8X Æ 7X

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities

Application Turbomachinery

Background: • Particular parallel performance issue on large partition counts

Optimized source point performance •

Improved efficiency with large numbers of source points

"GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY-SA 3.0 via Commons https://commons.wikimedia.org/wiki/File:GaTurbineBlade.svg#/media/File:GaTurbineBla de.svg

Test case showing reduction in total CPU time when using large numbers of source points (reduction of additional computational cost of source points by as much as 70%)

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities

Application Radiation

Background: • Problems modeling collimated radiation such as headlights and solar irradiation use the Monte Carlo solver. This solver needs to take full advantage of HPC potential

Enhanced Monte Carlo Radiation model • Optimized the model so that the total number of rays (histories) remains consistent, independent of the number of core partitions

ANSYS Application Example Headlights, solar irradiation • 2-pectral bands (multiband) participating media; 5 radiation domains (2 fluid, 3 solid); 3.5 million elements of which 2.2 million radiation elements • Specified serial histories – 10 million

Complex headlamp case with 10 million ray histories. Comparison when solving only radiation and energy

Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities

I/O

Background: • Time to read and write files to HPC for large and complex cases with many regions/face sets could significantly lengthen overall solution time

Optimized HPC I/O speedup • •

Optimization of CFX solver to HPC interface resulted in a substantial speed-up I/O time now nearly negligible even at 64 cores

Reduction in wall clock seconds for I/O on an example test case with many regions

Miglioramenti delle performance HPC per ANSYS Fluent

21

Improved Parallel Performance & Scaling – Fluent 17.0 Robustness

ANSYS Features & Capabilities

No reordering Not converged >200 iterations

Background: • Fluent’s priority has been to deliver the best results, not the fastest convergence

Conservative Coarsening Method default for Pressure-based Coupled Solver: • Especially helpful for native polyhedral meshes and/or highly stretched cells

Algebraic multigrid solver now automatically reorders the linear system • Ensures proper ordering in multiple cell zones (was limited to within a single cell zone)

RCM reordering Converged in 94 iterations

Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning

Faster METIS partitioning: • Updated library and optimized algorithms deliver significant partitioning speed-up for many larger cases, particularly those with adapted meshes • 64-bit indexing in METIS and for partition storage to enable larger models • Future proofed: Tested up to 2 billion cells!

Partition Time - Seconds

ANSYS Features & Capabilities

Combustor tor 830M 830M Cells Cells CRAY CRAY XE6 350 300 250 200 150 100 50 0

16.0.0 17.0.0

4096 141 111

8192 295 174

ANSYS Application Examples Combustor: • 40% faster to partition for 8192 cores • Less than 3 minutes

Truck: • 99% faster to partition for 512 cores • Just 18 seconds (versus 36 minutes!!)

Auto Partition time - Seconds

Truck 134M Cells 2500,0 2000,0 1500,0 1000,0 500,0 0,0

102 204 4 8 16.0.0 923,1 2175, > 1 hour 17.0.0 18,2 15,8 18,5 27,4 256

512

409 6 51,7

Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning

ANSYS Features & Capabilities Background: • DPM and combustion models pose challenges to parallel performance as users attempt to loadbalance flow and physics calculations

New Option: Model-Weighted Partitioning • Automatically weights multiple physics models across the full set of processors within a specified load imbalance tolerance • Users can select the factors and relative weightings

• Turbulence, combustion, radiation, detailed kinetic mechanism (25 species, 113 reactions) • 60% faster for 128 cores (Just 82 seconds)

700 Time in Seconds

ANSYS Application Example Oxy-Fuel Burner:

Oxy-fuel Burner, 1.9M hex cells 600 500 400 300 200 100 0

32 64 128 256 512 1024 Default 647,26314,59203,16112,15 65,05 37,1 Load Balance 198,08150,59 82,03 61,76 34,29 22,33

Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning

ANSYS Features & Capabilities Background: • Partitions need to communicate with each other. Lack

Exhaust 33M Neighborhood Creation

of optimization can slow performance, especially for moving/dynamic mesh cases where the neighborhood needs to be updated frequently

interface identification for better performance and completeness • Better identification of interfaces improves robustness

ANSYS Application Example Exhaust System: • Speed-up from 1X to 30X depending on case and number of cores

160 140 Time in seconds

Neighborhood Creation Optimization: • Optimized communication algorithms and improved

180

120 100 80 60 40 20 0

128 256 512 1024 2048 4096 8192 16.0.0 7,828 4,75 6,219 7,882 17,07 52,63 156,4 17.0.0 3,844 2,539 1,866 1,838 2,346 2,793 5,749

Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example

Case Details:

• • • •

Application General flow

External flow over a passenger sedan d Number of cells: 4 Million Cell Type: Mixed Models used: Standard K-HH turbulence

General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example

Case Details:

• • • •

Application General flow

Vehicle exhaust model d l Number of cells: 33 Million Cell Type: Mixed Models used: SST K-omega turbulence

Optimized Neighborhood Creation Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric

Improved Parallel Performance & Scaling – Fluent 17.0 Application Mesh motion

Engine Crankcase Lubrication Model: • • •

85% faster run time (