source points. "GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY
ANSYS HPC for CFD Applications Release 17.0
Agenda High-Performance Computing – Motivazioni Le soluzioni ANSYS HPC Miglioramenti delle performance HPC per ANSYS CFD R17.0
HPC – Motivazioni
A parità di complessità del modello, ridurre i tempi di design impatto sul time to market A parità di tempo, possibilità di studiare modelli più complessi maggiore dettaglio di conoscenza sui propri prodotti A parità di tempo e complessità del modello, possibilità di studiare più varianti studi parametrici con analisi delle correlazioni input/output 3
HPC – Motivazioni
Necessità di studiare modelli più accurati e/o più complessi (high fidelity) passaggio da studio di componente a studio di sistema geometrie sempre più complicate e dettagliate griglie di calcolo più fitte maggiore dettaglio di conoscenza sui propri prodotti maggiore possibilità di sviluppo dei prodotti 4
HPC – Motivazioni
Necessità di applicare modelli numerici più avanzati transitori turbolenza combustione multifase ecc. 5
HPC – Motivazioni
Necessità di provare diverse configurazioni analisi di sensitività ottimizzazione robust design
6
HPC – Motivazioni
Financial ROI Results
I risultati dell’indagine indicano un notevole ritorno dell’investimento sull’HPC: • $356 medi in ricavo per dollaro investito in HPC • $38 medi in profitti (o risparmi sui costi) per dollaro investito in HPC Source: IDC report “Creating Economic Models Showing the Relationship Between Investments in HPC and the Resulting Financial ROI and Innovation”; October 2013, IDC #243296, Volume: 1.
7
LE SOLUZIONI ANSYS HPC
Interdisciplinarietà: unica soluzione, multi-fisica Qualsiasi sia la richiesta di simulazione, ANSYS HPC fornisce la capacità di calcolo parallelo richieste per accelerare il tempo di soluzione e risolvere problemi con elevata accuratezza (high fidelity). I solutori ANSYS in ambito meccanico, fluidodinamico ed elettromagnetico, tra cui:
dinamica
esplicita,
ANSYS Mechanical ANSYS Autodyn ANSYS Fluent ANSYS CFX ANSYS Icepak ANSYS Polyflow
utilizzano tutti le stesse licenze ANSYS HPC per essere eseguiti in parallelo. 9
Courtesy Courte Cou rtesy sy y of FCA Italy Italy y
ANSYS HPC Solutions at Every Scale
Scalability on supercomputers HPC cluster appliances pp
Efficiency on multi-core orrkstat on workstation s
Le soluzioni ANSYS HPC
Per un singolo utente che vuole affrontare una simulazione sulla propria workstation, un singolo ANSYS HPC Pack permette l’accelerazione del calcolo fino a 8 volte. Per utenti che hanno accesso a grandi risorse HPC, gli ANSYS HPC Packs possono essere combinati per abilitare il calcolo parallelo su centinaia, o addirittura migliaia, di cores.
512 128 32 8
1
2
3
4
5
6
7
HPC Packs per simulazione
HPC Workgroup
Cores abilitati
HPC (per processo) HPC Pack
32768 8192 2048
Offre la possibilità di avere grandi volumi di calcolo parallelo per migliorare la produttività degli utenti. Abilita un numero massimo totale di cores di calcolo (da 16 a 32768 sullo stesso server) al quale un team ha accesso.
HPC Parametric Pack
Moltiplica la disponibilità di licenze per le single applicazioni, abilitando l’esecuzione simultanea di più design points e consumando solo un set di licenze applicativo per volta (solo via ANSYS Workbench).
11
ANSYS HPC Parametric Pack
Le licenze ANSYS HPC Parametric Pack scalano la possibilità da parte dell’utente ad eseguire contemporaneamente più analisi parametriche all’interno di ANSYS Workbench.
Una licenza ANSYS HPC Parametric Pack consente di valutare fino a 4 design simultaneamente, senza alcuna richiesta aggiuntiva di licenze applicativo (di fatto sono moltiplicate le licenze “base”).
Tempo
Number of Simultaneous Design Points Enabled 64
32
(esempio: 4 design points)
Esecuzione sequenziale
16 Riduzione tempo di calcolo
8
Esecuzione in simultanea
dp1
d dp2
dp3
dp4
4
1
2
3
4
5 12
Number of HPC Parametric Pack Licenses
MIGLIORAMENTI DELLE PERFORMANCE HPC PER ANSYS CFD R17.0 13
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example
Case Details: • • • • •
Application General flow
Airfoil External Aerodynamic Flow 100 M hex elements Single Domain Turbulent Flow
R17 vs. R15: >5X faster solution @ 2048 cores R17 vs. R16: Solution time reduced by up to 39% @ 4096 cores Scaling to 25K nodes/core
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example
32% faster!
R17 vs. R16: 32% faster @ 4096 cores
Case Details:
Application Mesh motion
• Automotive IC Engine Application • 146 M nodes (380M elements: tet/prism/pyramid) • Single Domain • Turbulent Flow
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example
Case Details:
Application Turbomachinery
• Full Turbine • Steady (FR) • 13 M nodes (hex) • 256 cores Æ 50K nodes/core • 4 Domains • Casing, guide vanes, runner, draft tube • Turbulent Flow
R17 vs. R16: Absolute 5-10% faster Minimal scaling change
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Application Example
Case Details:
Application Turbomachinery
• Full Turbine • Unsteady (TRS) • 13 M nodes (hex) • 256 cores Æ 50K nodes/core • 4 Domains • Casing, guide vanes, runner, draft tube • Turbulent Flow
R17 vs. R16: Absolute 10-30% faster Speed-up @ 16 compute nodes 5.8X Æ 7X
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities
Application Turbomachinery
Background: • Particular parallel performance issue on large partition counts
Optimized source point performance •
Improved efficiency with large numbers of source points
"GaTurbineBlade" by Tomeasy - Own work by uploader; produced with Adobe illustrator. Licensed under CC BY-SA 3.0 via Commons https://commons.wikimedia.org/wiki/File:GaTurbineBlade.svg#/media/File:GaTurbineBla de.svg
Test case showing reduction in total CPU time when using large numbers of source points (reduction of additional computational cost of source points by as much as 70%)
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities
Application Radiation
Background: • Problems modeling collimated radiation such as headlights and solar irradiation use the Monte Carlo solver. This solver needs to take full advantage of HPC potential
Enhanced Monte Carlo Radiation model • Optimized the model so that the total number of rays (histories) remains consistent, independent of the number of core partitions
ANSYS Application Example Headlights, solar irradiation • 2-pectral bands (multiband) participating media; 5 radiation domains (2 fluid, 3 solid); 3.5 million elements of which 2.2 million radiation elements • Specified serial histories – 10 million
Complex headlamp case with 10 million ray histories. Comparison when solving only radiation and energy
Improved Parallel Performance & Scaling – CFX 17.0 ANSYS Features & Capabilities
I/O
Background: • Time to read and write files to HPC for large and complex cases with many regions/face sets could significantly lengthen overall solution time
Optimized HPC I/O speedup • •
Optimization of CFX solver to HPC interface resulted in a substantial speed-up I/O time now nearly negligible even at 64 cores
Reduction in wall clock seconds for I/O on an example test case with many regions
Miglioramenti delle performance HPC per ANSYS Fluent
21
Improved Parallel Performance & Scaling – Fluent 17.0 Robustness
ANSYS Features & Capabilities
No reordering Not converged >200 iterations
Background: • Fluent’s priority has been to deliver the best results, not the fastest convergence
Conservative Coarsening Method default for Pressure-based Coupled Solver: • Especially helpful for native polyhedral meshes and/or highly stretched cells
Algebraic multigrid solver now automatically reorders the linear system • Ensures proper ordering in multiple cell zones (was limited to within a single cell zone)
RCM reordering Converged in 94 iterations
Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning
Faster METIS partitioning: • Updated library and optimized algorithms deliver significant partitioning speed-up for many larger cases, particularly those with adapted meshes • 64-bit indexing in METIS and for partition storage to enable larger models • Future proofed: Tested up to 2 billion cells!
Partition Time - Seconds
ANSYS Features & Capabilities
Combustor tor 830M 830M Cells Cells CRAY CRAY XE6 350 300 250 200 150 100 50 0
16.0.0 17.0.0
4096 141 111
8192 295 174
ANSYS Application Examples Combustor: • 40% faster to partition for 8192 cores • Less than 3 minutes
Truck: • 99% faster to partition for 512 cores • Just 18 seconds (versus 36 minutes!!)
Auto Partition time - Seconds
Truck 134M Cells 2500,0 2000,0 1500,0 1000,0 500,0 0,0
102 204 4 8 16.0.0 923,1 2175, > 1 hour 17.0.0 18,2 15,8 18,5 27,4 256
512
409 6 51,7
Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning
ANSYS Features & Capabilities Background: • DPM and combustion models pose challenges to parallel performance as users attempt to loadbalance flow and physics calculations
New Option: Model-Weighted Partitioning • Automatically weights multiple physics models across the full set of processors within a specified load imbalance tolerance • Users can select the factors and relative weightings
• Turbulence, combustion, radiation, detailed kinetic mechanism (25 species, 113 reactions) • 60% faster for 128 cores (Just 82 seconds)
700 Time in Seconds
ANSYS Application Example Oxy-Fuel Burner:
Oxy-fuel Burner, 1.9M hex cells 600 500 400 300 200 100 0
32 64 128 256 512 1024 Default 647,26314,59203,16112,15 65,05 37,1 Load Balance 198,08150,59 82,03 61,76 34,29 22,33
Improved Parallel Performance & Scaling – Fluent 17.0 Partitioning
ANSYS Features & Capabilities Background: • Partitions need to communicate with each other. Lack
Exhaust 33M Neighborhood Creation
of optimization can slow performance, especially for moving/dynamic mesh cases where the neighborhood needs to be updated frequently
interface identification for better performance and completeness • Better identification of interfaces improves robustness
ANSYS Application Example Exhaust System: • Speed-up from 1X to 30X depending on case and number of cores
160 140 Time in seconds
Neighborhood Creation Optimization: • Optimized communication algorithms and improved
180
120 100 80 60 40 20 0
128 256 512 1024 2048 4096 8192 16.0.0 7,828 4,75 6,219 7,882 17,07 52,63 156,4 17.0.0 3,844 2,539 1,866 1,838 2,346 2,793 5,749
Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example
Case Details:
• • • •
Application General flow
External flow over a passenger sedan d Number of cells: 4 Million Cell Type: Mixed Models used: Standard K-HH turbulence
General solver scalability improvements Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0 ANSYS Application Example
Case Details:
• • • •
Application General flow
Vehicle exhaust model d l Number of cells: 33 Million Cell Type: Mixed Models used: SST K-omega turbulence
Optimized Neighborhood Creation Results obtained on Intel Xeon E5-2697v3 nodes with TrueScale InfiniBand fabric
Improved Parallel Performance & Scaling – Fluent 17.0 Application Mesh motion
Engine Crankcase Lubrication Model: • • •
85% faster run time (