Computational Science and Engineering Department
Daresbury Laboratory
PERFORMANCE of VARIOUS COMPUTERS in COMPUTATIONAL CHEMISTRY 15th Daresbury Machine Evaluation Workshop CCLRC Daresbury Laboratory Martyn F. Guest Computational Science and Engineering Department
[email protected] Acknowledgements: M. Ehrig (HP/Compaq), Streamline, Workstations UK, ClusterVision, Compusys Computational Chemistry Benchmarks
1
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
OUTLINE z
Processor Performance Overview Single-processor performance, Performance Metrics SPEC (Standard Performance Evaluation Corporation) z SPEC 95 and SPEC CPU 2000
z
Computational Chemistry Benchmark (serial) - SPECfp ?
z
Matrix and application “kernels” Application packages (GAMESS-UK, DL_POLY) Comparison involves 220+ computers (vector supercomputers, workstations, PCs and MPP nodes)
Recently extended to include “RATE” benchmarks URLs: Powerpoint presentation and Paper: http://www.cse.clrc.ac.uk/disco/hw-perf.shtml
•
Benchmarking of Parallel Commodity Systems
Computational Chemistry Benchmarks
2
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
MEW14 (2003) - Summary PI relative to the IBM p-series 690/pwr4 1.3 GHz The MATRIX-97, Chemistry AMD MP2400+ / 2000
69 108
Pentium 4 Xeon / 3066
Kernels, GAMESS-UK and DL_POLY Benchmarks 140
SGI Altix3700 Itanium2/1300
143
HP RX4640 Itanium2/1300-H
162
HP RX2600 Itanium2/1500-L HP RX5670 Itanium2/1500-H
171
Intel Tiger Itanium2/1500
159
AMD Opteron 848/ 2200
128
AMD Opteron 244/ 1800
106 96
Compaq Alpha ES45/1250 85
Compaq Marvel EV7 /1000 55
SUN Blade 2000 / 1056 Cu SUN FireV880 / 900 Cu
47
SGI Origin3800/R14k-600
54
HP PA-9000/RP7410-875
79
IBM p-series 690/1700
133 100
IBM p-series 690/1300 0
20
Computational Chemistry Benchmarks
40
60
80
3
100
120
140
160
180
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
MACHINES UNDER EVALUATION Machine AMD Opteron
Processor 850 (2.4 GHz), 248 (2.2 GHz), 848 (2.2 GHz), 246 (2.0 GHz), 244 (1.8 GHz), 842 (1.6 GHz)
Intel IA32 Intel x86-64 (EM64T)
P4 Xeon 3.06 GHz, 3.2 GHz Nocona 3.2 GHz, 3.4 GHz, 3.6GHz
SUN Blade 2000 / 1056 Cu SUN Fire V880 / 900 Cu
UltraSPARC-3 / 1056 MHz UltraSPARC-3 / 900 MHz
HP 9000 / RP3440-4 HP 9000 / RP7410
PA8800 / 1000MHz PA8700+ / 875MHz
HP RX4640 (4-way) HP RX2600 (2-way) HP RX5670 (4-way) HP RX1620 (2-way)
Itanium 2 / 1300 MHz (3 MB L3) Itanium 2 / 1500 MHz (6 MB L3) Itanium 2 / 1500 MHz (6 MB L3) Itanium 2 / 1600 MHz (3 MB L3)
Intel Tiger (4-way) Intel Tiger (4-way)
Itanium 2 / 1000 MHz (3 MB L3) Itanium 2 / 1500 MHz (6 MB L3)
Machine HP/Compaq ES45 HP/Compaq ES45 Compaq Marvel SGI Altix 3700 SGI Origin3800/R14k SGI Origin300/R14k SGI O2/R12k-SC
Itanium 2 / 1.5 & 1.3 GHz R14000/R14010 600 MHz R14000/R14010 500 MHz R12000/R12010 270 MHz
IBM p-Series 630 IBM p-Series 690 IBM p-Series 690 IBM p5 570/1.9
RS/6000 /POWER4 1.0GHz RS/6000 /POWER4 1.3GHz RS/6000 /POWER4 1.7GHz RS/6000 /POWER5 1.9GHz
Vector Supercomputers NEC SX-6i, SX-5, SX-4 Cray Y-MP/J90-10, FUJITSU VPP/300, Cray YMP C98/4256 MPP Nodes Cray T3E/1200 IBM SP / Power3
Computational Chemistry Benchmarks
4
Processor AXP A21264C / 1000 MHz AXP A21264C / 1250 MHz EV7 1000 MHz
AXP EV56 600 MHz RS/6000 WH2 - 375 MHz
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPEC Benchmarks z
Compute intensive categories
z
integer versus floating point conservative versus aggressive compilation speed versus throughput
Composite Metrics - SPEC CPU 2000
NOT: • Graphics • Network • I/O
http://www.specbench.org
SPEED
THROUGHPUT
SPECint2000 SPECfp2000
SPECint_rate2000 SPECfp_rate2000
SPECint_base2000 SPECfp_base2000
SPECint_rate_base2000 SPECfp_rate_base2000
Aggressive Conservative
z z z z
SPECratio - T(measured system) / T (reference) Reference = 300 MHz Ultra 5/10 (100) SPECfp2000 - geometric mean of 14 ratios, one for each benchmark SPECint2000 - geometric mean of 12 ratios, one for each benchmark
Computational Chemistry Benchmarks
5
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPEC CPU 2000 - Floating point Benchmark Suite (SPECfp2000) Benchmark
Language
168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi
F77 F77 F77 F77 C F90 C C F90 C F90 F90 F77 F77
Description Physics: Quantum chromodynamics Shallow water modelling Multigrid solver: 3D potential field Partial differential equations 3D graphics library Computational fluid dynamics Image recognition / neural networks Seismic wave propagation simulation Image processing: Face recognition Computational chemistry Number theory / primality testing Finite-element crash simulation Nuclear physics accelerator design Metereology: Pollutant distribution
Reference: 300 MHz Ultra 5/10 = 100 Computational Chemistry Benchmarks
6
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPEC CPU 2000 - SPECfp2000
Values relative to HP RX5670 Itanium2/1.5GHz 87
Intel D925XCV2 (3.8 GHz P4)
89
AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz HP RX4640 Itanium2/1600-9M HP RX5670 Itanium2/1500
2712 : HP RX4640 Itanium2/1600 2647 : SGI Altix Bx2/1600 2702 : IBM p5 570 / 1.9 GHz
81 129 100 86
HP RX5670 Itanium2/1300
2108
80
HP AlphaServer GS1280 7/1300 64
Sun Fire V440 / US IIIi 1600
85
Sun Java Wrkst. (Opteron/2.4 GHz)
86
Fujitsu PP900 GP64/1890 25
SGI Origin 3200/R14k-600
88
SGI Altix 3000 Itanium 2/1300
102
SGI Altix 3000 Itanium 2/1500
126
SGI Altix 3000 Bx2/1600-9M 47
HP 9000/RP3440-1000
128
IBM eServer p5 570/1900 81
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
7
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPEC CPU 2000 - SPECint2000
Values relative to HP RX5670 Itanium2/1.5GHz Intel D925XCV2 (3.8 GHz P4)
127 132
AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz
119
HP RX4640 Itanium2/1600-9M
121
HP RX5670 Itanium2/1500
100
HP RX5670 Itanium2/1300
81 76
HP AlphaServer GS1280 7/1300 64
Sun Fire V440 / US IIIi 1600 Sun Java Wrkst. (Opteron/2.4 GHz)
121
Fujitsu PP900 GP64/1890
103
SGI Origin 3200/R14k-600
38 78
SGI Altix 3000 Itanium 2/1300
95
SGI Altix 3000 Itanium 2/1500
110
SGI Altix 3000 Bx2/1600-9M HP 9000/RP3440-1000
40
IBM eServer p5 570/1900
111
IBM p-series 690/1700
85
0
Computational Chemistry Benchmarks
50
8
100
150
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Memory System Performance Limitations Why applications with limited memory reuse perform inefficiently today
Year of Introduction (Cray) 1988
Goal: understand application performance to memory reuse and other factors Computational Chemistry Benchmarks
1994
2000
C ray
C90 Y-MP T90 J90
Intel
EL-90
Alpha SGI MIPS
10.00%
Sun
SV1
Micro p ro cesso rs
• STREAMS ADD: Computes A + B for long vectors A and B (historical data available) • New microprocessor generations “reset” performance to at most 6% of peak • Performance degrades to 1% 3% of peak as clock speed increases within a generation
M e a s u re d % p e a k (S T R E A M S A D D )
100.00%
Cray
1.00% 10
100
1000
10000
Clock Speed MHz
9
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
IBM p-series 690Turbo:Multi-chip Module (MCM) Four POWER4 chips (8 processors) on an MCM, with two associated memory slots
GX Bus
GX Bus
M E M O R Y S L O T
Mem Ctrl
L3
Shared L2
Shared L2
Distributed switch
Distributed switch
Distributed switch
Distributed switch
Mem Ctrl
L3
L3
Mem Ctrl
Shared L2
Shared L2
L3
Mem Ctrl
M E M O R Y S L O T
4 GX Bus links for external L3 cache shared across all processors connections Computational Chemistry Benchmarks
10
9th December 2004
GX Bus
GX Bus
Computational Science and Engineering Department
Daresbury Laboratory
SPEC CPU 2000 - SPECfp2000
Units of SPECfp as a function of Peak Performance Intel D925XCV2 (3.8 GHz P4) AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz HP RX4640 Itanium2/1600-9M HP RX5670 Itanium2/1500 HP RX5670 Itanium2/1300 HP AlphaServer GS1280 7/1300 Sun Fire V440 / US IIIi 1600 Sun Java Wrkst. (Opteron/2.4 GHz) Fujitsu PP900 GP64/1890 SGI Origin 3200/R14k-600 SGI Altix 3000 Itanium 2/1300 SGI Altix 3000 Itanium 2/1500 SGI Altix 3000 Bx2/1600-9M HP 9000/RP3440-1000 IBM eServer p5 570/1900 IBM p-series 690/1700
0
0.2
0.4
0.6
SPECfp2000 / CPU Peak Performance Computational Chemistry Benchmarks
11
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPEC CPU 2000 - SPECfp vs SPECfp_rate (32 CPUs) Values relative to IBM 690 Turbo 1.7 GHz
HP AlphaServer GS320/1224 Compaq Alpha GS1280 7/1300
SPECfp SPECfp_rate
Fujitsu PP1500 GP64/1890 Sun Fire 15K / 1.2 GHz SGI Origin3800/R14k-600 SGI Altix 3000 1.5 GHz Itanium2 SGI Altix Bx2 1.6 GHz Itanium2 Bull Novascale 1.5GHz/Itanium2
Itanium Systems
HP Superdome 1.5 GHz Itanium2 HP Superdome/PA8700-750 IBM eServer p5 590/1650
pwr4
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
0.5
12
1
1.5
2
pwr5 2.5
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Computational Chemistry Benchmark Suite z
Benchmark suite developed to incorporate:
z
Matrix “kernels” (MATRIX-89 and MATRIX-97) Application “kernels” Application packages (e.g. GAMESS-UK, DL_POLY)
Implemented on Supercomputers, servers, workstations, PCs and parallel machines Matrix Operations / Matrix multiplication and matrix diagonalisation Computational Chemistry Kernels - four typical application kernels (direct-SCF, MD, QMC and Jacobi eigen solver) STREAM (memory bandwidth) Quantum Chemistry Calculations - twelve typical applications, including SCF, direct-SCF, CASSCF, MCSCF, direct-CI and MRD-CI, MP2, 2nd derivatives (GAMESS-UK-89 and GAMESS-UK-99) Molecular Dynamics Calculations - six typical simulations
Computational Chemistry Benchmarks
13
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Fortran Compilers - Performance Performance relative to GNU g77 on Opteron 246 / 2.0 GHz OVERALL
113.2
133.1
S TREAM JACOBI QMC Ke rne l MD Ke rne l S CF Ke rne l
G77: gcc V 3.2.2 (SuSE Linux) pgf90: v5.2-1 Pathscale EKO (v1.4)
Diag -97 QHQ-97 MMO-97 Whe ts to ne -97 0
50
Computational Chemistry Benchmarks
100
14
150
200
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
COMPUTATIONAL CHEMISTRY BENCHMARKS I. Matrix Operations MATRIX OPERATIONS (MATRIX-89 and MATRIX-97) z SPARSE Matrix Multiply BenchMark MMO operation is central to the efficient operation of modern QC codes. In this benchmark a series of MMOs (R = A X B) are performed involving matrices of increasing order: – MATRIX-89: 10, 20, 30, … , 100 (B is sparse) – MATRIX-97: 50, 100, 150, … , 500 (B is sparse) z
Memory bandwidth
Diagonalisation Benchmark Based on diagonalising a series of real symmetric matrices. Measures the performance of 8 routines from mathematical libraries and QC codes: – MATRIX-89: 10, 20, 30, … , 100 – MATRIX-97: 50, 100, 150, 200, 250, 300
z
Q†HQ Benchmark Designed to extend MMO benchmark by allowing for the use of library routines e.g. BLAS. Uses both a scalar and vector algorithm: – MATRIX-89: 10, 20, 30, … , 150 – MATRIX-97: 20, 40, 60, … , 300
Computational Chemistry Benchmarks
15
-lblas 9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Matrix-97: SPARSE MMO Benchmark. Performance relative to the HP RX5670 Itanium2/1.5GHz 29
Pentium 4 Xeon / 3066
45
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600
57
AMD Opteron 850/ 2400
100 91
AMD Opteron 248/ 2200 SGI Altix3700 Itanium2/1500
86
HP RX4640 Itanium2/1300-H
86
HP RX2600 Itanium2/1500-L
NEC NECSX-6i SX-6i::204 204
77
HP RX5670 Itanium2/1500-H
100 176
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
67
Compaq Alpha ES45/1250
35
SUN Blade 2000 / 1056 Cu
20
SGI SGIO2/R12k-270 O2/R12k-270: :33
28
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
35 148
IBM eServer p5 570 / 1900 102
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
100
16
150
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Matrix-97: Q†HQ MMO Benchmark.
Performance relative to the HP RX5670 Itanium2/1.5GHz 49
Pentium 4 Xeon / 3066
74
Pentium 4 Xeon (EM64T)/ 3200
82
Pentium 4 Xeon (EM64T)/ 3600 56
AMD Opteron 850/ 2400
54
AMD Opteron 248/ 2200
103
SGI Altix3700 Itanium2/1500 89
HP RX4640 Itanium2/1300-H
106
HP RX2600 Itanium2/1500-L 100
HP RX5670 Itanium2/1500-H
110
HP RX1620 Itanium2/1600-H 103
Intel Tiger Itanium2/1500 39
Compaq Alpha ES45/1250 29
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i:: 145 145
21
SGI Origin3800/R14k-600
62
HP 9000/RP3440-1000
105
IBM eServer p5 570 / 1900 73
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
17
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Matrix-97: Diagonalisation Benchmark Performance relative to the HP RX5670 Itanium2/1.5GHz 77
Pentium 4 Xeon / 3066
111
Pentium 4 Xeon (EM64T)/ 3200
126
Pentium 4 Xeon (EM64T)/ 3600 110
AMD Opteron 850/ 2400 100
AMD Opteron 248/ 2200 89
SGI Altix3700 Itanium2/1500
87
HP RX4640 Itanium2/1300-H
84
HP RX2600 Itanium2/1500-L
100
HP RX5670 Itanium2/1500-H
113
HP RX1620 Itanium2/1600-H 83
Intel Tiger Itanium2/1500 73
Compaq Alpha ES45/1250
SGI SGIO2/R12k-270 O2/R12k-270: :14 14
33
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i::16 16
37
SGI Origin3800/R14k-600
82
HP 9000/RP3440-1000
118
IBM eServer p5 570 / 1900 108
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
18
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
The Matrix-97 Benchmarks. Performance relative to the HP RX5670 Itanium2/1.5GHz 52
Pentium 4 Xeon / 3066
77
Pentium 4 Xeon (EM64T)/ 3200
88
Pentium 4 Xeon (EM64T)/ 3600 71
AMD Opteron 850/ 2400 67
AMD Opteron 248/ 2200
93
SGI Altix3700 Itanium2/1500 87
HP RX4640 Itanium2/1300-H
89
HP RX2600 Itanium2/1500-L
100
HP RX5670 Itanium2/1500-H
133
HP RX1620 Itanium2/1600-H 85
Intel Tiger Itanium2/1500 49
Compaq Alpha ES45/1250 27
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i::122 122
29
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
59 123
IBM eServer p5 570 / 1900 94
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
19
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
COMPUTATIONAL CHEMISTRY KERNELS Programs that are realistic models of actual chemical applications or algorithms Self consistent Field (SCF) z SCF code using distributed primitive 1s gaussian functions as a basis (thus emulating the use of s, p, functions); performs direct-SCF calculation on Be4 (60 functions).
Molecular Dynamics (MD) z This code bounces a few thousand argon atoms around in a box with periodic boundary conditions. LJ pair-wise interactions are used with integration of the Newtonian equations of motion.
Quantum Monte Carlo (QMC) z This code evaluates the energy of the simplest explicitly correlated electronic wavefunction for the He atom using a variational monte-carlo method without importance sampling.
Jacobi iterative linear equation solver (JACOBI) z JACOBI uses a naive jacobi iterative algorithm to solve a linear equation. All the time is spent in a large matrix-vector multiplication. Computational Chemistry Benchmarks
20
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Computational Chemistry Kernels - Direct-SCF. Performance relative to the HP RX5670 Itanium2/1.5GHz 45
Pentium 4 Xeon / 3066 38
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600
43
AMD Opteron 850/ 2400
NEC NECSX-6i SX-6i::99
52 45
AMD Opteron 248/ 2200
59
SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H
51
HP RX2600 Itanium2/1500-L
44
HP RX5670 Itanium2/1500-H
100 113
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
40
Compaq Alpha ES45/1250
+Oinlinebudget=1000000 F90 V2.8 158)
49
SUN Blade 2000 / 1056 Cu
26 27
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
39
IBM eServer p5 570 / 1900
61
IBM p-series 690/1700
47
0
Computational Chemistry Benchmarks
50
21
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Chemistry Kernels - Molecular Dynamics. Performance relative to the HP RX5670 Itanium2/1.5GHz 76
Pentium 4 Xeon / 3066
79
Pentium 4 Xeon (EM64T)/ 3200
90
Pentium 4 Xeon (EM64T)/ 3600
89
AMD Opteron 850/ 2400 79
AMD Opteron 248/ 2200
152
SGI Altix3700 Itanium2/1500 87
HP RX4640 Itanium2/1300-H
133
HP RX2600 Itanium2/1500-L 100
HP RX5670 Itanium2/1500-H
112
HP RX1620 Itanium2/1600-H
123
Intel Tiger Itanium2/1500 69
Compaq Alpha ES45/1250 32
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i::48 48
38
SGI Origin3800/R14k-600
59
HP 9000/RP3440-1000
104
IBM eServer p5 570 / 1900 79
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
22
100
150
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Chemistry Kernels - Quantum Monte Carlo. Performance relative to the HP RX5670 Itanium2/1.5GHz 122
Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200
153
Pentium 4 Xeon (EM64T)/ 3600
175
AMD Opteron 850/ 2400
178 163
AMD Opteron 248/ 2200 83
SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H
87
HP RX2600 Itanium2/1500-L
80
HP RX5670 Itanium2/1500-H
100 108
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
79
Compaq Alpha ES45/1250
78
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i::15 15
51 46
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
65
IBM eServer p5 570 / 1900
102
IBM p-series 690/1700
79
0
Computational Chemistry Benchmarks
50
100
23
150
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Chemistry Kernels - Jacobi Solver. Performance relative to the HP RX5670 Itanium2/1.5GHz 55
Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200
67 70
Pentium 4 Xeon (EM64T)/ 3600 AMD Opteron 850/ 2400
48 43
AMD Opteron 248/ 2200
115
SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H
100
HP RX2600 Itanium2/1500-L
99
HP RX5670 Itanium2/1500-H
100 114
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
93
Compaq Alpha ES45/1250
28
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i::533 533
24 12
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
64
IBM eServer p5 570 / 1900
133 76
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
24
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
STREAM: Measured Sustainable Memory Bandwidth in HPC (TRIAD) http://www.cs.virginia.edu/stream
76
Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200
92
Pentium 4 Xeon (EM64T)/ 3600
91 69
AMD Opteron 850/ 2400
80
AMD Opteron 248/ 2200 SGI Altix3700 Itanium2/1500
103
HP RX4640 Itanium2/1300-H
102 107
HP RX2600 Itanium2/1500-L 100
HP RX5670 Itanium2/1500-H
129
HP RX1620 Itanium2/1600-H 100
Intel Tiger Itanium2/1500
SGI SGIO2/R12k-270 O2/R12k-270: :2% 2%
54
Compaq Alpha ES45/1250 26
SUN Blade 2000 / 1056 Cu
NEC NECSX-6i SX-6i:: 712 712% %
17
SGI Origin3800/R14k-600
105
HP 9000/RP3440-1000
149
IBM eServer p5 570 / 1900 IBM p-series 690/1700
156
0
Computational Chemistry Benchmarks
50
100
25
150
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
STREAM: Sustainable Memory Bandwidth (TRIAD) per process Pentium 4 Xeon / 3066
4 CPU 2 CPU 1 CPU
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600 AMD Opteron 248/ 2200 Intel Tiger Itanium2/1500 HP RX5670 Itanium2/1500 HP RX1620 Itanium2/1600-H SGI Altix3700 Itanium2/1500 Compaq Alpha ES45/1250 HP 9000/RP3440-1000 IBM eServer p5 590/1650 IBM p-series 690/1300 0
Computational Chemistry Benchmarks
1000
2000
3000
MBytes / Sec 26
4000
5000
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Computational Chemistry Kernels. Performance relative to the HP RX5670 Itanium2/1.5GHz 75
Pentium 4 Xeon / 3066
84
Pentium 4 Xeon (EM64T)/ 3200
94
Pentium 4 Xeon (EM64T)/ 3600
93
AMD Opteron 850/ 2400 86
AMD Opteron 248/ 2200
102
SGI Altix3700 Itanium2/1500 81
HP RX4640 Itanium2/1300-H
89
HP RX2600 Itanium2/1500-L
100
HP RX5670 Itanium2/1500-H
112
HP RX1620 Itanium2/1600-H 84
Intel Tiger Itanium2/1500 56
Compaq Alpha ES45/1250 33
SUN Blade 2000 / 1056 Cu
NEC SX-6i: 151%
30
SGI Origin3800/R14k-600
57
HP 9000/RP3440-1000
100
IBM eServer p5 570 / 1900 70
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
27
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
GAMESS-UK and DL_POLY Benchmarks 12 Typical QC Calculations
Six Typical Simulations Simulation
Module
Basis (GTOs) Species
1. SCF STO-3G (124) 2. SCF 6-31G (154) 3. ECP Geometry ECPDZ (70) 4. Direct-SCF 6-31G (82) 5. CAS-geometry TZVP (52) 6. MCSCF EXT1 (74) 7. Direct-CI EXT2 (64) 8. MRD-CI (26M) ECP (59) 9. MP2-geometry 6-31G* (70) 10. SCF 2nd derivs.6-31G (64) 11. MP2 2nd derivs.6-31G* (60) 12. Direct-MP2 DZP (76)
Computational Chemistry Benchmarks
Morphine C6H3(NO2)3 + Na7Mg Cytosine H2CO H2CO H2CO/H2+CO TiCl4 H3SiNCO C5H5N C4 C5H5N
28
Atoms Time steps
1. Na-K disilicate glass 1080 2. Metallic Al with 256 Sutton-Chen potential 3. Valinomycin in 1223 3837 water molecules 4. Dynamic shell model 768 water with 1024 sites 5. Dynamic shell model 768 MgCl2 with 1280 sites 6. Model membrane, 2 3148 membrane chains, 202 solute and 2746 solvent molecules
300 8000 100 1000 1000 1000
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
The GAMESS-UK Benchmark I. CPU Performance relative to the HP RX5670 Itanium2/1.5GHz 78
Pentium 4 Xeon / 3066
91
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600
102
AMD Opteron 850/ 2400
118 110
AMD Opteron 248/ 2200 95
SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H
85
HP RX2600 Itanium2/1500-L
94
HP RX5670 Itanium2/1500-H
100 105
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
93
Compaq Alpha ES45/1250
58
SUN Blade 2000 / 1056 Cu
40 29
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
72
2.3 minutes 118
IBM eServer p5 570 / 1900 IBM p-series 690/1700
100
0
Computational Chemistry Benchmarks
50
29
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPECfp2000 and the GAMESS-UK Benchmark Pentium 4 Xeon / 3066
78
52
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600
GAMESS-UK
91
65
102
72
AMD Opteron 850/ 2400
118
78
AMD Opteron 248/ 2200
110
81 95
SGI Altix3700 Itanium2/1500
Performance relative to the HP RX5670 Itanium2 / 1.5 GHz
HP RX4640 Itanium2/1300-H HP RX2600 Itanium2/1500-L HP RX5670 Itanium2/1500-H
100
85 86 94
101 100 100 105
HP RX1620 Itanium2/1600-H 93
Intel Tiger Itanium2/1500 58
Compaq Alpha ES45/1250
65
SGI Origin3800/R14k-600
23
97
SPECfp2000
40 39
SUN Blade 2000 / 1056 Cu
128
29
HP 9000/RP3440-1000
47
72 111
IBM eServer p5 570 / 1900 IBM p-series 690/1700
81
0
Computational Chemistry Benchmarks
50
30
128
100
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
The GAMESS-UK-99 Benchmark 10 Typical QC Calculations Module
Basis (GTOs)
1. Direct- SCF 2. SCF 3. DFT B3LYP 4. MCSCF 5. Direct-CI 6. CCSD(T) 7. MP2-geometry 8. SCF 2nd derivs. 9. MP2 2nd derivs. 10. Direct-MP2
Benchmark Time:
6-31G (227) 6-31G** (265) 6-311G* (167) CC-PVTZ (100) CC-PVTZ (100) TZV+2d+1f (144) TZVP (105) 6-311G** (144) TZVP(C2d) (104) 6-31G* (130)
Species Morphine C6H3(NO2)3 Cytosine H2CO H2CO/H2+CO C4 H3SiNCO C5H5N C4 Cytosine
30.8 minutes on IBM p-series 690/pwr4 1.3 GHz
Computational Chemistry Benchmarks
31
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
The GAMESS-UK-99 Benchmark Performance relative to the HP RX5670 Itanium2/1.5GHz 69
Pentium 4 Xeon / 3066
83
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600
95
AMD Opteron 850/ 2400
95 91
AMD Opteron 248/ 2200
92
SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H
85
HP RX2600 Itanium2/1500-L
92
HP RX5670 Itanium2/1500-H
100 101
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
86
Compaq Alpha ES45/1250
59
SUN Blade 2000 / 1056 Cu
33 28
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
70
IBM eServer p5 570/1900
115
IBM p-series 690/1700
93
0
Computational Chemistry Benchmarks
50
32
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
The DL_POLY Benchmark. Performance relative to the HP RX5670 Itanium2/1.5GHz 55
Pentium 4 Xeon / 3066
72
Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600
81
AMD Opteron 850/ 2400
85 78
AMD Opteron 248/ 2200
106
SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H
86
HP RX2600 Itanium2/1500-L
108
HP RX5670 Itanium2/1500-H
100 124
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
111
Compaq Alpha ES45/1250
61
SUN Blade 2000 / 1056 Cu
30 38
SGI Origin3800/R14k-600 HP 9000/RP3440-1000
52
IBM eServer p5 570 / 1900
72
IBM p-series 690/1700
63
0
Computational Chemistry Benchmarks
50
33
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPECfp2000 and the DL_POLY Benchmark Pentium 4 Xeon / 3066
52
55
Pentium 4 Xeon (EM64T)/ 3200
65
Pentium 4 Xeon (EM64T)/ 3600
72 81
72
AMD Opteron 850/ 2400
85
78 78
AMD Opteron 248/ 2200
DL_POLY
81
SGI Altix3700 Itanium2/1500
100
HP RX4640 Itanium2/1300-H
86 86
Performance relative to the HP RX5670 Itanium2 / 1.5 GHz
HP RX2600 Itanium2/1500-L HP RX5670 Itanium2/1500-H
106
101 100 100
108
124 128
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
111
97 61
Compaq Alpha ES45/1250 30
SUN Blade 2000 / 1056 Cu SGI Origin3800/R14k-600
23
65
SPECfp2000
39 38
HP 9000/RP3440-1000
47
52 72
IBM eServer p5 570 / 1900 63
IBM p-series 690/1300 0
Computational Chemistry Benchmarks
50
34
128 81
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Summary Index relative to the HP RX5670 Itanium2/1.5GHz The MATRIX-97, Chemistry Kernels, GAMESS-UK and DLPOLY Benchmarks 50
Pentium 4 Xeon / 3066
75
Pentium 4 Xeon (EM64T)/ 3200
91
Pentium 4 Xeon (EM64T)/ 3600
92
AMD Opteron 850/ 2400 85
AMD Opteron 248/ 2200
99
SGI Altix3700 Itanium2/1500 85
HP RX4640 Itanium2/1300-H
95
HP RX2600 Itanium2/1500-L
100
HP RX5670 Itanium2/1500-H
119
HP RX1620 Itanium2/1600-H 93
Intel Tiger Itanium2/1500 56
Compaq Alpha ES45/1250 33
SUN Blade 2000 / 1056 Cu
32
SGI Origin3800/R14k-600
60
HP 9000/RP3440-1000
103
IBM eServer p5 570 / 1900 82
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
50
35
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Computational Chemistry Rate Benchmark z
“Rate” Benchmark suite recently developed to incorporate: Matrix “kernels” (MATRIX-97+) Application packages (e.g. GAMESS-UK-99, DL_POLY+)
z
Multi-component benchmark:
z
Matrix Operations / Matrix multiplication and matrix diagonalisation Quantum Chemistry Calculations Molecular Dynamics Calculations
Rate Procedure: For each benchmark (i) run n instances at once and take elapsed time (last to finish - first to start). The rate for this benchmark is R_i = n X T_ref / T_i T_ref is the elapsed time on a reference system. (HP RX5670 Itanium2/1.5GHz scaled to a single processor (n=1) elapsed time of 100 units) Take “the geometric mean” of all the benchmarks (with the same n).
Computational Chemistry Benchmarks
36
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Components of Chemistry Rate Benchmark 1. MATRIX OPERATIONS (MATRIX-97+) z SPARSE Matrix Multiply BenchMark: Vector FORTRAN and DGEMM series of MMOs (R = A X B) are performed involving matrices of increasing order: – MATRIX-97+: 100, 200, … , 1200 (B is sparse) z
Memory bandwidth
Diagonalisation Benchmark performance of 7 routines from mathematical libraries and QC codes: – MATRIX-97+: 100, 200, …, 600
z
Q†HQ Benchmark involves use of library routines e.g. BLAS. Uses both a scalar and vector algorithm (DGEMM): – MATRIX-97+: 100, 200, … , 1000
-lblas
2. DLPOLY 5 simulations (increased no. of time steps) 3. GAMESS-UK 8 Calculations (GAMESS_UK-99 : Direct- SCF, DFT B3LYP, MCSCF, Direct-CI, MP2-geometry, SCF 2nd derivs., MP2 2nd derivs., Direct-MP2) Computational Chemistry Benchmarks
37
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: DL_POLY Component Pentium 4 Xeon / 3066
58
Pentium 4 Xeon (EM64T)/ 3200
55
75
R-DL_POLY
75
Pentium 4 Xeon (EM64T)/ 3600
85
85
AMD Opteron 248/ 2200
83
84
AMD Opteron 848/ 2200
85
84
169
106
SGI Altix3700 Itanium2/1500
106
86
HP RX4640 Itanium2/1300-H HP RX2600 Itanium2/1500-L
172 109
100
Intel Tiger Itanium2/1500
101
113
Compaq Alpha ES45/1250
63 32
31
SGI Origin3800/R14k-500
33
33
60
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
224
127
Linear increase in Rate with number of Runs
46
77
IBM eServer p5 570/1900
199
113 64
SUN Blade 2000 / 1056 Cu
212
87
110
HP RX5670 Itanium2/1500-H
1-CPU 2-CPU 4-CPU
77
151
59
117
100
200
38
300
400
500
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: DGEMM Component Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200
R-DGEMM
58
78
Pentium 4 Xeon (EM64T)/ 3600
95
99
AMD Opteron 248/ 2200
64
AMD Opteron 848/ 2200
61
SGI Altix3700 Itanium2/1500
65 125
62
187
95
96
HP RX4640 Itanium2/1300-H
167
85
87
HP RX2600 Itanium2/1500-L
97
94
HP RX5670 Itanium2/1500-H
100
100
Intel Tiger Itanium2/1500
39
39
SUN Blade 2000 / 1056 Cu
31
195 170
86
88
Compaq Alpha ES45/1250
Linear increase in Rate with number of Runs
76
29
SGI Origin3800/R14k-500 16 16 32 IBM eServer p5 570/1900 78
0
Computational Chemistry Benchmarks
192
109
112
IBM p-series 690/1700
1-CPU 2-CPU 4-CPU
86
92
157
78
100
200
39
300
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: MMO FORTRAN Pentium 4 Xeon / 3066
45
Pentium 4 Xeon (EM64T)/ 3200
R-MMO
62
Pentium 4 Xeon (EM64T)/ 3600
70 50
AMD Opteron 248/ 2200
58
39
AMD Opteron 848/ 2200
21 13
SGI Altix3700 Itanium2/1500
78
45
HP RX4640 Itanium2/1300-H
80
36
HP RX2600 Itanium2/1500-L
49
135
40
HP RX5670 Itanium2/1500-H
100
Intel Tiger Itanium2/1500
32
Compaq Alpha ES45/1250
27
99
26
Impact of Memory Bandwidth
44 138
IBM eServer p5 570/1900
132
86
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
43
52 52
23 20
SGI Origin3800/R14k-500
52
31
52
SUN Blade 2000 / 1056 Cu
1-CPU 2-CPU 4-CPU
83 100
40
150 167
200
300
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: Diagonalisation Component Pentium 4 Xeon / 3066
29
Pentium 4 Xeon (EM64T)/ 3200
7
43
R-Diagonalisation
21
Pentium 4 Xeon (EM64T)/ 3600
49
AMD Opteron 248/ 2200
47
38
AMD Opteron 848/ 2200
44
46
SGI Altix3700 Itanium2/1500
70
85
141 82
55
HP RX2600 Itanium2/1500-L
59
Intel Tiger Itanium2/1500 Compaq Alpha ES45/1250
35
SUN Blade 2000 / 1056 Cu
24
100 58
36
199 118
69
Decline in Rate with 32-bit commodity solutions
24
19 18
37
70
IBM eServer p5 570/1900
51
IBM p-series 690/1700
162
54 100
HP RX5670 Itanium2/1500-H
0
Computational Chemistry Benchmarks
1-CPU 2-CPU 4-CPU
68
71
HP RX4640 Itanium2/1300-H
SGI Origin3800/R14k-500
24
69 49
111 100
100
41
200
300
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: GAMESS-UK Component Pentium 4 Xeon / 3066
67
Pentium 4 Xeon (EM64T)/ 3200
40
81
Pentium 4 Xeon (EM64T)/ 3600
91
AMD Opteron 248/ 2200
86
AMD Opteron 848/ 2200
88
SGI Altix3700 Itanium2/1500
87
HP RX4640 Itanium2/1300-H
84
HP RX2600 Itanium2/1500-L
89
HP RX5670 Itanium2/1500-H
100
53 36 59
Compaq Alpha ES45/1250 SUN Blade 2000 / 1056 Cu
33
103
75
114
87 96
178
70
62
53
98 103
General decline in Rate with number of Runs
32
SGI Origin3800/R14k-500 16 20 15 116
IBM eServer p5 570/1900
85
IBM p-series 690/1700 0
Computational Chemistry Benchmarks
1-CPU 2-CPU 4-CPU
85
83
80
Intel Tiger Itanium2/1500
R-GAMESS-UK
26
111 51
100
173
40
200
42
300
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Computational Chemistry Rate Metric z
Chemistry Rate Metric: For each component benchmark (i) run n instances at once and take elapsed time (last to finish - first to start). Component rate R_i = n X T_ref / T_i T_ref is the elapsed time on the HP RX5670 Itanium2/1.5 GHz scaled to a single CPU time of 100 units.
R_CHEM = 0.1 X R_DGEMM + 0.1 X R_Diagonalisation + 0.4 X R_DLPOLY + 0.4 X R_GAMESS-UK
Computational Chemistry Benchmarks
43
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Chemistry Rate Benchmark Pentium 4 Xeon / 3066
61
Pentium 4 Xeon (EM64T)/ 3200
44
76
Pentium 4 Xeon (EM64T)/ 3600
85
AMD Opteron 248/ 2200
79
AMD Opteron 848/ 2200
80
1-CPU 2-CPU 4-CPU
51 67 58 68
121
94
SGI Altix3700 Itanium2/1500
92
85
HP RX4640 Itanium2/1300-H
82
HP RX2600 Itanium2/1500-L
95
HP RX5670 Itanium2/1500-H
100
Intel Tiger Itanium2/1500
99
32 23
SGI Origin3800/R14k-500
54
0
Computational Chemistry Benchmarks
R-CHEM
30 25
71
IBM p-series 690/1700
158 106
31
95
IBM eServer p5 570/1900
190
87
58
SUN Blade 2000 / 1056 Cu
147 93
92
Compaq Alpha ES45/1250
159
93 57
160 88
100
44
200
300
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Evaluation and Benchmarking of Highend and Commodity-based Sytems Parallel Systems and Capacity Computing
Computational Chemistry Benchmarks
45
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
High-End Systems Evaluated z
Cray T3E/1200E ( … historical … ) 816 processor system at Manchester (CSAR), 600 Mz Alpha EV56 CPU, 256 MB
z
IBM pseries 690 and pseries 690+ (Daresbury) IBM p690 (8-way LPAR’d nodes, 1280 X 1.3 GHz CPUs with colony, HPCx) IBM p690+ (32-way nodes, 1600 X 1.7 GHz CPUs with HPS, HPS HPCx- Phase2)
z
Compaq AlphaServer SC 4-way ES40/667 A21264A (APAC) and 833 MHz SMP nodes (2 GB RAM); TCS1 system at PSC (750 4-way ES45 nodes - 3,000 EV68 CPUs - 4 GB memory per node, 8MB L2 cache), Quadrics interconnect (5 usec latency, 250 MB/sec B/W)
z
SGI Origin 3800 SARA (1000 CPUs) - Numalink with MIPS R14k/500 CPUs
z
SGI Altix 3700 Linux Cluster - Numalink with Itanium 2 1.3 GHz CPUs, 3MB L3 cache z CSAR (“newton” 512 CPUs) and SARA (“aster” - 416 CPUs - 7 nodes) ORNL (“ram” 256 CPUs with Itanium 2 1.5 GHz CPUs, 6MB L3 cache)
Computational Chemistry Benchmarks
46
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Commodity Systems (CSx) Prototype / Evaluation Hardware Systems
Location
CPUs
Configuration
CS1 CS2
Daresbury Daresbury
32 64
CS3 CS4 CS6
RAL Sara CLiC
16 32 528
CS7 CS8 CS9
Daresbury NCSA Bristol
64 320 96
PentiumIII / 450 MHz + FE (EPSRC) 24 X dual UP2000/EV67-667, QSNet Alpha/LINUX cluster, 8 X dual CS20/EV67-833 (“loki”) Athlon K7 850MHz + myrinet Athlon K7 1.2 GHz + FE PentiumIII / 800 MHz; fast ethernet (Chemnitzer Cluster) AMD K7/1000 MP + SCALI/SCI (“ukcp”) 160 dual IBM Itanium/800 + Myrinet 2k (“titan”) Pentium4 Xeon/2000 + Myrinet 2k (“dirac”)
10 16
10 CPUS, Pentium II/266 8 X dual Pentium III/933, SCALI
Prototype Systems CS0 CS5
Daresbury Daresbury
Computational Chemistry Benchmarks
47
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Commodity Systems (CSx) II. Systems
Location
CPUs
CS10
Hull
64
CS11 CS12
Workstations 32 Essex 48
CS13
256
CS14 CS15 CS16 CS17
White Rose, Leeds NCSA SDSC SDSC Daresbury
CS18 CS19
Bradford Daresbury
78 64
CS20
RAL
256
Computational Chemistry Benchmarks
1024 128 256 32
Configuration Pentium4 Xeon/2667 + Myrinet 2k (“eagle”), Streamline/SCORE Pentium4 Xeon/2667 + GbitEther, ScaMPI Pentium4 Xeon/2400 + GbitEther (“sstream1”), Streamline/SCORE Pentium4 Xeon/2200-2400 + M2k (“snowdon”), Streamline/SCORE Pentium III Xeon/1000 + M2k (“platinum”) Pentium III Xeon/ 800 + M2k (“meteor”) dual-Itanium2/1.3 GHz + M2k (“Teragrid”) Pentium4 Xeon/2667 + GbitEther (“ccp1”), Streamline/SCORE Pentium4 Xeon/2800 + M2k/GbitE (“grendel”) dual-Opteron/246 2.0 GHz nodes + Infiniband, Gbit and SCI (“scaliwag”) dual-Opteron/248 2.2 GHz nodes + Myrinet (“scarf”) 48
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Applications Performance Overview • •
Serial (SPEC, DL) & Communication Benchmarks Parallel Applications Performance
1. Computational Chemistry: Molecular Simulation & Electronic Structure 2. Computational Materials Science 3. Atomic & Molecular Physics 4. Computational Engineering 5. Environmental Modelling
z z z z
Capacity-based group solution Issues of Cost effectiveness On e.g. 128-256 CPU cluster, modal job size is ~ 32 CPUs Increasing trend to hierarchical clusters - Gbit network with HEC core (with e.g. myrinet)
Capability and Capacity Computing Commodity vs. Proprietary Solutions Computational Chemistry Benchmarks
49
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Performance Metrics: 1999-2001 Attempted to quantify delivered performance from the Commodity-based systems against MPP (CSAR Cray T3E/1200E) and ASCI-style SMP-node platforms (e.g. SGI Origin 3800) i.e.
Performance Metric (% 32-node Cray T3E) T (32-nodes Cray T3E/1200E) / T (32 CPUs ) CSx [ T 32-node T3E / T 32-node CS1 Pentium III/450 + FE ] T 32-node T3E / T 32-node CS6 Pentium III/800 + FE T 32-node T3E / T 32-CPU CS2 Alpha Linux Cluster + Quadrics
Performance Metrics: 2002 Performance Metric (% 32-node AlphaServer SC [PSC]) T (32-CPUs AlphaServer SC ES45/1000) / T (32 CPUs ) CSx T 32-CPU AlphaServer ES45 / T 32-CPU CS9 Pentium 4 Xeon / 2000 + Myrinet 2k Computational Chemistry Benchmarks
50
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Commodity Comparisons with High-end Systems % of 32 CPUs of Compaq AlphaServer SC ES45/1000 Cluster CS9 Pentium4/2000 Xeon + Myrinet
CS9 - 66% of ES45/1000
ANGUS (288) ANGUS (144) CPMD CASTEP CHARMM DLPOLY_2 (Macromolecular)
2002
DLPOLY-2 (Ionic) GAMESS-UK +CHARMM NWChem (DFT Jfit) GAMESS-UK (HF Forces) GAMESS-UK (MP2 gradient) GAMESS-UK (DFT gradient) GAMESS-UK (DFT-Jfit) GAMESS-UK (DFT) GAMESS-UK (SCF) 0
Computational Chemistry Benchmarks
20
40
51
60
80
100
120
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Performance Metrics: 2004 Attempt to quantify delivered performance from current Commodity-based systems against high-end platforms: SGI Altix 3700/Itanium2 1.3 GHz (CSAR service) and IBM p690+ (HPCx service) i.e.
Performance Metric (% 32-node SGI Altix 3700/Itanium 2 1.3 GHz)
T (32-CPUs SGI Altix 3700/Itanium2 1.3GHz) / T (32 CPUs ) CSx T 32-CPU SGI Altix 3700 / T 32-CPU CS20 Opteron 248 / 2800 + Myrinet 2k T 32-CPU SGI Altix 3700 / T 32-CPU CS18 Pentium 4 Xeon / 2800 + Gbit Ethernet
Computational Chemistry Benchmarks
52
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
DL_POLY V2: High-end and Commodity-based Systems Bench 4: NaCl; 27,000 ions, Ewald , 75 time steps, Cutoff=24Å
Performance 12 10 8
CS12 P4/2400 + GbitE CS10 P4/2666 + Myrinet CS18 P4/2800 + GbitE CS18 P4/2800 + Myrinet CS19 Opteron246/2.0 + IB CS19 Opteron246/2.0 + SCI CS20 Opteron246/2.2 + Myrinet CS16 Itanium2/1300 + Myrinet SGI Altix 3700/Itanium2 1300
9.1
6.7 6.9
7.4
5.6
6
30%,68%
5.0 4.0
4 2
10.9
3.3 1.7 1.8
3.6 3.1
3.3 3.5
T3E128 = 2.13
1.9
0 16
Number of CPUs
Computational Chemistry Benchmarks
53
32 9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Commodity Comparisons with High-end Systems % of 32 CPUs of SGI Altix 3700 / 1300
CS18 - 49% of Altix
Cluster CS18 Pentium4/2800 Xeon + Gbit Ethernet PCHAN ANGUS DLMULTI DL_POLY-3 (Macromolecular) DLPOLY-3 (Ionic) DLPOLY_2 (Macromolecular) DLPOLY-2 (Ionic) NWChem (DFT Jfit) GAMESS-UK (DFT Forces) GAMESS-UK (HF Forces) GAMESS-UK (MP2 gradient) GAMESS-UK (DFT gradient) GAMESS-UK (DFT-Jfit) GAMESS-UK (DFT) GAMESS-UK (SCF)
0
Computational Chemistry Benchmarks
20
40
54
60
80
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Commodity Comparisons with High-end Systems % of 32 CPUs of SGI Altix 3700 / 1300 Cluster CS20 AMD Opteron 248/2.2 + Myrinet
CS20 - 100% of Altix
PCHAN ANGUS CPMD DLMULTI DL_POLY-3 (Macromolecular) DLPOLY-3 (Ionic) DLPOLY_2 (Macromolecular) DLPOLY-2 (Ionic) NWChem (DFT Jfit) GAMESS-UK (DFT Forces) GAMESS-UK (HF Forces) GAMESS-UK (MP2 gradient) GAMESS-UK (DFT gradient) GAMESS-UK (DFT-Jfit) GAMESS-UK (DFT) GAMESS-UK (SCF)
0
20
Computational Chemistry Benchmarks
40
60
55
80
100
120
140
160
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SUMMARY z
Processor Performance Overview Single-processor performance, Performance Metrics
z
SPECfp and Computational Chemistry Benchmark (serial)
z
Matrix’89 and Matrix’97 kernels (MMO, diagonalisation), Application kernels” (SCF, MD, QMC and JACOBI + STREAM), Application packages (GAMESS-UK, DL_POLY)
Increasing importance of Rate-based benchmarks: SPECfp_rate and Chemistry Rate Benchmark
z
Machine COSTS vs. Performance: URLs: Powerpoint presentation and Paper: http://www.cse.clrc.ac.uk/disco/hw-perf.shtml
z
Benchmarking & Evaluation of Commodity-based Systems CS20 Opteron248/2200 cluster with Myrinet delivers on average 100% of SGI Altix 3700/1300. Cost effectiveness of the Clusters (e.g., CS18) with Gbit Ethernet - delivers 49% of the SGI Altix 3700/1300.
Computational Chemistry Benchmarks
56
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Fortran Compilers - Performance Performance relative to GNU g77 on Pentium 4 Xeon/2666 OVERALL
G77: v0.5.26 pgf90: v4.0-2 ifc: v7.1
GAMES S -UK S TREAM JACOBI QMC Ke rne l MD Ke rne l S CF Ke rne l Diag -97 QHQ-97 MMO-97 Whe ts to ne -97 0
50
Computational Chemistry Benchmarks
100
150
57
200
250
300
350
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: Diagonalisation Component AMD MP2000+ / 1667 13 AMD Opteron 244 / 1800
6
33
AMD Opteron 848 / 2200
43
Pentium4 Xeon / 3066
R-Diagonalisation
29 30
37
7
29
HP RX2600 Itanium2/1000-H
64
HP RX4640 Itanium2/1300-H
64
85
SGI Altix 3700 Itanium2/1300
42
Intel Tiger Itanium2/1500
83 41 58
117
100
Compaq Alpha ES45/1250
35
SUN Blade 2000 / 1056 Cu
24
SUN FireV880 / 900 Cu
21
SGI Origin3800/R14k-500
199
69
Decline in Rate with 32-bit commodity solutions
24 21
42 37
50
IBM p-series 690/1300
100
35
19 18
IBM p-series 690/1700
161
80
59
HP RX5670 Itanium2/1500-H
1-CPU 2-CPU 4-CPU
37
0
Computational Chemistry Benchmarks
50 33
100 38
100
200
58
300
9th December 2004
400
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: DGEMM Component AMD MP2000+ / 1667
20
23
AMD Opteron 244 / 1800 AMD Opteron 848 / 2200
81
80
Intel Tiger Itanium2/1500
162 86
87
HP RX5670 Itanium2/1500-H
38
39
SUN Blade 2000 / 1056 Cu
31
SUN FireV880 / 900 Cu
28
77
Linear increase in Rate with number of Runs
28
56
32
Computational Chemistry Benchmarks
110
59
60
0
156
78
78
IBM p-series 690/1300
195
29
16 16
IBM p-series 690/1700
170 99
100
Compaq Alpha ES45/1250
166
85
87
SGI Altix 3700 Itanium2/1300
1-CPU 2-CPU 4-CPU
66
67
HP RX4640 Itanium2/1300-H
105 58
78
HP RX2600 Itanium2/1000-H
SGI Origin3800/R14k-500
58
61
Pentium4 Xeon / 3066
R-DGEMM
47
48
50
100
150
59
200
250
300
350
9th December 2004
400
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: DL_POLY Component AMD MP2000+ / 1667
35
33
AMD Opteron 244 / 1800
68
AMD Opteron 848 / 2200
80
Pentium4 Xeon / 3066
80
58
HP RX2600 Itanium2/1000-H
R-DL_POLY
67
161
55
64
64
HP RX4640 Itanium2/1300-H
86
SGI Altix 3700 Itanium2/1300
90
Intel Tiger Itanium2/1500
86
173
89
177
113
HP RX5670 Itanium2/1500-H
113
100
Compaq Alpha ES45/1250 32
SUN FireV880 / 900 Cu
27
SGI Origin3800/R14k-500
64
27
127
26
45
0
Computational Chemistry Benchmarks
Linear increase in Rate with number of Runs
54 46
60
IBM p-series 690/1300
200
31
40
IBM p-series 690/1700
224
101
64
SUN Blade 2000 / 1056 Cu
1-CPU 2-CPU 4-CPU
60 45
117 90
100
200
60
300
400
9th December 2004
500
Computational Science and Engineering Department
Daresbury Laboratory
Rate Benchmark: GAMESS-UK Component AMD MP2000+ / 1667
37
AMD Opteron 244 / 1800
53
AMD Opteron 848 / 2200
R-GAMESS-UK
34 76
Pentium4 Xeon / 3066
66
67
HP RX2600 Itanium2/1000-H
59
84
SGI Altix 3700 Itanium2/1300
75
73
Intel Tiger Itanium2/1500
138
70
97
100
Compaq Alpha ES45/1250
62
96 53
SUN Blade 2000 / 1056 Cu
33
SUN FireV880 / 900 Cu
28
26
SGI Origin3800/R14k-500
30
6 16
IBM p-series 690/1700
67
0
179 102
32
General decline in Rate with number of Runs
54
85
IBM p-series 690/1300
113
71
80
HP RX5670 Itanium2/1500-H
1-CPU 2-CPU 4-CPU
19
68
HP RX4640 Itanium2/1300-H
103
50 40
50
Computational Chemistry Benchmarks
40 51
100
150
61
200
250
300
350
9th December 2004
400
Computational Science and Engineering Department
Daresbury Laboratory
Chemistry Rate Benchmark AMD MP2000+ / 1667
19
31
AMD Opteron 244 / 1800
49
57
43
58
Pentium4 Xeon / 3066
63
66
HP RX2600 Itanium2/1000-H HP RX4640 Itanium2/1300-H
149
76
77
SUN Blade 2000 / 1056 Cu
31
SUN FireV880 / 900 Cu
27
26
SGI Origin3800/R14k-500
32
18
30
34
0
Computational Chemistry Benchmarks
94
57 72
43
52
IBM p-series 690/1300
R-CHEM
52
68
IBM p-series 690/1700
106
54
57
Compaq Alpha ES45/1250
192
98
100
HP RX5670 Itanium2/1500-H
165
89
93
Intel Tiger Itanium2/1500
152
82
85
SGI Altix 3700 Itanium2/1300
1-CPU 2-CPU 4-CPU
120
67
71
AMD Opteron 848 / 2200
50
100
150
62
200
250
300
350
400
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
SPEC CPU 2000 - SPECfp2000_base Values relative to HP RX5670 Itanium2/1.5GHz 87
Intel D925XCV2 (3.8 GHz P4) 83
AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz
73
HP RX4640 Itanium2/1600-9M
129 100
HP RX5670 Itanium2/1500 86
HP RX5670 Itanium2/1300 HP AlphaServer GS1280 7/1300
61
Sun Fire V440 / US IIIi 1600
57
Sun Java Wrkst. (Opteron/2.4 GHz)
78 72
Fujitsu PP900 GP64/1890 SGI Origin 3200/R14k-600
24
SGI Altix 3000 Itanium 2/1300
88
SGI Altix 3000 Itanium 2/1500
101 126
SGI Altix 3000 Bx2/1600-9M HP 9000/RP3440-1000
41 122
IBM eServer p5 570/1900 IBM p-series 690/1700
76
0
Computational Chemistry Benchmarks
50
63
100
9th December 2004
Computational Science and Engineering Department
Daresbury Laboratory
The GAMESS-UK and GAMESS-UK-99 Benchmark Pentium 4 Xeon / 3066
69
Pentium 4 Xeon (EM64T)/ 3200
78 91
83
Pentium 4 Xeon (EM64T)/ 3600
95
AMD Opteron 850/ 2400
95
AMD Opteron 248/ 2200
102 118 110
91
SGI Altix3700 Itanium2/1500
Performance relative to the HP RX5670 Itanium2 / 1.5 GHz
HP RX4640 Itanium2/1300-H HP RX2600 Itanium2/1500-L
92
95
85 85 94 92 100 100
HP RX5670 Itanium2/1500-H
105 101
HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500
86 58 59
Compaq Alpha ES45/1250 SUN Blade 2000 / 1056 Cu
33
GAMESS-UK GAMESS-UK-99
40
29 28
SGI Origin3800/R14k-600
93
72 70
HP 9000/RP3440-1000
118 115
IBM eServer p5 570/1900 IBM p-series 690/1700
93
0
Computational Chemistry Benchmarks
50
64
100
100
9th December 2004