Performance of Various Computers in Computational ...

4 downloads 0 Views 565KB Size Report
Dec 9, 2004 - Pentium 4 Xeon / 3066. AMD MP2400+ / 2000. MEW14 (2003) - Summary PI relative to the IBM p-series. 690/pwr4 1.3 GHz. The MATRIX-97 ...
Computational Science and Engineering Department

Daresbury Laboratory

PERFORMANCE of VARIOUS COMPUTERS in COMPUTATIONAL CHEMISTRY 15th Daresbury Machine Evaluation Workshop CCLRC Daresbury Laboratory Martyn F. Guest Computational Science and Engineering Department [email protected] Acknowledgements: M. Ehrig (HP/Compaq), Streamline, Workstations UK, ClusterVision, Compusys Computational Chemistry Benchmarks

1

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

OUTLINE z

Processor Performance Overview „ Single-processor performance, Performance Metrics „ SPEC (Standard Performance Evaluation Corporation) z SPEC 95 and SPEC CPU 2000

z

Computational Chemistry Benchmark (serial) - SPECfp ? „ „ „

z

Matrix and application “kernels” Application packages (GAMESS-UK, DL_POLY) Comparison involves 220+ computers (vector supercomputers, workstations, PCs and MPP nodes)

Recently extended to include “RATE” benchmarks „ URLs: Powerpoint presentation and Paper: http://www.cse.clrc.ac.uk/disco/hw-perf.shtml



Benchmarking of Parallel Commodity Systems

Computational Chemistry Benchmarks

2

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

MEW14 (2003) - Summary PI relative to the IBM p-series 690/pwr4 1.3 GHz The MATRIX-97, Chemistry AMD MP2400+ / 2000

69 108

Pentium 4 Xeon / 3066

Kernels, GAMESS-UK and DL_POLY Benchmarks 140

SGI Altix3700 Itanium2/1300

143

HP RX4640 Itanium2/1300-H

162

HP RX2600 Itanium2/1500-L HP RX5670 Itanium2/1500-H

171

Intel Tiger Itanium2/1500

159

AMD Opteron 848/ 2200

128

AMD Opteron 244/ 1800

106 96

Compaq Alpha ES45/1250 85

Compaq Marvel EV7 /1000 55

SUN Blade 2000 / 1056 Cu SUN FireV880 / 900 Cu

47

SGI Origin3800/R14k-600

54

HP PA-9000/RP7410-875

79

IBM p-series 690/1700

133 100

IBM p-series 690/1300 0

20

Computational Chemistry Benchmarks

40

60

80

3

100

120

140

160

180

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

MACHINES UNDER EVALUATION Machine AMD Opteron

Processor 850 (2.4 GHz), 248 (2.2 GHz), 848 (2.2 GHz), 246 (2.0 GHz), 244 (1.8 GHz), 842 (1.6 GHz)

Intel IA32 Intel x86-64 (EM64T)

P4 Xeon 3.06 GHz, 3.2 GHz Nocona 3.2 GHz, 3.4 GHz, 3.6GHz

SUN Blade 2000 / 1056 Cu SUN Fire V880 / 900 Cu

UltraSPARC-3 / 1056 MHz UltraSPARC-3 / 900 MHz

HP 9000 / RP3440-4 HP 9000 / RP7410

PA8800 / 1000MHz PA8700+ / 875MHz

HP RX4640 (4-way) HP RX2600 (2-way) HP RX5670 (4-way) HP RX1620 (2-way)

Itanium 2 / 1300 MHz (3 MB L3) Itanium 2 / 1500 MHz (6 MB L3) Itanium 2 / 1500 MHz (6 MB L3) Itanium 2 / 1600 MHz (3 MB L3)

Intel Tiger (4-way) Intel Tiger (4-way)

Itanium 2 / 1000 MHz (3 MB L3) Itanium 2 / 1500 MHz (6 MB L3)

Machine HP/Compaq ES45 HP/Compaq ES45 Compaq Marvel SGI Altix 3700 SGI Origin3800/R14k SGI Origin300/R14k SGI O2/R12k-SC

Itanium 2 / 1.5 & 1.3 GHz R14000/R14010 600 MHz R14000/R14010 500 MHz R12000/R12010 270 MHz

IBM p-Series 630 IBM p-Series 690 IBM p-Series 690 IBM p5 570/1.9

RS/6000 /POWER4 1.0GHz RS/6000 /POWER4 1.3GHz RS/6000 /POWER4 1.7GHz RS/6000 /POWER5 1.9GHz

Vector Supercomputers NEC SX-6i, SX-5, SX-4 Cray Y-MP/J90-10, FUJITSU VPP/300, Cray YMP C98/4256 MPP Nodes Cray T3E/1200 IBM SP / Power3

Computational Chemistry Benchmarks

4

Processor AXP A21264C / 1000 MHz AXP A21264C / 1250 MHz EV7 1000 MHz

AXP EV56 600 MHz RS/6000 WH2 - 375 MHz

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPEC Benchmarks z

Compute intensive categories „ „ „

z

integer versus floating point conservative versus aggressive compilation speed versus throughput

Composite Metrics - SPEC CPU 2000

NOT: • Graphics • Network • I/O

http://www.specbench.org

SPEED

THROUGHPUT

SPECint2000 SPECfp2000

SPECint_rate2000 SPECfp_rate2000

SPECint_base2000 SPECfp_base2000

SPECint_rate_base2000 SPECfp_rate_base2000

Aggressive Conservative

z z z z

SPECratio - T(measured system) / T (reference) Reference = 300 MHz Ultra 5/10 (100) SPECfp2000 - geometric mean of 14 ratios, one for each benchmark SPECint2000 - geometric mean of 12 ratios, one for each benchmark

Computational Chemistry Benchmarks

5

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPEC CPU 2000 - Floating point Benchmark Suite (SPECfp2000) Benchmark

Language

168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi

F77 F77 F77 F77 C F90 C C F90 C F90 F90 F77 F77

Description Physics: Quantum chromodynamics Shallow water modelling Multigrid solver: 3D potential field Partial differential equations 3D graphics library Computational fluid dynamics Image recognition / neural networks Seismic wave propagation simulation Image processing: Face recognition Computational chemistry Number theory / primality testing Finite-element crash simulation Nuclear physics accelerator design Metereology: Pollutant distribution

Reference: 300 MHz Ultra 5/10 = 100 Computational Chemistry Benchmarks

6

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPEC CPU 2000 - SPECfp2000

Values relative to HP RX5670 Itanium2/1.5GHz 87

Intel D925XCV2 (3.8 GHz P4)

89

AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz HP RX4640 Itanium2/1600-9M HP RX5670 Itanium2/1500

2712 : HP RX4640 Itanium2/1600 2647 : SGI Altix Bx2/1600 2702 : IBM p5 570 / 1.9 GHz

81 129 100 86

HP RX5670 Itanium2/1300

2108

80

HP AlphaServer GS1280 7/1300 64

Sun Fire V440 / US IIIi 1600

85

Sun Java Wrkst. (Opteron/2.4 GHz)

86

Fujitsu PP900 GP64/1890 25

SGI Origin 3200/R14k-600

88

SGI Altix 3000 Itanium 2/1300

102

SGI Altix 3000 Itanium 2/1500

126

SGI Altix 3000 Bx2/1600-9M 47

HP 9000/RP3440-1000

128

IBM eServer p5 570/1900 81

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

7

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPEC CPU 2000 - SPECint2000

Values relative to HP RX5670 Itanium2/1.5GHz Intel D925XCV2 (3.8 GHz P4)

127 132

AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz

119

HP RX4640 Itanium2/1600-9M

121

HP RX5670 Itanium2/1500

100

HP RX5670 Itanium2/1300

81 76

HP AlphaServer GS1280 7/1300 64

Sun Fire V440 / US IIIi 1600 Sun Java Wrkst. (Opteron/2.4 GHz)

121

Fujitsu PP900 GP64/1890

103

SGI Origin 3200/R14k-600

38 78

SGI Altix 3000 Itanium 2/1300

95

SGI Altix 3000 Itanium 2/1500

110

SGI Altix 3000 Bx2/1600-9M HP 9000/RP3440-1000

40

IBM eServer p5 570/1900

111

IBM p-series 690/1700

85

0

Computational Chemistry Benchmarks

50

8

100

150

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Memory System Performance Limitations Why applications with limited memory reuse perform inefficiently today

Year of Introduction (Cray) 1988

Goal: understand application performance to memory reuse and other factors Computational Chemistry Benchmarks

1994

2000

C ray

C90 Y-MP T90 J90

Intel

EL-90

Alpha SGI MIPS

10.00%

Sun

SV1

Micro p ro cesso rs

• STREAMS ADD: Computes A + B for long vectors A and B (historical data available) • New microprocessor generations “reset” performance to at most 6% of peak • Performance degrades to 1% 3% of peak as clock speed increases within a generation

M e a s u re d % p e a k (S T R E A M S A D D )

100.00%

Cray

1.00% 10

100

1000

10000

Clock Speed MHz

9

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

IBM p-series 690Turbo:Multi-chip Module (MCM) Four POWER4 chips (8 processors) on an MCM, with two associated memory slots

GX Bus

GX Bus

M E M O R Y S L O T

Mem Ctrl

L3

Shared L2

Shared L2

Distributed switch

Distributed switch

Distributed switch

Distributed switch

Mem Ctrl

L3

L3

Mem Ctrl

Shared L2

Shared L2

L3

Mem Ctrl

M E M O R Y S L O T

4 GX Bus links for external L3 cache shared across all processors connections Computational Chemistry Benchmarks

10

9th December 2004

GX Bus

GX Bus

Computational Science and Engineering Department

Daresbury Laboratory

SPEC CPU 2000 - SPECfp2000

Units of SPECfp as a function of Peak Performance Intel D925XCV2 (3.8 GHz P4) AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz HP RX4640 Itanium2/1600-9M HP RX5670 Itanium2/1500 HP RX5670 Itanium2/1300 HP AlphaServer GS1280 7/1300 Sun Fire V440 / US IIIi 1600 Sun Java Wrkst. (Opteron/2.4 GHz) Fujitsu PP900 GP64/1890 SGI Origin 3200/R14k-600 SGI Altix 3000 Itanium 2/1300 SGI Altix 3000 Itanium 2/1500 SGI Altix 3000 Bx2/1600-9M HP 9000/RP3440-1000 IBM eServer p5 570/1900 IBM p-series 690/1700

0

0.2

0.4

0.6

SPECfp2000 / CPU Peak Performance Computational Chemistry Benchmarks

11

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPEC CPU 2000 - SPECfp vs SPECfp_rate (32 CPUs) Values relative to IBM 690 Turbo 1.7 GHz

HP AlphaServer GS320/1224 Compaq Alpha GS1280 7/1300

SPECfp SPECfp_rate

Fujitsu PP1500 GP64/1890 Sun Fire 15K / 1.2 GHz SGI Origin3800/R14k-600 SGI Altix 3000 1.5 GHz Itanium2 SGI Altix Bx2 1.6 GHz Itanium2 Bull Novascale 1.5GHz/Itanium2

Itanium Systems

HP Superdome 1.5 GHz Itanium2 HP Superdome/PA8700-750 IBM eServer p5 590/1650

pwr4

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

0.5

12

1

1.5

2

pwr5 2.5

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Computational Chemistry Benchmark Suite z

Benchmark suite developed to incorporate: „ „ „

z

Matrix “kernels” (MATRIX-89 and MATRIX-97) Application “kernels” Application packages (e.g. GAMESS-UK, DL_POLY)

Implemented on Supercomputers, servers, workstations, PCs and parallel machines „ Matrix Operations / Matrix multiplication and matrix diagonalisation „ Computational Chemistry Kernels - four typical application kernels (direct-SCF, MD, QMC and Jacobi eigen solver) „ STREAM (memory bandwidth) „ Quantum Chemistry Calculations - twelve typical applications, including SCF, direct-SCF, CASSCF, MCSCF, direct-CI and MRD-CI, MP2, 2nd derivatives (GAMESS-UK-89 and GAMESS-UK-99) „ Molecular Dynamics Calculations - six typical simulations

Computational Chemistry Benchmarks

13

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Fortran Compilers - Performance Performance relative to GNU g77 on Opteron 246 / 2.0 GHz OVERALL

113.2

133.1

S TREAM JACOBI QMC Ke rne l MD Ke rne l S CF Ke rne l

G77: gcc V 3.2.2 (SuSE Linux) pgf90: v5.2-1 Pathscale EKO (v1.4)

Diag -97 QHQ-97 MMO-97 Whe ts to ne -97 0

50

Computational Chemistry Benchmarks

100

14

150

200

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

COMPUTATIONAL CHEMISTRY BENCHMARKS I. Matrix Operations MATRIX OPERATIONS (MATRIX-89 and MATRIX-97) z SPARSE Matrix Multiply BenchMark „ MMO operation is central to the efficient operation of modern QC codes. In this benchmark a series of MMOs (R = A X B) are performed involving matrices of increasing order: – MATRIX-89: 10, 20, 30, … , 100 (B is sparse) – MATRIX-97: 50, 100, 150, … , 500 (B is sparse) z

Memory bandwidth

Diagonalisation Benchmark „ Based on diagonalising a series of real symmetric matrices. Measures the performance of 8 routines from mathematical libraries and QC codes: – MATRIX-89: 10, 20, 30, … , 100 – MATRIX-97: 50, 100, 150, 200, 250, 300

z

Q†HQ Benchmark „ Designed to extend MMO benchmark by allowing for the use of library routines e.g. BLAS. Uses both a scalar and vector algorithm: – MATRIX-89: 10, 20, 30, … , 150 – MATRIX-97: 20, 40, 60, … , 300

Computational Chemistry Benchmarks

15

-lblas 9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Matrix-97: SPARSE MMO Benchmark. Performance relative to the HP RX5670 Itanium2/1.5GHz 29

Pentium 4 Xeon / 3066

45

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600

57

AMD Opteron 850/ 2400

100 91

AMD Opteron 248/ 2200 SGI Altix3700 Itanium2/1500

86

HP RX4640 Itanium2/1300-H

86

HP RX2600 Itanium2/1500-L

NEC NECSX-6i SX-6i::204 204

77

HP RX5670 Itanium2/1500-H

100 176

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

67

Compaq Alpha ES45/1250

35

SUN Blade 2000 / 1056 Cu

20

SGI SGIO2/R12k-270 O2/R12k-270: :33

28

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

35 148

IBM eServer p5 570 / 1900 102

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

100

16

150

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Matrix-97: Q†HQ MMO Benchmark.

Performance relative to the HP RX5670 Itanium2/1.5GHz 49

Pentium 4 Xeon / 3066

74

Pentium 4 Xeon (EM64T)/ 3200

82

Pentium 4 Xeon (EM64T)/ 3600 56

AMD Opteron 850/ 2400

54

AMD Opteron 248/ 2200

103

SGI Altix3700 Itanium2/1500 89

HP RX4640 Itanium2/1300-H

106

HP RX2600 Itanium2/1500-L 100

HP RX5670 Itanium2/1500-H

110

HP RX1620 Itanium2/1600-H 103

Intel Tiger Itanium2/1500 39

Compaq Alpha ES45/1250 29

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i:: 145 145

21

SGI Origin3800/R14k-600

62

HP 9000/RP3440-1000

105

IBM eServer p5 570 / 1900 73

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

17

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Matrix-97: Diagonalisation Benchmark Performance relative to the HP RX5670 Itanium2/1.5GHz 77

Pentium 4 Xeon / 3066

111

Pentium 4 Xeon (EM64T)/ 3200

126

Pentium 4 Xeon (EM64T)/ 3600 110

AMD Opteron 850/ 2400 100

AMD Opteron 248/ 2200 89

SGI Altix3700 Itanium2/1500

87

HP RX4640 Itanium2/1300-H

84

HP RX2600 Itanium2/1500-L

100

HP RX5670 Itanium2/1500-H

113

HP RX1620 Itanium2/1600-H 83

Intel Tiger Itanium2/1500 73

Compaq Alpha ES45/1250

SGI SGIO2/R12k-270 O2/R12k-270: :14 14

33

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i::16 16

37

SGI Origin3800/R14k-600

82

HP 9000/RP3440-1000

118

IBM eServer p5 570 / 1900 108

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

18

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

The Matrix-97 Benchmarks. Performance relative to the HP RX5670 Itanium2/1.5GHz 52

Pentium 4 Xeon / 3066

77

Pentium 4 Xeon (EM64T)/ 3200

88

Pentium 4 Xeon (EM64T)/ 3600 71

AMD Opteron 850/ 2400 67

AMD Opteron 248/ 2200

93

SGI Altix3700 Itanium2/1500 87

HP RX4640 Itanium2/1300-H

89

HP RX2600 Itanium2/1500-L

100

HP RX5670 Itanium2/1500-H

133

HP RX1620 Itanium2/1600-H 85

Intel Tiger Itanium2/1500 49

Compaq Alpha ES45/1250 27

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i::122 122

29

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

59 123

IBM eServer p5 570 / 1900 94

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

19

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

COMPUTATIONAL CHEMISTRY KERNELS Programs that are realistic models of actual chemical applications or algorithms „ Self consistent Field (SCF) z SCF code using distributed primitive 1s gaussian functions as a basis (thus emulating the use of s, p, functions); performs direct-SCF calculation on Be4 (60 functions).

„ Molecular Dynamics (MD) z This code bounces a few thousand argon atoms around in a box with periodic boundary conditions. LJ pair-wise interactions are used with integration of the Newtonian equations of motion.

„ Quantum Monte Carlo (QMC) z This code evaluates the energy of the simplest explicitly correlated electronic wavefunction for the He atom using a variational monte-carlo method without importance sampling.

„ Jacobi iterative linear equation solver (JACOBI) z JACOBI uses a naive jacobi iterative algorithm to solve a linear equation. All the time is spent in a large matrix-vector multiplication. Computational Chemistry Benchmarks

20

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Computational Chemistry Kernels - Direct-SCF. Performance relative to the HP RX5670 Itanium2/1.5GHz 45

Pentium 4 Xeon / 3066 38

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600

43

AMD Opteron 850/ 2400

NEC NECSX-6i SX-6i::99

52 45

AMD Opteron 248/ 2200

59

SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H

51

HP RX2600 Itanium2/1500-L

44

HP RX5670 Itanium2/1500-H

100 113

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

40

Compaq Alpha ES45/1250

+Oinlinebudget=1000000 F90 V2.8 158)

49

SUN Blade 2000 / 1056 Cu

26 27

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

39

IBM eServer p5 570 / 1900

61

IBM p-series 690/1700

47

0

Computational Chemistry Benchmarks

50

21

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Chemistry Kernels - Molecular Dynamics. Performance relative to the HP RX5670 Itanium2/1.5GHz 76

Pentium 4 Xeon / 3066

79

Pentium 4 Xeon (EM64T)/ 3200

90

Pentium 4 Xeon (EM64T)/ 3600

89

AMD Opteron 850/ 2400 79

AMD Opteron 248/ 2200

152

SGI Altix3700 Itanium2/1500 87

HP RX4640 Itanium2/1300-H

133

HP RX2600 Itanium2/1500-L 100

HP RX5670 Itanium2/1500-H

112

HP RX1620 Itanium2/1600-H

123

Intel Tiger Itanium2/1500 69

Compaq Alpha ES45/1250 32

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i::48 48

38

SGI Origin3800/R14k-600

59

HP 9000/RP3440-1000

104

IBM eServer p5 570 / 1900 79

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

22

100

150

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Chemistry Kernels - Quantum Monte Carlo. Performance relative to the HP RX5670 Itanium2/1.5GHz 122

Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200

153

Pentium 4 Xeon (EM64T)/ 3600

175

AMD Opteron 850/ 2400

178 163

AMD Opteron 248/ 2200 83

SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H

87

HP RX2600 Itanium2/1500-L

80

HP RX5670 Itanium2/1500-H

100 108

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

79

Compaq Alpha ES45/1250

78

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i::15 15

51 46

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

65

IBM eServer p5 570 / 1900

102

IBM p-series 690/1700

79

0

Computational Chemistry Benchmarks

50

100

23

150

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Chemistry Kernels - Jacobi Solver. Performance relative to the HP RX5670 Itanium2/1.5GHz 55

Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200

67 70

Pentium 4 Xeon (EM64T)/ 3600 AMD Opteron 850/ 2400

48 43

AMD Opteron 248/ 2200

115

SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H

100

HP RX2600 Itanium2/1500-L

99

HP RX5670 Itanium2/1500-H

100 114

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

93

Compaq Alpha ES45/1250

28

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i::533 533

24 12

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

64

IBM eServer p5 570 / 1900

133 76

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

24

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

STREAM: Measured Sustainable Memory Bandwidth in HPC (TRIAD) http://www.cs.virginia.edu/stream

76

Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200

92

Pentium 4 Xeon (EM64T)/ 3600

91 69

AMD Opteron 850/ 2400

80

AMD Opteron 248/ 2200 SGI Altix3700 Itanium2/1500

103

HP RX4640 Itanium2/1300-H

102 107

HP RX2600 Itanium2/1500-L 100

HP RX5670 Itanium2/1500-H

129

HP RX1620 Itanium2/1600-H 100

Intel Tiger Itanium2/1500

SGI SGIO2/R12k-270 O2/R12k-270: :2% 2%

54

Compaq Alpha ES45/1250 26

SUN Blade 2000 / 1056 Cu

NEC NECSX-6i SX-6i:: 712 712% %

17

SGI Origin3800/R14k-600

105

HP 9000/RP3440-1000

149

IBM eServer p5 570 / 1900 IBM p-series 690/1700

156

0

Computational Chemistry Benchmarks

50

100

25

150

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

STREAM: Sustainable Memory Bandwidth (TRIAD) per process Pentium 4 Xeon / 3066

4 CPU 2 CPU 1 CPU

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600 AMD Opteron 248/ 2200 Intel Tiger Itanium2/1500 HP RX5670 Itanium2/1500 HP RX1620 Itanium2/1600-H SGI Altix3700 Itanium2/1500 Compaq Alpha ES45/1250 HP 9000/RP3440-1000 IBM eServer p5 590/1650 IBM p-series 690/1300 0

Computational Chemistry Benchmarks

1000

2000

3000

MBytes / Sec 26

4000

5000

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Computational Chemistry Kernels. Performance relative to the HP RX5670 Itanium2/1.5GHz 75

Pentium 4 Xeon / 3066

84

Pentium 4 Xeon (EM64T)/ 3200

94

Pentium 4 Xeon (EM64T)/ 3600

93

AMD Opteron 850/ 2400 86

AMD Opteron 248/ 2200

102

SGI Altix3700 Itanium2/1500 81

HP RX4640 Itanium2/1300-H

89

HP RX2600 Itanium2/1500-L

100

HP RX5670 Itanium2/1500-H

112

HP RX1620 Itanium2/1600-H 84

Intel Tiger Itanium2/1500 56

Compaq Alpha ES45/1250 33

SUN Blade 2000 / 1056 Cu

NEC SX-6i: 151%

30

SGI Origin3800/R14k-600

57

HP 9000/RP3440-1000

100

IBM eServer p5 570 / 1900 70

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

27

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

GAMESS-UK and DL_POLY Benchmarks 12 Typical QC Calculations

Six Typical Simulations Simulation

Module

Basis (GTOs) Species

1. SCF STO-3G (124) 2. SCF 6-31G (154) 3. ECP Geometry ECPDZ (70) 4. Direct-SCF 6-31G (82) 5. CAS-geometry TZVP (52) 6. MCSCF EXT1 (74) 7. Direct-CI EXT2 (64) 8. MRD-CI (26M) ECP (59) 9. MP2-geometry 6-31G* (70) 10. SCF 2nd derivs.6-31G (64) 11. MP2 2nd derivs.6-31G* (60) 12. Direct-MP2 DZP (76)

Computational Chemistry Benchmarks

Morphine C6H3(NO2)3 + Na7Mg Cytosine H2CO H2CO H2CO/H2+CO TiCl4 H3SiNCO C5H5N C4 C5H5N

28

Atoms Time steps

1. Na-K disilicate glass 1080 2. Metallic Al with 256 Sutton-Chen potential 3. Valinomycin in 1223 3837 water molecules 4. Dynamic shell model 768 water with 1024 sites 5. Dynamic shell model 768 MgCl2 with 1280 sites 6. Model membrane, 2 3148 membrane chains, 202 solute and 2746 solvent molecules

300 8000 100 1000 1000 1000

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

The GAMESS-UK Benchmark I. CPU Performance relative to the HP RX5670 Itanium2/1.5GHz 78

Pentium 4 Xeon / 3066

91

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600

102

AMD Opteron 850/ 2400

118 110

AMD Opteron 248/ 2200 95

SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H

85

HP RX2600 Itanium2/1500-L

94

HP RX5670 Itanium2/1500-H

100 105

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

93

Compaq Alpha ES45/1250

58

SUN Blade 2000 / 1056 Cu

40 29

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

72

2.3 minutes 118

IBM eServer p5 570 / 1900 IBM p-series 690/1700

100

0

Computational Chemistry Benchmarks

50

29

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPECfp2000 and the GAMESS-UK Benchmark Pentium 4 Xeon / 3066

78

52

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600

GAMESS-UK

91

65

102

72

AMD Opteron 850/ 2400

118

78

AMD Opteron 248/ 2200

110

81 95

SGI Altix3700 Itanium2/1500

Performance relative to the HP RX5670 Itanium2 / 1.5 GHz

HP RX4640 Itanium2/1300-H HP RX2600 Itanium2/1500-L HP RX5670 Itanium2/1500-H

100

85 86 94

101 100 100 105

HP RX1620 Itanium2/1600-H 93

Intel Tiger Itanium2/1500 58

Compaq Alpha ES45/1250

65

SGI Origin3800/R14k-600

23

97

SPECfp2000

40 39

SUN Blade 2000 / 1056 Cu

128

29

HP 9000/RP3440-1000

47

72 111

IBM eServer p5 570 / 1900 IBM p-series 690/1700

81

0

Computational Chemistry Benchmarks

50

30

128

100

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

The GAMESS-UK-99 Benchmark 10 Typical QC Calculations Module

Basis (GTOs)

1. Direct- SCF 2. SCF 3. DFT B3LYP 4. MCSCF 5. Direct-CI 6. CCSD(T) 7. MP2-geometry 8. SCF 2nd derivs. 9. MP2 2nd derivs. 10. Direct-MP2

Benchmark Time:

6-31G (227) 6-31G** (265) 6-311G* (167) CC-PVTZ (100) CC-PVTZ (100) TZV+2d+1f (144) TZVP (105) 6-311G** (144) TZVP(C2d) (104) 6-31G* (130)

Species Morphine C6H3(NO2)3 Cytosine H2CO H2CO/H2+CO C4 H3SiNCO C5H5N C4 Cytosine

30.8 minutes on IBM p-series 690/pwr4 1.3 GHz

Computational Chemistry Benchmarks

31

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

The GAMESS-UK-99 Benchmark Performance relative to the HP RX5670 Itanium2/1.5GHz 69

Pentium 4 Xeon / 3066

83

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600

95

AMD Opteron 850/ 2400

95 91

AMD Opteron 248/ 2200

92

SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H

85

HP RX2600 Itanium2/1500-L

92

HP RX5670 Itanium2/1500-H

100 101

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

86

Compaq Alpha ES45/1250

59

SUN Blade 2000 / 1056 Cu

33 28

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

70

IBM eServer p5 570/1900

115

IBM p-series 690/1700

93

0

Computational Chemistry Benchmarks

50

32

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

The DL_POLY Benchmark. Performance relative to the HP RX5670 Itanium2/1.5GHz 55

Pentium 4 Xeon / 3066

72

Pentium 4 Xeon (EM64T)/ 3200 Pentium 4 Xeon (EM64T)/ 3600

81

AMD Opteron 850/ 2400

85 78

AMD Opteron 248/ 2200

106

SGI Altix3700 Itanium2/1500 HP RX4640 Itanium2/1300-H

86

HP RX2600 Itanium2/1500-L

108

HP RX5670 Itanium2/1500-H

100 124

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

111

Compaq Alpha ES45/1250

61

SUN Blade 2000 / 1056 Cu

30 38

SGI Origin3800/R14k-600 HP 9000/RP3440-1000

52

IBM eServer p5 570 / 1900

72

IBM p-series 690/1700

63

0

Computational Chemistry Benchmarks

50

33

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPECfp2000 and the DL_POLY Benchmark Pentium 4 Xeon / 3066

52

55

Pentium 4 Xeon (EM64T)/ 3200

65

Pentium 4 Xeon (EM64T)/ 3600

72 81

72

AMD Opteron 850/ 2400

85

78 78

AMD Opteron 248/ 2200

DL_POLY

81

SGI Altix3700 Itanium2/1500

100

HP RX4640 Itanium2/1300-H

86 86

Performance relative to the HP RX5670 Itanium2 / 1.5 GHz

HP RX2600 Itanium2/1500-L HP RX5670 Itanium2/1500-H

106

101 100 100

108

124 128

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

111

97 61

Compaq Alpha ES45/1250 30

SUN Blade 2000 / 1056 Cu SGI Origin3800/R14k-600

23

65

SPECfp2000

39 38

HP 9000/RP3440-1000

47

52 72

IBM eServer p5 570 / 1900 63

IBM p-series 690/1300 0

Computational Chemistry Benchmarks

50

34

128 81

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Summary Index relative to the HP RX5670 Itanium2/1.5GHz The MATRIX-97, Chemistry Kernels, GAMESS-UK and DLPOLY Benchmarks 50

Pentium 4 Xeon / 3066

75

Pentium 4 Xeon (EM64T)/ 3200

91

Pentium 4 Xeon (EM64T)/ 3600

92

AMD Opteron 850/ 2400 85

AMD Opteron 248/ 2200

99

SGI Altix3700 Itanium2/1500 85

HP RX4640 Itanium2/1300-H

95

HP RX2600 Itanium2/1500-L

100

HP RX5670 Itanium2/1500-H

119

HP RX1620 Itanium2/1600-H 93

Intel Tiger Itanium2/1500 56

Compaq Alpha ES45/1250 33

SUN Blade 2000 / 1056 Cu

32

SGI Origin3800/R14k-600

60

HP 9000/RP3440-1000

103

IBM eServer p5 570 / 1900 82

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

50

35

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Computational Chemistry Rate Benchmark z

“Rate” Benchmark suite recently developed to incorporate: „ Matrix “kernels” (MATRIX-97+) „ Application packages (e.g. GAMESS-UK-99, DL_POLY+)

z

Multi-component benchmark: „ „ „

z

Matrix Operations / Matrix multiplication and matrix diagonalisation Quantum Chemistry Calculations Molecular Dynamics Calculations

Rate Procedure: „ For each benchmark (i) run n instances at once and take elapsed time (last to finish - first to start). „ The rate for this benchmark is R_i = n X T_ref / T_i „ T_ref is the elapsed time on a reference system. (HP RX5670 Itanium2/1.5GHz scaled to a single processor (n=1) elapsed time of 100 units) „ Take “the geometric mean” of all the benchmarks (with the same n).

Computational Chemistry Benchmarks

36

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Components of Chemistry Rate Benchmark 1. MATRIX OPERATIONS (MATRIX-97+) z SPARSE Matrix Multiply BenchMark: Vector FORTRAN and DGEMM „ series of MMOs (R = A X B) are performed involving matrices of increasing order: – MATRIX-97+: 100, 200, … , 1200 (B is sparse) z

Memory bandwidth

Diagonalisation Benchmark „ performance of 7 routines from mathematical libraries and QC codes: – MATRIX-97+: 100, 200, …, 600

z

Q†HQ Benchmark „ involves use of library routines e.g. BLAS. Uses both a scalar and vector algorithm (DGEMM): – MATRIX-97+: 100, 200, … , 1000

-lblas

2. DLPOLY „ 5 simulations (increased no. of time steps) 3. GAMESS-UK „ 8 Calculations (GAMESS_UK-99 : Direct- SCF, DFT B3LYP, MCSCF, Direct-CI, MP2-geometry, SCF 2nd derivs., MP2 2nd derivs., Direct-MP2) Computational Chemistry Benchmarks

37

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: DL_POLY Component Pentium 4 Xeon / 3066

58

Pentium 4 Xeon (EM64T)/ 3200

55

75

R-DL_POLY

75

Pentium 4 Xeon (EM64T)/ 3600

85

85

AMD Opteron 248/ 2200

83

84

AMD Opteron 848/ 2200

85

84

169

106

SGI Altix3700 Itanium2/1500

106

86

HP RX4640 Itanium2/1300-H HP RX2600 Itanium2/1500-L

172 109

100

Intel Tiger Itanium2/1500

101

113

Compaq Alpha ES45/1250

63 32

31

SGI Origin3800/R14k-500

33

33

60

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

224

127

Linear increase in Rate with number of Runs

46

77

IBM eServer p5 570/1900

199

113 64

SUN Blade 2000 / 1056 Cu

212

87

110

HP RX5670 Itanium2/1500-H

1-CPU 2-CPU 4-CPU

77

151

59

117

100

200

38

300

400

500

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: DGEMM Component Pentium 4 Xeon / 3066 Pentium 4 Xeon (EM64T)/ 3200

R-DGEMM

58

78

Pentium 4 Xeon (EM64T)/ 3600

95

99

AMD Opteron 248/ 2200

64

AMD Opteron 848/ 2200

61

SGI Altix3700 Itanium2/1500

65 125

62

187

95

96

HP RX4640 Itanium2/1300-H

167

85

87

HP RX2600 Itanium2/1500-L

97

94

HP RX5670 Itanium2/1500-H

100

100

Intel Tiger Itanium2/1500

39

39

SUN Blade 2000 / 1056 Cu

31

195 170

86

88

Compaq Alpha ES45/1250

Linear increase in Rate with number of Runs

76

29

SGI Origin3800/R14k-500 16 16 32 IBM eServer p5 570/1900 78

0

Computational Chemistry Benchmarks

192

109

112

IBM p-series 690/1700

1-CPU 2-CPU 4-CPU

86

92

157

78

100

200

39

300

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: MMO FORTRAN Pentium 4 Xeon / 3066

45

Pentium 4 Xeon (EM64T)/ 3200

R-MMO

62

Pentium 4 Xeon (EM64T)/ 3600

70 50

AMD Opteron 248/ 2200

58

39

AMD Opteron 848/ 2200

21 13

SGI Altix3700 Itanium2/1500

78

45

HP RX4640 Itanium2/1300-H

80

36

HP RX2600 Itanium2/1500-L

49

135

40

HP RX5670 Itanium2/1500-H

100

Intel Tiger Itanium2/1500

32

Compaq Alpha ES45/1250

27

99

26

Impact of Memory Bandwidth

44 138

IBM eServer p5 570/1900

132

86

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

43

52 52

23 20

SGI Origin3800/R14k-500

52

31

52

SUN Blade 2000 / 1056 Cu

1-CPU 2-CPU 4-CPU

83 100

40

150 167

200

300

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: Diagonalisation Component Pentium 4 Xeon / 3066

29

Pentium 4 Xeon (EM64T)/ 3200

7

43

R-Diagonalisation

21

Pentium 4 Xeon (EM64T)/ 3600

49

AMD Opteron 248/ 2200

47

38

AMD Opteron 848/ 2200

44

46

SGI Altix3700 Itanium2/1500

70

85

141 82

55

HP RX2600 Itanium2/1500-L

59

Intel Tiger Itanium2/1500 Compaq Alpha ES45/1250

35

SUN Blade 2000 / 1056 Cu

24

100 58

36

199 118

69

Decline in Rate with 32-bit commodity solutions

24

19 18

37

70

IBM eServer p5 570/1900

51

IBM p-series 690/1700

162

54 100

HP RX5670 Itanium2/1500-H

0

Computational Chemistry Benchmarks

1-CPU 2-CPU 4-CPU

68

71

HP RX4640 Itanium2/1300-H

SGI Origin3800/R14k-500

24

69 49

111 100

100

41

200

300

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: GAMESS-UK Component Pentium 4 Xeon / 3066

67

Pentium 4 Xeon (EM64T)/ 3200

40

81

Pentium 4 Xeon (EM64T)/ 3600

91

AMD Opteron 248/ 2200

86

AMD Opteron 848/ 2200

88

SGI Altix3700 Itanium2/1500

87

HP RX4640 Itanium2/1300-H

84

HP RX2600 Itanium2/1500-L

89

HP RX5670 Itanium2/1500-H

100

53 36 59

Compaq Alpha ES45/1250 SUN Blade 2000 / 1056 Cu

33

103

75

114

87 96

178

70

62

53

98 103

General decline in Rate with number of Runs

32

SGI Origin3800/R14k-500 16 20 15 116

IBM eServer p5 570/1900

85

IBM p-series 690/1700 0

Computational Chemistry Benchmarks

1-CPU 2-CPU 4-CPU

85

83

80

Intel Tiger Itanium2/1500

R-GAMESS-UK

26

111 51

100

173

40

200

42

300

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Computational Chemistry Rate Metric z

Chemistry Rate Metric: „ For each component benchmark (i) run n instances at once and take elapsed time (last to finish - first to start). „ Component rate R_i = n X T_ref / T_i „ T_ref is the elapsed time on the HP RX5670 Itanium2/1.5 GHz scaled to a single CPU time of 100 units.

R_CHEM = 0.1 X R_DGEMM + 0.1 X R_Diagonalisation + 0.4 X R_DLPOLY + 0.4 X R_GAMESS-UK

Computational Chemistry Benchmarks

43

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Chemistry Rate Benchmark Pentium 4 Xeon / 3066

61

Pentium 4 Xeon (EM64T)/ 3200

44

76

Pentium 4 Xeon (EM64T)/ 3600

85

AMD Opteron 248/ 2200

79

AMD Opteron 848/ 2200

80

1-CPU 2-CPU 4-CPU

51 67 58 68

121

94

SGI Altix3700 Itanium2/1500

92

85

HP RX4640 Itanium2/1300-H

82

HP RX2600 Itanium2/1500-L

95

HP RX5670 Itanium2/1500-H

100

Intel Tiger Itanium2/1500

99

32 23

SGI Origin3800/R14k-500

54

0

Computational Chemistry Benchmarks

R-CHEM

30 25

71

IBM p-series 690/1700

158 106

31

95

IBM eServer p5 570/1900

190

87

58

SUN Blade 2000 / 1056 Cu

147 93

92

Compaq Alpha ES45/1250

159

93 57

160 88

100

44

200

300

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Evaluation and Benchmarking of Highend and Commodity-based Sytems Parallel Systems and Capacity Computing

Computational Chemistry Benchmarks

45

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

High-End Systems Evaluated z

Cray T3E/1200E ( … historical … ) „ 816 processor system at Manchester (CSAR), 600 Mz Alpha EV56 CPU, 256 MB

z

IBM pseries 690 and pseries 690+ (Daresbury) „ IBM p690 (8-way LPAR’d nodes, 1280 X 1.3 GHz CPUs with colony, HPCx) „ IBM p690+ (32-way nodes, 1600 X 1.7 GHz CPUs with HPS, HPS HPCx- Phase2)

z

Compaq AlphaServer SC „ 4-way ES40/667 A21264A (APAC) and 833 MHz SMP nodes (2 GB RAM); „ TCS1 system at PSC (750 4-way ES45 nodes - 3,000 EV68 CPUs - 4 GB memory per node, 8MB L2 cache), Quadrics interconnect (5 usec latency, 250 MB/sec B/W)

z

SGI Origin 3800 „ SARA (1000 CPUs) - Numalink with MIPS R14k/500 CPUs

z

SGI Altix 3700 „ Linux Cluster - Numalink with Itanium 2 1.3 GHz CPUs, 3MB L3 cache z CSAR (“newton” 512 CPUs) and SARA (“aster” - 416 CPUs - 7 nodes) „ ORNL (“ram” 256 CPUs with Itanium 2 1.5 GHz CPUs, 6MB L3 cache)

Computational Chemistry Benchmarks

46

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Commodity Systems (CSx) Prototype / Evaluation Hardware Systems

Location

CPUs

Configuration

CS1 CS2

Daresbury Daresbury

32 64

CS3 CS4 CS6

RAL Sara CLiC

16 32 528

CS7 CS8 CS9

Daresbury NCSA Bristol

64 320 96

PentiumIII / 450 MHz + FE (EPSRC) 24 X dual UP2000/EV67-667, QSNet Alpha/LINUX cluster, 8 X dual CS20/EV67-833 (“loki”) Athlon K7 850MHz + myrinet Athlon K7 1.2 GHz + FE PentiumIII / 800 MHz; fast ethernet (Chemnitzer Cluster) AMD K7/1000 MP + SCALI/SCI (“ukcp”) 160 dual IBM Itanium/800 + Myrinet 2k (“titan”) Pentium4 Xeon/2000 + Myrinet 2k (“dirac”)

10 16

10 CPUS, Pentium II/266 8 X dual Pentium III/933, SCALI

Prototype Systems CS0 CS5

Daresbury Daresbury

Computational Chemistry Benchmarks

47

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Commodity Systems (CSx) II. Systems

Location

CPUs

CS10

Hull

64

CS11 CS12

Workstations 32 Essex 48

CS13

256

CS14 CS15 CS16 CS17

White Rose, Leeds NCSA SDSC SDSC Daresbury

CS18 CS19

Bradford Daresbury

78 64

CS20

RAL

256

Computational Chemistry Benchmarks

1024 128 256 32

Configuration Pentium4 Xeon/2667 + Myrinet 2k (“eagle”), Streamline/SCORE Pentium4 Xeon/2667 + GbitEther, ScaMPI Pentium4 Xeon/2400 + GbitEther (“sstream1”), Streamline/SCORE Pentium4 Xeon/2200-2400 + M2k (“snowdon”), Streamline/SCORE Pentium III Xeon/1000 + M2k (“platinum”) Pentium III Xeon/ 800 + M2k (“meteor”) dual-Itanium2/1.3 GHz + M2k (“Teragrid”) Pentium4 Xeon/2667 + GbitEther (“ccp1”), Streamline/SCORE Pentium4 Xeon/2800 + M2k/GbitE (“grendel”) dual-Opteron/246 2.0 GHz nodes + Infiniband, Gbit and SCI (“scaliwag”) dual-Opteron/248 2.2 GHz nodes + Myrinet (“scarf”) 48

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Applications Performance Overview • •

Serial (SPEC, DL) & Communication Benchmarks Parallel Applications Performance

1. Computational Chemistry: Molecular Simulation & Electronic Structure 2. Computational Materials Science 3. Atomic & Molecular Physics 4. Computational Engineering 5. Environmental Modelling

z z z z

Capacity-based group solution Issues of Cost effectiveness On e.g. 128-256 CPU cluster, modal job size is ~ 32 CPUs Increasing trend to hierarchical clusters - Gbit network with HEC core (with e.g. myrinet)

Capability and Capacity Computing Commodity vs. Proprietary Solutions Computational Chemistry Benchmarks

49

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Performance Metrics: 1999-2001 Attempted to quantify delivered performance from the Commodity-based systems against MPP (CSAR Cray T3E/1200E) and ASCI-style SMP-node platforms (e.g. SGI Origin 3800) i.e.

Performance Metric (% 32-node Cray T3E) T (32-nodes Cray T3E/1200E) / T (32 CPUs ) CSx [ T 32-node T3E / T 32-node CS1 Pentium III/450 + FE ] T 32-node T3E / T 32-node CS6 Pentium III/800 + FE T 32-node T3E / T 32-CPU CS2 Alpha Linux Cluster + Quadrics

Performance Metrics: 2002 Performance Metric (% 32-node AlphaServer SC [PSC]) T (32-CPUs AlphaServer SC ES45/1000) / T (32 CPUs ) CSx T 32-CPU AlphaServer ES45 / T 32-CPU CS9 Pentium 4 Xeon / 2000 + Myrinet 2k Computational Chemistry Benchmarks

50

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Commodity Comparisons with High-end Systems % of 32 CPUs of Compaq AlphaServer SC ES45/1000 Cluster CS9 Pentium4/2000 Xeon + Myrinet

CS9 - 66% of ES45/1000

ANGUS (288) ANGUS (144) CPMD CASTEP CHARMM DLPOLY_2 (Macromolecular)

2002

DLPOLY-2 (Ionic) GAMESS-UK +CHARMM NWChem (DFT Jfit) GAMESS-UK (HF Forces) GAMESS-UK (MP2 gradient) GAMESS-UK (DFT gradient) GAMESS-UK (DFT-Jfit) GAMESS-UK (DFT) GAMESS-UK (SCF) 0

Computational Chemistry Benchmarks

20

40

51

60

80

100

120

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Performance Metrics: 2004 Attempt to quantify delivered performance from current Commodity-based systems against high-end platforms: SGI Altix 3700/Itanium2 1.3 GHz (CSAR service) and IBM p690+ (HPCx service) i.e.

Performance Metric (% 32-node SGI Altix 3700/Itanium 2 1.3 GHz)

T (32-CPUs SGI Altix 3700/Itanium2 1.3GHz) / T (32 CPUs ) CSx T 32-CPU SGI Altix 3700 / T 32-CPU CS20 Opteron 248 / 2800 + Myrinet 2k T 32-CPU SGI Altix 3700 / T 32-CPU CS18 Pentium 4 Xeon / 2800 + Gbit Ethernet

Computational Chemistry Benchmarks

52

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

DL_POLY V2: High-end and Commodity-based Systems Bench 4: NaCl; 27,000 ions, Ewald , 75 time steps, Cutoff=24Å

Performance 12 10 8

CS12 P4/2400 + GbitE CS10 P4/2666 + Myrinet CS18 P4/2800 + GbitE CS18 P4/2800 + Myrinet CS19 Opteron246/2.0 + IB CS19 Opteron246/2.0 + SCI CS20 Opteron246/2.2 + Myrinet CS16 Itanium2/1300 + Myrinet SGI Altix 3700/Itanium2 1300

9.1

6.7 6.9

7.4

5.6

6

30%,68%

5.0 4.0

4 2

10.9

3.3 1.7 1.8

3.6 3.1

3.3 3.5

T3E128 = 2.13

1.9

0 16

Number of CPUs

Computational Chemistry Benchmarks

53

32 9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Commodity Comparisons with High-end Systems % of 32 CPUs of SGI Altix 3700 / 1300

CS18 - 49% of Altix

Cluster CS18 Pentium4/2800 Xeon + Gbit Ethernet PCHAN ANGUS DLMULTI DL_POLY-3 (Macromolecular) DLPOLY-3 (Ionic) DLPOLY_2 (Macromolecular) DLPOLY-2 (Ionic) NWChem (DFT Jfit) GAMESS-UK (DFT Forces) GAMESS-UK (HF Forces) GAMESS-UK (MP2 gradient) GAMESS-UK (DFT gradient) GAMESS-UK (DFT-Jfit) GAMESS-UK (DFT) GAMESS-UK (SCF)

0

Computational Chemistry Benchmarks

20

40

54

60

80

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Commodity Comparisons with High-end Systems % of 32 CPUs of SGI Altix 3700 / 1300 Cluster CS20 AMD Opteron 248/2.2 + Myrinet

CS20 - 100% of Altix

PCHAN ANGUS CPMD DLMULTI DL_POLY-3 (Macromolecular) DLPOLY-3 (Ionic) DLPOLY_2 (Macromolecular) DLPOLY-2 (Ionic) NWChem (DFT Jfit) GAMESS-UK (DFT Forces) GAMESS-UK (HF Forces) GAMESS-UK (MP2 gradient) GAMESS-UK (DFT gradient) GAMESS-UK (DFT-Jfit) GAMESS-UK (DFT) GAMESS-UK (SCF)

0

20

Computational Chemistry Benchmarks

40

60

55

80

100

120

140

160

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SUMMARY z

Processor Performance Overview „ Single-processor performance, Performance Metrics

z

SPECfp and Computational Chemistry Benchmark (serial) „

z

Matrix’89 and Matrix’97 kernels (MMO, diagonalisation), Application kernels” (SCF, MD, QMC and JACOBI + STREAM), Application packages (GAMESS-UK, DL_POLY)

Increasing importance of Rate-based benchmarks: „ SPECfp_rate and Chemistry Rate Benchmark

z

Machine COSTS vs. Performance: URLs: „ Powerpoint presentation and Paper: http://www.cse.clrc.ac.uk/disco/hw-perf.shtml

z

Benchmarking & Evaluation of Commodity-based Systems „ CS20 Opteron248/2200 cluster with Myrinet delivers on average 100% of SGI Altix 3700/1300. „ Cost effectiveness of the Clusters (e.g., CS18) with Gbit Ethernet - delivers 49% of the SGI Altix 3700/1300.

Computational Chemistry Benchmarks

56

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Fortran Compilers - Performance Performance relative to GNU g77 on Pentium 4 Xeon/2666 OVERALL

G77: v0.5.26 pgf90: v4.0-2 ifc: v7.1

GAMES S -UK S TREAM JACOBI QMC Ke rne l MD Ke rne l S CF Ke rne l Diag -97 QHQ-97 MMO-97 Whe ts to ne -97 0

50

Computational Chemistry Benchmarks

100

150

57

200

250

300

350

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: Diagonalisation Component AMD MP2000+ / 1667 13 AMD Opteron 244 / 1800

6

33

AMD Opteron 848 / 2200

43

Pentium4 Xeon / 3066

R-Diagonalisation

29 30

37

7

29

HP RX2600 Itanium2/1000-H

64

HP RX4640 Itanium2/1300-H

64

85

SGI Altix 3700 Itanium2/1300

42

Intel Tiger Itanium2/1500

83 41 58

117

100

Compaq Alpha ES45/1250

35

SUN Blade 2000 / 1056 Cu

24

SUN FireV880 / 900 Cu

21

SGI Origin3800/R14k-500

199

69

Decline in Rate with 32-bit commodity solutions

24 21

42 37

50

IBM p-series 690/1300

100

35

19 18

IBM p-series 690/1700

161

80

59

HP RX5670 Itanium2/1500-H

1-CPU 2-CPU 4-CPU

37

0

Computational Chemistry Benchmarks

50 33

100 38

100

200

58

300

9th December 2004

400

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: DGEMM Component AMD MP2000+ / 1667

20

23

AMD Opteron 244 / 1800 AMD Opteron 848 / 2200

81

80

Intel Tiger Itanium2/1500

162 86

87

HP RX5670 Itanium2/1500-H

38

39

SUN Blade 2000 / 1056 Cu

31

SUN FireV880 / 900 Cu

28

77

Linear increase in Rate with number of Runs

28

56

32

Computational Chemistry Benchmarks

110

59

60

0

156

78

78

IBM p-series 690/1300

195

29

16 16

IBM p-series 690/1700

170 99

100

Compaq Alpha ES45/1250

166

85

87

SGI Altix 3700 Itanium2/1300

1-CPU 2-CPU 4-CPU

66

67

HP RX4640 Itanium2/1300-H

105 58

78

HP RX2600 Itanium2/1000-H

SGI Origin3800/R14k-500

58

61

Pentium4 Xeon / 3066

R-DGEMM

47

48

50

100

150

59

200

250

300

350

9th December 2004

400

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: DL_POLY Component AMD MP2000+ / 1667

35

33

AMD Opteron 244 / 1800

68

AMD Opteron 848 / 2200

80

Pentium4 Xeon / 3066

80

58

HP RX2600 Itanium2/1000-H

R-DL_POLY

67

161

55

64

64

HP RX4640 Itanium2/1300-H

86

SGI Altix 3700 Itanium2/1300

90

Intel Tiger Itanium2/1500

86

173

89

177

113

HP RX5670 Itanium2/1500-H

113

100

Compaq Alpha ES45/1250 32

SUN FireV880 / 900 Cu

27

SGI Origin3800/R14k-500

64

27

127

26

45

0

Computational Chemistry Benchmarks

Linear increase in Rate with number of Runs

54 46

60

IBM p-series 690/1300

200

31

40

IBM p-series 690/1700

224

101

64

SUN Blade 2000 / 1056 Cu

1-CPU 2-CPU 4-CPU

60 45

117 90

100

200

60

300

400

9th December 2004

500

Computational Science and Engineering Department

Daresbury Laboratory

Rate Benchmark: GAMESS-UK Component AMD MP2000+ / 1667

37

AMD Opteron 244 / 1800

53

AMD Opteron 848 / 2200

R-GAMESS-UK

34 76

Pentium4 Xeon / 3066

66

67

HP RX2600 Itanium2/1000-H

59

84

SGI Altix 3700 Itanium2/1300

75

73

Intel Tiger Itanium2/1500

138

70

97

100

Compaq Alpha ES45/1250

62

96 53

SUN Blade 2000 / 1056 Cu

33

SUN FireV880 / 900 Cu

28

26

SGI Origin3800/R14k-500

30

6 16

IBM p-series 690/1700

67

0

179 102

32

General decline in Rate with number of Runs

54

85

IBM p-series 690/1300

113

71

80

HP RX5670 Itanium2/1500-H

1-CPU 2-CPU 4-CPU

19

68

HP RX4640 Itanium2/1300-H

103

50 40

50

Computational Chemistry Benchmarks

40 51

100

150

61

200

250

300

350

9th December 2004

400

Computational Science and Engineering Department

Daresbury Laboratory

Chemistry Rate Benchmark AMD MP2000+ / 1667

19

31

AMD Opteron 244 / 1800

49

57

43

58

Pentium4 Xeon / 3066

63

66

HP RX2600 Itanium2/1000-H HP RX4640 Itanium2/1300-H

149

76

77

SUN Blade 2000 / 1056 Cu

31

SUN FireV880 / 900 Cu

27

26

SGI Origin3800/R14k-500

32

18

30

34

0

Computational Chemistry Benchmarks

94

57 72

43

52

IBM p-series 690/1300

R-CHEM

52

68

IBM p-series 690/1700

106

54

57

Compaq Alpha ES45/1250

192

98

100

HP RX5670 Itanium2/1500-H

165

89

93

Intel Tiger Itanium2/1500

152

82

85

SGI Altix 3700 Itanium2/1300

1-CPU 2-CPU 4-CPU

120

67

71

AMD Opteron 848 / 2200

50

100

150

62

200

250

300

350

400

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

SPEC CPU 2000 - SPECfp2000_base Values relative to HP RX5670 Itanium2/1.5GHz 87

Intel D925XCV2 (3.8 GHz P4) 83

AMD MSI K8N Athlon64/2.6GHz HP DL145 / Opteron 2.4 GHz

73

HP RX4640 Itanium2/1600-9M

129 100

HP RX5670 Itanium2/1500 86

HP RX5670 Itanium2/1300 HP AlphaServer GS1280 7/1300

61

Sun Fire V440 / US IIIi 1600

57

Sun Java Wrkst. (Opteron/2.4 GHz)

78 72

Fujitsu PP900 GP64/1890 SGI Origin 3200/R14k-600

24

SGI Altix 3000 Itanium 2/1300

88

SGI Altix 3000 Itanium 2/1500

101 126

SGI Altix 3000 Bx2/1600-9M HP 9000/RP3440-1000

41 122

IBM eServer p5 570/1900 IBM p-series 690/1700

76

0

Computational Chemistry Benchmarks

50

63

100

9th December 2004

Computational Science and Engineering Department

Daresbury Laboratory

The GAMESS-UK and GAMESS-UK-99 Benchmark Pentium 4 Xeon / 3066

69

Pentium 4 Xeon (EM64T)/ 3200

78 91

83

Pentium 4 Xeon (EM64T)/ 3600

95

AMD Opteron 850/ 2400

95

AMD Opteron 248/ 2200

102 118 110

91

SGI Altix3700 Itanium2/1500

Performance relative to the HP RX5670 Itanium2 / 1.5 GHz

HP RX4640 Itanium2/1300-H HP RX2600 Itanium2/1500-L

92

95

85 85 94 92 100 100

HP RX5670 Itanium2/1500-H

105 101

HP RX1620 Itanium2/1600-H Intel Tiger Itanium2/1500

86 58 59

Compaq Alpha ES45/1250 SUN Blade 2000 / 1056 Cu

33

GAMESS-UK GAMESS-UK-99

40

29 28

SGI Origin3800/R14k-600

93

72 70

HP 9000/RP3440-1000

118 115

IBM eServer p5 570/1900 IBM p-series 690/1700

93

0

Computational Chemistry Benchmarks

50

64

100

100

9th December 2004