Towards Reproducible Scientific Benchmarking in ...

2 downloads 45132 Views 2MB Size Report
Unit tests, some system tests in Perl scripts, hooked up to Bitten C.I. ... Pyre: “an extensible, object-oriented framework for specifying and staging complex ... Documentation, inc A.P.I. & examples, generated using Sphinx. Why Python?
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit Patrick Sunter, Wendy Sharples, Steve Quenette, Wendy Mason, Jerico Revote 9 Nov 2010, eResearch Australasia Workshop VPAC, Monash University

Computational Software Development Conversations   Geophysicist:

“How does the new Patch Recovery algorithm compare to the results in the Van Keken Paper?”   Or:   “I’m sure the multigrid version of the code was working faster

than it does with the latest checkout a couple of months back?”   These conversations matter to both researchers and software

engineers, and the computational science we do as a group

2

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Why Benchmarks focus   What are benchmarks?   Target problems with some sort of assessment metric (numerical,

scientific, computational)   In domains such as geophysics with complex physics, arguments about modelling: benchmarking key part of the field

  Benchmarks as ‘Boundary objects’ (Blackstock et al, 2009)

3

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Benchmarking, Testing & Reproducibility   Reproducibility motivations

(Fomel, Hennenfent, 2007):

  scientific integrity   Robust (& productive) software

development   Technology transfer

  Specifically re benchmarks of

a HPC App like Underworld:

  Database of what code could do,

Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel. – Randall LeVeque

when, where, how fast, with different algorithms   Records of what model ‘suites’ I ran and what pre and postprocessing necessary to get a certain result 4

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Start from where you are: StGermain to Underworld   ‘Declarative’ XML models based on with C code with

Plugins, inc gLucifer vis.   Custom scripts to configure sets of runs, postprocessing (Gnuplot, VTK)   Unit tests, some system tests in Perl scripts, hooked up to Bitten C.I. system   (See Quenette et al, 2007, ‘Explaining StGermain’ and

Moresi et al, 2007, ‘Computational approaches to studying non-linear dynamics of the crust and mantle ’)

5

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

BaseApps/ RayleighTaylor.xml SlabSubduction.xml

Inspirations & early design decisions   Madagascar “an open-source software package for multidimensional data analysis and reproducible computational experiments.”   Pyre: “an extensible, object-oriented framework for specifying and staging complex, multi-physics simulations.”   Kepler: “designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.”

  … re-use vs developing custom apps is always a trade-off   In this case light-weight integration of benchmarking, testing

and analysis was the goal …

6

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

… evolving CREDO workflow/architecture

7

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

System components   Components:   Python package CREDO (mix of OO  

     

and scripted), Matplotlib usage CREDO scripts distributed in code itself, connected to Scons configuration tool, and thus … Bitten C.I. System (report plugins) Fairly ‘agile’ methodology of small targets, freq. releases, user feedback Documentation, inc A.P.I. & examples, generated using Sphinx

  Why Python? (vs ‘swiss army

chainsaw’ of Perl):

  Relates well to the goals of CREDO:

explorative scripting, (relatively) easy refactoring into more reproducible   Scientific software ecosystem:   (MayaVI, VTK, SciPy, Numpy, Mystic) 8

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

What it looks like: system tests

9

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

System tests (2)   Declarative simple objects (e.g.

“SysTestSuite”) and lists also support reproducibility over time   Known Test Suites can have Post-hoc modification to try different solver options   And basic knowledge of Python O.O. structures assists this   E.g. testing standard Underworld sys test regression suite using multigrid solver

Underworld TestSuite (30 SysTests)

10

Multigrid mods: XMLs Solver opts Model parameters

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Benchmark: Rayleigh-Taylor convection

11

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Benchmark: thermal convection

ThermakConvBench markDim.xml 12

Model Results: - Images - Data CREDO records (XML): - Test suite ran - Models ran - Test results

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Design & development reflections   Trade-off between multiple package

goals not always easy   Creative tension b/w reproducibility & ‘explorability’   (Procedural script vs declarative objects)

  Fine-line regarding what to

expose & directly link to UW’s XML   Python’s lightweight package system facilitates separating ‘CREDO’ code from user modules   Documentation integrated closely into development 13

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

CREDO Future & eResearch bigger picture Toolkit improvement

Connections

  Current work

  Data & metadata

  Currently working on better

performance result tracking   Reporting of benchmarks (XLST, ReportLab)   Re-factor to work for computational

apps not based on StGermain   Computational cost, esp. multi-runs:   Grid (Grisu)   Parametrics -> Inversion (talk to

others)   Online file storage (e.g. ARCS) helpful also

  Output, XML records of both

models and system test / reproducibility   (AuScope grid) Python (w/ Numeric etc) should work well for data input conversion   Publishing & collaboration:

  Benchmark & analysis scripts

shareable, and potentially supports publications

  Reassess workflow tools (eg

Kepler) relation development

14

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Thank you https://www.mcc.monash.edu.au/trac/AuScopeEngineering/ wiki/CREDO https://www.mcc.monash.edu.au/credo-doc/   Questions?

15

Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit

Suggest Documents