Unit tests, some system tests in Perl scripts, hooked up to Bitten C.I. ... Pyre: âan extensible, object-oriented framework for specifying and staging complex ... Documentation, inc A.P.I. & examples, generated using Sphinx. Why Python?
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit Patrick Sunter, Wendy Sharples, Steve Quenette, Wendy Mason, Jerico Revote 9 Nov 2010, eResearch Australasia Workshop VPAC, Monash University
Computational Software Development Conversations Geophysicist:
“How does the new Patch Recovery algorithm compare to the results in the Van Keken Paper?” Or: “I’m sure the multigrid version of the code was working faster
than it does with the latest checkout a couple of months back?” These conversations matter to both researchers and software
engineers, and the computational science we do as a group
2
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Why Benchmarks focus What are benchmarks? Target problems with some sort of assessment metric (numerical,
scientific, computational) In domains such as geophysics with complex physics, arguments about modelling: benchmarking key part of the field
Benchmarks as ‘Boundary objects’ (Blackstock et al, 2009)
3
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Benchmarking, Testing & Reproducibility Reproducibility motivations
(Fomel, Hennenfent, 2007):
scientific integrity Robust (& productive) software
development Technology transfer
Specifically re benchmarks of
a HPC App like Underworld:
Database of what code could do,
Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel. – Randall LeVeque
when, where, how fast, with different algorithms Records of what model ‘suites’ I ran and what pre and postprocessing necessary to get a certain result 4
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Start from where you are: StGermain to Underworld ‘Declarative’ XML models based on with C code with
Plugins, inc gLucifer vis. Custom scripts to configure sets of runs, postprocessing (Gnuplot, VTK) Unit tests, some system tests in Perl scripts, hooked up to Bitten C.I. system (See Quenette et al, 2007, ‘Explaining StGermain’ and
Moresi et al, 2007, ‘Computational approaches to studying non-linear dynamics of the crust and mantle ’)
5
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
BaseApps/ RayleighTaylor.xml SlabSubduction.xml
Inspirations & early design decisions Madagascar “an open-source software package for multidimensional data analysis and reproducible computational experiments.” Pyre: “an extensible, object-oriented framework for specifying and staging complex, multi-physics simulations.” Kepler: “designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.”
… re-use vs developing custom apps is always a trade-off In this case light-weight integration of benchmarking, testing
and analysis was the goal …
6
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
… evolving CREDO workflow/architecture
7
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
System components Components: Python package CREDO (mix of OO
and scripted), Matplotlib usage CREDO scripts distributed in code itself, connected to Scons configuration tool, and thus … Bitten C.I. System (report plugins) Fairly ‘agile’ methodology of small targets, freq. releases, user feedback Documentation, inc A.P.I. & examples, generated using Sphinx
Why Python? (vs ‘swiss army
chainsaw’ of Perl):
Relates well to the goals of CREDO:
explorative scripting, (relatively) easy refactoring into more reproducible Scientific software ecosystem: (MayaVI, VTK, SciPy, Numpy, Mystic) 8
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
What it looks like: system tests
9
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
System tests (2) Declarative simple objects (e.g.
“SysTestSuite”) and lists also support reproducibility over time Known Test Suites can have Post-hoc modification to try different solver options And basic knowledge of Python O.O. structures assists this E.g. testing standard Underworld sys test regression suite using multigrid solver
Underworld TestSuite (30 SysTests)
10
Multigrid mods: XMLs Solver opts Model parameters
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Benchmark: Rayleigh-Taylor convection
11
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Benchmark: thermal convection
ThermakConvBench markDim.xml 12
Model Results: - Images - Data CREDO records (XML): - Test suite ran - Models ran - Test results
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Design & development reflections Trade-off between multiple package
goals not always easy Creative tension b/w reproducibility & ‘explorability’ (Procedural script vs declarative objects)
Fine-line regarding what to
expose & directly link to UW’s XML Python’s lightweight package system facilitates separating ‘CREDO’ code from user modules Documentation integrated closely into development 13
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
CREDO Future & eResearch bigger picture Toolkit improvement
Connections
Current work
Data & metadata
Currently working on better
performance result tracking Reporting of benchmarks (XLST, ReportLab) Re-factor to work for computational
apps not based on StGermain Computational cost, esp. multi-runs: Grid (Grisu) Parametrics -> Inversion (talk to
others) Online file storage (e.g. ARCS) helpful also
Output, XML records of both
models and system test / reproducibility (AuScope grid) Python (w/ Numeric etc) should work well for data input conversion Publishing & collaboration:
Benchmark & analysis scripts
shareable, and potentially supports publications
Reassess workflow tools (eg
Kepler) relation development
14
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit
Thank you https://www.mcc.monash.edu.au/trac/AuScopeEngineering/ wiki/CREDO https://www.mcc.monash.edu.au/credo-doc/ Questions?
15
Towards Reproducible Scientific Benchmarking in Geophysics: the CREDO Toolkit