Oct 8, 2012 - it later on when needed instead of re-computing it. â. Nothing wrong with ... Starkov elastic final stat
Reproducibility of Geant4 simulations
Alberto Ribon CERN PH/SFT
PH/SFT group meeting, 08 October 2012
Outline ●
Introduction: motivation & definition
●
Strategy: tests, tools, and methods
●
Results: Geant4 9.6, lessons
●
Conclusion & Outlook
Acknowledgement Several people in Geant4 have contributed to this work, in particular: Witek, Vladimir I., Gunter, Gabriele, and John 2
Motivation ●
Simulations use pseudorandom numbers, not truly random ●
●
●
Finite and reproducible sequence of numbers which approximate the properties of truly random numbers, completely determined by a relatively small set of initial values, called the generator state
The deterministic nature of pseudorandom numbers make possible to reproduce the same simulation over and over ●
This is necessary but not sufficient
●
There is a continuous of cases, i.e. partial reproducibility
We should strive for fully reproducible simulations ●
It guarantees the possibility of debugging rare problems
●
It is one of the “quality metrics” expected for simulation codes
●
It allows to compare parallel versus sequential simulations
3
Definition ●
Simulations usually consist of “runs” made of a set of “events” ●
●
●
●
Fully simulated p-p collision in a LHC experiment A single hadron (e.g. 20 GeV π‾) impinging on a simplified calorimeter (e.g. Fe-Sci), inducing a hadronic shower A single hadron-nucleus interaction
We can distinguish between two types of reproducibility ●
Weak or run-level reproducibility –
●
Use the random state at the beginning of a run
Strong or event-level reproducibility –
Use the random state at the beginning of an event
The goal is to achieve strong reproducibility (which of course implies the weak reproducibility)
4
Geant4 9.5 (Dec 2011) ●
●
●
●
Ideally, in any simulation project, one should start immediately to check for reproducibility; in practice, all the effort goes first in developments and then in validation... In Geant4, we were able to reproduce most of the crashes and problems we found in our tests, or reported by users. But not always. Cases of non-reproducibility were also reported, from time to time, from the LHC experiments In January this year, we finally decided to tackle the problem ●
Geant4 is mature and used heavily in huge productions
●
Growing interest in parallel detector simulations
When we started with Geant4 9.5, we found that: ●
Weak reproducibility is always valid
●
Strong reproducibility is sometimes violated, more frequently: - in our recommended physics lists (e.g. FTFP_BERT) than in LHEP; - in recent versions of Geant4 than in older ones
5
Tests of reproducibility (1/2) ●
●
Two types of strong-reproducibility tests ●
Process level: a single hadron-nucleus interaction
●
Physics-list level: a hadronic shower in a simplified calorimeter
Similar algorithm: 1. Run A : N events; for each event, save the status at the beginning, status_evt_i, and print a summary-number d_i , with i = 1, ... N for i = 1, N { 2. Run B : 1 event, starting with the status status_evt_i, and print a summary-number d'_i 3. Compare the 2 events : if d_i (@RunA) == d'_i (@RunB) then reproducibility is ok; else is violated! }
6
Tests of reproducibility (2/2) ●
Summary-number for an event : a double-precision number ●
Process level: a number computed (arbitrarily) from the 4-momenta of the secondaries produced in the hadron-nuclear interaction –
●
Physics-list level: simply a random number –
●
A kind of “hash” of the 4-momenta of the final state, e.g.: for i = 1, K { result += i * (px + py + pz – Ekin); } This is enough because of the very large numbers of random drawings needed for simulating a hadronic shower
In some cases, either for the summary-number at the process level, or for the printing of variables when debugging reproducibility (see next pages), it is necessary to print the exact hexadecimal value of the memory content of the double in order to be able to detect tiny differences between the two runs, 7 which otherwise would be cut away by the conversion to decimal...
How to debug non reproducibility ? ●
Suppose we have found a reproducibility violation, i.e. ●
RunA' : >= 2 events; summary-number of the last event d_a
●
RunB : 1 event; summary-number d_b , with d_b ≠ d_a
What do we do now to find the reproducibility problem? ●
The debugger does not help ●
●
Most of the time, there is nothing “wrong” in the content on any possible variable we could inspect for both runs
We can print information about the two events ●
the last one in RunA', and the only one in RunB, which should be the same but are actually different
and compare the output with tkdiff to find out when and how they start to differ...
8
Printing information in Geant4 ●
●
●
At the physics-list level, Geant4 provides 6 verbosity levels (that can be set via run-time commands) to get increasingly more detailed information ●
Minimum level /tracking/verbose 1 to find out the Track ID
●
Middle level
●
Maximum level /tracking/verbose 6 to find out the Physics Process
/tracking/verbose 3
to find out the Step Number
Need a script to filter out irrelevant information, otherwise tkdiff would take hours to parse multi-MB files Once the process that causes the non-reproducibility is found (this can happen immediately if a process-level test detects it), we need to write some printing statements of key quantities in key places in the process source code ●
This is where we spend most of our time!
9
How does non-reproducibility arise? ●
The two events (which violate reproducibility because not identical) ●
“EventA” : last event of RunA'
●
“EventB” : the only event of RunB
coincide exactly up to a certain moment, when a physics process produces a set of final state particles which are slightly different between the two events, e.g.
●
●
“EventA” : proton with Ekin = 35.199286783513... MeV
●
“EventB” : proton with Ekin = 35.199286783512... MeV
Even an initial very tiny difference (e.g. μeV) implies different energy depositions, directions, positions, etc. which keep growing at each step, until the two particles end up in two different volumes and/or undergo different physics processes ●
The same sequence of random randoms is used for sampling 10 different quantities... : the rest of the two events become different!
Example of differences between the two events (1/2)
11
Example of differences between the two events (2/2)
12
Patterns of non-reproducibility (1/2) Typically there is a computational expensive quantity we need (e.g. cross section), and to save CPU time we do the computation once, cache the value in a table, and then re-use it later on when needed instead of re-computing it ●
●
●
Nothing wrong with caching per se But if the cache values are history-dependent (i.e. depending on the previous events), then reproducibility can be violated In some cases, there are genuine mistakes in the cache values –
Either in the way they are computed
–
Or in the way they are retrieved (e.g. wrong index manipulation)
and in these cases not only reproducibility is violated, but even the simulation results can be wrong ●
In most cases (at least in Geant4), however, the cache values become history-dependent because of an otherwise harmless approximation, which produces statistically meaningful results13
Patterns of non-reproducibility (2/2) Examples of these “dangerous” approximations ●
●
●
cache_vector[ Z ] = function(Z, A) where the first isotope encountered for a given Z is used to compute the function, and then re-used for all other isotopes of the same element - use instead the lightest, or heaviest, or average isotope cache_vector[ bin_i ] = function( momentum ) where the momentum of the first particle is used to compute the function, instead of the center of the momentum bin... cache_vector[ bin_i ] = h1_h2_cross_section( Ekin_i ) where the center of the kinetic energy bin is assigned to the first particle to compute the h1 (projectile) – h2 (target) cross section, and then re-used also for the cross section of h2 (projectile ) - h1 (target) 14 - use instead the lightest, or heaviest of the two hadrons
Non-reproducibility fixes for G4 9.6 Up to now 11 non-reproducibility fixes have been made. Of these, 3 are needed after G4 9.5 (i.e. after January). Now Geant4 is reproducible, with two exceptions: CHIPS (deprecated in G4 9.6) and neutron HP (very slow). ●
(Decoupled) Chips quasi-elastic
●
Starkov elastic final state model for π± > 1 GeV
●
Ion ionization corrections
●
Fission in Bertini intra-nuclear cascade
●
Bertini intra-nuclear cascade, when hyperons are involved (it turned out a problem in G4PhaseSpaceDecayChannel )
●
●
●
(Decoupled) Chips hadron-nucleon inelastic cross sections, used by FTFP (2 different problems) Multiple scattering (3 different problems) Binary intra-nuclear cascade
15
Conclusion & Outlook ●
Several reproducibility violations in Geant4 have been fixed.
●
The coming release Geant4 9.6 will be reproducible ●
●
Exceptions are: CHIPS (deprecated) and neutron HP
Lessons: ●
Be careful when caching
●
Keep monitoring the reproducibility –
●
otherwise it is quickly lost by new developments!
Reproducibility tests of Geant4 are now run regularly ●
Every night (CDash Nightly)
●
More times per day, for each proposed tag (CDash Continuous)
●
16 Monthly, at each reference tag: more tests and with high statistics