Reproducibility of Geant4 simulations - CERN Indico

Reproducibility of Geant4 simulations

Alberto Ribon CERN PH/SFT

PH/SFT group meeting, 08 October 2012

Outline ●

Introduction: motivation & definition

●

Strategy: tests, tools, and methods

●

Results: Geant4 9.6, lessons

●

Conclusion & Outlook

Acknowledgement Several people in Geant4 have contributed to this work, in particular: Witek, Vladimir I., Gunter, Gabriele, and John 2

Motivation ●

Simulations use pseudorandom numbers, not truly random ●

●

●

Finite and reproducible sequence of numbers which approximate the properties of truly random numbers, completely determined by a relatively small set of initial values, called the generator state

The deterministic nature of pseudorandom numbers make possible to reproduce the same simulation over and over ●

This is necessary but not sufficient

●

There is a continuous of cases, i.e. partial reproducibility

We should strive for fully reproducible simulations ●

It guarantees the possibility of debugging rare problems

●

It is one of the “quality metrics” expected for simulation codes

●

It allows to compare parallel versus sequential simulations

3

Definition ●

Simulations usually consist of “runs” made of a set of “events” ●

●

●

●

Fully simulated p-p collision in a LHC experiment A single hadron (e.g. 20 GeV π‾) impinging on a simplified calorimeter (e.g. Fe-Sci), inducing a hadronic shower A single hadron-nucleus interaction

We can distinguish between two types of reproducibility ●

Weak or run-level reproducibility –

●

Use the random state at the beginning of a run

Strong or event-level reproducibility –

Use the random state at the beginning of an event

The goal is to achieve strong reproducibility (which of course implies the weak reproducibility)

4

Geant4 9.5 (Dec 2011) ●

●

●

●

Ideally, in any simulation project, one should start immediately to check for reproducibility; in practice, all the effort goes first in developments and then in validation... In Geant4, we were able to reproduce most of the crashes and problems we found in our tests, or reported by users. But not always. Cases of non-reproducibility were also reported, from time to time, from the LHC experiments In January this year, we finally decided to tackle the problem ●

Geant4 is mature and used heavily in huge productions

●

Growing interest in parallel detector simulations

When we started with Geant4 9.5, we found that: ●

Weak reproducibility is always valid

●

Strong reproducibility is sometimes violated, more frequently: - in our recommended physics lists (e.g. FTFP_BERT) than in LHEP; - in recent versions of Geant4 than in older ones

5

Tests of reproducibility (1/2) ●

●

Two types of strong-reproducibility tests ●

Process level: a single hadron-nucleus interaction

●

Physics-list level: a hadronic shower in a simplified calorimeter

Similar algorithm: 1. Run A : N events; for each event, save the status at the beginning, status_evt_i, and print a summary-number d_i , with i = 1, ... N for i = 1, N { 2. Run B : 1 event, starting with the status status_evt_i, and print a summary-number d'_i 3. Compare the 2 events : if d_i (@RunA) == d'_i (@RunB) then reproducibility is ok; else is violated! }

6

Tests of reproducibility (2/2) ●

Summary-number for an event : a double-precision number ●

Process level: a number computed (arbitrarily) from the 4-momenta of the secondaries produced in the hadron-nuclear interaction –

●

Physics-list level: simply a random number –

●

A kind of “hash” of the 4-momenta of the final state, e.g.: for i = 1, K { result += i * (px + py + pz – Ekin); } This is enough because of the very large numbers of random drawings needed for simulating a hadronic shower

In some cases, either for the summary-number at the process level, or for the printing of variables when debugging reproducibility (see next pages), it is necessary to print the exact hexadecimal value of the memory content of the double in order to be able to detect tiny differences between the two runs, 7 which otherwise would be cut away by the conversion to decimal...

How to debug non reproducibility ? ●

Suppose we have found a reproducibility violation, i.e. ●

RunA' : >= 2 events; summary-number of the last event d_a

●

RunB : 1 event; summary-number d_b , with d_b ≠ d_a

What do we do now to find the reproducibility problem? ●

The debugger does not help ●

●

Most of the time, there is nothing “wrong” in the content on any possible variable we could inspect for both runs

We can print information about the two events ●

the last one in RunA', and the only one in RunB, which should be the same but are actually different

and compare the output with tkdiff to find out when and how they start to differ...

8

Printing information in Geant4 ●

●

●

At the physics-list level, Geant4 provides 6 verbosity levels (that can be set via run-time commands) to get increasingly more detailed information ●

Minimum level /tracking/verbose 1 to find out the Track ID

●

Middle level

●

Maximum level /tracking/verbose 6 to find out the Physics Process

/tracking/verbose 3

to find out the Step Number

Need a script to filter out irrelevant information, otherwise tkdiff would take hours to parse multi-MB files Once the process that causes the non-reproducibility is found (this can happen immediately if a process-level test detects it), we need to write some printing statements of key quantities in key places in the process source code ●

This is where we spend most of our time!

9

How does non-reproducibility arise? ●

The two events (which violate reproducibility because not identical) ●

“EventA” : last event of RunA'

●

“EventB” : the only event of RunB

coincide exactly up to a certain moment, when a physics process produces a set of final state particles which are slightly different between the two events, e.g.

●

●

“EventA” : proton with Ekin = 35.199286783513... MeV

●

“EventB” : proton with Ekin = 35.199286783512... MeV

Even an initial very tiny difference (e.g. μeV) implies different energy depositions, directions, positions, etc. which keep growing at each step, until the two particles end up in two different volumes and/or undergo different physics processes ●

The same sequence of random randoms is used for sampling 10 different quantities... : the rest of the two events become different!

Example of differences between the two events (1/2)

11

Example of differences between the two events (2/2)

12

Patterns of non-reproducibility (1/2) Typically there is a computational expensive quantity we need (e.g. cross section), and to save CPU time we do the computation once, cache the value in a table, and then re-use it later on when needed instead of re-computing it ●

●

●

Nothing wrong with caching per se But if the cache values are history-dependent (i.e. depending on the previous events), then reproducibility can be violated In some cases, there are genuine mistakes in the cache values –

Either in the way they are computed

–

Or in the way they are retrieved (e.g. wrong index manipulation)

and in these cases not only reproducibility is violated, but even the simulation results can be wrong ●

In most cases (at least in Geant4), however, the cache values become history-dependent because of an otherwise harmless approximation, which produces statistically meaningful results13

Patterns of non-reproducibility (2/2) Examples of these “dangerous” approximations ●

●

●

cache_vector[ Z ] = function(Z, A) where the first isotope encountered for a given Z is used to compute the function, and then re-used for all other isotopes of the same element - use instead the lightest, or heaviest, or average isotope cache_vector[ bin_i ] = function( momentum ) where the momentum of the first particle is used to compute the function, instead of the center of the momentum bin... cache_vector[ bin_i ] = h1_h2_cross_section( Ekin_i ) where the center of the kinetic energy bin is assigned to the first particle to compute the h1 (projectile) – h2 (target) cross section, and then re-used also for the cross section of h2 (projectile ) - h1 (target) 14 - use instead the lightest, or heaviest of the two hadrons

Non-reproducibility fixes for G4 9.6 Up to now 11 non-reproducibility fixes have been made. Of these, 3 are needed after G4 9.5 (i.e. after January). Now Geant4 is reproducible, with two exceptions: CHIPS (deprecated in G4 9.6) and neutron HP (very slow). ●

(Decoupled) Chips quasi-elastic

●

Starkov elastic final state model for π± > 1 GeV

●

Ion ionization corrections

●

Fission in Bertini intra-nuclear cascade

●

Bertini intra-nuclear cascade, when hyperons are involved (it turned out a problem in G4PhaseSpaceDecayChannel )

●

●

●

(Decoupled) Chips hadron-nucleon inelastic cross sections, used by FTFP (2 different problems) Multiple scattering (3 different problems) Binary intra-nuclear cascade

15

Conclusion & Outlook ●

Several reproducibility violations in Geant4 have been fixed.

●

The coming release Geant4 9.6 will be reproducible ●

●

Exceptions are: CHIPS (deprecated) and neutron HP

Lessons: ●

Be careful when caching

●

Keep monitoring the reproducibility –

●

otherwise it is quickly lost by new developments!

Reproducibility tests of Geant4 are now run regularly ●

Every night (CDash Nightly)

●

More times per day, for each proposed tag (CDash Continuous)

●

16 Monthly, at each reference tag: more tests and with high statistics

Reproducibility of Geant4 simulations - CERN Indico

Reproducibility of Geant4 simulations - CERN Indico

Suggest Documents

CHEP-Geant4 MT poster - CERN Indico

CHEP-Geant4 MT poster - CERN Indico

Positron Source Simulations using Geant4 - CERN

training and simulations in radiotherapy - CERN Indico

Multithreading in Geant4 version 10 and its integration ... - CERN Indico

T OF - CERN Indico

Geant4 Simulations of Proton-induced Spallation for ... - CERN

CERN alignment sensors - CERN Indico

CERN alignment sensors - CERN Indico

CERN Hadoop Analytics - CERN Indico

Current Status of Geant4 MultiThreading - Indico - FNAL

2 - Indico - Cern

et al - Indico - Cern

Tatiana Likhomanenko - CERN Indico

MESUR - CERN Indico

WLCG Demonstrator - CERN Indico

Presented - CERN Indico

WLCG Demonstrator - CERN Indico

Baby MIND - CERN Indico

Network Requirements - CERN Indico

Ubiquitous Cyberinfrastructure - CERN Indico

300ms - CERN Indico

transparencies - CERN Indico

Mia Tosi - Indico - Cern