Single precision usage
R&D task number: G4RD3
Study of the impact of using single precision in simulation components having low data accuracy
Arithmetic precision is essential in operations having to account for effects with very small cumulative numerical contributions, such as certain NLO or NNLO theory calculations. This is not the case for many of the components of particle transport simulation, where the data for the models being simulated are known with quite low accuracy, and for which calculations in float would not impact a priory on the precision of physics results. Here are few examples:
- Except the case of very thin layers, the boundary of detector components is usually known with errors much larger than microns. In most cases, over or under-estimating the material budget seen by a particle with less than one per mille has a negligible impact on the physics result. Most geometry calculations could be done therefore in single precision. At least the shape and material parameters in geometry could be stored in single precision.
- The cross sections for most physics processes used in simulation are known at a (few) percent level of accuracy, so the precision of the observables depending on those (energy deposits, final states and their kinematics) is at the same level. Cross section tables could be stored in single precision.
- The magnetic field maps are known with errors at percent level for most HEP experiments. Performing integration of the particle transport equation using single precision field maps should not pose precision issues.
Potential impact on computing performance
Most x86 architectures use x87 and/or SSE FPU coprocessors for arithmetic operations. Conversion is needed for both float and double operations, making both equally fast. Floats are represented on 32 bits, while doubles use 64 bits. Being represented on half size has impact on memory operations and caching, specially on large data structures. Operating in single precision implies having to fetch less data per instruction from memory and half size in memory copy operations. Data structures are more compact and therefore more likely to fit caches at all levels. Vectorization for floats can execute twice as many simultaneous operations than for doubles. In Geant4, data cache misses have a considerable impact on performance, the potential gain from using single precision could be larger than 10-15%.
Potential caveats
While single precision may be enough for most numerical operations described above, we may encounter case by case a number of issues. A first category is the numerical stability of algorithms in certain data conditions. Certain algorithms are prone to catastrophic cancellations, or at least loss of significant bits that can affect the precision of results at unacceptable levels. Another category is related to algorithm tuning that will have to be adapted to single precision, for example conditions for considering a particle as systematically results in certain conditions.
To deal with the above cases, unit tests have to be implemented in order to assess the stability of algorithms. For combined functionality, a validation procedure has to be put in place case by case. This study should also investigate compacting the data structures, particularly in the case of geometry.
Directions to investigate
- Demonstrator for using single precision geometry data for geantino transport in a complex geometry.
- Performance of magnetic field integration in single versus double precision, using a realistic setup. Performance will be compared when storing the field data in float versus double, but also doing the integration in float versus double
- Study for storing and using cross section tables in float and performing interpolation in float. Performance will be compared to the double precision case.
Lead and main developers: Andrei Gheata, Guilherme Amadio
Effort estimate
An initial exploration is needed to have a realistic estimation per category. We expect the physics cases to have a faster evolution compared to geometry, which has more algorithms.