STS reco: Add padding between atomic counters to prevent false sharing.
Add padding between atomic counters to ensure they are all in different cache lines.
This prevents false sharing between CPU caches and improves performance on the cluster creation and hit creation kernels:
- Cluster creation is now 250% (!) faster on CPU (32 threads, meassured across 20 mCBM timeslices)
- Runtime of hit creation improved by about 40%