Bringing new life to ATLAS data

15 October 2021 | By

The ATLAS Collaboration is breathing new life into its LHC Run-2 dataset, recorded from 2015 to 2018. Physicists will be reprocessing the entire dataset – nearly 18 PB of collision data – using an updated version of the ATLAS offline analysis software (Athena). Not only will this improve ATLAS physics measurements and searches, it will also position the Collaboration well for the upcoming challenges of Run 3 and beyond.

Athena converts raw signals recorded by the ATLAS experiment into more simplified datasets for physicists to study. Its new-and-improved version has been in development for several years, and includes multi-threading capabilities, more complex physics-analysis functions and improved memory consumption.

“Our aim was to significantly reduce the amount of memory needed to run the software, widen the types of physics analyses it could do, and – most critically – allow current and future ATLAS datasets to be analysed together,” says Zach Marshall, ATLAS Computing Coordinator. “These improvements are a key part of our preparations for future high-intensity operations of the LHC – in particular the High-Luminosity LHC (HL-LHC) run beginning around 2028, which will see ATLAS’ computing resources in extremely high demand.”

This latest version of Athena already makes good headway in reducing the computing resources required for data analysis. For example, the computationally intensive job of taking individual signals from the inner detector and chaining them together to form particle tracks is now two to four times faster. Less disk space is needed to store the results and overall the software runs more smoothly.

In addition, physicists now have the ability to handle ‘multi-threading’ of events. “While past software improvements enabled greater parallelisation in ATLAS data processing, this improvement allows us to process multiple events at once while concurrently analysing multiple parts of a collision event,” explains Marshall. “This modification required tens of thousands of code changes; it significantly lowers the memory consumption needed and allows for higher event throughput.”

Physicists will be reprocessing the entire dataset – nearly 18 PB of collision data – using an updated version of the ATLAS offline analysis software (Athena).

The software improvements also feature new ways for physicists to study their data. For example, researchers will now, by default, be able to look for tracks that originate away from the collision point. These could be signatures of particles with long lifetimes and may lead to evidence of exciting beyond-the-Standard-Model physics processes. While such searches were possible with the earlier version of the ATLAS software, the heavy computing resources they required meant they could not always be carried out.

Finally, physicists have also made improvements to the databases containing all of the time-dependent status information of the detector components. These databases – on which Athena runs – now incorporate an improved understanding of the detector’s operation during Run 2. “Every data-taking period is an opportunity for us to learn more about the detector and its subsystems,” says Song-Ming Wang, ATLAS Data Preparation Coordinator. “Revisiting these databases with the benefit of hindsight will allow us to provide even better performance.”

With the new Athena software now up-and-running, researchers have set out to reprocess the entire Run-2 dataset. This will take several months, as the dataset is quite substantial. One challenge in handling all of these data is related to how they are accessed and stored. Raw data files are saved on magnetic tape at the CERN Data Centre, as well as on Worldwide LHC Computing Grid centres around the world. Instead of recalling large portions of the dataset to be processed at once – which would require significant (and expensive) storage space – data will be orchestrated such that only small percentages of them are processed at a time. Once complete, physicists will use this same strategy to reprocess the billions of simulated events used in physics analyses.

After all of this work, ATLAS will have a significantly improved dataset that will allow for crisper measurements, more powerful searches, and simpler combinations of past data with the future data to come!

Learn more