ATLAS and Seal Storage Technology collaborate on new archival storage

28 October 2022 | By

The Large Hadron Collider (LHC) at CERN is undergoing a major upgrade that will greatly increase the number of collisions and discovery potential of the accelerator. This new High-Luminosity LHC (HL-LHC) presents several great challenges for physicists, including the issue of long-term archival data storage. The ATLAS Experiment at CERN already generates tens of petabytes of data every year, stored centrally in CERN’s “Tier 0” Data Centre as part of the Worldwide LHC Computing Grid (WLCG). Magnetic tape is used for long-term Tier-0 archival storage due to its reliability and energy efficiency. A copy of the data is then stored and accessed from "Tier 1" data sites around the world. Demand for archival storage capacity for these sites is expected to increase exponentially in the coming years, and is projected to quickly outpace the capacity afforded by a sustained budget model (see Figure 1).

To address this disparity in supply vs demand at these Tier-1 sites, the ATLAS Collaboration has partnered with Seal Storage Technology in a pilot project to explore their decentralised cloud storage platform as an efficient and cost-effective option for archival data storage. In this R&D project, Seal has provided 10 Pibibytes (PiBs) of cloud-based storage capacity to the ATLAS Experiment, hosted in Seal’s data centres. Together with ATLAS and CERN scientists and engineers, the Seal engineering team has successfully completed the initial integration between Seal’s decentralised storage platform and CERN’s Rucio data management software and File Transfer Service (FTS).

“This is a great opportunity for ATLAS to integrate cutting-edge, commercial cloud storage resources into our distributed computing infrastructure,” said Alessandro Di Girolamo, ATLAS Computing Co-Coordinator. “We are delighted to collaborate with Seal Storage Technology on this R&D project, which has enormous possibilities for expansion and could provide significant distributed archival storage across WLCG sites for the HL-LHC era.”

“This is a great opportunity for ATLAS to integrate cutting-edge, commercial cloud storage resources into our distributed computing infrastructure,” said Alessandro Di Girolamo, ATLAS Computing Co-Coordinator.

Figure 1: Expected tape (archival) storage needs of the ATLAS Experiment at WLCG Tier-1 sites for the coming years, under two different models of software development (labelled “Conservative” and “Aggressive”). In both models, the needs outpace the expected storage capacity under a sustained budget model (black lines). (Image: ATLAS Collaboration/CERN)

Archival storage at large data centres typically relies on magnetic tapes, which have the advantage of being energy efficient and reliable. However, to meet the data-analysis requirements of the HL-LHC, physicists will need to be able to quickly access archival storage. Typically, data stored on tape are considered "cold" and must be recalled, i.e. copied onto disk, in order to be processed. Without separate data recall, the CPU would sit idle while waiting for the physical tape cassette to be mounted, wound to the appropriate point, and read. This necessary recall is the defining feature of tape, and introduces delays in processing and complication when many users want to access different datasets stored on the same physical tape.

The Seal-hosted storage has already been tested as “warm" archival storage, using the storage to directly feed input to ATLAS data processing jobs. Archival storage without recall delays as tape would provide significantly faster turn-around for the experiment’s data analyses when accessing archived data not already available on disk. Additionally, Seal's storage helps ensure data integrity by regularly checking for data corruption or involuntary modification.

Development work is ongoing to improve transfer rates and robustness of the systems. Seal’s data platform uses Filecoin’s distributed ledger technology to ensure all data stored with Seal is immutable, verifiable and has a chain of custody. Data authenticity is paramount: all data centres, hardware and management are enterprise-grade, and Seal stores multiple copies across several global sites. Secure global distribution ensures there is no single point of failure for data stored with Seal, establishing the highest standards for data security and protection.

“As a leader in scientific advancement, the ATLAS Experiment at CERN is the ideal partner for Seal’s innovative decentralised cloud storage platform,” said Michael Horowitz, Seal Storage Technology CEO. “Seal is excited to continue this partnership and provide secure and reliable storage for data fuelling HL-LHC’s discoveries.”

Learn more