Slovenian Vega supercomputer processes ATLAS experimental data
June 16, 2022 — The ATLAS collaboration uses a global network of data centers – the Global LHC Computing Grid – to perform data processing and analysis. These data centers are typically built from commodity hardware to perform the full spectrum of ATLAS data processing, from reducing the raw data coming out of the detector to a manageable size, to producing plots for publication.
While the Grid’s distributed approach has proven to be very effective, the computing needs of the LHC experiments are constantly increasing, which is why the ATLAS collaboration explored the potential for integrating high-performance computing centers (HPC ) in the Grid’s distributed environment. HPC harnesses the power of purpose-built supercomputers from specialized hardware and is widely used in other scientific disciplines.
However, HPC poses significant challenges for processing ATLAS data. Access to supercomputer facilities is generally more restricted than Grid sites and their CPU architectures may not be suitable for ATLAS software. Their scheduling mechanisms favor very large jobs using several thousand nodes, which is atypical of an ATLAS workflow. Finally, the supercomputer installation may be geographically distant from the storage hosting the ATLAS data, which may pose network problems.
Despite these challenges, ATLAS collaborators have successfully exploited HPC over the past few years, including several topping the famous Top500 list of supercomputers. Technological barriers were overcome by isolating the main computation from the parts requiring network access, such as data transfer. Software issues were resolved by using container technology, which allows ATLAS software to run on any operating system, and the development of “edge services”, which allows computations to run in offline mode without the need to contact external services.
The most recent HPC center to process ATLAS data is Vega – the first new petascale EuroHPC JU machine, housed at the Institute of Information Sciences in Maribor, Slovenia. Vega began operations in April 2021 and consists of 960 nodes, each containing 128 physical processor cores, for a total of 122,800 physical cores or 245,760 logical cores. To put that into perspective, the total number of cores provided to ATLAS from grid resources is around 300,000.
Due to close ties with the ATLAS community of physicists in Slovenia, some of whom were heavily involved in the design and commissioning of Vega, the ATLAS collaboration was one of the first users to be granted grants official times. This benefited both the ATLAS collaboration, which was able to leverage a significant additional resource, and Vega, which received a steady and well-understood stream of jobs to help with the commissioning phase.
Vega was almost continuously busy with ATLAS jobs from the time it was activated, and periods when fewer jobs were running were either due to other users on Vega or a lack of ATLAS jobs to submit. This enormous additional computing power – essentially doubling ATLAS’s available resources – was invaluable, allowing multiple large-scale data processing campaigns to run in parallel. Thus, the ATLAS collaboration is heading towards the restart of the LHC with a fully updated Run 2 dataset and the corresponding simulations, many of which have been significantly extended in terms of statistics thanks to the additional resources provided by Vega.
It is a testament to the robustness of ATLAS’ distributed computing systems that they could be extended to a single site equivalent in size to the entire grid. While Vega will eventually be devoted to other scientific projects, a part will continue to be dedicated to ATLAS. Moreover, the successful experience shows that ATLAS members (and their data) are ready to jump on the next available HPC center and exploit its full potential.