On-Prem

HPC

Aurora exascale system gets 'mini-me' testbed for researchers

We node you want to test it out. (We're here all week.)


Researchers waiting to get their hands on the much delayed Aurora supercomputer at the US Argonne National Laboratory now have a new toy at their disposal, a mini-Aurora codenamed Sunspot.

Sunspot is a two-rack test and development system equipped with 128 nodes of the same technologies that will power Argonne's Aurora exascale supercomputer. Image by Argonne National Laboratory

Sunspot is a new test and development system that has been built to the exact same architecture as Aurora, the exascale supercomputer currently under construction at the Argonne Leadership Computing Facility (ALCF) in Illinois.

But while Aurora is planned to assimilate more than 10,000 nodes once fully completed, Sunspot can slip into just two datacenter racks with its 128 nodes.

Like Aurora, each node is configured with two Intel Xeon CPU Max (Sapphire Rapids) processors and six Intel Data Center Max (Ponte Vecchio) GPU accelerators, with HPE's Slingshot interconnect (Cray technology) linking everything together.

"Sunspot is basically a miniature version of Aurora," said ALCF's project director for Aurora, Susan Coghlan.

The idea is that it gives the research teams a facility that they can use to optimize code performance using the actual Aurora hardware while they are still waiting for the real thing.

Aurora was originally scheduled for delivery in 2018 as a system based on the (now discontinued) Intel Xeon Phi chips, then came a new architecture intended to make it the first exascale supercomputer (one capable of performing a billion billion (1018) floating point calculations per second).

However, this incarnation slipped behind schedule due to delays in Intel getting its Sapphire Rapids Xeon Scalable processors out the door, and the AMD-based Frontier supercomputer at Oak Ridge National Laboratory in Tennessee eventually took the exascale prize.

Sunspot has apparently been on-site at Argonne since December, but prior to it being ready the development teams made use of a series of other testbed systems. These included Iris, Arcticus, and Florentia at Argonne itself and Borealis at Intel's high performance compute (HPC) lab in Oregon.

These systems continue to be useful for Aurora preparations, but it is apparently Sunspot's identical architecture that gives researchers the ideal environment for optimizing application performance for the exascale supercomputer.

"Sunspot is the first time we're seeing how everything is working together," Coghlan said. "We learn a lot from these runs. It gives us a chance to iron out some of the kinks before Aurora is ready for users."

ALCF's Aurora Early Science Program co-manager Tim Williams said that this was important for getting ready to start doing science with a new system from day one of deployment.

"Testbeds like Sunspot allow researchers to carry out performance studies and scale up their workloads to run on much larger supercomputers while those systems are still being built," he explained.

According to Argonne, over 180 researchers from over 20 application development teams from the Early Science Program (ESP) and the US Department of Energy Exascale Computing Project (ECP) have now begun accessing the testbed for scaling and performance optimization research.

The ALCF team said it expects performance improvements in the software code as the teams continue to do multi-node scaling and optimization work on Sunspot and other available computing resources.

As an example, the team is said to be using Sunspot's Intel DAOS (Distributed Asynchronous Object Storage) system to test and enhance I/O performance.

Sunspot is expected to continue to serve a role even after Aurora is declared fully operational, Argonne said, which is now scheduled for sometime later this year. Like the ALCF's other test and development systems, Sunspot should remain a useful platform for users to optimize code performance before moving across to Aurora. ®

Send us news
3 Comments

DoE watchdog warns of poor maintenance at home of Frontier exascale system

Report says new QA plan currently being worked up

Researchers weigh new benchmarks for Green500 amid shifting workload priorities

Just because it's super efficient at Linpack doesn't mean it'll be in everything

Intel drops the deets on UK's Dawn AI supercomputer

Phase one packs 512 Xeons, 1,024 Ponte Vecchio GPUs. Phase two: 10x that

As the Top500 celebrates its 30th year, with a $5 VM you too can get into the top 10 ... of 1993

But if you really care about performance, there are better options out there, natch

Aurora dawns late: Half-baked entry secures second in supercomputer stakes

Half the machine, quadruple the anticipation for all-Intel super

Fujitsu says it can optimize CPU and GPU use to minimize execution time

Demos its Adaptive GPU Allocator as global shortage of geepies grinds on

UK govt finds £225M for Isambard-AI supercomputer powered by Nvidia

5,448 GraceHopper superchips and 200PFLOPS gets you somewhere in the global public top ten

UK bets on Intel CPUs and GPUs, Dell boxen, OpenStack for Dawn supercomputer

We'd make some kind of Sun sets joke here but it's too early in the morning

HPE and Nvidia offer 'turnkey' supercomputer for AI training

If you can afford it – pricing's not out yet

Tachyum says someone will build 50 exaFLOPS super with its as-yet unfinished chips

'It's a huge, effing big machine'

Atos subsidiary Eviden scores contract win in Europe's first exascale system

$526M Jupiter set to rule EU's tech orbit by 2024

US govt talks up $2B X-ray photobooth to check its nuke weapon sims are right

Sub-critical plutonium implosion to be snapped on nanosecond scale