On-Prem

HPC

Aurora dawns late: Half-baked entry secures second in supercomputer stakes

Half the machine, quadruple the anticipation for all-Intel super


SC23 After years of delays, Argonne National Laboratory's all-Intel "Aurora" supercomputer has finally graced the Top500 ranking of the world's most powerful publicly known supercomputers — just not where many had hoped to see it.

The system, which features Intel's high-bandwidth memory (HBM)-equipped Xeon Max processors and GPU Max accelerators managed 585 petaFLOPS of double precision performance in the Linpack benchmark — or at least half of it did. Argonne, which completed installation of Aurora in late June has only submitted Linpack results for about half the system. The full system is expected to exceed two exaFLOPS of peak performance.

"Typically when you deploy systems like Aurora [with] 60,000 GPUs, it takes about seven to nine months to get to complete stability and correct attuning of a system," Ogi Brkic, the VP of Intel's supercomputing group, told journos during a pre-briefing. "We completed the bill of a system in June; in a full month we were able to do a lot."

The system's arrival at Argonne this summer and the Top500 this fall comes after years of delays and redesigns. The machine was supposed to come online in 2021, but has been delayed repeatedly by Intel's challenges bringing chips to market.

At one point the system was slated to deliver 180 petaFLOPS of double precision performance using 50,000 "Knights Hill" Xeon Phi many-core CPUs, but later pivoted to a more traditional CPU-plus-GPU design.

But as our sibling site The Next Platform has pointed out on more than one occasion, while it may not be the most efficient system — to produce 585 petaFLOPS, the system required 24.6 megawatts — it'll be one of the cheapest exascale-class supercomputers ever, coming in at $200 million. That's, of course, after Intel Federal, the prime contractor on the project, took a $300 million write off on the scheme.

Despite the partial showing, Aurora has managed to claim the number two spot, ousting Japan's 442 petaFLOPS Fujitsu A64-based "Fugaku" supercomputer that previously held that position.

All of this means that Oak Ridge National Laboratory's 1.2 exaFLOPS Frontier system has retained the top spot in the biannual ranking for a fourth consecutive time. The system arrived at the summit of the Top500 in early 2022 and is powered by AMD's 64-core Eypc 3 silicon and Instinct MI250X accelerators.

Unfortunately, those hoping to see how well Intel's complete system performs will have to wait until at least next June's Top500 ranking. And while we wait, Intel and Argonne will need to more than double the system's performance, if it has any hope of beating Frontier, or competing with Lawrence Livermore National Laboratory's AMD MI300A-powered "El Capitan" system.

Top500 gets a shakeup

Aurora isn't the only new supercomputer contending for a place at the top of the heap on this fall's ranking. In fact, the Top500 received quite a shakeup compared to the last few years of relative computational stability.

Of the ten fastest supercomputers in the Top500 ranking, four - including Aurora, Microsoft Azure's "Eagle" system and EuroHPC's "MareNostrum 5 ACC", and Nvidia's "EOS" system - are new. or have been upgraded significantly since this spring.

Next to Aurora, Eagle is the most powerful of these new systems. The cloud-based super claimed the number three spot with an Linpack score of 561 petaFLOPS which it squeezed from its 56-core Xeon 8480C processors and Nvidia H100 GPUs. In fact, Eagle is the highest ranked cloud system in the history of the Top500 and the fastest H100-equipped system to compete for a spot in the top 10.

With that said, Eagle isn't the first cloud cluster to breach the top ten list. Microsoft's Voyager-EUS2 claimed the number ten spot two years ago. However, with just 30 petaFLOPs of FP64 grunt, that's nearly 19x slower than Eagle.

Given just how many H100s Microsoft has been deploying to power its AI search and enterprise products, Redmond's return to the upper echelons of the Top500 are hardly surprising.

The team working on the AMD-powered "LUMI" system at CSC in Finland has upgraded kit on several occasions and has consistently managed to extract out double-digit petaFLOP performance improvements each year since its arrival on the Top500 last spring. The system is now 150 percent faster than it was first deployed, it's reported.

Looking down the stack EuroHPC's MareNostrum 5 ACC and Nvidia's EOS supers now slot in between the IBM-Nvidia powered Summit and Sierra systems with 138 petaFLOPS and 121 petaFLOPS respectively. These share a lot in common with Microsoft's Eagle platform, as they're using a combination of 4th-gen Intel Xeons, Nvidia's H100 accelerators, and Infiniband networking.

This year's ranking also represents a bit of a role reversal for Intel and AMD. Previously Intel CPUs were used in just two of the top 10 systems. Now Intel processors are used in five, while AMD processors power Frontier and Lumi; Fugaku uses custom Arm cores; and IBM's Power 9 chips still underpin Summit and Sierra machines.

More disruption to come:

With the race to build ever larger GPU clusters to power public and private AI development, next year's Top500 rankings look likely to be headed toward another shakeup.

In addition to Aurora, there are several new exascale and pre-exascale class systems slated to come online in the US and Europe over the next year. Two of the most anticipated are the El Capitan system we mentioned earlier and Europe's Jupiter system.

El Capitan was one of the first systems to showcase AMD's Instinct MI300A APU. The chip combines 24 of AMD's Zen 4 cores — the same ones used in its Genoa Epycs down to the dies themselves — and six CDNA 3 GPU dies and 128GB of HBM3 memory. The Next Platform predicts the system will deliver peak theoretical performance of 2.3 exaflops peak FP64 when fully operational.

Europe's first exascale super will also begin installation in 2024, though it's not year clear whether the system will be finished and fine tuned in time to rank at ISC or SC24. That system will be built by Atos and powered by SiPearl's Arm-based Rhea processors and Nvidia's GH200 Superchip. ®

Send us news
3 Comments

Intel drops the deets on UK's Dawn AI supercomputer

Phase one packs 512 Xeons, 1,024 Ponte Vecchio GPUs. Phase two: 10x that

DoE watchdog warns of poor maintenance at home of Frontier exascale system

Report says new QA plan currently being worked up

Intel shows off backside power and stacked transistors at IEDM

Chip giant claims demo tech could 'significantly' improve device density

Intel scores a reprieve in $2.18B VLSI patent case after court orders retrial

The never-ending IP story goes on

Researchers weigh new benchmarks for Green500 amid shifting workload priorities

Just because it's super efficient at Linpack doesn't mean it'll be in everything

China's Loongson debuts processor that 'matches Intel silicon circa 2020'

Best not to dismiss it, as Asus looks to be onboard and advances are promised

German budget woes threaten chip fab funding for Intel and TSMC

Constitutional court tells govt: Er, about that €60B you handed out... it's not legal

UK bets on Intel CPUs and GPUs, Dell boxen, OpenStack for Dawn supercomputer

We'd make some kind of Sun sets joke here but it's too early in the morning

As the Top500 celebrates its 30th year, with a $5 VM you too can get into the top 10 ... of 1993

But if you really care about performance, there are better options out there, natch

Greenpeace calls out tech giants for carbon footprint fumble

Net-zero promises or zero-net progress?

Washington pours $3B into silicon smackdown to outpackage Asia

Uncle Sam rolls up sleeves to onshore work and protect supply chain

Intel emits patch to squash chip bug that lets any guest VM crash host servers

Sapphire Rapids, Alder Lake, Raptor Lake chip families treated for 'Redundant Prefix'