On-Prem

HPC

US seeks exascale systems 10 times faster than current state-of-the-art computers

China claims to have 10 in the pipeline and may pull ahead in HPC arms race


The US Department of Energy is looking to vendors that will help build supercomputers up to 10 times faster than the recently inaugurated Frontier exascale system to come on stream between 2025 and 2030, and even more powerful systems than that for the 2030s.

These details were disclosed in a request for information (RFI) issued by the DoE for computing hardware and software vendors, system integrators and others to "assist the DoE national laboratories (labs) to plan, design, commission, and acquire the next generation of supercomputing systems in the 2025 to 2030 time frame."

Vendors have until the end of July to respond.

For this RFI, the DoE says it is interested in the deployment of one or more supercomputers that can solve scientific problems five to 10 times faster than current state-of-the-art systems "or solve more complex problems, such as those with more physics or requirements for higher fidelity."

The current state of the art is perhaps best represented by the Frontier exascale system installed in the Oak Ridge National Laboratory, which was declared operational at the end of May and clocked at 1.102 Linpack exaFLOPS of compute power, but which is expected to hit a peak theoretical performance in excess of 2 exaFLOPS in future.

In line with this, the DoE states that its rough estimate – based upon trends covering the past 20 years – includes traditional HPC systems at the 10-20 exaFLOPS level or beyond in the 2025+ time frame, and 100+ exaFLOPS and beyond in the 2030+ time frame, which it expects to be delivered "through hardware and software acceleration mechanisms."

Any such supercomputer will be expected to operate within a power envelope of 20-60MW, according to the DoE. For comparison, the Frontier system already consumes about 20MW, and can apparently hit a peak of over 30MW.

Systems should also be "sufficiently resilient" to hardware and software failures to minimize requirements for user intervention.

Interestingly, the DoE says that it is seeking to move away from "monolithic acquisitions" towards a model that would allow more rapid upgrade cycles of deployed systems to enable faster innovation on hardware and software.

One possible strategy that might be followed is an increased reuse of existing infrastructure so that the upgrades are modular, with an acquisition process that allows continuous injection of technological advances to facilities, perhaps every 12 to 24 months rather than on a four or five-year cycle.

This sounds somewhat similar to the approach that has been adopted for Europe's first publicly declared exascale computer, the Jupiter system being constructed in Germany by the European High Performance Computing Joint Undertaking (EuroHPC JU).

Jupiter will be based on a "dynamic, modular supercomputing architecture," according to the Forschungszentrum Jülich where it will be based, comprising a universal cluster module paired with a GPU accelerator module and storage modules, but it is planned to be expanded in future with additional modules that may include a quantum processing unit or a neuromorphic processing module.

The responses to this RFI will help the DoE and the national labs to update their long-term advanced computing roadmaps, as well as inform the requirements for one or more DoE system acquisition RFPs (request for proposal) for systems to be delivered in the 2025–2030 time frame.

The level of information requested from vendors by the DoE is quite detailed, covering not just the kind of processors, memory, storage, and interconnect options the vendors foresee using within the 2025 to 2030 time frame, but also what manufacturing processes they expect the chips to be made with, whether the processors will be some form of APU/XPU system-on-chip combination, the expectations for the bandwidth and power of interconnects, the potential node configuration, and so on.

Perhaps the DoE has been spurred on by fears that the US may fall behind China in the supercomputing arms race, as reported earlier this year.

While the US has three exascale systems in the pipeline, China aims to have up to 10 operational systems by 2025, and reports in the Financial Times claimed that US experts feared that China could beat them to important science and technology breakthroughs by fielding a larger number of exascale machines. ®

Send us news
10 Comments

DoE watchdog warns of poor maintenance at home of Frontier exascale system

Report says new QA plan currently being worked up

US lawmakers want blanket denial for sensitive tech export licenses to China

Committee worries licenses are being issued to boost and suit business, not national security

Australia building 'top secret' cloud to catch up and link with US, UK intel orgs

Plans to share 'vast amounts of data' – very carefully

US lawmakers have Chinese LiDAR on their threat-detection radar

Amid fears Beijing could harvest spatial data, letter suggests Huawei-style bans may be needed

Crypto crasher Do Kwon's extradition approved, but destination is unclear

Hey Google, are the jails nicer in South Korea or the US?

Researchers weigh new benchmarks for Green500 amid shifting workload priorities

Just because it's super efficient at Linpack doesn't mean it'll be in everything

Aurora dawns late: Half-baked entry secures second in supercomputer stakes

Half the machine, quadruple the anticipation for all-Intel super

Intel drops the deets on UK's Dawn AI supercomputer

Phase one packs 512 Xeons, 1,024 Ponte Vecchio GPUs. Phase two: 10x that

As the Top500 celebrates its 30th year, with a $5 VM you too can get into the top 10 ... of 1993

But if you really care about performance, there are better options out there, natch

Baidu joins Tencent in downplaying impact of US chip bans on AI ambitions

Stockpiled silicon, improving efficiency, and looking for alternative suppliers – while maybe taking a dig at Microsoft and OpenAI

Washington pours $3B into silicon smackdown to outpackage Asia

Uncle Sam rolls up sleeves to onshore work and protect supply chain

Fujitsu says it can optimize CPU and GPU use to minimize execution time

Demos its Adaptive GPU Allocator as global shortage of geepies grinds on