On-Prem

HPC

Researchers weigh new benchmarks for Green500 amid shifting workload priorities

Just because it's super efficient at Linpack doesn't mean it'll be in everything


SC23 Is it time for the Green500 to expand its scope to account for more diverse workloads? This was one of the questions attendees grappled with at SC23.

Similar to the Top500, which ranks systems based on sheer performance, the Green500 weighs that performance against a system's power consumption in terms of gigaFLOPS per watt.

High Performance Linpack has been the gold standard for testing compute clusters for performance in exacting double-precision workloads. So, when the Green500 was launched in 2007 it made sense to use Linpack as the basis for evaluating the efficiency of these systems.

The problem is Linpack is only one benchmark and it isn't representative of all workloads. This is why we've seen benchmarks like High Performance Conjugate Gradient (HPCG) and HPL-MxP — formerly HPL-AI — crop up over the years to provide additional context for both traditional double-precision and mixed precision workloads.

But while we've found new ways to benchmark supercomputers in terms of performance, the Green500 remains tied to Linpack. That, however, may not be the case for much longer.

Trying an alternative approach

Over the past 18-24 months there has been a growing movement to broaden the scope of the Green500 to alternative workloads, Wu-chun Feng explained during a presentation at SC23.

"Mike Heroux and Jack Dongarra in particular have broached this subject about looking at the Green500 using HPCG," he explained. "Satoshi Matsuoka has been talking about 'all these benchmarks are important can we come up with some type of composite Green500 number of FLOPS per watt by somehow combining the numbers we get from the different benchmarks'."

Some of the early testing of HPCG is designed to more closely reflect real world performance in a wide variety of HPC workloads. If you take a look at HPCG performance, the scores are substantially lower than you'd expect to see from Linpack. In fact Japan's Fugaku comes in fourth in Linpack but first in the HPCG benchmark, beating out Frontier.

It's important to remember that for the Green500, the workload - whether its Linpack or HPCG - is there just as much to measure power consumption as it is to measure performance. Different benchmarks are going to utilize the infrastructure like accelerators and network fabrics to different degrees. As such, testing methodologies may need to be adjusted to accommodate alternative workloads.

While complex, Feng noted that "HPCG presents an opportunity to innovate from a software perspective in order to deliver energy efficiency."

While Feng didn't touch on HPL-MxP in much detail, there also appears to be an opportunity to address workloads that can take advantage of lower-precision floating point calculations to achieve a speedup compared to your typical FP64 application.

Looking at modern accelerators, it's not hard to see why. Nvidia's H100, for instance, sports up to 67 teraFLOPS of FP64, but drop down to FP8 and you're looking at 2 PFLOPS and roughly 4 PFLOPS with sparsity enabled.

Scientists at the University of Bristol have demonstrated the advantages of running climate models at half precision. But, the biggest beneficiary of lower precision is undoubtedly AI training and inference, especially for models that take advantage of sparsity.

As such, it's not hard to imagine a system that's incredibly efficient in mixed-precision workloads but performs rather poorly in HPC benchmarks. But just like HPCG, incorporating HPL-MxP into the Green500 ranking will likely require new testing methodology.

Henri maintains its lead over Green500

Despite the excitement surrounding Aurora's arrival on the Top500 ranking of supercomputers, there weren't nearly as many surprises with regard to this fall's Green500.

The Flatiron Institute's two petaFLOP Henri system retained its top spot. The 31-kilowatt Lenovo ThinkSystem cluster managed to squeeze 65 gigaFLOPS per watt from its 5920 Nvidia H100 and Ice Lake Xeon cores.

With that said two systems have moved into the top 10 most efficient supers. This included EuroHPC's MareNostrum 5 ACC which in addition to claiming the number eight spot on the Top500 managed to displace frontier for sixth place on the Green500.

Built by Eviden, the system features a similar arrangement as Henri, pairing Nvidia's H100s with Intel's newer 4th-Gen Xeon Scalable processors. The system managed to achieve 54 gigaFLOPS per watt of efficiency in the test.

South Korea's Olaf system was the other new system to break into the upper echelon of the Green500, claiming the number ten spot at 45 gigaFLOPS per watt.

Olaf is another Lenovo ThinkSystem machine, but instead of Intel's CPUs it pairs Nvidia H100 GPUs with AMD's 32 core Eypc Genoa processors. ®

Send us news
Post a comment

AMD slaps together a silicon sandwich with MI300-series APUs, GPUs to challenge Nvidia’s AI empire

Chips boast 1.3x lead in AI, 1.8x in HPC over Nv's H100

Tech world forms AI Alliance to promote open and responsible AI

Everyone from Linux Foundation to NASA and Intel ... but some big names in AI are MIA

Creating a single AI-generated image needs as much power as charging your smartphone

PLUS: Microsoft to invest £2.5B in UK datacenters to power AI, and more

Don't be fooled: Google faked its Gemini AI voice demo

PLUS: The AI companies that will use AMD's latest GPUs, and more

AMD thinks it can solve the power/heat problem with chiplets and code

CTO Mark Papermaster lays out the plan for the next two years

Trust us, says EU, our AI Act will make AI trustworthy by banning the nasty ones

Big Tech plays the 'this might hurt innovation' card for rules that bar predictive policing, workplace emotion assessments

The AI everything show continues at AWS: Generate SQL from text, vector search, and more

Invisible watermarks on AI-generated images? Sure. But major tools in the stack matter most

Now is a good time to buy memory because prices rise next year, Gartner predicts

To blame? The usual suspect – AI's appetite for chips

HPE targets enterprises with Nvidia-powered platform for tuning AI

'We feel like enterprises are either going to become AI powered, or they're going to become obsolete'

Intel drops the deets on UK's Dawn AI supercomputer

Phase one packs 512 Xeons, 1,024 Ponte Vecchio GPUs. Phase two: 10x that

Google unveils TPU v5p pods to accelerate AI training

Need a lot of compute? How does 8,960 TPUs sound?

Mere minority of orgs put GenAI in production after year of hype

Folks are dipping their toes in without a full commitment