Azure-based 'AI supercomputer' to use Nvidia GPUs and software

Is it a super or a cloud service? Users would string together components as required

Dan Robinson Thu 17 Nov 2022 // 17:00 UTC

Microsoft and Nvidia say they are teaming up to build an "AI supercomputer" using Azure infrastructure combined with Nvidia's GPU accelerators, network kit, and its software stack.

The target market will be enterprises looking to train and deploy large state-of-the-art AI models at scale.

According to Nvidia, the project will be a multi-year effort that will aim to deliver "one of the most powerful AI supercomputers in the world," and the move will also make Azure the first public cloud to incorporate Nvidia's AI software stack.

To build this system, the pair will use the Azure cloud platform's ND-series and NC-series virtual machines, which are GPU-based instances, but the project will involve bringing "tens of thousands" of Nvidia's A100 and the latest H100 GPUs to the platform.

Nvidia claims that Microsoft's Azure is the first public cloud to incorporate its Quantum-2 InfiniBand networking switches for its AI-optimized virtual machines. The current Azure instances feature 200Gb/s Quantum InfiniBand and A100 GPUs, but future ones will have 400Gbps Quantum-2 InfiniBand and the newer H100 GPUs.

The two companies offered no time frame for this project to become operational, but it appears that this so-called AI supercomputer may actually not exist as such.

Reading between the lines, it appears that Microsoft is simply adding to Azure a bunch of AI-optimized instances with Nvidia's latest hardware and software, which customers will string together as required for their specific project.

We asked Nvidia for clarification, and a spokesperson told us that: "All new capabilities are in Azure instances, but the setup is such that enterprises will be able to scale those instances all the way up to supercomputing status." So that's an AI cloud service, in that case.

Nvidia's spokesperson added: "Customers can acquire resources as they normally would with a real supercomputer. For both you have a software layer that reserves resources. Just now it is in the cloud and not on a dedicated supercomputer. The most important thing is scalability. The resource on Azure can be scaled up to supercomputer standards with the same AI software, network capability and compute nodes."

The partnership will also see Nvidia make use of Azure resources to conduct research into generative AI models, which the company describes as a "rapidly emerging area" of AI in which foundational models such as Megatron Turing NLG 530B are the basis for new algorithms capable of synthesizing text, computer code, images, and even video.

The model in question was developed by a team from Microsoft and Nvidia, using Nvidia's Megatron-LM transformer model and Microsoft's DeepSpeed deep learning optimization library, the latter of which the pair will also work on optimizing.

Scott Guthrie, Microsoft EVP for its Cloud + AI Group, said that AI will fuel the next wave of automation in enterprises, enabling organizations to do more with less.

"Our collaboration with Nvidia unlocks the world's most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure."

Nvidia's VP of enterprise computing, Manuvir Das, said the partnership aimed to provide researchers and companies with infrastructure and software to exploit AI.

"The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications," he said.

To this end, the AI supercomputer/cloud service will support a broad range of applications and services, including Microsoft DeepSpeed and Nvidia's AI Enterprise software suite. The latter is already certified and supported on Azure instances with A100 GPUs, while support for Azure instances with the newer H100 GPUs will be added in a future software release, Nvidia said. ®

On-Prem

HPC

Azure-based 'AI supercomputer' to use Nvidia GPUs and software

Is it a super or a cloud service? Users would string together components as required

Creating a single AI-generated image needs as much power as charging your smartphone

AMD slaps together a silicon sandwich with MI300-series APUs, GPUs to challenge Nvidia’s AI empire

Tech world forms AI Alliance to promote open and responsible AI

HPE targets enterprises with Nvidia-powered platform for tuning AI

Don't be fooled: Google faked its Gemini AI voice demo

AWS accuses Microsoft of clipping customers' cloud freedoms

Trust us, says EU, our AI Act will make AI trustworthy by banning the nasty ones

Microsoft adds FPGA-powered network accelerator to Azure

Experienced Copilot help is hard to find, warns Microsoft MVP

Microsoft, Databricks double act tries to sew up the data platform market

Dell APJ chief: Industry won't wait for Nvidia H100

Nvidia’s China-market H20 chips hit another speed bump

About Us

Our Websites

You Privacy