On-Prem

HPC

Samsung slaps processing-in-memory chips onto GPUs for first-of-its-kind supercomputer

Korean tech giant claims big performance, energy efficiency gains with memory tech


Samsung has built a claimed first-of-its-kind supercomputer containing AMD datacenter GPUs affixed with its processing-in-memory chips, which the company said can significantly improve the performance and energy efficiency of training large AI models.

The supercomputer, disclosed Tuesday at an industry event in South Korea, includes 96 AMD Instinct MI100 GPUs, each of which are loaded with a processing-in-memory (PIM) chip, a new kind of memory technology that reduces the amount of data that needs to move between the CPU and DRAM.

Choi Chang-kyu, the head of the AI Research Center at Samsung Electronics Advanced Institute of Technology, reportedly said the cluster was able to train the Text-to-Test Transfer Transformer (T5) language model developed by Google 2.5 times faster while using 2.7 times less power compared to the same cluster configuration that didn't use the PIM chips.

"It is the only one of its kind in the world," Choi said.

Samsung has said that its PIM tech has major implications for energy consumption and the environment, reducing annual energy use of a cluster by 2,100 Gigawatt hours and, consequently, cutting down 960,000 tons of carbon emissions.

As always, we should reserve judgement until these claims can be tested and verified independently, but the company said such a reduction in power is equal to the amount of carbon it would take 16 billion urban trees to absorb over a decade.

One big reason why the PIM-powered supercomputer has so much horsepower is that each PIM chip uses high-bandwidth memory (HBM), which the industry is increasingly turning to for handling high-performance computing and AI workloads. Nvidia and AMD have used HBM in datacenter GPUs for multiple generations now, and Intel plans to introduce HBM in a forthcoming variant of server processors branded Xeon Max and a high-end datacenter GPU.

What makes Samsung HBM-PIM chips different from HBM implementations by other companies is that each memory bank on the PIM chip includes a processing unit inside. This, according to the South Korean electronics giant, reduces bottlenecks associated with moving data between the CPU and memory by shifting some of the computation inside the memory itself.

Samsung hopes to spur adoption of its PIM chips in the industry by creating software that will allow organizations to use the tech in an integrated software environment. To do this, it's relying on SYCL, a royalty-free, cross-architecture programming abstraction layer that happens to underpin Intel's implementation of C++ for its oneAPI parallel programming model.

The company has been hyping up PIM for nearly three years now, and one other way it plans to take the tech to market is through what it has called the AXDIMM, short for accelerated DIMM.

We'll know if Samsung ends up making any inroads if PIM starts appearing in new supercomputers set up by research labs, academic institutions, and other organizations over the next few years. ®

Send us news
10 Comments

AMD slaps together a silicon sandwich with MI300-series APUs, GPUs to challenge Nvidia’s AI empire

Chips boast 1.3x lead in AI, 1.8x in HPC over Nv's H100

AMD thinks it can solve the power/heat problem with chiplets and code

CTO Mark Papermaster lays out the plan for the next two years

Samsung creates a group dedicated to inventing whatever comes next

Exec who led memory and battery businesses to global dominance gets the job of defining Chaebol's future

DoE watchdog warns of poor maintenance at home of Frontier exascale system

Report says new QA plan currently being worked up

Greenpeace calls out tech giants for carbon footprint fumble

Net-zero promises or zero-net progress?

Researchers weigh new benchmarks for Green500 amid shifting workload priorities

Just because it's super efficient at Linpack doesn't mean it'll be in everything

Samsung UK discloses year-long breach, leaked customer data

Chaebol already the subject of suits for a pair of past indiscretions

Aurora dawns late: Half-baked entry secures second in supercomputer stakes

Half the machine, quadruple the anticipation for all-Intel super

AMD SEV OMG: Trusted execution in VMs undone by bad hypervisors' cache meddling

Let's do the CacheWarp again

As the Top500 celebrates its 30th year, with a $5 VM you too can get into the top 10 ... of 1993

But if you really care about performance, there are better options out there, natch

Intel drops the deets on UK's Dawn AI supercomputer

Phase one packs 512 Xeons, 1,024 Ponte Vecchio GPUs. Phase two: 10x that

Fujitsu says it can optimize CPU and GPU use to minimize execution time

Demos its Adaptive GPU Allocator as global shortage of geepies grinds on