Samsung Expands PIM Ambitions

Article By : Gary Hilson

Samsung will bring HBM-PIM memory technology to more use cases, including power-hungry artificial intelligence and machine learning applications.

Samsung Electronics Co., Ltd. announced another step in making processing-in-memory (PIM) technology more mainstream. The first successful integration of its PIM-enabled High Bandwidth Memory (HBM-PIM) into a commercialized accelerator system is part of a vision for incorporating PIM technologies into other memory types.

Building on the news earlier this year of its Aquabolt-XL, which incorporates the AI processing function into the company’s HBM2 Aquabolt, Samsung showcased its latest advancements with PIM technology at Hot Chips 33, including the testing of the HBM-PIM in the Xilinx Virtex Ultrascale+ (Alveo) AI accelerator, where it delivered an almost 2.5x system performance gain as well as more than a 60% cut in energy consumption.

In a pre-briefing roundtable for media and analysts, prior to the conference, Nam Sung Kim, senior vice president of Samsung’s memory business unit, outlined how the company sees the Aquabolt-XL with an HBM2-PIM being used for machine learning (ML) accelerators and other artificial intelligence (AI) applications. He said the rapidly growing memory bandwidth demand for emerging ML and AI applications has become more expensive and power-hungry due to various physical and thermal constraints. “Because of the limited number of PCB wires and the chip packages, along with the power and some constraints of those chip packages, it’s getting really hard and expensive to keep increasing the bandwidth.”

Nam Sung Kim

Kim said by bringing the processor closer to memory, PIM can improve performance and energy efficiency of memory-bound workloads. While PIM isn’t a new idea, a critical barrier to widespread industry adoption has been the necessary changes to the host processor and/or application code. The Aquabolt-XL aims to address those hurdles by being architected to be drop-in-replacement of HBM2 and fully compatible with JEDEC-compliant HBM2 memory controllers, and Samsung is offering a software solution so that no application source code changes are required if the code is based on TensorFlow, he said. “We have demonstrated that a commercial processor integrated with Aquabolt-XL can run on TensorFlow-based code without any change.”

Aquabolt-XL is aimed at memory-bound workloads with low arithmetic intensity, such as speech recognition and natural language processing, but Kim said it’s not meant to compete with the machine learning in the AI accelerators. Instead, it’s designed to complement the compute capability of the processors. “We can improve performance and the efficiency of systems for a wide range of workloads.”

Samsung’s Aquabolt-XL HBM2-PIM is architected to be a drop-in-replacement of an HBM2 and fully compatible with JEDEC-compliant HBM2 memory controllers. (Image source: Samsung)

Samsung is also looking beyond HBM for PIM applications. The company is using PIM to bring processing to the DRAM module itself in the form of its acceleration DIMM (AXDIMM), said Kim. This minimizes large data movement between the CPU and DRAM to boost the energy efficiency of AI accelerator systems. Because the AI engine is built inside the buffer chip, the AXDIMM can perform parallel processing of multiple memory ranks (sets of DRAM chips) instead of accessing just one rank at a time.

The goal is to retain a traditional form factor so that the AXDIMM can be dropped in as a replacement without the need for any system modifications. “We also provide necessary software stack, including Python APIs, libraries, and device drivers, to support the acceleration of applications such as deep learning recommendation models that require both high processing bandwidth and memory capacity.”

Like the HBM-PIM, Samsung’s AXDIMM can be dropped in as a DRAM replacement without the need for any system modifications and has an AI engine built inside the buffer chip to perform parallel processing of multiple memory ranks. (Image source: Samsung)

Kim said it’s being tested on customer servers and has shown to offer approximately twice the performance in AI-based recommendation applications and a 40% decrease in system-wide energy usage.

Wider target applications for Samsung’s PIM technology are to create a unit that supports multiple functions in a variety of DRAM types, including HBM3, DIMM-DDR5, GDDR6, and LPDDR5. Kim said the latter could help to bring AI to the edge inside a variety of endpoint devices without the need for data center connectivity. Samsung has done simulation tests demonstrating that the LPDDR5-PIM can more than double performance while reducing energy usage by over 60% when used in applications such as voice recognition, translation, and chatbot.

The PIM concept has been around for a while, and what Samsung has developed with Aquabolt does offer additional options in some computing environments, said Bob O’Donnell, president and chief analyst at TECHnalysis Research. However, for now, it appears to be for specific inference workloads that will require some software support. “It’s certainly an intriguing concept. The numbers certainly look impressive in terms of the performance and the power versus traditional solutions.”

Bob O’Donnel

The potential need for software changes is the “gotcha,” he said, which are usually present with new chip architectures — what can be done with existing software is often limited. Having to do a lot of recompiling to take advantage of the performance leaps can potentially be an Achilles’ heel for Samsung’s HBM2-PIM. “Given those limitations, it certainly seems like they’re getting a lot closer to making it more of a commercial reality.”

O’Donnell said there are potential customers who are building massive AI and ML models that are on the cutting edge and already doing a great deal of custom work anyway because they need to get every bit of performance and capability out of the system, so they will do the custom software work. “It’s never going to become mainstream, not in its current kind of shape and form.”

A likely customer for Aquabolt-XL might be hyperscalers, who tend to pull together different processors, storage media and memory together in different ways to address specific workloads, said O’Donnell, so they might spin up a small number of them. “Once this is commercially viable, we’ll see how many people actually end up using it.” In the short term, he said, academic institutions and scientific researchers tend to be ones doing the really big AL models that would take advantage of an HBM-PIM “Conceptually, it points to the way to new types of computing models for more general-purpose computing. We’re a long way off from that, but this is how this stuff starts.”

This article was originally published on EE Times.

Gary Hilson is a general contributing editor with a focus on memory and flash technologies for EE Times.


Subscribe to Newsletter

Leave a comment