In-Memory Computing, AI Draws Research Interest

Article By : Gary Hilson

As interest in AI and in-memory computing significantly increases, ReRAM could be the key to unlocking the ability to imitate the human brain.

As interest in artificial intelligence (AI) and in-memory computing significantly increases, resistive random-access memory (ReRAM) could be the key to unlocking their ability to imitate the human brain—yet challenges remain.

Last year’s IEDM corralled many recent research papers on advancing a wide array of memory types, both emerging and incumbent. Not surprisingly, a great deal of them were dedicated to how memory could improve in-memory computing, AI and machine learning (ML), and even mimic the human brain.

ReRAM has been synonymous with neuropathic computing, something Weebit Nano has expressed interest in pursuing for its technology; although it’s taking a backseat to the company’s other business priorities.

The University of Michigan, meanwhile, has been developing various ReRAM prototypes going back at least a decade. ReRAM offers high-density non-volatile storage and the potential for efficient in-memory computing, while ReRAM-enabled accelerators can solve the von Neumann bottleneck, explained Wei D. Lu, a professor of electrical engineering and the computer science department at the University of Michigan. His IEDM presentation outlined some of the devices as well as how parallelism could address both increasingly larger AI models and the power, latency, and cost requirements of edge computing applications.

CPUs that exploit parallelism still encounter memory bottlenecks. While GPUs allow for faster memory access, Lu said a new computing architecture that fundamentally improves throughput and compute efficiency is need. A memory protection unit (MPU) could dramatically increase parallelism and co-locate memory with logic, which allows for device-level computing and better facilitates in-memory computing.

An MPU could dramatically increase parallelism and co-locate memory with logic, which allows for device level computing and better facilitates in-memory computing (Image courtesy of University of Michigan)

ReRAM’s potential for in-memory computing lies in using a ReRAM array as a computing fabric, Lu said, as it can natively perform learning and inference functions. A ReRAM also supports bidirectional data flow, while larger neural networks could be implemented using a modular system with a tiled MPU architecture to achieve higher throughput.

Addressing ReRAM Challenges

There are several key challenges with ReRAM devices, however. For one thing, high-precision analog-to-digital converter-based readout circuits pose a significant challenge, while performance can suffer from device non-idealities including cell-cell variations. A third challenge is the nonlinear and asymmetric conductance update observed in ReRRAM devices can severely degrade training accuracy, Lu said.

Potential solutions to the first problem include multi-range quantization and binary neural networks. Training that is architecture aware can address the performance issues caused by device non-idealities, as well as implementing binary weights with a 2T2R architecture, which also helps with the third challenge, Lu said. Mixed precision training could also address the second and third challenges because it offers significant performance and computational boost by training large neural networks in lower precision formats.

Irem Boybat

Phase change memory (PCM) is also a candidate for improving in-memory computing. IBM Research Europe has been exploring the use of PCM to address the temperature sensitivity of analog in-memory computing. As outlined by Irem Boybat, a member of the in-memory computing group at IBM Research, there is a computational efficiency problem as neural networks for AI flourish. Deep learning is computationally intensive, and if the ongoing “AI revolution” is to be sustainable, it’s essential to embrace disruptive computer paradigms.

“The language models are growing exponentially in size,” Boybat said. This involves transporting vast amounts of data from memory to the processing unity, which is expensive and leaves a large carbon footprint, according to Boybat.

Analog in-memory computing blurs the line between memory and processing by performing certain computation tasks within the memory itself and is achieved by exploiting the physical attributes of memory devices. PCM is a promising candidate for in-memory computing because it can store information in a very dense manner and consumes negligible static power, Boybat said. IBM Research has recently demonstrated two PCM-based in-memory computing chips in the past year.

Temperature sensitivity continues to be area of research for the team, with mushroom-type PCM being used to study retention. A resistive heater and temperature placed under the chip has shown there are no retention issues expected in the 30 to 80 degrees Celsius range. IBM Research’s experiments investigated the impact of temperature variations and drift on multi-level PCM as employed for in-memory computing.

Supported by the IBM Research AI Hardware Center, the research team found that although PCM exhibits a conductance-dependent temperature sensitivity, normalized distributions of the conductance states remain relatively constant across the applied time-temperature profile. The researchers developed a reliable statistical model to capture the impact of the temperature on drift and conductance and verified it against PCM conductance measurements.

Using more than one million PCM devices, they demonstrated it was possible to achieve and maintain high inference accuracies for a variety of networks under ambient temperature variations ranging from 33 to 80 degrees Celsius using a simple compensation scheme.

Mimicking the human brain

Another popular area of research that goes beyond in-memory computing is creating neural networks that are more in line with the human brain. Work around ReRAM-based brain-inspired computation (BIC), as presented by Ming Liu on behalf of many researchers at China’s Institute of Microelectronics of the Chinese Academy of Sciences and Fudan University, is being driven by the unprecedented pace of AI computing usage, which is doubling every three months, Liu said.

BIC will circumvent the von Neumann bottleneck in the mid and longer terms. (Image courtesy of China’s Institute of Microelectronics of the Chinese Academy of Sciences)

Increased use of AI computing makes brain-inspired hardware critical to sustain development. While new memory technologies can enhance existing hierarchies in the near term, BIC will circumvent the von Neumann bottleneck in the mid to longer terms; BIC encompasses computing in memory and neuromorphic computing.

Understanding BIC requires distinguishing between the algorithms of AI: the neural networks of computer science and those of biology and neuroscience. An Artificial Neural Network (ANN) processes continuous signals in the space domain, while a Spiking Neural Network (SNN) is more bio-possible in that it mimics how the brain works. ReRAM provides an ideal platform for BIC because of its rich switching dynamics that could support large-scale integration, low power periphery and applications specific architectures for building out BIC chips and systems, Liu said.

Soon, integrated SNN multicores will be possible, following on more than a decade of research at many other institutions beginning with emulating behaviours, Liu said. The computing density and energy efficiency of ReRAM SNNs offer great potential for high performance, and chips that combine event-driven representation and integrated multicores that can perform at lower power will be a reality. However, there’s still a lot of exploration at the architectural level to be done to develop BIC chips for real-life applications.

The characteristics of ReRAM make it a popular candidate for AI and applications where the goal is to mimic the human brain. But IEDM gave as much attention to magnetoresistive random-access memory  with a full day of sessions and two IEEE Magnetics Society events at IEDM 2021 in recognition of the relationship between the microelectronics and magnetism communities to drive advancements forward.

On the ferroelectric random-access memory (FRAM) front, CEA-Leti announced what it claims is the world’s first demonstration of 16-kbit arrays at the 130nm node, bringing it closer to commercialization. The ultralow-power, fast, high-endurance, and CMOS-compatible BEOL FRAM memory uses a new HfO2-based ferroelectric material that is also more environmentally friendly than PZT because it is lead-free.

Potential use cases include embedded applications such as internet of things (IoT) devices and wearables. The work was supported by the EU’s 3eFERRO Consortium project that was designed to produce new ferroelectric material that makes FRAM a competitive non-volatile memory candidate for IoT applications.

And even though a lot of IEDM research papers gravitated toward using emerging memories in bleeding edge applications such as AI, neuromorphic computing, and in-memory computing, advancing incumbent memories such as dynamic random-access memory remains a strong focus of many researchers.

Intel presented numerous papers via the IEDM event that covered improvements to scale and bring new capabilities to silicon. Intel’s Components Research outlined efforts around the design, process, and assembly challenges of hybrid bonding interconnects, presenting a vision for a more than 10x interconnect density improvement in packaging. This follows Intel’s announcement in July regarding the introduction of Foveros Direct, which enables sub-10-micron bump pitches, providing an order of magnitude increase in the interconnect density for 3D stacking.

Other papers looked at how Intel is addressing the anticipated post-FinFET era with an approach to stacking multiple CMOS transistors that aims to achieve a maximized 30 to 50 percent logic scaling improvement for the continued advancement of Moore’s Law by fitting more transistors per square millimeter. Another effort to advance Moore’s Law includes the coming angstrom era through research demonstrating how novel materials just a few atoms thick can be used to make transistors that overcome the limitations of conventional silicon channels — enabling millions more transistors per die area.

Intel also outlined research focused on bringing new capabilities to silicon by integrating GAN-based power switches with silicon-based CMOS on a 300mm wafer, which will enable low-loss, high-speed power delivery to CPUs while simultaneously reducing motherboard components and space.

This article was originally published on EE Times.

Gary Hilson is a general contributing editor with a focus on memory and flash technologies for EE Times.

 

Subscribe to Newsletter

Leave a comment