Combined with ongoing work on a training algorithm for analog neural networks, the result could be a huge reduction in power consumption.
Rain Neuromorphics has taped out a demonstration chip for its brain-inspired analog architecture that employs a 3D array of randomly-connected memristors to compute neural network training and inference at extremely low power.
Switching to entirely analog hardware for AI computation could allow a massive reduction in the power consumed by AI workloads. While some commercial chips currently use analog processor-in-memory techniques, they require digital conversion between network layers, consuming significant power. The limitations of current analog devices also means they can’t be used for training AI models since they are incompatible with back-propagation, the algorithm widely used for AI training. Rain’s aim is to build a complete analog chip, solving these issues with a combination of new hardware and a new training algorithm.
The company has changed direction on hardware over the last year. Analog computing chips use arrays of memristive elements, with commercial chips using memory cells such as flash. Rain previously used randomly deposited resistive nanowires, but has opted for resistive RAM (ReRAM) as the memristive element combined with 3D manufacturing techniques borrowed from NAND flash processes. The resulting configuration is based entirely on lithography.
The chip also uses technology developed for vertical bit lines in flash arrays, allowing ReRAM to scale to many-layered structures of memory (current flash arrays can be have more than 100 layers).
CMOS layers at the bottom of the chip represent neurons. Columns built using the vertical bit lines borrowed from flash arrays represent axons. The columns are then coated in a memristive material. Randomly configured dendrites link the axons, connecting neurons to each other (the point where a dendrite in any layer touches a column can be thought of as a synapse).
“It’s very brain-like, honestly, and we are quite proud of that,” Rain Neuromorphics CTO Jack Kendall told EE Times. “It’s the same structure as axons and dendrites in the brain. We don’t think you can get much better than this.”
Why are the dendrite connections randomly dispersed throughout the chip?
“The reason randomness is important is if you have a very large neural network, you want to maintain a certain level of sparsity,” Kendall said. “But which neurons do you pick? If you pick in a controlled way, in a lattice or a regular pattern, you’re introducing a huge amount of bias or assumption into how you think that information should be processed, but that’s contrary to the entire goal of learning.
“The goal of learning should be to discover that pattern.”
“Random” isn’t quite correct. In fact, “randomly” distributed dendrites are the same in each chip since they are defined by the lithography mask used to make each layer. Some level of structure may also be introduced. One strategy is to introduce more short- than long-range connections, similar to connections in the brain. Rain’s algorithm will eventually determine the best connection patterns, but for now the company is focused on two approaches.
One is known sparsity patterns, taking optimally pruned networks and replicating patterns in the lithographic masks. Another is to start with biological motifs, though this is harder. One reason is large sparse matrix multiplication (needed to assess the benefits of these biological motifs) is not suited to current AI accelerator hardware. Motifs including modularity and small-world-ness are on Rain’s to-do list.
“We know that these are all properties of biological brains,” Kendall said. “A lot of people have hypotheses about how they’re implemented in brains and what their function is. We can immediately begin testing a bunch of different patterns and we’ll be able to see what works and what doesn’t.”
Rain’s roadmap is developing chips with different sparsity patterns. That approach is similar to the brain, in which different parts process different tasks. Chips with sparsity patterns would be tailored for different AI workloads. The company also plans to advance work on nanowires since the technology allows depositing high-density memory devices in a single step.
Commercial processor-in-memory chips use memory technologies like flash or DRAM. Rain’s use of ReRAM is based on its endurance, power consumption, write speed and retention properties.
“We have hit a wall with flash in terms of performance… we can stack more layers, [but] that’s not really the issue for training neural networks,” said Kendall. “More important is the limited endurance at a fundamental level, and the speed required to switch the device, and to a lesser extent the power consumed during the read operation. Those three are the bottlenecks for implementing training.”
Kendall noted that ReRAM also can scale down further at advanced process nodes, giving it a density advantage. That, and the potential to improve endurance beyond flash.
The chip taped out this year has demonstrated Rain’s architecture in 180-nm CMOS, with 10,000 neurons. The team has performed memristor weight updates (training) and matrix multiplication (inference). The company’s training algorithm was more than three times faster than on a SONOS flash array, which the company claims is state-of-the-art accuracy.
CEO Gordon Wilson said Rain’s technology has the potential to scale to tens of million of neurons on a single chip by moving to 40-nm process technology, for example.
Rain’s new hardware will go hand-in-hand with the company’s algorithmic work. Last year, the company demonstrated that complete analog neural networks were indeed possible. The work is based on a technique called equilibrium propagation, which for the first time allowed training on analog processor-in-memory chips.
The initial version of equilibrium propagation works with energy-based models only. Those models are often compared to GANs (generative adversarial networks). The models generate more varied images than can GANs. Energy-based models also avoid mode collapse. However, they can tax computing resources because they require multiple loops through a network before establishing energy minima. By contrast, just one forward pass is required for inference, which is iterative. With analog hardware, this process happens very quickly, eliminating the computational cost.
Recent work has also demonstrated that an equilibrium propagation training algorithm can be used to train other network types as well as energy-based models.
“This all leads to a massive efficiency gain in our hardware, and we see that efficiency gain across all implementations and models that we are going to deliver,” said Wilson. “There are unique and special consequences for certain models like GANs, but we are seeing these efficiency gains everywhere from recommendation models and vision all the way up through generative models and more in the larger scale models.”
The company said its first-generation chips will support recommendation models, vision and sound models, as well as data center and “heavy edge” applications such as robotics. Rain will reduce power consumption associated with the scaling of AI workloads, Wilson added. Overall energy savings result from speedier computation as well as a reduction in overall power consumption
The demo chip combines a ten-fold reduction in power footprint with cuts in inference speed from hundreds of microseconds to hundreds of nanoseconds. Together, the result could be thousand-fold reduction in energy consumption compared to GPU solutions, the company claims.
While Rain’s architecture has been demonstrated in silicon, “We still have a fair amount of engineering work ahead,” Wilson acknowledged. “But we’ve eliminated the scientific risk in terms of the question of whether it’s possible.”
The first Rain chips will deliver 125 million INT8 parameters for vision, speech, natural language processing and recommendation workloads, consuming less than 50 W. The company expect samples to be available in 2024, with silicon ready for commercial shipment in 2025.
This article was originally published on EE Times.
Sally Ward-Foxton covers AI technology and related issues for EETimes.com and all aspects of the European industry for EE Times Europe magazine. Sally has spent more than 15 years writing about the electronics industry from London, UK. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more. She holds a Masters’ degree in Electrical and Electronic Engineering from the University of Cambridge.