Scaling and Low Power Requirements for Neuromorphic Computing

Article By : Chris Eliasmith and Terry Stewart

Practical neuromorphic hardware will require major reductions in power consumption...

A decade hence, our interactions with our digital environment will be much different than today. Agents that track the context of a conversation or touchless interfaces to devices—relatively rare today beyond smart speakers—could be ubiquitous ten years from now. By then, we suspect private tracking of complex conversations through sophisticated touchless interfaces will be a standard function of the intelligent agents in our homes and offices.

The popular press frequently hypes such a future. From an engineering standpoint, the immediate goal is making our current AI technologies efficient enough to transform hype into reality. Many current technologies that seek to address these challenges require enormous amounts of power, running multi-billion parameter models on power-hungry GPUs.

Such resource use is simply not sustainable if we want billions of people to benefit from these technologies.

We call our proposed solution Semantic Pointer Architecture Unified Network, or Spaun. The more sophisticated the model we build, the more computationally demanding it gets. One way to quantify this is to consider how much power would be used if we scaled these systems to the size of the human brain.

Our large-scale brain model, Spaun, has about 6 million neurons and 20 billion effective connections. This is much larger than typical neural networks, and even bigger than the largest such networks, like Google’s recent Meena architecture (with 2.6 billion connections).  Spaun is large because it performs many different tasks, including object recognition, ordered list memorization, taking intelligence tests and following instructions, among others. Unlike most artificial network models, Spaun can switch tasks on-the-fly. It does so by using principles derived from the brain. All of its parts reflect processing of specific brain areas, and its neural and high-level behaviour mirror human and animal performance of the same tasks.

Spain performing an instruction-following task. The task stages are listed on the bottom. The thought bubbles show spiking activity and interpretations of that activity, demonstrating what is being represented. The colored patches show time-averaged activity of neurons in particular brain areas during the task.

Our main goal is to include more brain areas, and more sophisticated behaviour. To do this, we need even more neurons and connections.

Brain Power
This raises a significant energy problem. The human brain has about 100 billion neurons and hundreds of trillions of connections. Based on Spaun’s power usage, we estimate that a network the size of a single human brain would take half a gigawatt of power using today’s GPUs. That equivalent to the power produced by an average nuclear power plant. Obviously, we can’t continue to run Spaun on GPUs and scale up its size and performance.

Fortunately, modern power-sensitive optimization methods, along with neuromorphic hardware, demonstrate it is possible to greatly reduce power consumption.  We have had the opportunity to test several neuromorphic chips.  While all share design elements inspired by the brain, including processing with spikes, massive parallelism, and a focus on communication (on and between chips), each also has its unique characteristics.

Intel’s Loihi chip, for example, is digital and asynchronous. It  has been assembled into very large systems (100 million neurons). The BrainDrop developed by Stanford University is an analog/digital hybrid chip. Analog computation provides the promise of much lower power consumption in certain circumstances compared to digital. BrainDrop, meanwhile, demonstrates 100x reduction in power consumption compared to Loihi for some tasks, although the currently available chips are quite small in scale (a few thousand neurons).

We’ve also worked with early designs of the SpiNNaker 2 chip developed by Dresden University. Our experience with SpiNNaker 2 leads us to expect performance comparable to the Loihi chip, although final silicon is still unavailable for direct testing. This chip does support both spiking and non-spiking networks, perhaps providing the best of both worlds.

What kind of performance can we expect from such chips? Using today’s technology, we have been able to show 100x power reductions compared to GPUs using neuromorphic devices. In particular, we have demonstrated audio, motor control, and visual processing on Intel’s Loihi chip , which mimics the architecture of the brain using electrical pulses known as spikes, whose timing modulates the strength of the connections between neurons.

We also achieved up to 100x power efficiency improvements across those tasks (admittedly most of these networks are modest by today’s standards, requiring at most 240,000 neurons).  We have also used other optimization techniques to decrease the memory demands of some of these networks by a further 20x, which should provide additional energy savings.

This suggests that we can approach a thousand-fold improvement using current techniques, (although we have not yet measured systems that combine these methods). Such improvements would allow current, large models to run in mobile devices, extremely large models to run efficiently on cloud computing infrastructure and for modest models to run for years at a time on a battery.

Sparsification
Development of practical neuromorphic hardware for efficient AI computation is ongoing. Based on our experience, several key research areas must be addressed.  First, scaling to larger models requires hardware that can seamlessly transition from one to many chips. Techniques for “tiling” chips must be pursued for systems with billions of neurons.

Temporal sparsification, the process of reducing the number and size of messages sent over a period of time, while ensuring that overall accuracy is unaffected, is critical to achieving energy-efficiency. For “pure” neuromorphic designs, communication consists of sending essentially one bit (a single on or off “spike”) at a time. Our recent explorations with multi-bit spikes have demonstrated that optimal designs can include a combination of single- and multi-bit messages.

Reducing the memory footprint of models also is critical, as is providing sufficiently fast and low power memory access, possibly with new memory technology such as MRAM.

Finally, our experience with analog chips has convinced us of their power efficiency for these kinds of applications, although commercial scaling remains as a significant challenge.

While much work remains to be done, we remain optimistic about future development if we continue to learn from the techniques embodied in neuromorphic hardware, pushing designs in the direction of replicating the power efficiency evident in nature.

For example, the Spaun model includes several techniques that have been shown to work on neuromorphic hardware, while solving some challenging AI problems.  Several of those are being explored to drive commercial applications of neuromorphic computing.

Other techniques include sparsity of connectivity and messaging along with analog and digital computation combined with massive, expandable parallelism.

–Chris Eliasmith, is a systems design engineer specializing in applied brain research at the University of Waterloo; Terry Stewart is an associate research officer with the National Research Council of Canada

Leave a comment