Intel presents quantitative benchmark results for its Loihi neuromorphic chip for the first time. It intends to lead the way towards formal industry-wide neuromorphic benchmarks⋯
At Intel Labs Day today, Intel presented a summary of performance results comparing its neuromorphic chip, Loihi, to classical computing and mainstream deep learning accelerators for the first time. The results show that while Loihi may not offer much advantage over other approaches for feed-forward neural networks, big latency and power efficiency gains can be achieved for other workloads, such as recurrent neural networks. Intel hopes this first set of quantitative results will pave the way towards developing a formal neuromorphic benchmark for all types of neuromorphic hardware.
“After several decades of neuromorphic research, there have been lots of promises made about amazing AI capabilities, huge breakthroughs in efficiency, but very little published quantitative results to show if that’s truly the case, and if it is, where exactly do we get these gains?” Mike Davies, director of neuromorphic research at Intel, told EE Times.
“That’s been a mission in our research program,” he continued, “before we try to hastily speed the technology towards commercial applications, we’re taking a methodical measured research approach, where we try to first understand which of the many different directions that one can take in terms of neuroscience inspiration, will actually yield the most compelling results.”
Deep learning comparison
Is it really possible to make a meaningful comparison between results from neuromorphic chips and other computing hardware? Neuromorphic hardware is typically demonstrated running “exotic” algorithms such as spiking neural networks, which are very different to the types of algorithms found in deep learning.
“There’s confusion [about] neuromorphic research, because there’s an overlap between what we can run on a neuromorphic chip like Loihi, and what these deep learning models do,” Davies said. “In multiple respects, there’s ways that we can extract the learnings from the deep learning community and import them into the neuromorphic world.”
The Intel Neuromorphic Research Community (INRC), a community of more than 100 companies exploring neuromorphic computing using Intel’s Loihi hardware, has been able to run deep learning algorithms on Loihi as part of this work. Algorithms might be existing deep learning networks trained in the normal way, then converted to a format that Loihi can use, so they can be benchmarked. This is one method, but in fact there are several other ways deep learning algorithms can be run on Loihi (area 1 in the diagram below).
One is to use back propagation, one of the algorithmic techniques that has made deep learning so successful as it enables the fine-tuning of weights during training. The type of network often run by neuromorphic chips, spiking neural networks, can be formulated into a mathematically differentiable form, which allows back propagation to be applied in order to optimize the result.
Another option is to try to perform back propagation on-chip, which would be the equivalent of how neural networks are trained today (offline), but used to train incrementally in the field based on acquired data.
Intel plotted performance (latency and power) results on a graph (below) from papers published by members of the INRC that included quantified comparisons between Loihi and CPUs, GPUs, the Movidius Neural Compute Stick, or IBM’s neuromorphic technology called TrueNorth. All the results are for applications where data samples arrive one by one (batch size of 1), similar to real-time biological systems.
“Each one of these [data points] is quite a bit of work and that’s why there’s not been very much of this done to date in the neuromorphic field,” Davies said. “It’s incredibly hard to get these measurements, find the right baseline comparison point, and really do this rigorous work. But we’ve been pushing our collaborators to do this because it’s very exciting to have such a plot.”
The size of each point on the graph represents the size of the network; larger markers use more Loihi chips, with the biggest representing 500+ chips). These Loihi systems are compared to single computing subsystems (single CPU/GPU plus memory). It’s not easy to make apples-to-apples comparisons, Davies said, because CPUs can add DRAM to help scale up, whereas Loihi can only add more Loihi chips.
Could more compute chips in each system improve CPU and GPU results?
“For this size of network, it’s unlikely,” Davies said. “The small data points that dominate this plot are all very small networks by conventional standards… in general, for the types of problems that we’re looking at, they don’t paralellize well in that way. The reason why the Loihi implementation scales well is because there’s very fine scale parallelism, and the communication is happening on a scale of microseconds between the neurons, and the architecture is able to deal with that.”
Parallel communication at very fine granularity is fundamental to Loihi’s architecture. Conventional architectures separate coarse grained blocks of work in order to parallelize workloads; for deep learning, this is often done by batching. This technique wouldn’t help here, Davies said, since the critical metric is latency to process individual data samples.
The key insight from the results obtained so far is that Loihi offers little to no performance advantage for feed-forward networks, a type of neural network widely used in mainstream deep learning because they are easier to train on conventional deep learning accelerator hardware (see graph below).
“It’s quite remarkable that the data points split out so cleanly in this way, that the feed forward networks provide the least compelling gains, and in some cases, Loihi is worse,” Davies said.
The best gains were achieved running recurrent neural networks on Loihi systems, where performance improvements of 1000 to 10,000x lower energy and 100x faster solution times were obtained.
Intel has taken the first step towards a neuromorphic benchmark by announcing it intends to open-source the software it uses for work like this. Open-sourcing this code will allow others to run the same workloads on their neuromorphic platforms, and lower the barrier to entry into neuromorphic computing and the INRC.
“We really are excited to be able to start comparing results that different groups are getting for their neuromorphic chips,” Davies said. “But for us, the priority initially has been benchmarking against conventional architectures to understand what we should put into a neuromorphic benchmark suite that we can then use to drive progress within the neuromorphic field.”
A big part of a future neuromorphic benchmark is understanding what types of algorithms should be included. For deep learning the candidates are more obvious – ResNet-50 is so widely used that it’s become a defacto benchmark, for example. There isn’t an equivalent in the neuromorphic space, as it’s more fragmented and the hardware is more algorithm-specific.
“I think it’s important that we establish actual methodologies, formalized benchmarks, drawing from this emerging class of workloads where we see benefits from neuromorphic hardware, and standardize there. But I think that’s the next step,” Davies said. “We certainly hope to lead the field in that direction… We’re not quite at that point yet; there’s still some further convergence that has to happen, particularly on the software end of things, to make that possible.”
With these results, Intel wants to show that Loihi can provide big performance gains across a range of complex, difficult brain-inspired workloads, even if it doesn’t quite know what those workloads will look like yet.
“At Intel, our goal more than anything else is to make sure that is a broad set of workloads,” Davies said. “We’re not looking to make a point accelerator for constraint satisfaction solving, or a robotic arm manipulator. We want this to be a new class of computer architecture, similar to the CPU or the GPU, but if is well optimized it will inherently run a broad class of brain inspired, intelligent workloads really well.”