Hardware-Software Co-Design to Accelerate Binarized Neural Networks

Article By : Sally Ward-Foxton

BNNs, on threshold of commercialization, will drastically reduce model memory footprint for endpoint applications...

Two British firms have partnered to accelerate the adoption of binarized neural networks (BNNs), a technology that will drastically reduce memory footprint for AI models in endpoint applications such as voice control and person detection.

XMOS (Bristol, UK) and Plumerai (London, UK) will work together to combine XMOS’ crossover processor for voice-controlled IoT devices, Xcore.ai, with Plumerai’s Larq software library for training BNNs.

The adoption of BNNs, which reduce parameters to 1-bit numbers, requires both new neural network models and special hardware that can support the 1-bit operations. Xcore.ai is one of the first non-ASIC parts with native support for the 1-bit vector arithmetic required for BNN inference.

“We’re making deep learning tiny and computationally radically more efficient,” Roeland Nusselder, CEO of Plumerai told EETimes. “For this, we have been developing software for the most efficient form of deep learning, which is binarized neural networks.”

Plumerai, founded in 2017, employs 20 people spread between London, Amsterdam and Warsaw.

“[BNNs] need this hardware-software combination, this co-design. And that’s always been a chicken and egg problem,” Nusselder said. “There was no good hardware, that’s why there was no good software and vice versa. But now with this partnership with XMOS, that’s being solved.”

Neural Network
Binarized neural networks are ideal for endpoint applications as they reduce the memory footprint of AI models (Image: Shutterstock)

Binarized Networks
Binarized neural networks certainly hold plenty of potential. Apple acquired Seattle-based Xnor in January for exactly this technology in a deal reportedly worth $200 million.

BNNs are a very efficient form of deep learning that use single bit weights and activations (-1 or +1).  A deep learning model generally has tens of millions, or hundreds of millions of these parameters. While earlier work on neural networks used 32-bit floating point numbers for inference, 8-bit integer is the standard today in the industry as it is uses less memory and can be computed more efficiently. There is work being done on quantising to 4-bit levels, but why not drastically reduce the model’s memory footprint by going all the way down to 1-bit?

“It is much more efficient. But it’s challenging, because, crucially, you need new software algorithms to train binarized neural networks for you to really get the full potential out of binarized neural networks,” Nusselder said.

It isn’t simply a matter of taking an 8-bit model and converting it, or quantising it, to 1-bit.

“It’s trickier than that; you need to train [your model] from scratch in a binarized way,” Nusselder said. “That’s where our software and training algorithms come in.”

Typical quantisation (say, from 32-bit to 8-bit) results in a reduction of parameter precision and a corresponding loss in prediction accuracy, so there is a tradeoff between optimising for compute efficiency and maintaining the level of prediction accuracy required by the application. This also applies to BNNs.

“There used to be a huge loss in accuracy, the first BNN models were at a much lower accuracy than 8-bit or 32-bit models,” Nusselder said. “But this degradation in accuracy has been completely removed. The most straightforward way to do it is to enlarge your binarized neural network. By making your BNN slightly larger than [the equivalent] 8-bit deep learning model, you can make up for this loss in accuracy. But obviously, if you enlarge your BNN too much, then it also becomes slower. This is what we’ve been doing a lot of work on, to enlarge in such a way that there’s no accuracy degradation, but it’s still faster than the original 8-bit deep learning model.”

This enlargement might mean doubling the number of parameters in the model, but considering the memory requirement has already been reduced by a factor of eight, the improvement is still roughly a factor of 4.

Training BNNs while maintaining prediction accuracy and minimizing model size is Plumerai’s specialty. The company’s Larq software library is made up of open-source Python packages for building, training and deploying BNNs. Plumerai also offers trained BNN models ready to use in applications such as person detection in security and HVAC systems.

Hardware Instructions
XMOS’ Xcore.ai, hardware designed to natively support 1-bit vector arithmetic, is the missing piece of the puzzle. While standard microcontrollers have dedicated instructions for 32-bit/8-bit multiply-accumulate operations, most have no specific instructions for working with 1-bit numbers. In 1-bit, multiply can be done efficiently by the XNOR operation, and addition can be done with population count (popcount).

Nusselder’s view is that all microprocessors and microcontrollers designed for AI acceleration will eventually include specialised instructions for BNNs as they gain importance.

“[BNNs are] the future for efficient deep learning, I’m convinced of it,” he said. “BNNs will start with cheap, simple, low power tasks, with microcontrollers in the tinyML domain. From there, BNNs will eventually take over all other deep learning applications, including self-driving cars.”

Training algorithms for BNNs are improving rapidly and gaining attention.

“Before the end of last year, most of the papers were coming from Chinese companies… but now more companies in the US and Europe are starting to work on it,” Nusselder said. “The field keeps on accelerating, algorithms keep on improving, software keeps on improving… and now with XMOS’ chip, it is going to be quite exciting because [BNNs] can start to be commercialised.”

Leave a comment