Adapting the Microcontroller for AIoT

Article By : Sally Ward-Foxton

Will Arm continue to dominate going forward? Alternatives are springing up, and with the IoT set to expand, the market is set to expand as well.

What do you get if you cross AI with the IoT? The AIoT is the simple answer, but you also get a huge new application area for microcontrollers, enabled by advances in neural network techniques that mean machine learning is not limited to the world of supercomputers any longer. These days, smartphone application processors can (and do) perform AI inference for image processing, recommendation engines, and other complex features.

Bag Subway

An ecosystem of billions of IoT devices will get machine learning capabilities in the next couple of years (Image: NXP)

Bringing this kind of capability to the humble microcontroller represents a huge opportunity. Imagine a hearing aid that can use AI to filter background noise from conversations, smart home appliances that can recognise the user’s face and switch to their personalized settings, and AI-enabled sensor nodes that can run for years on the tiniest of batteries. Processing the data at the endpoint offers latency, security and privacy advantages that can’t be ignored.

However, achieving meaningful machine learning with microcontroller-level devices is not an easy task. Memory, a key criteria for AI calculations, is often severely limited, for example. But data science is advancing quickly to reduce model size, and device and IP vendors are responding by developing tools and incorporating features tailored for the demands of modern machine learning.

TinyML Takes Off
As a sign of this sector’s rapid growth, the TinyML Summit (a new industry event held earlier this month in Silicon Valley) is going from strength to strength. The first summit, held last year, had 11 sponsoring companies whereas this year’s event had 27, with slots selling out much earlier, according to the organizers, who also said that membership for their global monthly meetups for designers has grown dramatically.

“We see a new world with trillions of intelligent devices enabled by TinyML technologies that sense, analyze and autonomously act together to create a healthier and more sustainable environment for all,” Co-Chair of the TinyML Committee, Qualcomm’s Evgeni Gousev, said in his opening remarks at the show.

Gousev put this growth down to the development of more energy-efficient hardware and algorithms, combined with more mature software tools. Corporate and VC investment is increasing, as is startup and M&A activity, he noted.

Today, the TinyML Committee believes the tech has been validated and that initial products using machine learning in microcontrollers should hit the market in 2-3 years. ‘Killer apps’ are thought to be 3-5 years away.

A big part of the tech validation came last spring when Google demonstrated a version of its TensorFlow framework for microcontrollers for the first time. TensorFlow Lite for Microcontrollers is designed to run on devices with only kilobytes of memory (the core runtime fits in 16 KB on an Arm Cortex M3, and with enough operators to run a speech keyword detection model, takes up a total of 22 KB). It only supports inference (not training).

Big Players
The big microcontroller makers are of course watching developments in the TinyML community with interest. As research enables neural network models to get smaller, the size of their opportunity gets bigger.

Most have some kind of support for machine learning applications. For example, STMicroelectronics has an extension pack, STM32Cube.AI, which enables mapping and running neural networks on its STM32 family of Arm Cortex-M based microcontrollers.

Renesas has its e-AI development environment which allows AI inference to be implemented on microcontrollers. It effectively translates the model into a form which is usable in its e2 studio, compatible with C/C++ projects.

NXP said it has customers using its lower-end Kinetis and LPC MCUs for machine learning applications. The company is embracing AI with hardware and software solutions, albeit primarily oriented around its bigger application processors and crossover processors (between application processors and microcontrollers).

Strong Arm-ed
Most of the established companies in the microcontroller space have one thing in common: Arm. The embedded processor core giant dominates the microcontroller market with its Cortex-M series. The company recently announced the brand new Cortex-M55 core which is designed specifically for machine learning applications, especially when used in combination with the Ethos-U55 AI accelerator. Both are designed for resource-constrained environments.

Arm Cores for AI

Used in tandem, Arm’s Cortex-M55 and Ethos-U55 have enough processing power for applications such as gesture recognition, biometrics and speech recognition (Image: Arm)

But how can startups and smaller companies seek to compete with the big players in this market?

“Not by building Arm-based SoCs! Because they do that really well,” laughed XMOS CEO, Mark Lippett. “The only way to compete against those guys is by having an architectural edge… [that means] the intrinsic capabilities of the Xcore in terms of performance, but also the flexibility.”

While XMOS’ Xcore.ai, its newly released crossover processor for voice interfaces, will not compete directly with microcontrollers, the sentiment still holds true. Any company making an ARM-based SoC to compete with the big guys better have something pretty special in their secret sauce.

Scaling voltage and frequency
Startup Eta Compute released its much-anticipated ultra-low power device during the TinyML show. It can be used for machine learning in always-on image processing and sensor fusion applications with a power budget of 100µW. The chip uses an Arm Cortex-M3 core plus an NXP DSP core — either or both cores can be used for ML workload. The company’s secret sauce has several ingredients, but key is the way it scales both clock frequency and voltage on a continuous basis, for both cores. This saves a lot of power, particularly as it’s achieved without a PLL (phase locked loop).

Eta Compute ECM3532 block diagram

Eta Compute’s ECM3532 uses an Arm Cortex-M3 core plus an NXP CoolFlux DSP core. The machine learning workload can be handled by either, or both (Image: Eta Compute)

With viable competitors to Arm now out there, including the up and coming instruction set architecture offered by the RISC-V foundation, why did Eta Compute choose to use an Arm core for ultra-low power machine learning acceleration?

“The simple answer is that the ecosystem for Arm is just so well developed,” Tewksbury told EETimes. “It’s just much easier to go to production [with Arm] than it is with RISC-V right now. That situation could change in the future… RISC-V has its own set of advantages; certainly it’s good for the Chinese market, but we’re looking primarily at domestic and European markets right now with the ecosystem for [our device].”

Tewksbury noted that the major challenge facing the AIoT is the breadth and diversity of the applications. The market is rather fragmented, with many relatively niche applications commanding only low volumes. All together, however, this sector potentially extends to billions of devices.

“The challenge for developers is that they cannot afford to invest the time and the money in developing customized solutions for each one of those use cases,” Tewksbury said. “That’s where flexibility and ease of use become absolutely paramount. And that’s another reason why we chose Arm –because the ecosystem is there, the tools are there, and it’s easy for customers to develop products quickly and get them to market quickly without a lot of customization.”

After keeping its ISA under lock and key for decades, finally last October Arm announced that it would allow customers to build their own custom instructions for handling specialist workloads such as machine learning. This capability, in the right hands, may also offer the opportunity to further reduce power consumption.

Eta Compute can’t take advantage of this just yet since it does not apply retrospectively to existing Arm cores, so is not applicable to the M3 core Eta is using. But could Tewksbury see Eta Compute using Arm custom instructions in future generations of product to further reduce power consumption?

“Absolutely, yes,” he said.

Alternative ISAs
RISC-V has been getting a lot of attention this year. The open-source ISA allows the design of processors without a license fee, while designs based on the RISC-V ISA can be protected as with any other type of IP. Designers can pick and choose which extensions to add, and can add their own customized extensions.

French startup GreenWaves is one of several companies using RISC-V cores to target the ultra-low power machine learning space. Its devices, GAP8 and GAP9, use 8- and 9-core compute clusters respectively.

GreenWaves GAP9 block diagram

The architecture of GreenWaves’ GAP9 ultra-low power AI chip now uses 10 RISC-V cores (Image: GreenWaves)

Martin Croome, vice president of business development at GreenWaves, explained to EETimes why the company uses RISC-V cores.

“The first reason is RISC-V gives us the ability to customise the cores at the instruction set level, which we use heavily,” said Croome, explaining that the custom extensions are used to reduce the power of both machine learning and signal processing workloads. “When the company was formed, if you wanted to do that with any other processor architecture it was either impossible or it was going to cost you a fortune. And the fortune it was going to cost you was essentially your investor’s money going to a different company, and that is very difficult to justify.”

GreenWaves’ custom extensions alone give its cores a 3.6x improvement in energy consumption compared to unmodified RISC-V cores. But Croome also said that RISC-V has fundamental technical benefits simply due to being new.

“It’s a very clean, modern instruction set. It doesn’t have any baggage. So from an implementation perspective, the RISC-V core is actually a simpler structure, and simple means less power,” he said.

Croome also cited control as an important factor. The GAP8 device has 8 cores in its compute cluster, and GreenWaves needs very fine, detailed control over the core execution to allow maximum power efficiency. RISC-V enables that, he said.

“In the end, if we could have done all of that with Arm, we would have done all of that with Arm, it would have been a much more logical choice… Because no-one ever got fired for buying Arm,” he joked. “The software tools are there to a level of maturity which is far higher than RISC-V… but that said, there’s now so much focus on RISC-V that those tools are increasing in maturity very fast.”

In summary, while some see Arm’s hold on the microprocessor market as weakening, in part due to increased competition from RISC-V, the company is responding by allowing some customized extensions and developing new cores designed for machine learning from the outset.

In fact, there are both Arm and non-Arm devices coming to the market for ultra-low power machine learning applications. As the TinyML community continues to work on reducing neural network model size and developing dedicated frameworks and tools, this sector will blossom into a healthy application area that will support a variety of different device types.

Subscribe to Newsletter

Leave a comment