Edge computing will increasingly become an integral part of the digital transformation phenomenon. The main benefits deriving from the use of these technologies are the reduction of processing latency, which allows real-time responses, and the saving of bandwidth, sending already processed and, therefore, smaller information to the data center.

Compared to GreenWaves Technologies’ currently shipping product, GAP8, the latest GAP9 reduces energy consumption by 5 times while enabling inference on neural networks 10 times larger. GAP9 is built on the same GAP architectural layout of the GAP8. GreenWaves' GAP8 is the industry's first ultra-low-power processor designed to transfer high edge processing capacity with particularly low costs and power consumption. It enables battery-powered artificial intelligence (AI) in the Internet of Things (IoT) applications.

“There's a lot of interest at the moment in moving AI processing to the edge for applications such as ADAS, security cameras, robots and phones. We are focusing on what we believe will be the next wave of edge devices, at what we call the very edge. These are the devices that are extremely power constrained. For example, wearables with very small rechargeable batteries that need weeks or months of battery life or sensors where power is unavailable or expensive to install. If you have to install cables to power a sensor, the cost of doing this is often many times more than the cost of the sensor itself. And then some sensors may be very distant from mains power. They might be in a middle of a field or something like that where getting power to them is almost impossible. You need to start having some other alternate power source, like PV panels or batteries with an extremely long life,”said Martin Croome, VP of marketing at GreenWaves Technologies.

GAP8 is an IoT application processor based on the open-source RISC-V and PULP (Parallel Ultra-Low-Power Processing Platform) platforms. It is opening up new possibilities for the next generation of connected devices. GAP8 enables the cost-effective development of intelligent devices that capture, analyze, classify and act on a fusion of rich data sources such as images, sounds, radar, infra-red or vibrations. GAP8 works from few tens of milliwatts in active mode to a few microwatts in sleep mode – so devices can last for years on a battery. GAP8 is optimized to perform a wide spectrum of image and audio algorithms, including convolutional neural network inference with extreme energy efficiency.

“In GAP8, our first product which is now in production, we’ve used the extensibility of the RISC-V instruction set architecture to improve the energy efficiency. GAP8 has eight RISC-V cores in a cluster and then an extra ninth RISC-V core which we use as a controller, a bit like an MCU for the chip. Signal processing and machine learning algorithms are good targets for parallelization in the cluster. If you parallelize a calculation on eight cores you would hope to get as close as possible to eight times the work done. This factor is known as speedup. We have focussed on getting very good speedup by implementing all of the task fork and synchronization primitives in hardware and a number of other optimisations. We then use the speedup that we achieve not to do more work, but to do the same amount of work at a lower clock speed. This in turn allows us to drop the voltage internally in the chip which gives us a quadratic power benefit. The chip integrates dynamic frequency and voltage control so we can save energy by continuously tuning the clock speed and voltage to the task that we are carrying out.” Said Martin Croome.

The hierarchical architecture allows very low power operation by combining a series of highly autonomous intelligent I/O peripherals for connecting external devices. A cluster of 8 cores with an architecture optimized to execute vector and parallelized algorithms combined with a specialized convolution neural-network accelerator (HWCE). All cores and peripherals can be switched in power, voltage, and frequency as required. DC/DC regulators and clock generators with ultra-fast reconfiguration times are integrated to optimize power management. Cluster and HWCE cores share access to a memory area and instruction cache. Multiple DMAs allow for fast, low-power, stand-alone transfers between memory areas. A memory protector is included to allow applications to run safely on GAP8.

The new GAP9 adds security features for AES128/256 cryptography, and a full support for floating-point arithmetic across all cores based on an innovative transprecision floating-point unit capable of handling floating-point numbers in 8, IEEE 16, BFloat16, and IEEE 32-bit precisions with support for vectorization. GAP9 handles sophisticated neural networks such as MobileNet V1 with ease processing a 160 x 160 image with a channel scaling of 0.25 in just 12ms with a power consumption of 806μW/frame/second. GAP9 provides 20 times increase of effective memory bandwidth compared to GAP8, enabling significant improvements in detection accuracy by simultaneously analysing streams of data from multiple different sensors such as images, sounds, and radar (figure 1).

GreenWaves GAP9

Figure 1: block diagram of GAP9

“GAP9 enables a new level of capabilities for embedding combinations of sophisticated machine learning and signal processing capabilities into consumer, medical and industrial product applications,” said Loic Lietar, CEO of GreenWaves Technologies. “The GAP family provides product designers with a powerful, flexible solution for bringing the next generation of intelligent devices to market.”

“We don't heavily use some kind of very esoteric architectural approach to accelerate just CNNs because we believe that that market is moving so fast. Every month there's a thousand new papers on the neural network space. And we want to make sure that we're able to accelerate all of those new things that are coming out this year and next year and not be something that's focused on last year's best idea. So we have a convolution hardware accelerator, which helps reduce energy consumption in some cases, but generally, architecture is programmable.”

Face Identification has attracted a lot of press relating to uses in security applications, and it is implemented using a SqueezeNet based Convolutional Neural Network (CNN). Convolutional Neural Networks (CNN) are a family of neural networks widely used in computer vision and, more generally, with data that have spatial relationships. The CNNs follow an architecture at levels, typically non-cyclical.

Face Detection can be activated using passive infrared (PIR) techniques to reduce power consumption further when no face is present. Once detected, the algorithm emits the coordinates of the detected image appropriately resized in an image of 128 x 128 pixels. This image is the input for the face identification CNN. The output of the CNN is the signature of 512 parameters (16-bit size) of the detected face.

To speed up time-to-market, GreenWaves Technologies has designed the GAPuino board to facilitate the implementation of GAP8. The GAPuino can be used as a replacement for a standard Arduino Uno board and can be connected to most Arduino Uno compatible 3.3V or 5V shields. By adding an Arduino communication shield (BlueTooth, WiFi, LoRa...), it is be possible to prototype complete IoT applications with AI on battery-powered devices quickly (figure 2).

fig 2 Greenwaves

Figure 2: GAPuino

Several accessories are available such as the module with the B&W low power QVGA camera and a sensor board that adds an articulated set of sensors including four digital microphones for audio applications. For GAP8 processors, there is also a complete SDK, which includes a RISC-V GCC/GDB chain tool with extensions for additional processor instructions, tools for the MCU side with two operating systems (PULP OS and the Mbed OS port on RISC-V/GAP8).