Ambarella Targets AV Domain Controllers with Next-Gen AI Engine

Article By : Sally Ward-Foxton

The latest generation of the computer vision engine splits AI workloads and computer vision tasks to accelerate AV processing.

Illustrating the trend toward domain controllers in autonomous vehicles, Ambarella has launched its CV3 family of AV domain controllers designed to process up to 20 streams of image data at once. The new family of SoCs is based on the third generation of Ambarella’s CVFlow AI engine IP tailored for perception, multi-sensor fusion and path planning in L2+ to L4 vehicles.

As vehicle architectures move away from a single electronic control unit per feature towards zonal and larger, centralized domain controllers, and more vehicle functionality relies on compute-intensive AI processing, vehicle processors are growing rapidly. The flagship SoC in Ambarella’s new CV3 family includes an AI accelerator the company rates at 500 eTOPS (meaning its performance is equivalent to a 500-TOPS GPU). It also includes a vision processor, 16 ARM cores, a GPU and other hardware.

CV3 can connect and fuse information from multiple, long-range roof cameras, multiple surround-view, short-range cameras and multiple radar sensors with processing to spare for other vision processing tasks like driver monitoring.

Ambarella CV3 multi-sensor
CV3-High supports up to 20 high-resolution camera inputs. (Source: Ambarella)
Ambarella CV3 demo
CV3-High also can process multiple, large neural networks simultaneously, including object detection, segmentation and path planning. (Source: Ambarella)

Ambarella calls its design philosophy “algorithm-first”. CTO Les Kohn told EE Times the company studied hundreds of open-source networks, its own internal networks and customer algorithms used with its earlier platforms in designing the latest generation.

“We looked at hundreds of networks across all different types of architectures, and by doing that, we make sure that the architecture is flexible enough to handle all those different networks and still operate very efficiently,” Kohn said. “Of course, the challenge is how do you trade off flexibility and efficiency? But I think the key is to really study in detail the way these networks work.”

Overall, customer algorithms were sufficiently similar to allow acceleration with the same engine, he said.

Ambarella CV3 block diagram
Running on top of the CVFlow engine are 16 Arm Cortex A78 cores, a stereo and dense optical flow processor, an image signal processor, video codecs and a GPU. (Source: Ambarella)

Ambarella’s CV3-High SoC features an image signal processor capable of operating in challenging lighting and driving conditions. Also included are a stereo and dense optical flow accelerator for processing stereo cameras, 16 Arm A78AE cores, including a safety island, and video codecs. Finally, a GPU is used primarily for rendering visual representations of sensor output for parking assistance.

A third generation of the CVFlow accelerator engine is being implemented in the series for the first time. Contrasted with previous generations of the CVFlow engine, it consists of two blocks: a neural vector processor (NVP) to handle AI workloads and a general vector processor (GVP) with floating-point support. Computer vision workloads are offloaded from the NVP and floating-point workloads from Arm CPUs. For example, radar processing is handled by the GVP; perception is then performed by the NVP. Both blocks are based on in-house IP.

Splitting workloads between the NVP and new GVP allows the former to be further optimized for convolution and matrix processing.

Les Kohn
Les Kohn (Source: Ambarella)

“We have optimized the internal memory system and the interconnect between those systems to remove bottlenecks and improve efficiency,” Kohn said. “We also re-optimized all the data paths inside. So it’s not so much a fundamental change in architecture, but in reworking the details to eliminate bottlenecks and optimize for core network processing.”

The NVP version also adds operations common to advanced networks that are only just now beginning to be used for real-time applications, including graph networks and transformers.

The NVP also delivers 500 eTOPS 8-bit performance, or 1,000 eTOPS 4-bit performance (a more realistic scenario is a mix of precision used for different network layers, Kohn said). That represents a 42-fold performance boost over Ambarella’s second-generation SoC.

Future devices in the family will scale the size of the CVFlow engine, the image pipeline encoding and a mix of peripherals. Software will be transferrable across the CV3 family for use in entry-level, mid-range and premium vehicles.

Overall, CV3-High consumes about 50 W of power, or four times the performance-per-watt of previous generations. Those gains were achieved in part via a transition to 5-nm process technology.

The first SoCs in the Ambarella CV3 family are expected to be available for sampling during the first half of 2022.

This article was originally published on EE Times.

Sally Ward-Foxton covers AI technology and related issues for EETimes.com and all aspects of the European industry for EE Times Europe magazine. Sally has spent more than 15 years writing about the electronics industry from London, UK. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more. She holds a Masters’ degree in Electrical and Electronic Engineering from the University of Cambridge.

 

Subscribe to Newsletter

Leave a comment