New processor outperforms ARM's ML core
SAN JOSE, Calif. — Cadence announced an inference core with up to four times the multiply-accumulate units and up to 12 times the performance of its Vision C5 launched last year. The DNA 100 core supports sparsity in weights and activations and can prune neural networks to deliver higher levels of performance.
To date, high-end smartphones have led the way in adopting deep learning for inference jobs with handset SoC vendors, such as Mediatek using Cadence’s Vision P6 core. Designers are now working on AI acceleration in SoCs for surveillance cameras, smart speakers, cars, and AR/VR and IoT devices, said Lazaar Louis, a senior director of product management in Cadence’s Tensilica group.
Cadence clocked a 16-nm DNA 100 with 4,000 MACs at up to 2,550 frames/second and up to 3.4 TMACs/W on ResNet-50. A single 16-nm core running at 1 GHz can deliver up to 8 TMACs (12 TMACs using network pruning), and multiple cores can be embedded in an SoC to hit hundreds of TMACs.
The numbers would appear to beat Arm’s first ML core that it said in May targets 4.6 tera-operations/second (TOPS) and 3 TOPS/W at 7 nm for high-end handsets.
To deliver the performance, the Cadence core packs an upgraded MAC block and a new custom DSP. It also automates the process of deselecting sparse weights and activations to maximize the use of the MAC array. In addition, it can prune neural nets so that users who opt to retrain them can gain further performance gains.
The core supports 8-bit integer as well as 16-bit floating-point and integer formats. It runs graphs created with TensorFlow and Android neural-net frameworks. Cadence is developing support for Facebook’s Glow compiler and associated PyTorch 1.0 framework and has on its roadmap plans to support Amazon’s MxNet and other frameworks.
The DNA 100 core will be available in December for select customers with general availability before April.
— Rick Merritt, Silicon Valley Bureau Chief, EE Times