“If you are not using sparsity and compression at this point, you are behind the curve”
SAN FRANCISCO — Samsung described a new neural-network accelerator for smartphones that matches blocks from rivals such as Huawei. Toshiba detailed one for self-driving cars that pulls ahead of competitors such as Intel’s Mobileye at the International Solid-State Circuits Conference here.
A 5.5-mm2 block in the latest 8-nm Exynos chip delivers 1.9 tera-operations/second (TOPS) using 8-bot precision running at up to 933 MHz, said Jinook Song, a Samsung AI engineer. That’s about the rating for the latest Kirin processor in Huawei phones and the latest commercial IP blocks.
However, the block hits performance of 6.937 TOPS when a neural net allows pruning of up to three-quarters of its weight. The chip delivers a range of 4.5 to 11.5 TOPS/W when consuming from 39 mW at 0.5 V to 1.553 W at 0.8 V.
Samsung detailed its 8-nm Exynos deep-learning block and its performance. Click to enlarge. (Source: ISSCC)
Like mobile architectures from Cadence, Ceva, and Nvidia, the Samsung chip makes heavy use of pruning and quantization, running 8- and 16-bit operations to optimize for efficiency and network sparsity. “If you are not using sparsity and compression at this point, you are behind the curve” said Mike Demler, an analyst from the Linley Group who was attending the session.
It’s not clear if the implementation is a full dataflow architecture, another trend implemented in the latest IP blocks, Demler said.
The Samsung design appears in the latest Exynos chip and is expected to be used in at least some new handsets that the South Korean giant is announcing as early as this week. To enhance parallelism, it uses two cores, each with two data-staging units sharing 512-KByte scratch pads.
Toshiba pulls ahead of Mobileye but not Nvidia
Toshiba described an accelerator for advanced driver-assistance systems that delivers up to 20 TOPS and 2 TOPS/W, up from 1.9 TOPS and 564 GOPS/W with its 2015 chip.
Two new image-processing cores carried most of the work in the new version. Each block processes 8-MPixel images and up to 40 frames/second.
The 94.5-mm2 chip was made in a 16-nm node. It added to the 2015 design four DSPs, believed to be NeuPro cores from Ceva, as well as three new specialty accelerator blocks.
The image processors in Toshiba’s latest ADAS processor pack a punch. Click to enlarge. (Source: ISSCC)
“I was surprised; there is a lot in this chip, and with four levels of processing, it looks pretty complete,” said analyst Demler.
For example, the Toshiba design packs eight Cortex-A53 cores and two Cortex-R4s, compared to four MIPS cores and no R4 equivalents on the latest Mobileye chips. In addition, the Mobileye chips run at 5 W in 7 nm, making them more power-hungry and expensive than the Toshiba SoC, he said.
However, it was not clear if the 2.7-W rating for the Toshiba design included the entire SoC. Also, Mobileye supports LiDAR and radar. It was not clear what sensors the Toshiba chip supports beyond cameras, and Nvidia’s Xavier design is likely more powerful, he added.