Expects to add Caffe support in the future for its 20TOPs/W chip
SAN JOSE, Calif. — Battery-powered devices will get a new option for hardware-accelerated speech interfaces next year if Kurt Busch makes his targets this year. The chief executive of Syntiant aims in 2018 to sample a novel machine learning chip and raise a Series B to make it in volume.
The startup is designing a 20 tera-operations/watt chip using 4- to 8-bit precision to speed up AI operations initially for voice recognition. It uses an array of hundreds of thousands of NOR cells, computing TensorFlow neural network jobs in the analog domain.
Syntiant will release a reference design pairing its sub-watt chip with an Infineon MEMS microphone. If it is successful, the two will collaborate on other designs. “We want to make it extremely easy to add voice control to any kind of device,” said Busch.
“Today, the ecosystem is only supported by devices plugged into the wall. Nobody is offering an always-on, battery-powered solution…We can be a leader enabling that,” said Busch.
Syntiant is using a processor-in-memory architecture defined by its CTO, Jeremy Holleman, a researcher at the University of North Carolina in Charlotte. Holleman published academic work in the area as far back as 2014 at the International Solid-State Circuits Conference.
Another startup with academic roots, Mythic, is taking a similar approach using a 40nm Fujitsu NOR cell. But it appears to be targeting imaging applications more than speech and has Lockheed Martin as a partner for use of its chips in drones.
IBM Research is working on a similar architecture based on ReRAM. Today’s emerging MRAM, memristor and other memories are reigniting academic work in processor-in-memory chips that dates back to the 1990s.
The architecture is gaining attention because it is ideal for executing at very low power the massively parallel multiply-accumulate operations in deep learning. Syntiant and Mythic both claim they will process machine learning jobs at orders of magnitude less power than digital chips.
Startup leverages Google’s TensorFlow
An Achilles heel of the approach is it is hard to program the chips both due to their massive parallelism and their use of analog computing. To overcome the hurdles, Syntiant’s chip basically will act as “a silicon implementation of Google’s TensorFlow framework,” Busch said.
The company’s Syntiant Simulator is essentially an add-on for TensorFlow supporting the chip’s low-precision math and other unique hardware characteristics. Users will train neural nets on Google’s AI cloud service and download weights it produces to the chip via the simulator.
The downside of the approach is users will not be able to work in the many other AI frameworks, including those favored by Amazon, Baidu, Microsoft or others. However, Syntiant eventually expects to add support for other frameworks such as Caffe which is popular in China.
All the architectures face two other challenges. The designs are more difficult to port to new processes and they need to handle a basket of analog effects.
In addition, MRAM or other memories are expected to replace flash beyond the 28nm node. So, the startups face a potentially significant redesign early on in their road maps.
Syntiant aims to start with a microwatt-class device handling perhaps a few million weights and migrate to handling larger neural nets. However, it doesn’t expect the architecture will scale to handling training jobs or use in data centers.
“Our first device is supporting one [neural net] architecture for base functionality with an option to do two or three others. The first product targets speech and can do limited imaging jobs,” he said.
So far, the startup is staying mum on the types and sizes of neural nets it will support as well as its foundry and NOR supplier. Busch did say it is not using Intel’s foundry service but “a COTs flow and a merchant foundry.”
Intel Capital led the $5 million Series A round the startup closed in May to get to first samples. Busch is now working on a Series B round to fund general availability of the chip in 2019.
“It feels a lot like where we were in networking in 1995 or so,” said Busch, who started his career as a design engineer working on Ethernet and Token Ring chips.
“There was a lot of pent up demand. Deep learning is a powerful methodology, and the current CPUs and GPUs are not optimized for its needs on parallelism and memory access,” he added.
— Rick Merritt, Silicon Valley Bureau Chief, EE Times