AI Accelerator Targets Video Analytics at the Edge

Article By : Sally Ward-Foxton

The US AI accelerator startup, one of the first to talk about its compute-in-memory technique, has launched a 4-W chip for video analytics at the edge...

AI accelerator chip startup Mythic has launched its first product, a 35-TOPS “high-end edge” accelerator whose analog compute-in-memory architecture enables low power consumption and low cost, alongside low latency and deterministic behaviour.

The M1108 uses Mythic’s analog compute-in-memory technique based on 40-nm Flash memory cells. It’s aimed at edge applications such as power-over-ethernet security cameras that need to run sophisticated AI models within a power budget. Another likely application is video analytics boxes which need to accelerate multiple AI models on high-resolution footage while managing power and heat.

Mythic M.2 card
Mythic M.2 card (Image: Mythic)

Mythic’s AI accelerator chip is capable of a substantial 35 TOPS. The market leader in this space, Nvidia’s Xavier AGX, clocks in at 32 TOPS.

The M1108’s typical power consumption is just 4 W (compared to Xavier AGX’s typical 10-30 W), Mythic’s solution is smaller since no external DRAM is required, and Mythic expects the M1108 to compare well on price, since its 40-nm silicon doesn’t require a cutting-edge process node.

Mythic says its M1108 can run 870 fps on ResNet-50 (batch size 1) and 60 fps on Yolo v3-608×608 (batch size 1 video feed).

Speaking during the CogX festival in June 2019, Mythic CEO Mike Henry said that the company was planning to release samples around the end of that year. This puts today’s M1108 launch almost a year behind schedule. In an interview this week with EE Times, Henry put this delay down to the “Herculean” effort behind the scenes to build a software flow for the chip.

“We decided to hold off on formally launching [the M1108] until we could show highly competitive benchmark numbers,” Henry said. “Building the whole software ecosystem with a flexible compiler on a tile-based architecture and getting everything to work was a huge effort. We were showing off simpler [applications], such as keyword spotting, quite early, but the really big, powerful networks getting into the hundreds of frames per second and sub-10 millisecond latency, it’s a tremendous amount of work on the software to get to that point.”

Mythic AI Accelerator software flow
Software flow for Mythic’s AI accelerator, the M1108 (Image: Mythic)

The AI accelerator chip market has become increasingly competitive during this time. Has the delay cost the company customers?

Henry says not. Seeing other chips launch affirmed Mythic’s conviction that it has a unique balance of performance, power and cost, he said. Mythic’s view is that many other “edge” AI chips are really more like server class products with 10-15 W power requirements, or very low-end microwatt devices, leaving the more difficult “high-end edge” segment relatively empty of competition.

Analog compute
Mythic’s AI accelerator chip uses 108 compute tiles which rely on analog compute-in-memory techniques. In each tile, Mythic’s analog compute engine (ACE) sits alongside a digital SIMD vector engine, a 32-bit RISC-V processor, a network on chip (NoC) router and some local SRAM. The engine supports the equivalent of INT4, INT8 and INT16 operations and its overall capacity is 113 million weights, enough to run several separate complex AI models simultaneously. As computation is performed within Flash memory, external DRAM is not required.

Dataflow is engineered pre-runtime by the compiler. Since each tile contains resources such as SRAM, there is no contention, so the chip operates in a deterministic way. Parameters such as latency are known at compile time.

Mythic AI Accelerator tile
The structure of each of the 108 tiles on Mythic’s chip. The analog compute engine (ACE) uses an array of Flash memory cells, ADCs and DACs (Image: Mythic)

Henry and Mythic’s senior VP product and business development Tim Vehling were at pains to point out that their chip uses a “true” compute-in-memory technique.

“We do the full store and compute all on the same memory storage unit, the same Flash transistor,” said Vehling. “Some other analog architectures might do analog computation, but they’re still using digital memory for parameter storage. So they are still doing some sort of a data fetch. They’re not going to get the same density and the same speed [as Mythic’s] true compute-in-memory.”

Mythic AI Accelerator Chip
Mythic’s M1108 chip (Image: Mythic)

Henry points to Mythic’s high-performance ADC technology as one of the key performance differentiators versus other analog compute offerings. The company’s ADC technology is a key factor allowing Mythic to scale up to the number of tiles needed to hit 35 TOPS.

Aside from raw performance, Mythic’s other key differentiator versus other analog approaches is accuracy.

“We’re hitting high enough levels of accuracy that we can show popular neural networks, like ResNet-50, that many have said would never run in the analog domain in a million years,” Henry said, adding that many analog compute chip companies do not talk about the accuracy they can achieve on complex neural networks. “I still am very skeptical that anybody other than us is going to be showing popular neural networks running close to INT8 accuracy in the analog domain. I think that’s going to be always our unique edge,” he said.

Security cameras
Mythic’s high-end edge market includes target applications such as security cameras, which Henry described as a “hotly contested” but high-volume market that is quickly adopting AI accelerators. This is something of a sweet spot for Mythic since these cameras often use power over Ethernet, so power budget is tight, with compact design and no active cooling. The application demands real-time processing with low latency, and volumes are high.

Another key market is edge boxes for video analytics – perhaps processing video streams from an entire system of cameras. Power and cost constraints are still tight, Henry said, and these customers want to run multiple neural networks simultaneously.

The M1108’s nomenclature (first generation, 108-tile) implies further generations of product, and/or different tile configurations are in the pipeline. The company expects a family of products will be constructed using the same 40-nm compute tile as the M1108. This particular part was built with the security camera and video analytics “sweet spot” in mind, plus the maximum amount of space available on an M.2 card, Vehling said (the M1108 comes in a 19 x 19mm package), adding that products with both smaller and larger numbers of tiles are technically possible, but declined to confirm whether both were on the company’s roadmap. However, he did say that multi-chip PCIe cards are definitely on the roadmap.

Samples of the M1108 are available now, alongside single-chip M.2 and PCIe cards.

Leave a comment