Roundup of AI Chip Startups: Growth to Maturity

Article By : Sally Ward-Foxton

Are some of the dozens of AI chip startups beginning to mature into companies with rounded offerings?

Over the last couple of years, there has been a widely-documented Cambrian Explosion of AI chip startups appearing. Some of the earliest are beginning to mature, expanding their offerings with modules and cards with their chips on, announcing real-world design wins and building global distribution channels.

Here’s a roundup of some recent announcements that EE Times noted with interest.

Hailo modules
Israeli AI chip startup Hailo has launched two AI accelerator modules based on its Hailo-8 AI accelerator chip for edge applications. The modules are in standard M.2 and mini-PCIe formats and will suit fanless “edge boxes” in smart city, smart retail, smart home and industry 4.0 applications. These edge boxes might be performing tasks like multiple video stream analysis where this needs to be done at the edge to reduce latency and avoid privacy issues.

The Hailo-8’s “structure-defined dataflow” architecture allows it to perform 26 TOPS at 3 TOPS/W. It is automotive qualified as suitable for ASIL-B applications and Hailo has it AEC-Q100 Grade 2 qualification.

Hailo recently released new figures which showed its modules roundly beating Intel Myriad-X and Google Edge TPU (Coral M.2) modules on various performance benchmarks, including EfficientNet-EdgeTPU, which was optimized for Google Edge TPU. Hardly surprising, given that the Hailo-8 boasts 26 TOPS and each of the competing modules offers just 4 TOPS peak performance. What was a little surprising was that Hailo’s internal tests placed the Google Edge TPU module at, on average, twice as performant as the Intel Myriad-X module.

AI Chip Startups Hailo vs Google vs Intel Movidius
Hailo benchmarked its chip against two market leaders. Bars show frames per second processed using each model. (Image: Hailo)

The Hailo-8 is already used in Foxconn’s edge box, BOXiedge, designed for processing video in edge applications. This fanless box features Socionext’s SynQuacer SCA11 parallel processor alongside the Hailo-8 for deep learning inference acceleration.

The Tel-Aviv-based startup was founded in 2017 and now has more than 100 employees. It has raised more than $88m to date from strategic investors such as NEC and ABB.

Groq cards
Groq is now shipping its tensor streaming processor (TSP) silicon as a server node which integrates eight of its PCIe cards into a single chassis for AI inference in the data center. Groq’s TSP is one of the most powerful in the industry at 1 POPS (1000 TOPS) – per Groq’s figures, it can 18,900 IPS (inferences per second) on ResNet-50 v2 at batch size one, making it the fastest commercially available AI accelerator chip.

Groq’s processor has a simplified hardware design with execution planning happening in software. Groq’s compiler essentially orchestrates all the dataflow and timings required to make sure calculations happen without stalls, making latency and performance predictable.

The company recently announced it had taken on more funding, though it declined to reveal how much was raised, and hinted that it was broadening its applicable markets to include automotive.

AI Chip Startups Groq Node
The Groq node is a 5U box containing eight TPU chips. It offers up to 6 POPS of AI inference performance (Image: Groq)

The new Groq node combines eight Groq cards (8 TPU chips) and offers 6 POPS of performance while consuming 3.3kW of power in a 5U form factor. Groq says this combination of performance and power consumption means a significant benefit in terms of total cost of ownership (TCO) for data centers.

Graphcore distributors
British AI accelerator startup Graphcore has announced it has built a global network of channel partners.

The company was one of the first to release silicon back in 2018, announcing a second-generation chip over the summer. The Colossus Mark 2 is designed to place Graphcore firmly in competition with market leader Nvidia, achieving around 250 TFLOPS for AI training in the data center. Graphcore’s system level solution, the IPU Machine, is a 1U server blade containing four Colossus Mark 2 chips which offers a Petaflop of AI compute at FP16 precision. There is also an IPU-POD, sixteen IPU Machines that can work together or in parallel.

Graphcore IPU-POD
Graphcore’s IPU-POD houses sixteen IPU Machines (64 IPU chips) suitable for HPC applications (Image: Graphcore)

Graphcore says its IPUs are already installed in financial services, healthcare, consumer internet, academic research and many other fields.

Graphcore’s Elite Partner Program is a network of distributors and resellers for the IPU Machine. They include server manufacturers Dell, Inspur and 2CRSI as well as scientific supercomputer builder Atos, European distributor Boston Limited, systems integrator Business Systems International (BSI), Chinese distributor Digital China, scientific compute distributor Lambda, Hong Kong-based Macnica Cytech, supplier to US federal customers Meadowgate Technologies, Korean distributor Megazone, British HPC technology distributor OCF, HPC server builder Penguin Computing, distributor Tech Data Europe and US, and mission-critical/US government specialists Wildflower International.

Leave a comment