Software hides FPGA hardware from data scientists and neural network developers, ships with Xilinx Alveo cards that replace GPUs in AI accelerators...
AI software startup Mipsology is working with Xilinx to enable FPGAs to replace GPUs in AI accelerator applications using only a single additional command. Mipsology’s “zero effort” software, Zebra, converts GPU code to run on Mipsology’s AI compute engine on an FPGA without any code changes or retraining necessary.
Xilinx announced today that it is shipping Zebra with the latest build of its Alveo U50 cards for the data center. Zebra already supports inference acceleration on other Xilinx boards, including Alveo U200 and Alveo U250.
“The level of acceleration that Zebra brings to our Alveo cards puts CPU and GPU accelerators to shame,” said Ramine Roane, Xilinx’s vice president of marketing. “Combined with Zebra, Alveo U50 meets the flexibility and performance needs of AI workloads and offers high throughput and low latency performance advantages to any deployment.”
FPGAs historically were seen as notoriously difficult to program for non-specialists, but Mipsology wants to make FPGAs into a plug-and-play solution that is as easy to use as a CPU or GPU. The idea is to make it as easy as possible to switch from other types of acceleration to FPGA.
“The best way to see [Mipsology] is that we do the software that goes on top of FPGAs to make them transparent in the same way that Nvidia did Cuda CuDNN to make the GPU completely transparent for AI users,” said Mipsology CEO Ludovic Larzul, in an interview with EE Times.
Crucially, this can be done by non-experts, without deep AI expertise or FPGA skills, as no model retraining is needed to transition.
“Ease of use is very important, because when you look at people’s AI projects, they often don’t have access to the AI team who designs the neural network,” Larzul said. “Typically if someone puts in place a system of robots, or a video surveillance system… they have some other teams or other parties developing the neural networks and training them. And once they get [the trained model], they don’t want to change it because they don’t have the expertise.”
Why would Xilinx support third-party software when it already its own neural network accelerator engine (XDNN)?
“The pitch in one sentence is: we are doing better,” Larzul said. “Another sentence would be: ours works.”
Mipsology has its own compute engine within Zebra, which supports customers’ existing convolutional neural network (CNN) models, unlike XDNN which Larzul said has support for plenty of demos but is less well-suited to custom neural networks. This, he said, made getting custom networks up and running with XDNN “painful”. While XDNN can compete in applications where there is no threat from GPUs, Zebra is intended to enable FPGAs to take on GPUs head-on based on performance, cost and ease of use.
Most customers’ motivation to change from GPU solutions is cost, Larzul said.
“They want to lower the cost of the hardware, but don’t want to have to redesign the neural network,” he said. “There is a non-recurring cost [that’s avoided] because we are able to replace GPUs transparently, and there is no re-training or modification of the neural network.”
FPGAs also offer reliability, in part because they are less aggressive on silicon real estate and often run cooler than other accelerator types including GPUs, according to Larzul. This is especially important in the data center where long-term maintenance costs are significant.
“Total cost of ownership is not just the price of the board,” Larzul said. “There is also the price of making sure the system is up and running.”
Zebra is also aiming to make FPGAs compete on performance. While FPGAs typically offer less TOPS (tera operations per second) than other accelerators, they are able to use those TOPS more efficiently thanks to Zebra’s carefully designed compute engine, Larzul said.
“That’s something that most of the ASIC start-ups accelerating AI have forgotten — they are doing a very big piece of silicon, trying to pack in more TOPS, but they haven’t thought about how you map your network on that to be efficient,” he said, noting that Zebra’s FPGA-based engine is able to process more images per second than a GPU with 6x the amount of TOPS.
How is this achieved? While Larzul did not give exact details, he did say that they do not rely on pruning, since the accuracy reduction is too great to be acceptable without retraining. They do not use extreme quantisation (below 8-bit) for the same reason.
Zebra’s engine accelerates CNNs, which are mostly used by image and video processing applications today, but Zebra can also be applied to BERT (Google’s natural language processing model), which uses similar mathematical concepts. Future iterations of Zebra may cover other types of neural network including LSTM (long short-term memory) and RNNs (recurrent neural networks), but this is harder to achieve since RNNs are mathematically more diverse.
Team from EVE
Mipsology was founded in 2015, with around 30 people working on R&D in France, and a small team in California mainly covering business development. The company has received funding totalling $7m, $2m of which was a prize from a French government innovation competition in 2019.
Mipsology’s core team is from EVE — an ASIC emulator company acquired by Synopsys in 2012 for its ZeBu (Zero Bug) hardware-assisted verification products, at that time a competitor for Cadence’s Palladium verification platform. According to Larzul, EVE technology was used by almost all the major ASIC companies to verify ASICs during the design cycle; this technology relied on thousands of FPGAs connected together to reproduce ASIC behavior.
Mipsology has 12 patents pending and works closely with Xilinx as well as being compatible with third party accelerator cards such as Western Digital small form factor (SFF U.2) cards and Advantech cards like the Vega-4001.