What affects hyper-scalers’ accelerator choices, and what hardware are they actually choosing?
Deploying, managing and orchestrating compute acceleration chips at scale is not easy. Counterintuitively, cloud providers’ economic trade-offs favor non-performance aspects of accelerator product offerings, including OS drivers.
Liftr Insights has been tracking instance types and sizes offered by the top four infrastructure-as-a-service (IaaS) cloud provider over the last year: Alibaba Cloud, Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure.
In June 2019, we included the remaining few accelerator instance types, those that can be mixed-and-matched with a variety of processor-only instance types. After three quarters of detailed telemetry on the big four cloud offerings, three big trends have emerged.
Software drivers affect processor choice
Kevin Krewell mentioned the importance of compiler expertise in his presentation at the recent Linley Conference. However, efficient use of accelerator hardware by an application is only part of the overall software solution required to manage and orchestrate accelerator chips across cloud geographies.
In the top four clouds, all accelerator chips (regardless of type or manufacturer) have been attached exclusively to Intel Xeon processors for at least the past year, with the very recent exception of AMD’s EPYC on Microsoft Azure. Azure was the first to break ranks in February with its first production deployments of AMD Radeon Instinct MI25 GPUs with AMD EPYC v2 Rome processors.
The challenge at hyper-scale is software driver support for different processor models running different OS distributions and versions for multiple versions of each accelerator chip.
Software drivers affect accelerator choice
While compilers and acceleration APIs must be performant, accelerator drivers must be stable and reliable. Hyper-scale customers want to see rapid turnaround for bug fixes, bulletproof quality assurance and, above all, process control.
In the processor market, reliability, availability and serviceability (RAS) has been one of the biggest impediments to Arm processor adoption. Accelerators are no different. Ensuring driver RAS at hyper-scale is a much different skill set than designing performant compilers. And it takes time to develop the skills and process control to demonstrate a history of stable behavior.
The result is Nvidia’s 86-percent share of instance types offered by the top four clouds. This share contrasts with a highly fragmented competitive field of FPGAs (Intel and Xilinx), GPUs (AMD legacy and very recently Radeon Instinct) and the clouds’ own in-house designs (today, that’s Google Cloud Tensor Processing Unit [TPU] and AWS Inferentia).
Usability, RAS affect software development tools
Here again, it is not enough to have performant compilers behind an accelerator’s developer tools. We assume every accelerator chip development team has access to reasonably good compiler developers and average developer tool designers.
Development tools must be usable by a large number of potential customers and must behave as developers expect them to.
Nvidia’s CUDA provides a flexible underpinning for tools developers to support a very wide variety of dev tools across Nvidia’s GPU product line. Nvidia’s share of the accelerator market increased slightly over the past year as overall accelerator-based deployments grew by almost 70 percent in the top four clouds.
Azure supports AMD’s Radeon Instinct MI25 in one type family (NVas v4) but only on Windows, and the type family’s fractional GPU-per-instance configurations are typical of virtual desktop environments. AMD has demonstrated solid support of actual enterprise desktop environments and its advanced GPU virtualization features make its GPUs competitive for virtual desktops.
Access to in-house designed deep learning accelerators is enabled only through deep learning frameworks. Google enables developer access to its Cloud TPUs through TensorFlow (production) and PyTorch (beta). AWS enables developer access to its Inferentia chip through its own AWS Neuron software developer kit (SDK), which AWS has integrated with TensorFlow, PyTorch and MXNet.
AWS deployed Inferentia to production in December 2019, so it has only a short history. However, Google’s Cloud TPU deployment footprint has not changed since Liftr Insights started tracking it in June 2019.
The clouds are split on FPGA strategy.
Azure offers an Intel Arria 10 FPGA-based instance type (PBs). But Azure only enables access to that type through a small set of pre-developed deep learning inferencing models: ResNet 50, ResNet 152, DenseNet-121, VGG-16 and SSD-VGG. Azure deployed its FPGA instance type to production in November 2019.
Alibaba Cloud and AWS offer general-purpose FPGA instance types and partner with third parties to offer both FPGA development tools and pre-developed applications in app marketplaces. There are two challenges. First, FPGA development skills are rare, unlike GPU dev tools and deep learning modeling frameworks. Second, FPGA marketplace apps must show clear advantage over competitive GPU-based applications.
Alibaba Cloud slightly reduced its Intel Arria 10 FPGA deployment in February 2020; it had been stable since tracking began in March 2019. Over that same period, Alibaba Cloud almost doubled its Xilinx Virtex UltraScale+ FPGA deployments. AWS increased its Xilinx Virtex UltraScale+ FPGA deployments by about 20 percent in October. Remember that these changes are from a very small percent of total accelerator deployments.
As a side-note, Liftr Insights has not yet recorded newer FPGA chips in the top four public IaaS lineup, nor has it recorded deployments of pre-announced instance types based on other deep learning accelerators (such as Graphcore’s Colossus).
Top tier cloud providers worldwide are designing their own in-house deep learning accelerators. We believe vendor-neutral machine learning model formats like ONNX will be a key enabler for the proliferation of both training and inferencing chip designs.
AWS and GCP have already deployed their own designs as public IaaS instance types.
Alibaba Cloud revealed its first attempt at an inferencing accelerator, the Hanguang 800, in September 2019, but it has not yet made it available in a public instance type. Alibaba Group recently committed to invest $28 million over three years on semiconductor and OS development and to continue building out its data center infrastructure.
Baidu announced its Kunlun AI deep learning accelerator in December 2019 to be manufactured by Samsung. Baidu is also working with Huawei to ensure that Baidu’s PaddlePaddle deep learning framework will run on Huawei’s Kirin server processors and presumably also Huawei’s Ascend series of deep learning accelerators.
Outside of IaaS, in the software-as-a-service cloud, Facebook is working with the Open Compute Platform (OCP) accelerator module working group (OAM) to develop standardized training and inferencing platforms.
The OAM training platform is designed to house a wide range of large high-wattage, merchant deep learning accelerators using an interchangeable module that integrates an accelerator chip plus heat sink, including AMD, Intel/Habana Graphcore and Nvidia accelerators.
Likewise, the OAM inferencing platform is designed to house a wide range of small low-wattage inferencing accelerators in a standard M.2 physical carrier.
Facebook has designed its own Glow compiler to optimize inferencing models developed in standard frameworks such as PyTorch to each specific M.2-based inferencing accelerator.
Open infrastructure such as OCP’s OAM will enable lower-tier clouds to better compete with giants such as AWS, Azure, GCP and Alibaba Cloud.
— Paul Teich is principal analyst at Liftr Insights.