Real-time AI has computing requirements that can’t be met by CPUs and GPUs...
Computing infrastructure is undergoing a major shift as a wave of real-time services grow to become part of our everyday life. From intelligent personal assistants providing instantaneous information using natural language to retailers generating information on customer shopping behaviour through in-store analytics, these real-time services present a huge market opportunity for service providers.
To derive value from these services, data and insights need to be instantly accessible and will largely be driven through AI-enabled services. In response, cloud giants like Amazon Web Services (AWS), Microsoft, Alibaba and SK Telecom are developing the computing infrastructure to deliver those services.
Data center operators must now optimize computing to meet real-time response requirements. Hence, IT architectures also must address varied and quickly evolving workloads and algorithms—largely driven by AI—along with increasing integration of computing into storage and networking.
For their part, service providers need an infrastructure platform offering differentiation and performance to deliver throughput, low latency and a flexible software and hardware stack that can handle algorithms ranging from recurrent neural networks and long- and short-term memory networks, convolutional neural networks and query acceleration based on the Apache Spark cluster computing framework.
To achieve this level of differentiation, service providers are building their own hardware and software stacks. For example, the AWS Advanced Query Accelerator is a data analytics platform with a bespoke software and programmable hardware stack. SK Telecom recently developed AI-enabled speech and video analytics on a custom software and programmable hardware stack.
The next wave of computing needs to be adaptive, where software and hardware merge, and where both hardware and software are programmable to achieve real-time performance, maximum throughput and low latency and power efficiency. With the growth of real-time solutions and advances in AI, increasingly complex workloads and an explosion of unstructured data, a shift is underway in the data center focused on adaptable acceleration of computing, storage and networking.
Academic researchers are leveraging high performance computing (HPC) as a path to solving some of the world’s most complex problems. Accelerating time to insight and deploying HPC at scale requires incredible amounts of raw computing capability, energy efficiency and adaptability.
In a quest to answer the world’s most challenging scientific questions, a consortium of some 20,000 scientists at the European Laboratory for Particle Physics (CERN) is attempting to reconstruct the origin of the universe. To do this, researchers must push the limits of technology.
The Large Hadron Collider is the largest particle accelerator in the world. The 27-kilometer ring is composed of superconducting magnets that accelerate particles to previously unprecedented energy levels. Each proton traverses the ring 11,000 times per second — approaching the speed of light. At four different points on the ring-every 25 nanoseconds — protons collide. The conditions of the collision are captured by particle detectors.
This trigger system is implemented in two layers — the first trigger requiring a fixed, extremely low-latency AI inference capability of about three microseconds per event. It also requires massive bandwidth.
CPUs and GPUs cannot meet these requirements. So, 100 meters underground but shielded from radiation area, is a network of FPGAs running algorithms designed to instantaneously filter the data generated and identify novel particle substructures as evidence of the existence of dark matter and other physical phenomena. These FPGAs run both classical and convolutional neural networks to receive and align sensor data, perform tracking and clustering, run machine learning object identification and trigger functions—all before formatting and delivering the event data. The result is extremely low-latency inference on the order of 100 nanoseconds.
Storage for Real-Time Analysis
The adoption of high-speed storage and increased performance requirements for data-intensive applications have created CPU, memory and storage bottlenecks. Hence, the focus is shifting from computing horsepower to processing data through computational storage. That has implications for improved application performance and overall infrastructure efficiency.
A viable solution is to move computing closer to the data. Integrating data analytics with storage significantly reduces system-level data bottlenecks, increases parallelism while reducing overall power requirements. This approach has attracted vendors such as IBM and Micron Technology, who have developed accelerated storage and computation storage products where processing takes place near the data. Samsung Electronics has launched SmartSSD to enable high-performance accelerated computing closer to flash storage while overcoming CPU and memory limitations. Samsung’s SmartSSD increases speed and efficiency and lowers operating costs by pushing intelligence to where data reside.
With the advent of virtualized computing and containerized workloads, networking has become far more complex. As these environments scale beyond a single server, they must employ sophisticated overlay networks. Overlay networks are virtualized systems that are dynamically created and maintained using the concept of packet encapsulation. Supervising this encapsulation adds a burden on the OS or virtualization kernel. When combined with traditional networking tasks, these approaches consume nearly 30 percent of a server’s raw CPU cycles.
A common means of managing overlay networks is the Open vSwitch (OvS) protocol. FPGA-based SmartNICs (network interface cards) have the computational capacity to offload the host CPU from the above mentioned 30-percent overhead. In simple terms, three servers with SmartNICs handling OvS have the computational power of four servers running on standard NICs.
FPGA-based SmartNICs can also be leveraged to offload security and encryption tasks normally executed on the server CPU. Security comes in the form of deep packet inspection, resulting in dropped packets if they pose a threat. That approach could augment or even replace traditional firewall software enterprises now run on their servers. Additionally, SmartNICs can easily offload various encryption and decryption tasks.
New world order
In the new era of real-time services, meeting demand using only CPUs, or multicore CPUs, is not practical due to cost, power consumption and CPU-only scaling. For many workloads, throwing more CPU-based servers at the problem simply won’t deliver the required performance.
As Moore’s law grinds to a halt, next-generation CPUs offer little in the way of hope. Adaptable computing accelerators are therefore a viable solution, promising to meet the broad computing demand while scaling to help manage operating costs.