AI processors demand the use of more power. One way to compensate is to redesign data centers with 48-V racks, though that has resulted in new power conversion challenges.
The challenges of powering AI (artificial intelligence) processors lie in maintaining efficiency and enabling the highest quality of algorithm execution. AI processors need massive power, and a decrease in energy efficiency corresponds to an increase in losses in the entire power distribution network (PDN). In an interview with EE Times, Robert Gendron, P.E. corporate vice president of Vicor, highlighted how in data centers, the addition of artificial intelligence, machine learning, and deep learning has caused rack power to jump by over 200% to 20-kW range, leading to a re-evaluation of their PDNs using new 48-V solutions. The ability to redesign their 48-V racks and data center solutions has solved the high-current PDN problem but has resulted new challenges to power conversion.
Soaring demands on the PDN
Power delivery and power efficiency has become the largest concern in large-scale computing systems (figure 1). The industry has witnessed a dramatic increase in power consumed by processors with the advent of ASICs and GPUs processing complex AI functions. Rack power demands have also scaled proportionately to AI capability being utilized in large scale learning and inferencing application deployments.. In most cases, power delivery is now the limiting factor in computing performance as new CPUs look to consume ever-increasing currents. Optimal power delivery entails not just the distribution of power but also the efficiency, size, cost and thermal performance.
In order to support a large amount of data computing, traditional PDNs are subjected to enormous power demands, which impacts thermal management. Reducing resistance by lengthening the cables of PDN systems or increasing the operating voltage to reduce current are two options being adopted. To meet the increase in power, modern designs are adopting the second option to more effectively meet the stringent demands in data centers.
“Currently power demands are far outpacing traditional power delivery networks,” said Gendron. “Switching to a 48V architecture and adopting more innovative approaches to power delivery is the only way to deliver high-performance power to meet the staggering AI/HPC demands.”
When the processor power in 2015 started to increase dramatically, the Open Compute Project (OCP) consortium, which has the most cloud, server, and CPU companies as members, continued to evolve its 12-V rack design. The response was to switch from cables to busbars and deploy more 12-V single-phase AC converters inside the rack to minimize PDN distance and resistance to server blades. The main change was that the single-phase AC was derived from the individual phases of a three-phase power supply to the rack due to the increased power. Subsequently, the introduction of AI in data centers with 500-A to 1,000-A processors directed some companies to switch to 48-V distribution. This reduced the high-current PDN problem to 250A for a 12-kW rack, but introduced new challenges to the power conversion of the entire system. Because the PDNs feeding the blades are switching to 48 V, a power conversion change is required on the blade. In any case, switching to 48 V from the 12-V distribution reduces the input current requirement by a factor of 4 and reduces losses by 16×.
48-V architecture adoption
The use of 48 V is due to the use of rechargeable backup battery systems to power telecommunications equipment. The common architecture traditionally used in these systems was called intermediate bus architecture, which consisted of an isolated unregulated bus converter to convert 48 V to +12 V, which was then fed to a bank of multi-phase buck regulators to handle the conversion to 12 V and regulation for the point of load (PoL). As the currents of AI processors and CPUs increased, the density of the power delivery solution to the PoL became the most critical element in AI applications due to the PDN resistance between the regulator and the PoL. PDN losses are a dominant factor in calculating the efficiency and performance of the DC/DC regulator design.
To reduce losses Vicor suggests using a 48-V pre-regulation module (PRM), followed by a fixed ratio (1/K factor) voltage transformation stage (VTM). This proprietary architecture allows the performance of each stage to be optimized.
The PRM uses a zero-voltage switching topology, while the VTM uses a proprietary high-frequency sinusoidal amplitude converter (SAC) topology. The VTM can be seen as a DC/DC transformer with a ratio of 1/K for voltage and K for current. The VTM offers high power density and can be placed very close to the processor.
VTM implements a SAC (Sine Amplitude Converter) topology so its emissions are low and narrowband compared with those of multi-phase switches and their associated inductors. It also provides greater power density than multi-phase designs, with the single VTM replacing six multi-phase switch stages. The VTM fits in a small footprint, well within the layout constraints of advanced processors supporting four-channel memory without encroaching on the memory subsystem’s layout areas.
Depending on the processor electric current, engineers can choose between lateral power delivery (LPD) or vertical power delivery (VPD). In the former case, the current multiplier is located alongside the AI processor either on the same substrate or directly on the motherboard within a few millimeters, allowing the PDN to be reduced to about 50 µΩ. For even higher performance, VPD moves the current multiplier directly underneath the processor, also integrating high-frequency ground capacitors. This type of current multiplier is called a geared current multiplier. The VPD reduces the PDN resistance to 5–7 µΩ, allowing AI processors to be free to harness full power.
Maximizing AI processor performance
A typical Vicor VR solution for advanced AI processor acceleration modules is shown in Figure 4. The Vicor VR is comprised of three powertrain modules, an MCD (Modular Current Driver) and 2 MCM’s (Modular Current Multipliers), which provides a 48Vin to 0.8Vout VR with the capability of up to 650A continuous current and over 1000A peak current delivery. Like jet fuel for a plane, this level of power delivery ensures that the AI processor can operate at optimum clock rates and maximize performance.
“If our technology had not been employed in these advanced AI applications, the number of multi-phase VR devices would have exceeded the board size and would not maintain the same form factor. In addition, the noise contribution would have most likely been too high to maintain signal integrity,” said Gendron.
By utilizing a Vicor NBM2317, compatibility with legacy 12V server rack power distribution is maintained and provides 48V to the Vicor VR. This 12V to 48V converter can also run in “opposite” direction enabling 48V to 12V conversion. Conventional power architectures are not keeping pace with today’s power-hungry AI processors and their adoption within cloud computing. The Vicor power approach enables 48V distribution and a VR that supports advanced AI processing needs. Departing from the conventional multi-phase design used with CPUs, the Vicor solution was developed specifically to address a new class of processors quickly migrating within cloud servers.
A new approach to powering AI /HPC is needed. Distributing 12V from cloud server rack is no longer tenable, as leading companies push the envelope on power. To power, today’s ASICs and GPUs require more than just increased power by swapping out parts. The most effective solutions start with high voltage power and incorporate innovative architectures and topologies and use high-density power modules that are highly efficient.
This article was originally published on EE Times.
Maurizio Di Paolo Emilio holds a Ph.D. in Physics and is a telecommunication engineer and journalist. He has worked on various international projects in the field of gravitational wave research. He collaborates with research institutions to design data acquisition and control systems for space applications. He is the author of several books published by Springer, as well as numerous scientific and technical publications on electronics design.