Extrapolating to predict the future is risky; can server racks now dissipating tens of kW and thus requiring thousands of amps continue on their growth trend?
It’s well-known that the power requirement of data server racks has increased significantly over the past few years (I’ll avoid the tired and meaningless cliché of saying “exponentially” here). While the use of lower voltages and lower-power processes has reduced per-function power, the demands of the system they implement have increased at a higher rate than that drop. The result is that the power-needs treadmill keeps going faster and faster.
How much is the increase in rack power? There is no single answer, but one source (Reference 1) says that its survey shows it has gone from about 2.5 kW/rack in 2011 to almost 10 kW/rack in 2020 – and that’s just the average (Figure 1); about 15% of those surveyed were at 20 kW/rack or higher. (Other online surveys and sources had similar numbers.) There are also numerous predictions about the expected increase in dissipation/rack over the next years but those are speculative, of course, and so are “all over the place.”
No matter how you look at it, that’s a lot of heat to dissipate, and liquid cooling is becoming a standard way of dealing with it. Of course, such cooling only makes your problem of getting rid of rack heat into someone else’s problem — after all, the heat removed from the rack by the liquid cooling has to go somewhere else and dealt with somehow.
But here’s the part that troubles me personally: while I can grasp the issue of somehow dissipating tens of kilowatts per track, I can’t grasp what it takes to create this much heat: namely, amps of input current. After all, only part of that heat dissipation is due to power-supply efficiency (supplies are running at over 90% efficiency). Instead, most of the heat is being dissipated by the transistors which are providing the rack functions, not the supply.
The point really hit me when I saw a brief article in IEEE Spectrum about a wafer-size, trillion-transistor megachip for AI acceleration from Cerebras Systems, Figure 2 (Reference 2). I’m generally not interested in such high-density, high-count devices — after all, “analog-centric” EEs prefer to think about a few good linear components rather than a zillion-transistor digital device — but one number really hit me hard: this Cerebras IC needs 20 kA to function.
A quick run-through of the numbers shows this amount of current makes sense not only for this IC but racks in general. A 20-kW rack using 1-V rails is using 20 kA to get to that 20-kW number. That’s a lot of current to deliver, even before you need to worry about dissipating the resultant heat. The implications on power sourcing and distribution are intense.
I’ve always been very respectful of both high-voltage and high-current designs beyond designing and building a supply for such levels. With high voltage, in addition to the obvious lethal issues, there are issues related to arcing, flashover, insulator performance, dielectric breakdown, humidity effects, numerous mandates related to minimum creepage and clearance distances (Reference 3), and much more. For high currents, there’s a corresponding set of issues including IR drop (100 A through just one milliohm is a 100-millivolt loss, which is 10% of a nominal 1-V rail), thermal considerations and coefficients, thermal cycling, which may affect short- and long-term integrity of connections, and other maladies.
In short (is that perhaps a bad word to use here?) such high voltages and currents require much more than “just” a power supply. It also takes sophisticated power-distribution paths, bus bars, terminations, and more, with every part of the current path modeled, built, and verified beginning with basic tests such as high-potential (hipot) test (formally called a dielectric-strength test), partial discharge test; and an insulation-resistance test (usually referred to as a megaohm or “megger” test).
There is something ironic in seeing this trillion-transistor AI-accelerator chip and then looking at basic numbers of the “real” (non-AI) intelligence of the human brain. The human brain is estimated to have under 100 billion neurons yet dissipates only about 20 W (Reference 4); that’s about 20% of an adult’s total power needs. Clearly, there are a lot of specifics we don’t understand about the human brain and its function — its per-neuron power efficiency is clearly one of those specifics.
Have you ever been involved in a project with extremely high currents, even at low voltage? What were the “surprises” you encountered? What extraordinary design considerations did you add? Do you think rack-power density will continue to grow at such high rates? Or will there be an asymptotic limit, followed by totally new approaches to implementing data center racks and their circuitry, such as optical-based computing?
This article was originally published on EE Times.
Bill Schweber is an electronics engineer who has written three textbooks on electronic communications systems, as well as hundreds of technical articles, opinion columns, and product features. In past roles, he worked as a technical website manager for multiple EE Times sites and as both Executive Editor and Analog Editor at EDN. At Analog Devices, he was in marketing communications; as a result, he has been on both sides of the technical PR function, presenting company products, stories, and messages to the media and also as the recipient of these. Prior to the marcom role at Analog, Bill was Associate Editor of its respected technical journal, and also worked in its product marketing and applications engineering groups. Before those roles, he was at Instron Corp., doing hands-on analog- and power-circuit design and systems integration for materials-testing machine controls. He has a BSEE from Columbia University and an MSEE from the University of Massachusetts, is a Registered Professional Engineer, and holds an Advanced Class amateur radio license. He has also planned, written, and presented online courses on a variety of engineering topics, including MOSFET basics, ADC selection, and driving LEDs.