Do AI accelerators save energy? And is overall electricity consumption declining, or are data center simply squeezing more compute into the same power envelope?
It’s easy to forget that AI requires a staggering amount of computing resources, and therefore power, as most of the heavy lifting is done in the cloud.
A study last year from the University of Massachusetts at Amherst found that training one large AI model for natural language processing (BERT) uses enough electricity to produce an equivalent amount of CO2 as a round-trip trans-Atlantic flight for one person. That’s just one model, albeit a transformer network, trained one time; models are typically tuned and retrained many times during development. Add neural architecture search, an AutoML technique which uses AI to tune models, to a decent-size transformer and the total jumps to almost the same amount of CO2 as the lifetime emissions of five American cars.
AI accelerators promise to make AI processing much more computationally efficient. And as AI workloads increase, data centers are adopting these new specialized accelerators.
But do AI accelerators save energy? And is overall electricity consumption declining, or are data center simply squeezing more compute into the same power envelope?
The amount of energy used in AI computations is dependent on several things, explained David Turek, vice president of technical computing for IBM Cognitive Systems. “The manipulation of your strategy for how you want to train the model affects how much energy you’re going to be consuming,” Turek said. Metrics like “calculations-per-watt are not particularly informative, in the sense that there are many different stratagems that one can impose to try to reduce the overall energy consumption.”
A combination of the entire system architecture and the context of the application dictates the actual energy budget, he added. “The hierarchy of computational power, from model training to model deployment has a direct impact on the infrastructure, and that has a direct impact on the amount of energy consumed.”
Far from the classic view of AI systems, where one model is trained once and then deployed elsewhere for inference, the reality is that typical systems train many models many times and may run inference on multiple models simultaneously to get the best result.
After deployment, techniques such as federated learning are sometimes used to handle incremental model updates at the edge, rather than back at the data center. The power profile is then dependent on what processing is deployed at the edge.
In other words, the amount of energy consumed by training a particular AI model is not straightforward. But data center infrastructure is fixed, so making workflow adjustments is the optimal way to save energy, Turek said.
Possible approaches include: merging AI models with classical high-performance computing to reduce the total amount of computing required; trading off the time taken to complete a job in order to reduce use of high-energy AI accelerator hardware such as GPUs; and avoiding retraining in the data center using techniques such as federated learning.
“It’s about intelligence applied to the workflows that you can use from a management perspective to orchestrate optimal ways to deploy the energy available to you and your fixed system,” Turek said. Operators “can make scheduling assignments on their hardware infrastructure by taking into account energy budgets and energy consumption.”
Server maker Supermicro’s annual survey of data center environmental practices published late last year revealed opportunities for energy efficiency. Those opportunities are being missed, according to Michael McNerney, vice president of marketing and network security at Supermicro.
“We think there are some really basic best practices that could deliver a lot of value to customers,” McNerney said. “One that’s counterintuitive to a lot of people who’ve been running data centers for a long time is that the systems we build today can run hotter than traditional data center environments.”
Current equipment design means typical cooling to between 23-25°C (73-77°F) is no longer necessary to maintain performance and reliability. While some “green” data centers operator at extreme temperatures, even a small change can help save energy by reducing demands for air conditioning.
Multi-node systems are another way to conserve energy, where multiple servers are run with shared infrastructure. That configuration reduces the number of large power supplies and fans required. Multi-node systems are more energy efficient, can run hotter, and provide higher power density.
Supermicro’s survey also found the average power density per rack is currently 15kW while server inlet temperature is 23.5°C, and servers are refreshed every 4.1 years. Data centers with highly optimized green designs (operated by 12 percent of survey respondents) had a power density above 25kW per rack, an average inlet temperature of 26.5°C, while servers were swapped every 2 to 3 years. Hence, Supermicro concludes that most data centers still have a ways to go in terms of optimizing for energy efficiency.
Surprisingly, most respondents did not see energy consumption as a key success metric. “One thing we’ve seen is that companies’ facilities budget is separate from the acquisition costs of hardware and the capital acquisition costs of systems, which is separate from the headcount costs. And I think people are aware of them, but they don’t necessarily optimize them all together,” McNerney said.
“Larger data centers have a better insight into total cost of ownership, but, even still, if an increase in capital acquisition budget can decrease their energy budget… that’s sometimes a hard connection to make.”
McNerneyt does not see overall data center power consumption dropping any time soon. “The long-term trend is that the consumption of certain online services will keep pace with the improvements in efficiency, or you’ll actually see growth of the power consumption based on increased utilization for 5G [and] AI,” he said.
Energy vs. Power
Data center owners are looking to improve energy efficiency across the board since electricity makes up to 25 percent of their operating costs, said Paresh Kharya, Nvidia’s director of product management for accelerated computing.
One widely used metric for gauging energy savings is power utilization efficiency (PUE)—the ratio of energy directly used for computing in relation to overall energy consumed by data center infrastructure. An PUE rating of 1 is the goal.
“Over the years, we’ve seen PUEs in hyperscale data centers approaching 1 or 1.1, so they are becoming really efficient,” Kharya said. “Enterprise data centers have also made significant strides in this area—they went from more than a [PUE rating of] 2 to much less than 2 in most cases.”
Hyperscale data centers have highly optimized rack and cooling designs, and can operate at huge scale, conferring the ability to optimize and use sophisticated techniques beyond the reach of most enterprise data centers. “A lot of that innovation is also coming to enterprises, and we’re seeing energy efficiency improved significantly there as well,” Kharya said.
Given that companies pay for energy, not power, Kharya argued that meantime-to-solution is an important factor. For example, training ResNet-50 model for image recognition on a CPU-only server can take up to three weeks, whereas a server equipped with an Nvidia V100 GPU can do it in less than a day, he added.
“The individual server with our GPU on it would consume more energy [than the CPU equivalent], but it would provide a dramatically faster time to solution. So overall energy consumption [for AI workloads] would go down by a factor of 20 to 25 using GPU accelerators,” he argued.
Data center operators are therefore focused on ensuring all systems are operating efficiently in order to squeeze maximum computing out of costly infrastructure, according to Allyson Klein, Intel’s general manager of data platforms marketing.
“Data center operators’ primary goal is to maximize the performance they get out of their infrastructure,” Klein said. “Achieving efficient performance lies at the system-and-rack level as well as ensuring the entire data center is working together to maximize performance-per-watt.”
Hence, understanding the cross-section of data center workloads is key to deploying the right infrastructure to meet performance and power requirements. The ideal result is more compute capacity that consumes less power, without idle infrastructure consuming power.
Often, the trade-off is between using integrated acceleration in a CPU for a workload or seeking a discrete accelerator. “Accelerators add power, but can be more efficient overall if utilized full time,” Klein said. “If an accelerator does a lot of work and utilization is high, it may make sense, if the customer is also open for the investment in the infrastructure,” Klein added. “If an accelerator is not going to be consistently tapped, a CPU solution may be a better total investment as the accelerator sits idle too often, pulling power without providing any performance output.”
For most deployments, Klein said AI is just one of hundreds of thousands of different workloads. While Intel offers both CPUs and specialized AI accelerators (via Habana Labs), diverse workloads are the reason why Xeon Scalable (CPU) platforms are touted as the most efficient from a power and investment standpoint.
“The foundation of Intel’s AI strategy is built on Xeon Scalable processors with AI optimization built in and extensive software optimizations for machine and deep learning,” Klein said.
While the energy consumption of AI accelerators such as GPUs is large, they offer computing efficiencies that reduce overall energy consumption for comparable AI workloads. Although AI represents a growing portion of data center jobs, everyday workloads remain quite diverse.
AI workloads will benefit the most from accelerators, but CPUs continue to win more sockets in both hyper-scaler and enterprise data centers. The reason is flexibility. With AI workloads expanding and new 5G applications generating more unstructured data, it’s unlikely data center energy consumption will decline any time soon.