Turning Datacenter Chips into System Sensors

Article By : proteanTecs

Smart microprocessors are bringing power-savings and cost-reduction to manufacturers and system operators.

What’s happening in the power management space amid the never-ending drive to lower power consumption in more and more complex technologies and applications? What about in applications dealing with higher and higher voltages? This month’s In Focus highlights the various design developments and manufacturing strategies happening in the power management segment.


It’s no secret that less power consumption means lower operating costs but turning that concept into a reality takes foresight and data.

In datacenters, excessive power consumption is a multi-dimensional problem for operators. The energy needed to properly run hardware racks up utility costs while managers attempt to maintain an optimal facility temperature. This doesn’t consider the costs associated with excessive component wear out, caused by heat buildup within a server or switch that’s not running in an optimized manner.

Today, power, performance, reliability, and cost are interchangeable. Unfortunately, this problem is not new, but an emerging field of deep data is paving the way for a brighter equation.

A Zero-Sum Game

Not all chips are created equal. Lot, wafer, and even on-chip variation (OCV) are common amongst advanced SoCs, requiring manufacturers to look at each die individually. Some run faster and some slower, some are more leaky, some are less. To balance this difference, engineers determine parameters and guard bands for the different chips in their most optimized operating condition, qualifying them for market deployment while keeping in mind quality and reliability.

But today’s methods lack resolution and depth. Power characterization is performed on select samples, often taken from engineering lots intentionally skewed to represent the different process technology ‘corners’ (extremes). Best-known techniques provide just gross visibility as they collate and average many parameters together, making it hard to correlate measurements to exact manufacturing parameters.

To overcome these limitations, production teams add guard-bands to their measurements to account for the inaccuracies. In addition, lifetime aging effects are factored in, knowing they will cause performance degradation over time, which in turn will lead Datacenters to run them at even higher voltages.

It’s clear that a data revolution is needed.

By embedding intelligence in advanced SoCs, Universal Chip Telemetry (UCT) is turning chips into smart system sensors. Datacenter operators and their supply chains can finally gain the visibility needed to reduce power without compromise.

Personalized Power

UCT by proteanTecs introduces the concept of visibility from within, by incorporating on-chip monitors with AI-driven insights. A cloud-based analytics platform applies Machine Learning algorithms to the UCT data, providing a high coverage, high resolution picture of the system’s power, performance, and reliability throughout its lifecycle.

Similar to the revolution genome sequencing had created for healthcare, UCT is allowing electronics manufacturers to understand each chip’s unique DNA, offering precision profiling. Rather than relying on full-population statistics, engineers can perform a per-chip personalized assessment.

Figure 1: Precision profiling based on UCT allows for end-to-end power reduction

If an SoC is capable of reaching the optimal speed with a lower voltage, proteanTecs’ UCT-based platform can suggest a lower power level during characterization, that keeps the chip within spec. This reduces overheating, early wear out, and significantly lowers operational costs. On the other hand, for chips that run slow, an ideal voltage is suggested to bring it into spec instead of it being scrapped.

After an optimal chip-to-power match is achieved, per chip, at time zero, UCT provides continuous monitoring of the systems in the field for ongoing and predictive optimization.

Datacenters—Power(ed) by UCT

UCT gives Datacenter operators foresight into component degradation, allowing for accurate power and thermal management in the field.

Continuous performance monitoring of timing margins, voltage and clock integrity, workloads, application effects, and thermal gradients enables corrective action while the system is in mission mode. This comprehensive understanding of the aging effects results in proactive data-driven updates, optimizing the operating voltage required to achieve the target frequency. Dynamic voltage scaling is performed without disruption to system operation.

For the first time, Datacenters can gain a deep and actionable understanding of the software impact on the system’s performance. The chips can now ‘sense’ software related issues and to ensure updates were correctly tested and implemented. With this new data, operators are advised on corrective action, to optimize workload and ensure reliability, transitioning to predictive maintenance strategies.

Power Aware Performance

Today’s Datacenters demand round-the-clock resilience, while maintaining the economic equation required for scale. Power management, like unplanned downtime, has gone from being an inconvenience to a painful liability.

With the effects of newer chip manufacturing technologies and complex architectures, silicon aging can mean that voltages need to be increased in order to achieve the same level of performance. Such is the crisis that if a power density increase is needed, it immediately takes a toll on cooling requirements, themselves more stringent than ever before.

With new UCT data streaming from the chips themselves, the ability to gain full-lifecycle deep analytical insight is allowing power optimization from design to field. Manufacturers can apply personalized assessment during production, while cloud providers extend the lifetime of electronic systems in the field. UCT data isn’t just extending the longevity and reliability of today’s Datacenter hardware, but also cutting costs and boosting sustainability.

 


 

Virtual Event - PowerUP Asia 2024 is coming (May 21-23, 2024)

Power Semiconductor Innovations Toward Green Goals, Decarbonization and Sustainability

Day 1: GaN and SiC Semiconductors

Day 2: Power Semiconductors in Low- and High-Power Applications

Day 3: Power Semiconductor Packaging Technologies and Renewable Energy

Register to watch 30+ conference speeches and visit booths, download technical whitepapers.

Subscribe to Newsletter

Leave a comment