Analyse RTL clock gating to cut processor power
With these aims, the AMD low-power core design team used a power analysis solution that helped analyse pre-synthesis RTL clock-gating quality, find opportunities for improvements, and generate reports that the engineering team could use to decrease the operating power of the design. By targeting pre-synthesis RTL, power analysis can be run more often and over a larger number of simulation cycles—more quickly and with fewer machine resources than tools that rely on synthesised gates. The focus on clock gating and the quick turnaround of RTL analysis allowed AMD to achieve measurable power reductions for typical applications of a new, low-power X86 AMD core.
The AMD Jaguar X86 core is a processor aimed at system-on-a-chip designs for low-power markets and cloud clients. It uses the 28-nm process technology and has a small die area (3.1 mm2). Compared to the previous generation of this core, AMD Bobcat, many blocks were redesigned for improved power efficiency, including the IC loop buffer, store queue, and L2 clocks. The Jaguar compute unit (CU) includes four independent Jaguar cores and a shared-cache unit with four L2 databanks and an L2 interface tile. The L2 interface block runs at the core clock speed. The L2 databanks run at half-clock to save power and are clocked only when required, reducing power even further.
Figure 1: AMD Jaguar compute core architecture.
As design goals included increasing the frequency and instructions per clock cycle (IPC) in this generation of the core, designers worked on timing and minimising the gates between flops. The goal at the start of the project was to lower typical application power by 10%. Ultimately, using a design methodology that included deployment of PowerPro from Calypto, AMD was able to lower the typical power by approximately 20% while increasing frequency at the given voltage by over 10%.
The power analysis flow
In AMD's overall design flow, engineering managers would pick a tag from which to do synthesis at selected intervals. A snapshot of the relevant RTL code would run through PowerPro. Because PowerPro is able to analyse RTL in a matter of hours, AMD could run weekend regressions to make sure all of the simulations passed and to conduct power analysis of the RTL design very quickly, helping increase clock-gating efficiency by iteratively adjusting the existing clock gates based on the PowerPro recommendations. These weekend regressions also allowed the rapid analysis of design alternatives, resulting in significant performance and power improvements, including optimisations that could not have been done at the gate level or that may not have been detected and targeted without the PowerPro reports.
|Related Articles||Editor's Choice|