Nvidia's 64bit Denver: The second coming of Tegra K1
Two cores will ship this year in an SoC that is an upgrade to Nvidia's Tegra K1, targeting tablets. The existing 32bit chip targets Android and is used in an Acer Chromebook, Google's Project Tango tablet, Xiaomi's MyPad, and Nvidia's own Shield tablet.
Nvidia claims the 64bit Tegra K1 will sport PC-class performance in mobile systems for gaming, business apps, and content creation. Denver was nearly on par with an Intel Haswell processor and surpassed by 10 per cent to 25 per cent an Apple A7 series SoC in benchmarks Nvidia showed.
Nvidia only showed benchmarks against the x86 and 32bit ARM SoCs.
The company did not give any comparisons with a standard A57 64bit core from ARM. Targeting servers and networking gear, AMD just started to sample SoCs using the A57, and Applied Micro has started sampling its custom 64bit ARM.
Until benchmarks against standard and custom 64bit ARM SoCs emerge, it's not clear whether Denver will help Nvidia improve its position in mobile systems, where it significantly trails leader Qualcomm.
Denver can execute as many as seven instructions per clock, running up to a 2.5GHz rate. It packs a 128KB+64KB L1 cache and 2MB 16-way set associative L2 cache.
The most novel aspect of Denver is an optimised execution feature used as an alternative to a full out-of-order design. It handles a variety of optimisations such as renaming registers, unrolling loops, breaking false code dependencies, and removing unused computations.
The optimiser chains related routines and uses 128MB of main memory, securely partitioned before an operating system boots. "We see a 2x speed-up or better with optimised routines," said Darrell Boggs, chief architect on the project, speaking in a talk at the annual Hot Chips conference here.
The new core marks the end of Nvidia's use of a companion core, something it pioneered with its early 32bit ARM SoCs. ARM continues to pursue the approach with mixed 32bit and 64bit cores.
Among other techniques, Denver can reuse memory pipelines for integer traffic, and it has a pre-fetch to compensate for cache misses.
Denver is a microcoded seven-wide superscalar 64bit ARM.
- Rick Merritt