Qualcomm will describe at Hot Chips the Falkor dual-CPU custom ARM core in its Centriq server processor slated to ship later this year.
SAN JOSE, Calif. – Qualcomm will describe the custom ARM core inside its first server processor at Hot Chips this week. The Falkor CPU is at the heart of the company’s 10-nm Centriq 2400, a 48-core SoC that will ship later this year, targeting big data centers.
To date, a handful of companies have tried to gain footholds in servers with ARM-based products. They have generally failed so far because their parts could not match the performance of Intel’s x86-based Xeon. However, earlier this year Microsoft’s data center group announced it is testing SoCs from Qualcomm and rival Cavium.
It’s still unclear how Qualcomm will fare. The company did not provide any performance, power consumption or price information on its parts.
The ARM chips debut at a moment of unusually high competition. AMD just started shipping Epyc, its x86 server SoC with up to 32 cores, while Intel just refreshed Xeon with its Skylake architecture. Both are made in 14-nm processes.
Perhaps the most interesting disclosure about the 64-bit Falkor is it consists of two custom ARMv8 cores. Thus, the 48-core Centriq is actually made up of 24 dual-core processors running at about 1V.b
The dual-core approach is roughly similar to one used by AMD’s Bulldozer, a 2010 x86 core that struggled to compete with rival Intel. The dual Qualcomm CPUs share an L2 cache and ring interconnect with more than 250 GBytes/second aggregate bandwidth.
Each out-of-order Falkor core can dispatch up to three instructions and one direct branch per cycle. They use a pipeline that supports 128-bit loads and stores and varies in length depending on the operation.
The cores and L2 caches can run at independent power states. They are managed by a block head switch or low-dropout regulator from a shared supply rail that acts as a hardware state machine speeding state transitions.
While the dual cores share one L2 caches in each block, the L3 is a central cache running on the SoC’s ring bus. Qualcomm did not provide sizes of the caches.
The SoC supports six DDR4 channels at 2,667 MTransfers/s and 32 PCI Express Gen 3 lanes. AMD bests both Qualcomm and Intel providing eight DDR4 channels and 128 PCIe Gen 3 lanes on Epyc.
The Centriq incudes a south bridge block supporting SATA, USB and other I/Os. The chip fits in a 55mm2 LGA package.
The SoC supports ARM’s virtualization, TrustZone security and instruction extensions to accelerate crypto operations. In addition, it stores boot load and authentication code in an integrated ROM.