PCIe Generation 4 is just beginning to hit the market in processors and GPUs, yet many companies are already anticipating PCIe Gen 5 within a couple of years and the specification for PCIe Gen 6 is in development. Amazingly enough, the PCI Special Interest Group (SIG) has set its goal to double data throughput in each of these generations, even as the industry pushes the limits of board and packaging technology. This is not an easy goal to achieve, but there's enough commitment and optimism by the PCI SIG members to maintain this pace.

The SIG announced PCIe 4.0 official compliance testing will be available this August. The PCIe Gen 4 spec supports 16 giga-transfers per second (GT/s) and the forthcoming Gen 5 doubles that to 32GT/s (using NRZ encoding). With PCIe Gen 6, the group plans to double the rate again to 64 GT/s per lane. To get to that speed, the committee chose to adopt the proven PAM4 signaling from the telecommunications industry for 56Gbits/s interfaces.

The faster speeds of PCIe Gen 4 and beyond are important to keep pace with the changes in the data center. Processors are integrating higher numbers of CPU and accelerator cores. They need more memory, storage, and interconnect bandwidth to scale.

Expanding the bit width of PCIe is not a realistic option because it takes up more packaging pins and board space. The key is to keep making each PCIe lane faster.

At some point, the burden of board layouts and limitations of copper interconnects may require direct coax cabling or optical cabling, but those options are expensive today. PCIe must be fast, but it also must be cost efficient for mainstream PCs, laptops and servers.

PCIe also is a platform for other standards to compete because it’s a primary link among CPUs, GPUs, FPGAs and accelerators for heterogeneous computers. Even though there are other proprietary and standards-based buses such as NVLink and OpenCAPI, most still rely on the base technology of the PCIe physical layer.

In particular, the timing of the new PCIe generations has a direct impact on two competing accelerator connection standards that allow CPUs and accelerators to share memory—the Cache Coherent Interface for Accelerators (CCIX) and the Compute Express Link (CXL).

The CCIX standard is available today and runs on PCIe Gen 4. Presently, the major silicon proponent for CCIX is Xilinx, but many other vendors have signed up for the CCIX group. Earlier this year, Intel released the competing CXL specification that will run over PCIe Gen 5.

CCIX has the early lead considering it has already been implemented in Xilinx and Huawei products and has more than 50 CCIX consortium members. But Intel has put together a formidable array of systems companies (mostly Intel customers) behind CXL, even though products using CXL probably will not ship until 2021. In many ways, the CXL spec is a subset of the CCIX standard, yet it has the support of Intel.

The CCIX standard was created to offer a balanced approach where all compute elements are peers and supports a symmetric level of coherency. With CCIX, you can create a mesh network of CPUs and accelerators, giving all compute elements equal capability.

Due to its asymmetrical design, CXL does not support near-memory processing fully nor memory expansion with fine grain data sharing at all. This last trait is useful for speeding up database applications with accelerators.

Another key benefit of CCIX is that it leverages existing PCIe infrastructure while CXL will require dynamic hardware bypass of the PCIe data link layer. These changes will require significant new compliance tests. The companies providing IP for PCIe controllers are only now able to evaluate the changes required to support CXL.

Intel is pushing the CPU-centric CXL as a standard, but it was developed purely inside Intel. The key feature that Intel and its partners note is that it minimizes memory latency over the PCIe bus. It accomplishes that lower latency by replacing the PCIe data-link layer (DLL) on the CPU side and bypassing the PCIe driver. The PCIe logic must be able to swap the PCIe DLL with the CXL control logic. This approach does save a few nanoseconds, but at the price of flexibility.

The asymmetrical control logic puts more burden on the host processor, an approach used in the past with PCI and USB which Intel also helped define. Intel has set PCIe Gen 5 as the target physical layer because that’s the best time to intercept implementation with Intel’s own silicon. That said, there appears to be no reason CXL couldn’t run on PCIe Gen 4.

Several chip and system suppliers are members of both camps, including Arm, Huawei, and Mellanox. Server chip companies such as AMD, Ampere, and Marvell have not made their direction clear. Meanwhile, NVIDIA continues to promote its proprietary NVLink interface and is acquiring Mellanox while IBM is committed to OpenCAPI for Power servers. AMD also has its own Infinity Fabric, but to date it has been reserved for internal use only.

The good news is all these coherent connections rely heavily on the work of the PCI SIG and its development of advanced signaling and mechanical standards. And so far, that progress looks safe for the next few years.

Kevin Krewell is a principal analyst for Tirias Research.