CXL Spec Grows, Absorbs Others to Collate Ecosystem

Article By : Gary Hilson

The CXL spec is arguably the fastest-evolving specification in the computing world, but even with many vendors developing CXL products, there is a lot of work to be done to build out the ecosystem.

The Compute Express Link (CXL) protocol is arguably the fastest-evolving specification in the computing world, with the third iteration published just a bit longer than three years after its inception. But even with many vendors developing CXL products, there is a lot of work to be done to build out the ecosystem.

The recent Flash Memory Summit provided a forum for the latest features of the protocol, as well as myriad vendors outlining how they are contributing to the ecosystem. It was also a platform to announce further consolidation of related standards under the CXL group, which recently became a formal consortium.

Regardless of where they fit into this ecosystem, a recurring theme was that the CXL spec is revolutionary rather than evolutionary, unlike other protocols like PCI Express (PCIe) that have been steadily plugging away for more than a decade.

CXL 3.0 has added advanced switching and fabric capabilities, efficient peer-to-peer communications, and fine-grained resource sharing across multiple compute domains. Overall, it supports even more disaggregation, which will have a significant impact on the data center.

First introduced in March 2019, CXL is an industry-standard interconnect that offers coherency and memory semantics using high-bandwidth, low-latency connectivity between the host processor and devices like accelerators, memory buffers, and I/O interfaces. It runs across the standard PCIe and uses a flexible processor port that can auto-negotiate to either the standard PCIe transaction protocol or the alternative CXL transaction protocols.

What’s becoming apparent is the CXL interconnect promises to have a significant impact on the data center as it strives to keep up with exponentially growing data and computation requirements that can’t be met cost-effectively by just adding more and more memory. With the number of cores in CPUs growing and the need for more bandwidth to memory, the memory itself needs to be more efficient, CXL Consortium president Siamak Tavallaei said in an interview with EE Times. CXL provides the necessary methodology for an infrastructure that’s sufficient for data centers and large cloud computing environments.

Since the development work began on CXL 2.0, different teams within the consortium have tackled old use cases and developed new ones to leverage the protocol, which has led to the features in the latest version, he said. CXL is now at the point where it’s no longer just PowerPoint presentations and a written specification. “Silicon-based solutions are already in development, validation, and qualification,” Tavallaei added.

The CXL transaction layer comprises three dynamically multiplexed protocols on a single link. (Source: CXL Consortium)

From memory pooling to sharing

A key value proposition of CXL is that it is a common and standard way of moving data; others that have done a good job have been proprietary.

Just as important is backward compatibility, said Tavallaei, even with all the added features introduced in CXL 3.0.

At a high level, CXL provides a method for multiport devices. While memory pooling was an integral part of CXL 2.0, the notion of fabric is introduced within CXL 3.0.

The first iteration of CXL was designed for point-to-point connection, but the evolution to date has led to “fanning out” of capabilities with a more complex formation of devices, switches, and processors. Security became more important in CXL 2.0, which led to the addition of IDE as an encryption method over the link.

All this momentum has led to the formation of multiple working groups within consortium to enable the new features in CXL 3.0, Tavallaei said.

With CXL 2.0, memory pooling did not allow data to move from one virtual hierarchy to another hierarchy, but with CXL 3.0, multiple devices connected to a switch can now talk to each other: Switches can now be cascaded and interconnected using fabric ports, he said, creating a larger fabric that interconnects a large ensemble of devices, including accelerators, memory, and storage.

But even as new features have been announced in CXL 3.0, most vendors are only now getting CXL 2.0 products out the door and mastering features, such as pooling.

CXL 2.0 augments CXL 1.1 with enhanced fanout support and a variety of additional features with switches acting as the lynchpin, supporting multiple hosts and devices in a single layer. (Source: CXL Consortium)

Parag Beeraka, director of segment marketing at ARM, said memory pooling is how CXL is enabling a new data center architecture and addressing the rising costs as more memory is required.

“DRAM is one of the highest-expense items in the data center, so anything that can increase efficiency of already existing hardware will indirectly contribute to reduced total cost of ownership,” he said. And with hyperscale workloads becoming more diverse, there’s a need for more configurability. “You don’t want to build machines for specific workloads but rather be able to configure general-purpose servers to different workloads.”

Not unlike how small amounts of “hot” data was worthy of the expense of flash storage when NAND was rather pricey, CXL also opens the door for routing data to different memory and storage resources.

“One can make appropriate choices on enabling the right memory based on the workload,” Beeraka said. With CXL, more memory can be added to servers or memory can be pooled. “Pooling solutions will really help enable higher memory capacities.”

Tiering and disaggregation drive data center efficiencies

The concept of memory tiering is not unlike how Intel had positioned Optane as a level between DRAM and flash. Even with both it and Micron Technology deciding to abandon the development of the 3D Xpoint technology—in favor of focusing on CXL, no less—it goes to show how adding new tiering options has legs.

The efficiency of near memory is growing, Beeraka said, and CXL-enabled memory disaggregation works from a data center point of view, and DRAM costs come down slightly with memory expansion.

The other exciting benefit of pooling and desegregation is that application performance requirements can be met as optimally as possible, said Sid Karkare, AMD’s director of cloud business development.

It’s possible to tier DDR4 and DDR5 DRAM so that you can mitigate the costs of adding DDR5 by allocating only when necessary. You can also adjust for latency requirements: Some applications may be able to handle higher latencies. With pooling, the system memory composability increases.

Another challenge that CXL can solve is stranded memory—memory that has not been optimally attached to a given server.

“How do you allocate that memory on demand as required and, in general, sort of reduce the overall capex costs for a data center?” Karkare said. Page migration plays a role in tiered memory and can be accomplished with software, or you can do it hardware, he said. “There are pros and cons to both approaches.”

With software, the application has a better understanding of when it sees a slowdown in performance. “If you do it in hardware, then the performance is better,” he said. “We have seen both approaches being explored in the CXL ecosystem.”

Micron sees CXL as enabler of pliability in the data center, said Ryan Baxter, senior marketing director for data center at the company. “It boils down to server mix and the types of problems that customers in the ecosystem are really wanting to solve.”

A good example is artificial-intelligence servers and how they must advance between now and 2025. The amount of storage and memory necessary can be accessed today, with CXL acting as the higher-performance interface to enable memory expansion. Baxter said storage today isn’t fast enough for applications that support real-time answers use cases such as fraud detection and recommendation engines. “That means memory. That means DRAM.”

However, there’s limitation as to how many additional memory channels that can be applied in a CPU or in a server. “And this is where CXL comes into play,” Baxter said. “We believe CXL enables a significant degree of platform pliability that really gets us to where we need to be.”

Otherwise, he said, the answer is stacking DRAM, and that becomes extremely expensive. “The ASP per gigabit becomes non-linear.”

Micron’s customers are looking to “flatten” the memory space and lean on CXL as a memory channel. “The industry’s driving a new sort of heterogeneous architecture,” Baxter said. “CXL allows you to dial up the right combination of compute and the right combination of memory in the right place at the right time.”

SK Hynix also sees CXL as a gateway toward efficient use of computing, acceleration, and memory resources, said Uksong Kang, the company’s VP of memory planning, because it allows for memory bandwidth and capacity expansion, memory media differentiation, and control differentiation. It also allows for what Hynix calls “memory as a service” (MaaS).

Aside from being able to add memory capacity via a CXL channel, the protocol is memory-agnostic and non-deterministic, so there is more flexibility as to the type of memory that can be added, he said. “We can either choose to have standard memories such as DDR5, or we can have even custom memory media as needed.” Having choice of memory allows for balancing tradeoffs in performance, capacity, and power design.

Having a second tier of memory also allows for more control differentiation and integrates more features, such as error-correction control, security functions, low-power functions, acceleration, or a computation engine, Kang said. “By doing local computation, we can prevent data from moving back and forth between the CPU and the memory.”

Local computation increases power efficiency and performance. MaaS is applicable when the infrastructure and ecosystem is ready for memory pooling, he said, as CXL enables memory capacity to be allocated with memory pool virtualization or building a composable, scalable rack of memory pool appliances that can be populated with different types of memory media.

Growing ecosystem faces uncertainties

Kang sees the industry being at the ecosystem-enabling stage.

As the market expands, there will be an opportunity for a diverse type of memory solutions. “Even though we know that CXL is going to be a game-changer in the future, there are many uncertainties about what the market volume is going to be,” he said.

The ecosystem, of course, is more than just memory but also other critical components, such as controllers and retimers.

While Micron announced it would focus its efforts on CXL in lieu of further developing 3D Xpoint technology, it has yet to formally announce a CXL product.

Samsung’s first CXL offering is a DDR5 DRAM-based memory module targeted at data-intensive applications, such as AI and high-performance computing, that need server systems that can significantly scale memory capacity and bandwidth.

Rambus has been quick out of the gate with IP to help build the ecosystem for CXL by integrating controller and PHY technology from its acquisition of PLDA and AnalogX, respectively—these technologies complement the company’s expertise in server memory interface chips.

Astera Labs only just announced that its Leo CXL Memory Accelerator Platform has begun pre-production sampling for customers and strategic partners. The platform is designed to address processor memory bandwidth bottlenecks and capacity limitations by allowing CPU to access and manage CXL-attached DRAM and persistent memory so that centralized memory resources are used more efficiently—the access can also be scaled up without slowing down performance.

Samsung was early out the gate with a CXL memory offering—a DDR5 DRAM with the expectation that DDR5 will be the most cost-effective solution in terms of bandwidth, capacity expansion, speed, and reliability, as well as power efficiency when CXL really gains widespread traction. (Source: Samsung)

Building the CXL ecosystem is more than just about different products.

The protocol is heavily intertwined with PCIe—CXL 1.0 aligns to 32-Gbps PCIe Gen5. Tavallaei said further development of CXL will seek collaboration with those working on the PCIe specification, with its seventh iteration already in development and expected to double its data rate.

The CXL Consortium also just announced a joint work group with the JEDEC Solid State Technology Association on development of DRAM and persistent memory, with the aim to reduce duplication of efforts.

Another group that was doing a lot of overlapping work with the CXL Consortium was the Gen-Z Consortium. Late last year, both parties agreed the Gen-Z specifications and assets would be transferred to the CXL Consortium. Gen-Z predates CXL and uses memory-semantic communications to move data between memories on different components with minimal overhead, including memory devices, processors, and accelerators.

Similarly, the OpenCAPI standard is also being swallowed by CXL, even though it also predates it by several years. OpenCAPI was one of the earlier standards for a cache-coherent CPU interconnect and was an extension of IBM’s existing Coherent Accelerator Processor Interface (CAPI) technology, which the company opened to the rest of the industry under the control of a consortium.

 

This article was originally published on EE Times.

Gary Hilson is a general contributing editor with a focus on memory and flash technologies for EE Times.

 

Subscribe to Newsletter

Leave a comment