CXL Put Through Its Paces

Article By : Gary Hilson

A growing list of vendors have demonstrated interconnect interoperability via the Compute Express Link spec.

The Compute Express Link (CXL) has evolved to the point where the pipeline of enabling technologies is emerging.

The recent SC21 supercomputing conference provided an opportunity for several vendors to demonstrate their contributions to the growing CXL ecosystem, with technologies spanning controllers, testing and validation and memories.

As CXL’s value proposition focuses on making disaggregated resources including memory available on demand for a given workload, another focus is composability, a capability frequently associated with heterogeneous computing.

For example, endpoint controller IP from Cadence Design Systems emerged as the CXL specification was taking shape, said Gopi Krishnamurthy, a Cadence architect for PCIe and CXL. “A major challenge in developing this IP while the specification was still evolving in 2020 was the lack of CXL 2.0 host platform availability for [interoperability] testing.” Cadence partnered with Intel to simulate testing with future processor designs that implement CXL support. “Through simulation, Intel could ensure backward compatibility against its existing CXL 1.1 solutions,” Krishnamurthy said.

The interoperability simulation involved dropping the Cadence CXL endpoint’s register transfer level into Intel’s RTL simulation environment, replacing the endpoint bus functional model. The partners then developed a joint test plan that included use cases and traffic patterns for CXL 1.1/2.0 link training and discovery, CXL.mem, CXL.cache and mixed CXL.mem/CXL.cache traffic.

Co-simulations used FPGAs that were tested on early samples of Intel’s Xeon processor codenamed Sapphire Rapids. “We have the application logic to support all three classes of CXL traffic,” Krishnamurthy noted.

The demonstrations also employed a Teledyne electro-analyzer to connect host and device to monitor traffic flowing through the link.

Synopsys employed a full-speed trace and analysis using a Teledyne LeCroy CXL analyzer to demonstrate a complete CXL connection using Synopsys DesignWare CXL Root Complex and Endpoint Controller IP.

Teledyne LeCroy demonstrated how its protocol analyzer validates and debugs links running on the CXL protocol. The company’s Summit T516 protocol analyzer and Summit Z516 protocol exerciser can run CXL 2.0 compliance tests, and can also be used for PCI Express 5.0 protocol analysis.

The entire CXL protocol analysis can be automated using Teledyne LeCroy’s LinkExpert software, including a graphical user interface representing all traffic on the CXL link from the logical physical layer, outbox negotiations, data link layers for CXL.io and .cache and .mem devices as well as transaction layers.

The automated compliance tests provide detailed reports on trace files, verifying that devices meet CXL specifications.

Ensuring CXL interoperates in deployment, not just in the lab, is the focus of Mobiveil, a consortium member developing standard silicon IP blocks for data centers, networking and storage applications. CEO Ravi Thummarukudy said Mobiveil strives to keep its IP blocks current with the latest specifications, while integrating its IP blocks with ASIC and FPGA designs.

“It is critical for our customers to interoperate their CXL design with other CXL based solutions, such as CPUs, switches, ASIC and FPGAs in the system,” Thummarukudy said.

Mobiveil checks its designs using third-party verification IP as well as porting its control design to FPGA platforms, then conducting interoperability tests with other CXL component suppliers. Among the validations was demonstrating that its CXL 1.1 controller IP worked with Intel’s Sapphire Rapids Xeon processor using an interoperability framework in a data center. Thummarukudy said the company confirmed that its controller IP interfaced with Sapphire Rapids using the CXL protocol, successfully executing data transfers on CXL.mem and CXL.cache paths.

Mobiveil’s COMPEX controller supports host and device mode for applications such as data center accelerators, memory expanders and AI and machine learning applications. The controller IP also supports dual mode, meaning it can be configured to operate either as a host, or as any device type.

Composable

That flexibility speaks to CXL’s promise of composable architectures that optimize component usage. That’s the focus of Elastics.cloud, a startup targeting functionality across tiered data centers. Getting the most out of memory and storage without overprovisioning to create a more heterogeneous computing environment is also enabled by CXL.

Elastics.cloud demonstrated how the CXL interconnect can link servers to use a memory appliance for data-intensive workloads. (Source: Elastics.cloud) (Click on image to enlarge.)

Elastics.cloud concentrates on memory disaggregation and pooling by combining its proprietary logic with the CXL interconnect. It is also using the Intel Sapphire Rapids CPU to run the CXL.io protocol. An FPGA card with 32 gigabytes of memory accessible by the host processor is inserted in the server I/O slot, connecting to the host CPU via CXL. At the same, a similar X86 server was used as a memory appliance. The two FPGA cards are connected with a cable running standard Ethernet.

Not only is the application server able to access the 32 gigabytes of expanded memory on its own FPGA card, it can also access remote pooled memory on the memory appliance FPGA card. That configuration allows data-intensive workloads to optimize memory utilization. Elastics.cloud will implement additional functionality enabling processors, accelerators, memory, storage and networking components to leverage its technology as a way to optimize resources, thereby matching workload requirements via the CXL spec.

CXL’s three protocols can be used alone or in combination for specific use cases, as accelerators in memory to support dense computation or memory buffers to support memory capacity expansion and storage-class memory. (Source: CXL Consortium)

GigaIO is also exploring CXL benefits in the cloud and data centers through composability by unlocking server resources. That allows disaggregated resources to be assigned to servers on-the-fly via software. It also creates “virtual servers” for specific jobs; once the task is completed, resources are returned to the pool for the next task.

GigaIO is also boosting resource utilization, reaping the benefits of heterogeneous computing: required computing resources can be as needed, then released for the next job.

The demonstration highlights how CXL can build a universal, dynamic fabric where a processor on one server can shift to another to tap memory with lower protocol overhead and cache coherency. That allows GigaIO’s framework to compose resources as needed. The company said it expects to have full memory pooling capability with cash coherency in the latter half of next year.

Samsung, meanwhile, is among the first to release a CXL product. Its DDR5 DRAM-based memory module targets data-intensive applications like AI and high-performance computing requiring servers that can significantly scale memory capacity and bandwidth. It has also released a Scalable Memory Development Kit for heterogeneous memory systems.

Samsung’s CXL implementation was integrated with a SAP HANA in-memory database running an Intel Sapphire Rapids CPU (Source: Samsung)

Andrew Chan, director of Samsung’s memory lab, said the company integrated its memory device, featuring a coherent load/store interface, with an SAP HANA in-memory database running on Intel Sapphire Rapids CPU.

The integration “validates the CXL ecosystem for this new type of memory device,” Chan said, creating opportunities to rebalance computing, memory capacity, power and performance in data centers.

Introduced in 2019, CXL 1.1 was widely used for many demonstrations. The 2.0 version was released earlier this year, adding switching support to enable device fan-out, memory scaling, expansion and the migration of resources as well as memory pooling support to maximize memory utilization. It also reduced memory overprovisioning, provides standard management of the persistent memory interface while enabling simultaneous operation alongside DDR. That step frees up DDR for other uses.

Underscoring how the memory spec has gain traction, the CXL Consortium recently signed a letter of intent to absorb assets of the Gen-Z Consortium. The transition is expected to be complete by next summer.

This article was originally published on EE Times.

Gary Hilson is a freelance writer and editor who has written thousands of words for print and pixel publications across North America. His areas of interest include software, enterprise and networking technology, research and education, sustainable transportation, and community news. His articles have been published by Network Computing, InformationWeek, Computing Canada, Computer Dealer News, Toronto Business Times, Strategy Magazine, and the Ottawa Citizen.


Subscribe to Newsletter

Leave a comment