AMD Allies with Ranovus on Data Center Photonics Module

Article By : Brian Santo

Ranovus and AMD/Xilinx have demonstrated a module to accelerate AI workloads in data centers for less money and less power.

Ranovus Inc. and its customer AMD/Xilinx demonstrated a module that combines the former’s Odin Analog Drive CPO 2.0 and Xilinx’s Versal ACAP operating at 800G. The short story is that this thing should make artificial intelligence (AI) workloads in data centers go a lot faster, without consuming quite so much power as otherwise — and do it (Ranovus contends) for a lot less money.

The long story is… longer. Ranovus and AMD have not only shown that their own technologies work, but their demonstration is additional early proof that co-packaged optics (that’s the “CPO” in the description of Odin) is a successfully formulated concept.

The co-packaged optical (CPO) demonstration system built by Xilinx and Ranovus. It incorporates the former’s Versal ACAP with the latter’s Odin Analog-Drive CPO 2.0. (Source: Ranovus)

“Co-packaging is well under way,” said Ranovus president and CEO Hamid Arabzadeh. The trend has three thrusts, he said: to help improve the performance of traditional Ethernet networking, to satisfy the increasing demands of AI training, and to support the trend of memory and I/O disaggregation.

Odin is a monolithic photoelectronic chip that incorporates drivers and other transistor-based circuitry along with waveguides and lasers. Ranovus introduced the 2.0 version at the end of 2021.

Versal ACAP is Xilinx’s more-than-just-an-FPGA system on a chip (SoC). It incorporates programmable logic, processor cores, interface circuitry, and other support functions. Xilinx calls its Versal an adaptive compute acceleration platform, or ACAP. As the name implies, it’s designed to accelerate workloads with power and flexibility — AI workloads specifically.

Together, Odin and Versal ACAP are the key elements in one instantiation of a CPO. The concept was devised to address multiple challenges that data centers are facing.

The amount of data pouring into data centers ceaselessly increases, so data centers need to move data faster. Increased power consumption correlates with increased data transmission rates, however, and with internet traffic growing at its current rate, power consumption with today’s data center architectures will become untenable in 5- to 10 years.

Photonics vendors have different technological approaches to building lasers and waveguides, but their data center customers have no interest in being forced to choose from among several proprietary approaches. More to the point, they don’t want to be locked in to any particular vendor — data center operators want plug-and-play everywhere they can get it. Hence the idea for CPO modules that can be swapped in and out for repair or replacement.

So the Open Internetworking Forum (OIF — a group of data center operators and their vendors) established its Co-Packaging Framework Implementation Agreement (IA) project late in 2020 to “study the application spaces and relevant technology considerations for co-packaging of communication interfaces with one or more ASICs.”

At the end of 2021, the OIF defined the first Project under the umbrella of the Co-Packaging Framework: the 3.2T Co-Packaged Optical Module IA, which the organization said will “define a 3.2T co-packaged optical module that targets Ethernet switching applications utilizing 100G electrical lanes.” The OIF published its co-packaging framework document just a few weeks ago. Anyone who wants details on specifications can refer to that document, but the umbrella concept is that the organization wants to encourage innovation inside co-packaged modules, but any CPO module has to look like any other from an I/O perspective.

Ranovus and AMD (which is close to concluding its purchase of Xilinx) announced their joint CPO demo in conjunction with OFC show (March 6-10) to highlight, as Ranovus said, “the arrival of the CPO 2.0 solution for AI/ML platforms that demand power efficient, high throughput and high density optical interconnect.”

Arabzadeh observed that many existing interfaces top out at about 200G. Odin scales from 800Gbps to 3.2Tbps in the same footprint, he explained, due in large part to the company’s ability to embed lasers directly into the silicon substrate of its ICs. Arabzadeh believes his company’s approach is entirely unique, starting with the company’s quantum dot lasers, and moving on to the ability of the module to provide Nx100Gbps PAM4 Optical I/O channels for Ethernet switch and ML/AI silicon in a single packaged assembly.

By integrating the optics into the package using Versal Premium ACAPs, AMD and Ranovus says it is able to drastically reduce power, simplify board routing, and reduce cost.

Asked about the costs, Arabzadeh said that existing 400G modules are priced at about $800 per unit —or roughly $2 per gig. “We’d be one-tenth of that,” he said. He held up a standalone Ranovus laser and then referred to the demo Ranovus/Xilinx CPO moducle and said, “This individual laser costs more than the module, because of packaging.”

Ranovus is also controlling costs with its production approach. “Everything is wafer-level,” Arabzadeh said. The company identifies known-good die on the wafer, selects those, and doesn’t have to bother with the rest.

Regarding power, Arabzadeh said the module draws roughly 4 Watts, whereas the electronics that the modules replace draw roughly 14W-17W.

Ranovus also has a version of its chip which lacks the integrated lasers. The module demonstrated with AMD/Xilinx is optimized for AI workloads, but “if you look at Ethernet, it’s all based on modules. That’s just the ecosystem,” Arabzadeh said. So data center operators including Facebook and Microsoft still want a version with standalone external lasers, he explained.

Ranovus has one approach to go after the Ethernet segment of the market, and just demonstrated another aimed at the AI market. What about the third thrust in the CPO market? Arabzadeh said Ranovus will be making “an announcement for disaggregated server architectures, with pooled memory, and relying on CXL” later.

This article was originally published on EE Times.

Brian Santo is Editor-in-Chief of EE Times. He has been writing about technology for over 30 years, for a number of publications including Electronic News, IEEE Spectrum, and CED; this is his second stint with EE Times (the first was 1989-1997). A former holder of a Radio Telephone Third Class Operator license, he once worked as an engineer at WWWG-AM. He is based in Portland, OR.

 

Subscribe to Newsletter

Leave a comment