With networking and connectivity becoming a bottleneck in the data center, how do we get the humble network switch to 51.2Tb?
While we typically associate low power with battery-operated devices such as smartphones, smart watches and laptops, there are several other less obvious applications where low power has a significant impact on our daily lives. One such example is all the “plumbing” and communications infrastruction, often referred to as high-performance computing, managed by network switches inside a modern hyper-scale data center.
With the explosive growth of online activities driven by work from home, many industry sectors sectors are reporting huge growth in internet usage and e-commerce. We work, learn, play from home while embracing e-commerce and online delivery, telemedicine, virtual fitness and a host of other virtual events and experiences. And all of it seems to have moved to the cloud.
In the early 2010s, close to 40 percent of large companies surveyed said they expected to exceed their IT capacity within two years. Almost a decade later, virtually all businesses, regardless of size or sector, rely heavily on technology to scale and streamline their operations. More than ever, access to massive volumes of data is vital to their success. To increase their ability to process all this data quickly, these businesses must secure more computing and storage capacity from cloud providers who are building out massive data centers across while accelerating deployment of next-generation technology.
When we think of a hyper-scale data center, the first thing that typically comes to mind is the trusty server CPU. Performance and power savings come from a very predictable x86 scaling. We also have witnessed the migration of processing power to FPGAs, GPUs and, more recently, custom systems-on-chip (SoCs) designed in-house by internet giants. With every subsequent technology development, processors historically have made improvements in the very predictable manner defined by Moore’s Law. Other essential components in a hyper-scale data center are wired and wireless connectivity, networking and storage. These also exhibit a natural progression of improvement via the latest Ethernet and networking standards, as well as the latest memory, high-speed connectivity and storage technologies.
The rush to the cloud is centered around the server CPU, artificial intelligence, advanced memories and multi-chip packaging. Frequently, the performance limitation is not CPU performance or the type of advanced memory technology is adopted. Rather, the network and connectivity are the bottleneck. How fast data can move between servers within a rack, between racks, between buildings, across campuses and ultimately to the internet are also key factors.
The unsung hero underpinning this critical infrastructure is the network switch. Within a short span of five years, we have seen network switch host speed double every two years—from 3.2 Tb in 2015 to 12.8 Tb in 2019 to 25.6 Tb in 2020.
We’re not far from 51.2 Tb deployment, especially with advances in high-speed SerDes development resulting in single-lane 112 G long-reach capabilities. This translates to module bandwidth trending from 100 G in 2015 to 200/400 G in 2019. We are now on the cusp of major 400 G to 800 G speed deployment over the next two to three years. This is coupled with improvements in optical components transitioning beyond 28 to 56 Gbaud optics that started in 2019. All these changes coincide with the transition from non-return-to-zero coding to higher-modulation PAM4 (pulse amplitude modulation, 4-level) coding that is far more efficient.
A quick survey of what’s available in the commercial market reveals that the majority of 12.8 Tb SoCs are manufactured at the 16-nm process node. For 25.6 Tb, SoCs moved to 7 nm beginning in late 2019, entering volume production in 2020. First-generation 25.6 Tb SoCs used 50 G SerDes, the best technology then available. More recent announcements indicate that 100 G SerDes chips have finally arrived, and a transition from 50 G to 100 G SerDes as well as migration from 7- to 5-nm process technology is expected.
The benefits are quite significant. Consider a 25.6 Tbps switch: If it is dependent on a 50 G SerDes, the device will require 512 lanes. With a 100 G SerDes, the number of lanes is reduced to 256. The reduction in die area and the power consumption resulting from this dramatic reduction in lane counts is significant. Each of these network-switched ASIC chips consumes a lot of power, greater than 300 W!
The next plateau is 51.2 Tb. So, how do we get there?
New design paradigm
It is expected that 51.2 Tb switch ASIC fabrication will start at 5 nm, eventually migrating to 3 nm. This is predominantly influenced by the longer development cycles and alignment to foundry advanced-process deployment schedules. It is also dependent on both the availability and adoption of the 112 G SerDes over the 56 G SerDes to improve “lane count versus die size versus power” considerations.
Another possibility is that the next-generation network switch will adopt a disaggregated approach and instead use multiple die rather than large monolithic die. This approach would help in two ways. The smaller the die, the higher they yield, especially when die size is pushed to the lithography/reticle limits. Improved yield translates into lower costs. The ability to reuse silicon-proven high-speed SerDes in chiplet form will help accelerate time to market and improve success in early deployment of 51.2-Tb switch ASICs.
This shift will, however, necessitate a re-thinking of design methodology. The move from single- to multi-chip design requires greater attention to die, substrate and package design constraints and boundaries. The high-speed nature of these complex SoCs will create additional design and verification burdens. At 100 G and above, it is no longer SPICE simulation. Designers must account for the effects of inductance, parasitics, transmission line effects (terminations), cross talk and dielectric coefficients of various materials and s-parameters, as well as ensuring application access to channel models.
This results in more complicated thermal designs. It is no longer a matter of managing temperature inside a chip. Also required is monitoring temperature gradients across the die and the location of thermal hot spots. Temperature must now be addressed in its totality from die to interposer to package substrate to heat sink. Even the selection of die-attach materials and thermal grease for the heat sink are design considerations. At this level of design complexity, there is no trial and error.
High-speed network switch SoCs wouldn’t be possible without a number of technology innovations. Besides the obvious high-speed I/O (SerDes), a fundamental set of hard IP is needed to succeed. Other enabling innovations include high-performance processor cores, high-density on-chip memory, high speed interconnect (fabric) and memory bandwidth along with SoC integration.
SoC design platforms must also include IP cores such as 112G-LR PHY, 56G-LR PHY, high-bandwidth memory Gen 2/3 PHY and PCI Express 5.0/4.0 PHY. Additionally, low-power die-to-die PHY IP is needed to support multi-die integration, logic and I/O disaggregation for multi-die implementation. To manage this necessary transition to the 25.6 Tb/s switch, and eventually to the 51.2 Tb/s switch, we need a new design methodology. These include AI-driven design tools, advanced packaging and other aspects of chip design long taken for granted.
Now is the time to kick it up a notch and start our innovation engines.
This article was originally published on EE Times.
Tom Wong is director of marketing for design IP at Cadence Design Systems.