Memory system moves beyond the 100GbE restraint
While power consumption considerations in designing memory system yield system-level trade-offs, MoSys has developed an architecture that integrates in-order request queues, the weighted-round-robin scheduler, and the multi-cycle macro offload elements in a serial memory—reducing processor power consumption per bit and the alleviating the challenge involving I/O as the source of delays.
All interrelated system-level trade-offs, including performance, pin count, and area, ultimately are driven by power consumption considerations. At 100GbE and 400GbE, network chip vendors must consider end-to-end solutions for equipment OEMs. To remain competitive, OEMs plan to introduce multi-terabit systems that aggregate multiple 100Gbit/s ports on each line card.
Two current technology trends, 100Gbit/s line speeds in network appliances and the transition to IPv6, compound design complexity. At both the network SOC and OEM appliance levels, solutions have to deliver performance, network management, and quality of service. Crucial parameters include absolute delay, delay jitter, minimum delivered bandwidth, and packet loss. Network engineers monitor and manage networks based on these parameters, which also serve as the basis of contractual service-level agreements.
The IPv6 standard emerged to meet the rapidly diminishing number of addresses available in IPv4. This move to a 128bit number requires more complex processing, including IP search functions in network appliances, which require significantly larger address tables. Interestingly, a recent survey of the global Regional Internet Registry community identified "Vendor Support" as the biggest hurdle to IPv6 adoption.
At 100Gbit/s line speeds, packets arrive every 6.7ns. The challenge for packet processors, then, is to handle high read/write interface transactions (at least two per packet arrival) and avoid adding delays to the system. Unfortunately, every off-chip transaction increases power consumption by an order of magnitude as compared to on-chip access.
Because of all the above factors, it's time to consider a new architectural approach that shifts the traditional relationship between memory and the packet processor. In prior system architectures, designs optimised read/write accesses. With intelligent serial-memory architecture, the processor transmits instructions, and intelligent serial memory transmits results in return. This means that there are few processor read/write interactions with off-chip memory, which significantly reduces processor power consumption per bit. In addition, it reduces the persistent challenge of I/O as the source of delays.
This approach addresses the frequency gap between processor and DRAM. While processor frequency has increased 75%/year, DRAM has increased at only 7%/year. At 100 GbE and above, this frequency gap between processor and memory is known as the "memory wall." Traditionally, designers simply added external memory to overcome this inherent latency. At 100 GbE, there simply aren't enough pins to handle the parallel interface with DRAM.
Serial interfaces transfer more data per pin and per watt than parallel I/O, resulting in higher interconnect and energy efficiency. When purpose built for the task, serial transmissions can result in no latency penalty. At MoSys, we've developed an intelligent memory architecture to address the power, I/O, and latency issues above. Our approach combines three elements in a serial memory: in-order request queues, the weighted-round-robin scheduler, and the multi-cycle macro offload. This architecture can streamline bandwidth and latency intensive functions like buffering and table indexing, along with offloading recursive/iterative functions such as exact match and longest prefix match.
- Michael Sporer
Director of IC Marketing, MoSys
|Related Articles||Editor's Choice|