Cisco ups ante in networking ASIC design
Cisco Systems Inc. has debuted a router that packs on a custom 40-core processor a wide variety of networking services. The Ethernet giant aims to leverage its expertise designing complex ASICs to leapfrog competition in the $5-billion market for edge routers.
The Quantum Flow Processor takes to a new level Cisco's work on ASICs for its networking systems, surpassing in some ways technology in mainstream server CPUs from Intel Corp. and Sun Microsystems Inc. Analysts said the move is a savvy one for Cisco, although some complained the company is keeping details of its new chip sketchy.
Cisco claims it spent $250 million and five years developing its Aggregation Services Router 1000, $100 million of that just on the flow processor. The router handles at rates up to 20Gbit/s functions including firewall, IPSec virtual private networking, deep-packet inspection and session border control.
"There are half a dozen or so appliances all being used to provide these functions at the edge of carrier and end-user networks," said Pankaj Patel, general manager of Cisco's service provider group. "Our value proposition is to put them in one small form factor box to reduce capital and operating expenses," he added.
Competitors such as Juniper Networks and Redback Networks—and Cisco's existing 7600 series routers—typically slot multiple cards in a chassis or stack appliances in a rack to handle all the features increasingly being processed on the network edge, said Eve Griliches, a telecom analyst at International Data Corp.
"The more integrated you get it, the better performance you get when you are trying to run all the services at once, and eventually users will want to run all these services at once," Griliches said.
40-core custom chip
Key to the system is the 1.3 billion transistor flow processor, an 80W chip made in a 90nm process at Texas Instruments Inc. and designed using Cisco's customer-owned tooling. Each of its 40 Tensilica cores can handle up to four threads, far beyond the raw thread-level parallelism of Sun's 65nm Niagara or Intel's 45nm Penryn server CPUs.
"We looked outside and internally to see if there is anything we could use but nothing came close," said Nikhil Jayaram, director of engineering in Cisco's mid-range routing group. "Other architectures were about packet processing, but we wanted to do flow processing of stateful traffic," he added.
"Multicore processors and complex aggregation routers are converging in a way that means that the most complex communication processing chips now dwell at the edge of the public network," said Loring Wirbel, director of the EE Times market intelligence unit. "The center of the network now means big, dumb, high-speed bit-pushing, while all the smarts reside at the edge of the public network, and core routers like Cisco's CRS-1 are no longer the premier platforms for high-performance network processors," he added.
The company hopes the processor will be used in a wide range of routers and be actively upgraded for years in the field. But success in the dynamic edge-networking market, which is growing at double digit rates, is not assured, said Griliches of IDC.
"The market is littered with router makers who have tried to deliver all the services in one box, but have not done so sufficiently well because it is not easy to do. Putting everything in one chip is a step in the right direction," Griliches said. "A lot of their competitors will be moving in this direction," she added.
The company's track record in ASICs has been measured and successful, said Bryan Lewis, a VP of research at Gartner.
"Cisco is doing fewer ASIC designs than they have done in the past, but the revenue per ASIC design is growing, causing them to be one of the top buyers in total ASICs and the top buyer in wired comms," said Lewis. "In other words, Cisco is very selective in what ASICs they take on each year, but each design they do internally has been very successful in generating significant revenue for them," he added.
The flow processor appears to have an edge on merchant network processors, but it's difficult to tell because the company is guarded about releasing substantive details on the proprietary part.
"Most NPUs are still working largely at layer 2 and 3 mainly forwarding packets and not doing a lot of upper-level processing," said Bob Wheeler, analyst with The Linley Group.
Intel and Cavium Networks have designed 10G network processors that approach what Cisco is delivering. The Intel IXP 2800 uses 16 programmable cores to run services on cards. It was transitioned to startup Netronome that is developing a 20G version.
Cavium's Octeon uses 16 MIPS cores that can handle some layer 4-7 services jobs. It sports an embedded pattern matching engine, but requires off chip TCAMs for packet classification.
For Cisco, "the challenge was turning a multi-processor into a network processor," said Jayaram, a former chip designer at Digital Equipment Corp.
As many as 100 engineers took part in the project, including former microprocessor designers from AMD, Cyrix, Intel and Sun, as well as the team that designed the multicore ASIC for Cisco's CRS-1 core router.
The group pushed detailed chip design to a new level even for Cisco, one of the top captive ASIC design companies in the world. They worked on circuit and memory designs, did their own chip layout and RTL—even designed their own package, another Cisco first.
"One of our biggest challenges was signal integrity and the package plays a very big role in that," said Jayaram. "A poorly designed package can really bite you in power and signal integrity, but our substrate is almost invisible from an SI perspective," he added.
Keeping the 1.2GHz processor fed was another issue. Cisco opted for a flat memory model using multiple channels of second-generation reduced latency DRAMs and various memory blocks inside the chip.
"I suspect we use more on- and off-chip memory than anyone else," said Jayaram.
The flat model for system DRAM helps keep programming the device in C code simple compared to some network processors that use fragmented banks of TCAMs and other memory structures.
By using up to four threads per core, the chip can hide some of the latency that comms processors generate with their requirements for many memory accesses. Most computer processors are only using two threads per core.
The choice of Tensilica over MIPS or ARM as a core supplier turned out to be a close call. "They were fairly similar but the Tensilica architecture had some benefits when you dip down into the gory details of network processing," said Jayaram.
The cores are linked on what is "effectively a high-performance crossbar switch," he said. Processors using more than 40 cores typically move to more complex structures such as a mesh.
Externally the chip sports four 10Gbit SPI 4.2 ports to ship traffic in and out at rates up to 20Gbit/s, thanks to a Cisco proprietary feature for linking two interconnects. A next-generation version of the chip will use a derivative of the Interlaken interconnect to deliver traffic at rates of up to 40Gbit/s in and out of the chip.
"We did a lot of work to future proof this design" with all the blocks ready for 40G flows, said Jayaram.
The chip is geared for key comms tasks such as tree look ups, hashing functions and high bandwidth low latency access to DRAM. Much of its secret sauce takes the form of complex algorithms for flexibly handling a wide variety of content flows, some of which are passed through directly and others which get detailed processing.
Other ASICs on the board include of some packet framers and other generally minor parts. Cisco added a virtualization layer to its IOS router software as a way to deliver fault tolerant redundancy on the system without requiring multiple flow processors.
Cisco has filed for 42 patents on the new router, most of them on the processor. The company said it has shipped 60 million routers since 1986.
The rapid rise of network traffic will propel the need for the new system, the company said. Cisco estimates global IP demand will grow from 7exabytes per month in 2007 to 29exabytes per month in 2011, fueled in part by consumer video. The 2011 figure is more than 1,100 times the amount of traffic that traversed the Internet backbone in the U.S. in the year 2000, Cisco estimates.
The company has mustered support for the new router from multiple end users or potential users including Lufthansa Airlines and financial firm Wachovia. A Cisco press release quotes one telecom executive saying the router represents a class of design needed for future carrier networks.
"We believe it will be necessary for the edge of network to perform dynamic quality control to flexibly and securely enable aggregation of traffic from broadband services and converged communications," said Shin Hashomoto, an executive VP with Nippon Telegraph and Telephone in a prepared statement.
The Cisco ASR 1000 will be generally available in April with pricing starting at $35,000.
- Rick Merritt
|Related Articles||Editor's Choice|