AI Startup Emerges From Stealth With System-Level Accelerator

Article By : Sally Ward-Foxton

SambaNova, one of the best-funded AI chip startups, has launched its rack-based AI…

SambaNova, one of the AI chip startup “unicorns,” has emerged from stealth mode after three years to announce its first product, a system-level AI accelerator for hyperscale and enterprise data centers and high performance computing (HPC) applications. SambaNova’s business model includes selling various configurations of the DataScale rack-based system, as well as renting them out for a monthly subscription in an offering the company calls “Dataflow-as-a-service”.

Founded in Palo Alto, California in 2017, SambaNova has been in stealth mode until now, though the company has released some details about its “software defined hardware” chip architecture. The startup has raised $456 million in three rounds of funding to date, and is reportedly valued at more than $2.5 billion.

SambaNova RDU chip

SambaNova’s Cardinal SN10 reconfigurable dataflow unit (RDU) (Image: SambaNova)

DataScale is built on SambaNova’s Cardinal SN10 reconfigurable dataflow unit (RDU) chip. SambaNova still hasn’t given away much about this chip, with VP product Marshall Choy telling EE Times only that each chip offers “hundreds of teraflops and hundreds of megabytes of on-chip memory with direct access to terabytes of off-chip memories.” Choy argued that SambaNova’s customers, to an extent, do not care about the details of the chip; they are buying or renting the rack-based system which is SambaNova’s first product.

The DataScale system is based on multiples of the 8-chip node dubbed SN10-8, combined with the company’s SambaFlow software stack and a high-speed fabric for direct connection between RDU chips. It’s shipped as a fully assembled hardware system in quarter, half, full and multi-rack configurations.

SambaNova competitors, including Cerebras and fellow unicorn Graphcore, have also launched system-level data center products this year. However, it seems SambaNova is not worried about being late to the party. Choy highlighted SambaNova’s strategic investment from Google Ventures and Intel Capital, and their intent to build a company that can deliver on a multi-year product roadmap — both are more important than being first to market, he said.

“The market is huge,” Choy said. “I think the AI compute space is going to be very similar to other emerging technology spaces where it’s going to take multiple generations of product to win in the market. You could certainly lose in one generation, but you won’t win in one. We’ve taken the long game approach.”

Record breaking
SambaNova is claiming a number of leading performance figures for its new system, including:

  • World record BERT-Large training performance – 28,800 samples per second throughput at batch size of 1000 – using a 64-RDU chip system (8x SN10-8 nodes).
  • A 100 billion parameter NLP model was trained with just one SN10-8 node which has a total of 12 TB memory (this is 8 RDU chips, whereas training the same model required 412 state of the art GPUs with 32 TB of memory, according to SambaNova).
  • For recommendation systems, world record DLRM inference throughput of 8632 samples per second (for the 8-chip SN10-8 node).

SambaNova is also claiming various unmatched accuracy results. While it’s unusual to see prediction accuracy figures quoted for a hardware system (this is a function of the neural network), there is a hardware angle in that bigger chips can run bigger models, which in general means better accuracy.

“It’s not just about the hardware, it’s about the system, meaning the software stack as well,” Choy said. “[Hardware capacity] is a big part of it, but a lot of this is also orchestrated by the compiler and the runtime stack. Part of our philosophy has really been a complete system view.”

For example, language processing models are more accurate if they can use more parameters. If SambaNova can fit a bigger language processing model onto its chip, the result is a more accurate model, which is the metric customers ultimately care about, Choy said.

SambaNova’s figures have a scientific computer vision model running on one RDU with 90.23% prediction accuracy (CosmicTagger network with 1280 x 2048 px images). The size of the chip means the network can be trained with higher-resolution input images (the RDU can handle convolutions up to 50k x 50k).

“This accuracy level is being achieved because we don’t have to down sample the image resolution and can maintain all the richness of data and information in those high res images,” Choy said.

For recommendation models like DLRM, typically the representation space or the embeddings have to be compressed, meaning decisions have to be made about which features to cut, at the expense of accuracy. This can be minimized if the chip is bigger. SambaNova can run DLRM at 80.46% accuracy.

“For recommendation, it’s very clear cut from a business value perspective that even fractions of a percent improvement in accuracy can yield faster and bigger revenue streams in the case of an online retailer,” Choy said.

Alongside the rack-based product, SambaNova has introduced a new service offering where it rents out hardware for a monthly fee, which it is calling “Dataflow-as-a-service.” The hardware is shipped to the customer’s data center but managed and maintained by SambaNova.

SambaNova DataScale Racks

SambaNova DataScale systems at multi-rack scale (Image: SambaNova)

“Customers wanted to have a quick and easy way use DataScale, but wanted to use a service oriented model from both a technology and procurement perspective,” Choy said.

For a minimum of $10,000 per month, SambaNova has three workload-specific subscription tiers aimed at NLP, high-res computer vision and recommender systems. All offer the same performance, and run behind the customer’s firewall, in their own data center.

“We have different tiers of service that are dependent upon changing data set size and other attributes,” Choy said. “The customer doesn’t have to worry about sizing or anything like that. Whatever service level they want, we bring in the right level of hardware to do that, and we patch and support and maintain it for them.”

SambaNova already has revenue customers for DataScale, Choy said. The company has previously spoken about its systems installed at the Argonne National Laboratory, the Department of Energy’s National Nuclear Security Administration (NNSA), Lawrence Livermore National Laboratory (LLNL), and Los Alamos National Laboratory (LANL).

“At Argonne National Laboratory we’re working on important research efforts including those focused on cancer, Covid-19, and many others, and using AI to automate parts of the development process is key to our success,” said Rick Stevens, associate laboratory director, Argonne National Laboratory, in a statement. “The SambaNova DataScale architecture offers us the ability to train and infer from multiple large and small models concurrently and deliver orders of magnitude performance improvements over GPUs.”

SambaNova DataScale is available now.

Subscribe to Newsletter

Leave a comment