Entering the era of foundation models will mean more reliance on pre-trained models combined with fine-tuning.
At the AI Hardware Summit in Santa Clara, California, SambaNova Systems executives unveiled new silicon and talked about the company’s bid to support foundation models, a type of large language model that can be adapted for multiple tasks.
Powering the next generation of SambaNova rack-scale systems will be a second-generation version of the company’s dataflow-optimized RDU. The Cardinal SN30 RDU has bigger compute die, with 86 billion transistors per chiplet on the same TSMC 7-nm process node, and the on-chip memory has doubled to 640 MB. The result is a 688-TFLOPS (BF16) processor tailored for huge models. The package contains two compute chiplets and 1 TB of direct-attached DDR memory (not HBM). The result is up to 6× the performance of first-gen systems.
This device will power new generations of SambaNova DataScale servers for AI training, inference, and fine-tuning, shipping as rack-scale systems.
At the show, Kunle Olukotun, CTO and co-founder of SambaNova, presented the killer application for these next-gen systems: foundation models.
“We are entering a new era of AI, and it’s being enabled by foundation models,” he said.
The term “foundation model” was coined at the Stanford Center for Research on Foundation Models. It refers to a special type of large language model. If the foundation model is trained on sufficiently diverse data in sufficiently huge amounts, it can be adapted to perform multiple language-based tasks, perhaps including tasks as diverse as question answering, summarization, and sentiment analysis.
“This completely blows up the traditional task-centric model of machine learning that we’ve been using up to now, where every task had a particular model that you trained for it,” Olukotun said. “With foundation models, you can take a single model and adapt it to the particular task, [allowing you to] replace thousands of individually task-specific models with a single model, which means management is easier and you can much more easily transform your AI capabilities to match new tasks that come about.”
The scale of foundation models, which are generally larger than 10 billion parameters, presents challenges for companies wishing to use them.
“It is very difficult to actually aggregate the hardware resources and to get the software programming right, to get the machine-learning expertise in order to actually train it correctly and then deploy it, maintain it, and do the training and inference and constant management of these models,” said Rodrigo Liang, SambaNova co-founder and CEO.
With today’s technology, training foundation models from scratch can take months, but SambaNova intends to short-cut this by supplying its pre-trained models together with hardware that enables companies to fine-tune these models on their own private data to improve accuracy for the tasks that customer will use the model for.
SambaNova has, broadly speaking, two offerings. The first is DataScale infrastructure—racks of servers equipped with SambaNova’s hardware plus its software stack. This suits model-centric organizations, including capital markets, pharma, and HPC customers. The second is Dataflow-as-a-Service—the same racks of servers plus software, plus pre-trained foundation models that customers can fine-tune and deploy on the hardware. This is for data-centric companies that don’t want to spend the time and effort building and maintaining their own models. SambaNova maintains the models on the customer’s behalf, but once it’s been fine-tuned, that model is unique to that customer.
SambaNova systems are already installed at the Lawrence Livermore National Laboratory (LLNL), and the lab announced it will be upgrading to the next generation.
“We look forward to deploying a larger, multirack system of the next generation of SambaNova’s DataScale systems,” said Bronis de Supinski, CTO of Livermore Computing at LLNL. “Integration of this solution with traditional clusters throughout our center will enable the technology to have deeper programmatic impact. We anticipate a 2× to 6× performance increase, as the new DataScale system promises to significantly improve overall speed, performance, and productivity.”
Argonne National Labs is also deploying a multirack system of this next-gen system in the ALCF AI testbed, where it can be tested for a variety of use cases.
This article was originally published on EE Times.
Sally Ward-Foxton covers AI technology and related issues for EETimes.com and all aspects of the European industry for EETimes Europe magazine. Sally has spent more than 15 years writing about the electronics industry from London, UK. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more. She holds a Masters’ degree in Electrical and Electronic Engineering from the University of Cambridge.