SambaNova Systems today announced what could well be one of the largest large language models (LLMs) ever released, with the debut of its one trillion parameter Samba-1.
Samba-1 is not a single model, like OpenAI’s GPT-4, rather it is a combination of more than 50 high-quality AI models put together in an approach that SambaNova refers to as a Composition of Experts architecture. While the overall size of Samba-1 is massive, the model can actually be highly customized and tuned for specific enterprise use cases.
SambaNova Systems isn’t just building LLMs, the company’s core foundation is actually in hardware. In September the company announced its SN40L AI chip, which aims to compete against industry leader Nvidia with a highly efficient approach for training and inference. The new Samba-1 model will be part of the SambaNova Suite enabling organizations to customize and deploy models.
Also Read: SambaNova Secures $676M for Mass Production of AI Chips
“What we’re also doing now is actually giving you pre-composed, pre-trained and pre-optimized finished models that allow you to do a high performance and high scale deployment for production and inferencing without having to do all of that work of fine-tuning, aligning and actually optimizing the hardware,” Rodrigo Liang, co-founder and CEO of SambaNova told VentureBeat.
How Samba-1 takes a Composition of Experts to build a massive LLM
Samba-1 is composed of more than 50 AI models that have been individually trained and then optimized to work together.
This includes models from SambaNova as well as open-source models that have been curated for specific enterprise tasks. Among the models that are part of Samba-1 are Llama 2, Mistral, DeepSeek Coder, Falcon, DePlot, CLIP and Llava.
“We’ve taken the best of the best,” Liang said. “We figured out which ones are the best for enterprises, and then we put them together and optimize them into a single 1 trillion parameter model.”
Liang added that the various individual component models can interact with each other inside of Samba-1 in concert, such that the thread of one model providing an answer can then become the input for the next one as a single thread.
The idea of chaining multiple LLMs together to get an output is not a new one. The popular open-source LangChain technology does exactly that, chain LLMs together. Liang argued that the Samba-1 Composition of Experts approach differs greatly from the LangChain method.
Liang explained that with LangChain, the model chain has to be predetermined, so the user has to predict what chain of models to use for a given prompt. With Samba-1, the individual experts can be dynamically chained together based on the prompt and responses, allowing for more flexibility.
Going a step further he noted that Samba-1 allows users to explore different perspectives by getting input from models trained on different datasets.
“It can dynamically create 50 LangChain chain equivalents just to explore the results,” he said.
Composition of Experts is not a Mixture of Experts
The Composition of Experts approach should not be confused with the Mixture of Experts approach that some LLMs like Mistral provide.
According to Liang, a Mixture of Experts implies a single expert model is trained on multiple datasets. This can allow data from one dataset to potentially leak into the model, violating the security and privacy of the other datasets.
In contrast, a Composition of Experts refers to keeping each expert model separately trained on its own secure dataset. The security restrictions of the training data propagate to the expert model. Liang said that the Composition of Experts approach is not just about training the models but also deployment and inference securely and privately.
Not everyone needs a trillion parameters
While Samba-1 provides a trillion parameters, that’s not necessarily what an organization might want or need to deploy.
By using multiple specialized models together rather than a single large model, Liang said that Samba-1 can offer broad capabilities with high efficiency.
“What we’re seeing is that following every prompt does not require the entire trillion parameters to be activated all at once,” he said. “What we’re seeing now with the results is an incredible level of efficiency, footprint reduction, power reduction and bandwidth improvement because you’re only using the expert that’s required instead of the entire model that brings everything else in.”
The SambaNova approach allows customers to train models on their own private data and then deploy those customized models. This enables enterprises to create differentiated, proprietary assets optimized for their business needs.
“On Samba-1 now you’re able to actually have your own private model of a trillion parameter size and own it in perpetuity, once it’s trained on your data it is yours forever,” Liang said.
Author: Sean Michael Kerner
Source: Venturebeat
Reviewed By: Editorial Team