AI & RoboticsNews

Vectara grounds AI accuracy with Boomerang vector embedding

The issue of AI hallucinations is a big challenge when it comes to enterprise AI adoption. After all, no organization wants to generate inaccurate results from generative AI efforts.

Among the many organizations looking to solve the problem of AI hallucination is Vectara, which first emerged from stealth in October 2022, led by one of the co-founders of Big Data vendor Cloudera.

In May, the company updated its Generative AI platform with a grounded search capability in an attempt to provide retrieval augmented generation (RAG) results based on content.

Today the company is going a step further in its quest to reduce the risk of AI hallucination with the debut of its new Boomerang technology that the company refers to as a neural information retrieval model. Boomerang provides a new approach to generating the vector embeddings that are at the foundation of large language models (LLMs) to enable a higher degree of accuracy — with less hallucination.

“It’s a retrieval mode, it’s fundamentally there to serve the following purpose, the user sends a query into some kind of knowledge base and relevant information comes back out of the knowledge base,” Amin Ahmad, co-founder and CTO of Vectara told VentureBeat. “So there’s that kind of boomeranging action.”

The new Boomerang engine will make Vectara’s GenAI platform more accurate and builds on the company’s grounded generation approach.

“The way grounded generation works, is you take your data and you put it in a special vector database, or a meaning space – which is the term we use,” Amr Awadallah, co-founder and CEO of Vectara told VentureBeat. “And if you can’t map your data properly inside of this meaning space, then when the user question comes in, you are not going to get the proper facts coming back.” 

Boomerang is the new Vectara developed model that generates the vector embeddings that represents the meanings behind the words, regardless of language. The process of creating vector embeddings is critical and is one that the big LLM vendors all have. For example, OpenAI has its own ada embedding models which have been steadily improved in recent years as well. 

Awadallah explained that Boomerang is an upgraded engine from what his company had before, and enables the creation of a higher degree of quality and accuracy for the vector embeddings. The core enterprise benefit of Boomerang is that it enables the creation of what Awadallah said are better facts.

“Because now we have way better facts, everything else improves, the hallucination probability goes down and the explainability becomes way better on the output side,” he said.

As to precisely how Boomerang creates better vector embeddings, there is a great deal of complexity.

“The way that we got to this new model from the previous model we had is through application of a large number of new and additional techniques, as well as a lot more varying and diverse training data,” Ahmad said.

Ahmad noted that Vectara is aiming to publish some research papers detailing some of the new and unique methods that help to enable the Boomerang vector embedding approach. Awadallah echoed his co-founder partner noting that his company did in fact come up with new techniques that will be detailed in future academic research.

“There was a lot of research, a lot of trial and error, a lot of things that didn’t work and things that did work, that got us to this point where we now can exceed a couple of the most advanced companies in this space,” Awadallah said.

Vectara claims that Boomerang is able to outperform other larger models in cross-lingual retrieval and is able to better understand content in hundreds of languages and dialects. While the updated platform does make strides to reducing the risk of hallucination, there is still more that Vectara needs to do.

“Hallucination is not 0% and we want it to be 0%,  so we will be continuing our research in terms of how to get hallucination to be significantly minimized, which is critical for business contexts,” Awadallah said.

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More


The issue of AI hallucinations is a big challenge when it comes to enterprise AI adoption. After all, no organization wants to generate inaccurate results from generative AI efforts.

Among the many organizations looking to solve the problem of AI hallucination is Vectara, which first emerged from stealth in October 2022, led by one of the co-founders of Big Data vendor Cloudera.

In May, the company updated its Generative AI platform with a grounded search capability in an attempt to provide retrieval augmented generation (RAG) results based on content.

Today the company is going a step further in its quest to reduce the risk of AI hallucination with the debut of its new Boomerang technology that the company refers to as a neural information retrieval model. Boomerang provides a new approach to generating the vector embeddings that are at the foundation of large language models (LLMs) to enable a higher degree of accuracy — with less hallucination.

Event

AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.

 


Learn More

“It’s a retrieval mode, it’s fundamentally there to serve the following purpose, the user sends a query into some kind of knowledge base and relevant information comes back out of the knowledge base,” Amin Ahmad, co-founder and CTO of Vectara told VentureBeat. “So there’s that kind of boomeranging action.”

Boomerang is the encode block in this picture, it is what takes the text and converts it to vectors/embeddings representing the meanings behind the text. The Generate block below is the LLM that produces the final output as function of the user’s prompt and the retrieved facts. (Image credit: Vectara)

Advancing the State-of-the-Art for Vector embedding

The new Boomerang engine will make Vectara’s GenAI platform more accurate and builds on the company’s grounded generation approach.

“The way grounded generation works, is you take your data and you put it in a special vector database, or a meaning space – which is the term we use,” Amr Awadallah, co-founder and CEO of Vectara told VentureBeat. “And if you can’t map your data properly inside of this meaning space, then when the user question comes in, you are not going to get the proper facts coming back.” 

Boomerang is the new Vectara developed model that generates the vector embeddings that represents the meanings behind the words, regardless of language. The process of creating vector embeddings is critical and is one that the big LLM vendors all have. For example, OpenAI has its own ada embedding models which have been steadily improved in recent years as well. 

Awadallah explained that Boomerang is an upgraded engine from what his company had before, and enables the creation of a higher degree of quality and accuracy for the vector embeddings. The core enterprise benefit of Boomerang is that it enables the creation of what Awadallah said are better facts.

“Because now we have way better facts, everything else improves, the hallucination probability goes down and the explainability becomes way better on the output side,” he said.

The patch toward zero hallucinations

As to precisely how Boomerang creates better vector embeddings, there is a great deal of complexity.

“The way that we got to this new model from the previous model we had is through application of a large number of new and additional techniques, as well as a lot more varying and diverse training data,” Ahmad said.

Ahmad noted that Vectara is aiming to publish some research papers detailing some of the new and unique methods that help to enable the Boomerang vector embedding approach. Awadallah echoed his co-founder partner noting that his company did in fact come up with new techniques that will be detailed in future academic research.

“There was a lot of research, a lot of trial and error, a lot of things that didn’t work and things that did work, that got us to this point where we now can exceed a couple of the most advanced companies in this space,” Awadallah said.

Vectara claims that Boomerang is able to outperform other larger models in cross-lingual retrieval and is able to better understand content in hundreds of languages and dialects. While the updated platform does make strides to reducing the risk of hallucination, there is still more that Vectara needs to do.

“Hallucination is not 0% and we want it to be 0%,  so we will be continuing our research in terms of how to get hallucination to be significantly minimized, which is critical for business contexts,” Awadallah said.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Author: Sean Michael Kerner
Source: Venturebeat
Reviewed By: Editorial Team

Related posts
AI & RoboticsNews

DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance

AI & RoboticsNews

Snowflake beats Databricks to integrating Claude 3.5 directly

AI & RoboticsNews

OpenScholar: The open-source A.I. that’s outperforming GPT-4o in scientific research

DefenseNews

US Army fires Precision Strike Missile in salvo shot for first time

Sign up for our Newsletter and
stay informed!