AI & RoboticsNews

Got It AI’s ELMAR challenges GPT-4 and LLaMa, scores well on hallucination benchmarks

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More


Conversational AI startup Got It AI has released its latest innovation ELMAR (Enterprise Language Model Architecture), an enterprise-ready large language model (LLM) that can be integrated with any knowledge base for dialog-based chatbot Q&A applications. The company claims that ELMAR is notably smaller than GPT-3 and can run on-premises, making it a cost-effective solution for enterprise customers.

In addition, the LLM’s commercial viability is enhanced by its independence from Facebook Research’s LLaMA and Stanford’s Alpaca.

“ELMAR was conceived because we heard from our enterprise customers in our pipeline that they didn’t want their data to leave their ‘premises,’” Peter Relan, chairman of Got It AI, told VentureBeat. “Hence, we said let’s build a commercially viable, small model that could be run ‘on-prem,’ but match available LLMs in accuracy on key enterprise use cases.”

ELMAR also includes truth-checking on responses and post-processing to mitigate the risk of incorrect response rates for users. Compared to currently available LLMs, ELMAR requires less expensive hardware, making it a more accessible option for enterprise beta testers who can sign up for pilots.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

On par with big tech LLMs

Got It AI claims that ELMAR offers several benefits to enterprises seeking to incorporate a language model. Firstly, due to its diminutive size, the hardware required to operate ELMAR is significantly less expensive than that needed for OpenAI’s GPT-4. Furthermore, ELMAR allows for fine-tuning on the target dataset, eliminating the need for costly API-based models and preventing a surge in inference costs.

“We are not saying very powerful models aren’t needed,” Relan told VentureBeat. “We are saying all that power is not necessary for key enterprise use cases and requirements.”

Image Source: Got It AI

To advance conversation surrounding the accuracy of language models, Got It AI compared ELMAR to OpenAI’s ChatGPT, GPT-3, GPT-4, GPT-J/Dolly, Meta’s LLaMA, and Stanford’s Alpaca in a study to measure hallucination rates. The study demonstrated how a smaller yet fine-tuned LLM can perform just as well on dialog-based use cases on a 100-article test set made available now for beta testers.

“Recently, it was suggested that smaller and older models like GPT-J can deliver ChatGPT-like experiences. In our experiments, we did not find this to be the case. Despite fine-tuning, such models performed significantly worse than other more advanced models,” said Chandra Khatri, head of conversational AI research and cofounder of Got It AI. “It is not just about the data, but also about modern model architectures and training techniques.”

Earlier in January, the company developed what they called “TruthChecker,” a small language model–based fine-tuned post-processor, which compares responses generated by any language model with ground truth in the target dataset and flags what appear to be incorrect, misleading or incomplete answers; a phenomenon known as “hallucination.”

Got It AI’s study revealed that smaller open-source LLMs perform poorly on specific tasks unless they are fine-tuned on target datasets.

“When we used Alpaca, an open-source model, for a Q&A task on our target 100 articles set, it resulted in a significant fraction of answers being incorrect or hallucinations, but did better after fine-tuning. On the other hand, ELMAR, when fine-tuned on the same dataset, produced accurate results, equivalent to ChatGPT-3,” said Khatri.

Got It AI’s hallucination rate comparison. Image Source: Got It AI

“We picked our approach to be such that ELMAR’s model, training and data are not constrained by the licenses of LLaMA and Alpaca-like models and data,” said Relan. “It was not easy. We had to thread the needle and then find the right combination of a commercializable model, training techniques and data.”

The TruthChecker Playground has now been made accessible for users to evaluate the functionality of the AI.

Empowering businesses with greater LLM control

Got It AI’s ELMAR language model allows businesses to configure their pre-processors and plan measures to secure their language model architecture against attacks.

“The pre-processor will be tuned, configured and controlled by the enterprise,” Relan told VentureBeat. “So the enterprise user sets its policies for removing data, such as personally identifiable information (PII).”

The ELMAR model has been put through its paces against several knowledge bases such as Zendesk and Confluence, as well as large-sized PDF documents.

Following successful alpha feedback, Got It AI plans to soon commence ELMAR’s beta program with enterprise pilots across multiple industries and receive feedback on the types of pre-processing and post-processing “alignment” that work across all industries, versus those that are industry or enterprise-specific.

The company aims to improve ELMAR’s speed, accuracy and cost-effectiveness for training, with plans to scale up the model post-beta cycle. “There’s lots of work ahead,” said Relan.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Author: Victor Dey
Source: Venturebeat

Related posts
DefenseNews

Defense Innovation Unit prepares to execute $800 million funding boost

DefenseNews

Army may swap AI bill of materials for simpler ‘baseball cards’

DefenseNews

As the US Air Force fleet keeps shrinking, can it still win wars?

Cleantech & EV'sNews

Tesla skirts Austin's environmental rules at Texas gigafactory

Sign up for our Newsletter and
stay informed!