AI & Robotics News

Refuel AI nabs $5M to create training-ready datasets with LLMs

June 16, 2023

Refuel AI

Refuel AI, a company using large language models (LLMs) to generate high-quality training data for AI models, today came out of stealth with $5.2 million in seed funding. The company said it will use the round to grow its team and build out its platform’s capabilities, preparing it for commercial launch in July.

Founded by Stanford grads Nihit Desai and Rishabh Bhargava, Refuel has also opened access to AutoLabel, an open-source library that makes it easy for any AI team to label their data in their own environment and with any LLM they want.

>>Don’t miss our special issue: Building the foundation for customer data quality.<<

The offerings come as an answer to the data challenges that slow down AI development, keeping enterprises from embedding the next-gen technology into their products and business functions.

Today, every company is racing to be an AI company, working with in-house experts and third-party vendors to develop models capable of targeting different business-specific use cases. The task can be very challenging, but every AI project has the same starting point: clean and labeled data. If this is done right, the project can easily come to life.

Now, while companies have a lot of data at their disposal, not all of it is training-ready by default. The information has to be cleaned and annotated for training the model — a task that is typically handled by human teams and takes weeks to months. This just doesn’t scale for the demands of AI today.

“Many teams [we spoke to] had all these incredible ideas for models they wanted to train and products they wanted to build — if only they had the data ready for training. That’s when we knew making clean, labeled data available at the speed of thought was what we wanted to focus on,” Bhargava told VentureBeat.

So, in 2021, the duo started Refuel and went on to build a dedicated platform that uses specialized LLMs to automate the creation and labeling of datasets (with quality on par with or better than humans) for every business and every use case.

According to the company, enterprise users will be able to use the platform by simply uploading their datasets and instructing the LLMs to label the data. They could also give guidelines and a few examples to ensure only high-quality training-ready data comes out.

“Within an hour, they (users) will have enough data to start training their AI models, which they can then seamlessly connect into their model training infrastructure. As these teams collect more data (especially from production), they can re-route it into Refuel for labeling, measuring performance and improving their datasets for model re-training,” the CEO added.

In private beta tests by select enterprises, the offering was found to speed up the process of data creation and labeling by up to 100%. Bhargava didn’t share the names of these companies but noted that Refuel AI is seeing interest from multiple verticals, from social media and fintech to healthcare, HR and ecommerce.

With this round, which was co-led by General Catalyst and XYZ Ventures, Refuel plans to grow its engineering team from six to 12 members and further invest in the platform and its LLM infrastructure to prepare for a commercial launch by the end of July. The company will also invest the capital in its open-source library and community.

“As a concrete example, we’re organizing a competition to push the boundaries of LLM-powered data labeling, with prizes up to $10,000,” Bhargava noted.

Currently, in the data labeling space, the company competes with players like Tasq AI, Snorkel AI and SuperAnnotate.

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Refuel AI, a company using large language models (LLMs) to generate high-quality training data for AI models, today came out of stealth with $5.2 million in seed funding. The company said it will use the round to grow its team and build out its platform’s capabilities, preparing it for commercial launch in July.

Founded by Stanford grads Nihit Desai and Rishabh Bhargava, Refuel has also opened access to AutoLabel, an open-source library that makes it easy for any AI team to label their data in their own environment and with any LLM they want.

>>Don’t miss our special issue: Building the foundation for customer data quality.<<

The offerings come as an answer to the data challenges that slow down AI development, keeping enterprises from embedding the next-gen technology into their products and business functions.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

Every AI company needs AI-ready data

Today, every company is racing to be an AI company, working with in-house experts and third-party vendors to develop models capable of targeting different business-specific use cases. The task can be very challenging, but every AI project has the same starting point: clean and labeled data. If this is done right, the project can easily come to life.

Now, while companies have a lot of data at their disposal, not all of it is training-ready by default. The information has to be cleaned and annotated for training the model — a task that is typically handled by human teams and takes weeks to months. This just doesn’t scale for the demands of AI today.

“Many teams [we spoke to] had all these incredible ideas for models they wanted to train and products they wanted to build — if only they had the data ready for training. That’s when we knew making clean, labeled data available at the speed of thought was what we wanted to focus on,” Bhargava told VentureBeat.

So, in 2021, the duo started Refuel and went on to build a dedicated platform that uses specialized LLMs to automate the creation and labeling of datasets (with quality on par with or better than humans) for every business and every use case.

According to the company, enterprise users will be able to use the platform by simply uploading their datasets and instructing the LLMs to label the data. They could also give guidelines and a few examples to ensure only high-quality training-ready data comes out.

“Within an hour, they (users) will have enough data to start training their AI models, which they can then seamlessly connect into their model training infrastructure. As these teams collect more data (especially from production), they can re-route it into Refuel for labeling, measuring performance and improving their datasets for model re-training,” the CEO added.

In private beta tests by select enterprises, the offering was found to speed up the process of data creation and labeling by up to 100%. Bhargava didn’t share the names of these companies but noted that Refuel AI is seeing interest from multiple verticals, from social media and fintech to healthcare, HR and ecommerce.

The road ahead

With this round, which was co-led by General Catalyst and XYZ Ventures, Refuel plans to grow its engineering team from six to 12 members and further invest in the platform and its LLM infrastructure to prepare for a commercial launch by the end of July. The company will also invest the capital in its open-source library and community.

“As a concrete example, we’re organizing a competition to push the boundaries of LLM-powered data labeling, with prizes up to $10,000,” Bhargava noted.

Currently, in the data labeling space, the company competes with players like Tasq AI, Snorkel AI and SuperAnnotate.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Author: Shubham Sharma
Source: Venturebeat

LLMs Refuel AI

1009

0