AI & Robotics News

Monte Carlo Data wants to make sure the vector databases powering AI models stay reliable

November 9, 2023

San Francisco-based Monte Carlo Data, a company providing enterprises with automated data observability solutions, today announced new platform integrations and capabilities to expand its coverage and help teams deliver strong, trusted AI products.

At its annual IMPACT conference, the company said it will soon offer support for Pinecone and other vector databases, giving enterprises the ability to keep a close eye on the lifeblood of their large language models.

It also announced an integration with Apache Kafka, the open-source platform designed to handle large volumes of real-time streaming data, as well as two new data observability products: Performance Monitoring and Data Product Dashboard.

The observability products are now available to use, but the integrations will debut sometime in early 2024, the company confirmed.

Today, vector databases are the key to high-performing LLM applications. They store a numerical representation of text, images, videos, and other unstructured data in a binary representation (often called embeddings) and act as an external memory to enhance model capabilities. Multiple vendors provide vector databases to help teams build their LLMs, including MongoDB, DataStax, Weaviate, Pinecone, RedisVector, SingleStore and Qdrant.

But if any data stored and represented by vector databases breaks or becomes outdated by any chance, the underlying model that queries that information for search can veer off track, giving inaccurate results.

This is where Monte Carlo Data’s new integration, which is set to become generally available in early 2024 with initial support for Pinecone’s vector database, comes in.

Once connected to the platform, the integration allows users to deploy Monte Carlo Data’s observability smarts and track whether the high-dimensional vector information hosted in the database is reliable and trustworthy.

It monitors, flags and helps resolve data quality issues (if any), thereby ensuring that the LLM application delivers the best possible results.

In an email conversation with VentureBeat, a company spokesperson confirmed that no customers are currently using the vector database integration, but there’s a long list of enterprises that have expressed excitement for it.

“As is the case with all of the integrations and functionality we build, we’re working closely with our customers to make sure vector database monitoring is done in a way that is meaningful to their generative AI strategies,” they added.

Notably, a similar integration has also been built for Apache Kafka, allowing teams to ensure that the streaming data feeding AI and ML models in real-time for specific use cases are up to the mark.

“Our new Kafka integration gives data teams confidence in the reliability of the real-time data streams powering these critical services and applications, from event processing to messaging. Simultaneously, our forthcoming integrations with major vector database providers will help teams proactively monitor and alert to issues in their LLM applications,” Lior Gavish, the co-founder and CTO of Monte Carlo Data, said in a statement.

Beyond the new integrations, Monte Carlo Data also announced Performance Monitoring capabilities as well as a Data Product Dashboard for its customers.

The former drives cost efficiencies by allowing users to detect slow-running data and AI pipelines. They can essentially filter queries related to specific DAGs, users, dbt models, warehouses or datasets and then drill down to spot issues and trends to determine how performance was impacted by changes in code, data and warehouse configurations.

Meanwhile, the latter allows customers to easily identify data assets feeding a particular dashboard, ML application or AI model, track its health over time, and report on its reliability to business stakeholders via Slack, Teams and other collaboration channels – to drive faster resolutions if needed.

Monte Carlo Data’s observability-centric updates, particularly support for popular vector databases, come at a time when enterprises are going all in on generative AI. Teams are tapping tools like Microsoft’s Azure OpenAI service to make their own generative AI play and power LLM applications targeting use cases like data search and summarization.

This surge in demand has made visibility into the data efforts driving the LLM applications more important than ever.

Notably, California-based Acceldata, Monte Carlo Data’s key competitor, is also moving in the same direction. It recently acquired Bewgle, an AI and NLP startup founded by ex-Googlers, to deepen data observability for AI and strengthen Acceldata’s product with AI capabilities, enabling enterprises to get the most out of it.

“Data pipelines that feed the analytics dashboards today are the same that will power the AI products and workflows that enterprises will build in the next five years…(However), for great AI outcomes, high-quality data flowing through reliable data pipelines is a must. Acceldata is in the path of critical AI and analytics pipelines and will be able to add AI observability for its customers who will deploy AI models at rapid velocity in the next few years,” Rohit Choudhary, the CEO of the company, previously told VentureBeat.

Other notable vendors competing with Monte Carlo Data in the data observability space are Cribl and BigEye.

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Hear from top industry leaders on Nov 15. Reserve your free pass

San Francisco-based Monte Carlo Data, a company providing enterprises with automated data observability solutions, today announced new platform integrations and capabilities to expand its coverage and help teams deliver strong, trusted AI products.

At its annual IMPACT conference, the company said it will soon offer support for Pinecone and other vector databases, giving enterprises the ability to keep a close eye on the lifeblood of their large language models.

It also announced an integration with Apache Kafka, the open-source platform designed to handle large volumes of real-time streaming data, as well as two new data observability products: Performance Monitoring and Data Product Dashboard.

The observability products are now available to use, but the integrations will debut sometime in early 2024, the company confirmed.

VB Event

AI Unleashed

Don’t miss out on AI Unleashed on November 15! This virtual event will showcase exclusive insights and best practices from data leaders including Albertsons, Intuit, and more.

Register for free here

Monitoring vector databases

Today, vector databases are the key to high-performing LLM applications. They store a numerical representation of text, images, videos, and other unstructured data in a binary representation (often called embeddings) and act as an external memory to enhance model capabilities. Multiple vendors provide vector databases to help teams build their LLMs, including MongoDB, DataStax, Weaviate, Pinecone, RedisVector, SingleStore and Qdrant.

But if any data stored and represented by vector databases breaks or becomes outdated by any chance, the underlying model that queries that information for search can veer off track, giving inaccurate results.

This is where Monte Carlo Data’s new integration, which is set to become generally available in early 2024 with initial support for Pinecone’s vector database, comes in.

Observability to ensure reliable and trustworthy info.

Once connected to the platform, the integration allows users to deploy Monte Carlo Data’s observability smarts and track whether the high-dimensional vector information hosted in the database is reliable and trustworthy.

It monitors, flags and helps resolve data quality issues (if any), thereby ensuring that the LLM application delivers the best possible results.

In an email conversation with VentureBeat, a company spokesperson confirmed that no customers are currently using the vector database integration, but there’s a long list of enterprises that have expressed excitement for it.

“As is the case with all of the integrations and functionality we build, we’re working closely with our customers to make sure vector database monitoring is done in a way that is meaningful to their generative AI strategies,” they added.

Notably, a similar integration has also been built for Apache Kafka, allowing teams to ensure that the streaming data feeding AI and ML models in real-time for specific use cases are up to the mark.

“Our new Kafka integration gives data teams confidence in the reliability of the real-time data streams powering these critical services and applications, from event processing to messaging. Simultaneously, our forthcoming integrations with major vector database providers will help teams proactively monitor and alert to issues in their LLM applications,” Lior Gavish, the co-founder and CTO of Monte Carlo Data, said in a statement.

New products for better data observability

Beyond the new integrations, Monte Carlo Data also announced Performance Monitoring capabilities as well as a Data Product Dashboard for its customers.

The former drives cost efficiencies by allowing users to detect slow-running data and AI pipelines. They can essentially filter queries related to specific DAGs, users, dbt models, warehouses or datasets and then drill down to spot issues and trends to determine how performance was impacted by changes in code, data and warehouse configurations.

Meanwhile, the latter allows customers to easily identify data assets feeding a particular dashboard, ML application or AI model, track its health over time, and report on its reliability to business stakeholders via Slack, Teams and other collaboration channels – to drive faster resolutions if needed.

The rise of observability for AI

Monte Carlo Data’s observability-centric updates, particularly support for popular vector databases, come at a time when enterprises are going all in on generative AI. Teams are tapping tools like Microsoft’s Azure OpenAI service to make their own generative AI play and power LLM applications targeting use cases like data search and summarization.

This surge in demand has made visibility into the data efforts driving the LLM applications more important than ever.

Notably, California-based Acceldata, Monte Carlo Data’s key competitor, is also moving in the same direction. It recently acquired Bewgle, an AI and NLP startup founded by ex-Googlers, to deepen data observability for AI and strengthen Acceldata’s product with AI capabilities, enabling enterprises to get the most out of it.

“Data pipelines that feed the analytics dashboards today are the same that will power the AI products and workflows that enterprises will build in the next five years…(However), for great AI outcomes, high-quality data flowing through reliable data pipelines is a must. Acceldata is in the path of critical AI and analytics pipelines and will be able to add AI observability for its customers who will deploy AI models at rapid velocity in the next few years,” Rohit Choudhary, the CEO of the company, previously told VentureBeat.

Other notable vendors competing with Monte Carlo Data in the data observability space are Cribl and BigEye.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Author: Shubham Sharma
Source: Venturebeat
Reviewed By: Editorial Team

430

0