AI & RoboticsNews

A blueprint for the perfect Gen AI data layer: Insights from Intuit

In VentureBeat’s reporting on generative AI, one company in particular stands out among enterprise companies for speed and adeptness at deploying the technology at scale.

That company is Intuit. In September, Intuit introduced an LLM-drive assistant, called Intuit Assist, across all of its products, including TurboTax, QuickBooks, Credit Karma, MailChimp. It announced its own Gen AI operating system in June that orchestrates the large language model (LLM) activity across the entire company – a complete vision that, as far as I’m aware of, came well before that of any other major company.

I recently interviewed Alon Amit, Intuit’s VP of Product Management, about arguably the most important part of any company’s journey to realize Gen AI success: building a best-practice data management layer.

Amit explains that Intuit took several years to work through this data layer, to make sure data was well integrated, accurate, governed, and non-replicated. Only after doing this were LLMs able to call upon that data to allow personalized interactions with Intuit’s 100 million small business and consumer customers.

During the interview, Amit presented a single slide depicting Intuit’s data layer. The slide indicates the best practice of how a data layer should look, at least according to Intuit.

If you’re an enterprise data leader, I encourage you to click on the video link above, because Amit walks us through step-by-step the most important areas the company is working on, including the areas it needs to perfect in 2024. (The interview was part of our AI Unleashed event; the event’s full video is included above)

Here are some cliff-notes, based on what stood out for me:

1. The Data Map Registry: Intuit built this universal repository for every single data asset, real-time and batch, that gets produced in the company. All data schemas are included. It ensures assets are well governed, including that the owners and purpose of the assets are known. Alon conceded this process hadn’t been perfected, but that Intuit expects to “hit very close to a hundred percent” by the end of next year.

2. Culture of caring about “data as a product”: Aided by this data map, Intuit instilled a culture among its developers, product managers, engineers and others that even beyond the data within products shipped to customers, any data at all that gets generated is considered “product.”

3. Data schema changes are governed uniformly: Any data schemas, of click-stream data or of third-party data coming into Intuit’s data ecosystem, are governed the same way, to ensure they don’t break downstream data systems, such as those needed to support generative AI. This data inflow, seen on the left-side of the chart, includes Intuit’s own “domain events,” for example, which include when Intuit’s developers create an event bus for real-time data flowing from an application. This is all automatically populated within Intuit’s data lake. 

4. Governed data derivation: Derivation is a generic term for essentially any transformation happening on data beyond source data. It includes, for example, computations for analytics, extraction of features for AI models, and attributes for marketing campaigns. So if a developer derives a feature that is already in the data registry, they’ll be informed the feature is already there, to avoid duplication. 

5. Real-time data derivation: This is on the roadmap for 2024. Amit was careful to say that the company isn’t done in its quest for perfection. The company is working to build “real time paved paths for data derivation,” or the ability of developers to make sure that when a customer asks a question, or when an expert is offering support, Intuit will know the actions the user takes in near real-time.

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


In VentureBeat’s reporting on generative AI, one company in particular stands out among enterprise companies for speed and adeptness at deploying the technology at scale.

That company is Intuit. In September, Intuit introduced an LLM-drive assistant, called Intuit Assist, across all of its products, including TurboTax, QuickBooks, Credit Karma, MailChimp. It announced its own Gen AI operating system in June that orchestrates the large language model (LLM) activity across the entire company – a complete vision that, as far as I’m aware of, came well before that of any other major company.

I recently interviewed Alon Amit, Intuit’s VP of Product Management, about arguably the most important part of any company’s journey to realize Gen AI success: building a best-practice data management layer.

Amit explains that Intuit took several years to work through this data layer, to make sure data was well integrated, accurate, governed, and non-replicated. Only after doing this were LLMs able to call upon that data to allow personalized interactions with Intuit’s 100 million small business and consumer customers.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

 


Learn More

During the interview, Amit presented a single slide depicting Intuit’s data layer. The slide indicates the best practice of how a data layer should look, at least according to Intuit.

If you’re an enterprise data leader, I encourage you to click on the video link above, because Amit walks us through step-by-step the most important areas the company is working on, including the areas it needs to perfect in 2024. (The interview was part of our AI Unleashed event; the event’s full video is included above)

Here are some cliff-notes, based on what stood out for me:

1. The Data Map Registry: Intuit built this universal repository for every single data asset, real-time and batch, that gets produced in the company. All data schemas are included. It ensures assets are well governed, including that the owners and purpose of the assets are known. Alon conceded this process hadn’t been perfected, but that Intuit expects to “hit very close to a hundred percent” by the end of next year.

2. Culture of caring about “data as a product”: Aided by this data map, Intuit instilled a culture among its developers, product managers, engineers and others that even beyond the data within products shipped to customers, any data at all that gets generated is considered “product.”

3. Data schema changes are governed uniformly: Any data schemas, of click-stream data or of third-party data coming into Intuit’s data ecosystem, are governed the same way, to ensure they don’t break downstream data systems, such as those needed to support generative AI. This data inflow, seen on the left-side of the chart, includes Intuit’s own “domain events,” for example, which include when Intuit’s developers create an event bus for real-time data flowing from an application. This is all automatically populated within Intuit’s data lake. 

4. Governed data derivation: Derivation is a generic term for essentially any transformation happening on data beyond source data. It includes, for example, computations for analytics, extraction of features for AI models, and attributes for marketing campaigns. So if a developer derives a feature that is already in the data registry, they’ll be informed the feature is already there, to avoid duplication. 

5. Real-time data derivation: This is on the roadmap for 2024. Amit was careful to say that the company isn’t done in its quest for perfection. The company is working to build “real time paved paths for data derivation,” or the ability of developers to make sure that when a customer asks a question, or when an expert is offering support, Intuit will know the actions the user takes in near real-time.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Author: Matt Marshall
Source: Venturebeat
Reviewed By: Editorial Team

Related posts
AI & RoboticsNews

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

AI & RoboticsNews

AI comes alive: From bartenders to surgical aides to puppies, tomorrow’s robots are on their way

AI & RoboticsNews

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

DefenseNews

Navy names aircraft carriers after former presidents Bush and Clinton

Sign up for our Newsletter and
stay informed!