AI & RoboticsNews

How to succeed with AI and machine learning at scale

Presented by Hewlett Packard Enterprise (BlueData)

You’ve heard all the great benefits that AI can provide for enterprises today. From detecting fraud to predicting machine failure to understanding customer behavior — AI has the potential to deliver game-changing business value in a variety of different areas.

You may have even dabbled with AI and machine learning (ML) models in a few pilot projects. But has your organization actually delivered on the promise of AI with tangible business benefits? If not, you aren’t alone; most of your peers are facing similar issues.

Gartner predicts that over the next year “80% of AI projects will remain alchemy, run by wizards whose talents will not scale in the organization.” We have seen this happen time and again in enterprise organizations embarking on AI projects: they set up an innovation lab to ‘Do AI’, only to realize later that they haven’t been able to operationalize their ML models into real-world business processes.

Only operational ML models — models that have been integrated with business functions in production — deliver business value. So, what does it take to succeed with AI / ML at scale and operationalize your ML models? Here are a few key considerations:

1. Define your business objectives

Many AI / ML projects fail to deliver due to inflated expectations of what AI can do. Before starting an AI initiative, identify the goals of the project. Start with the business goals — what metrics are you trying to improve? For instance, are you trying to reduce your customer churn rate? Reduce cases of fraud? Reduce the time spent on processing customer applications? From the very outset, it’s important to clearly identify the use case, define measurable goals, benchmark current performance, and then realistically define success criteria.

2. Ensure stakeholder alignment

AI projects can also fail to succeed due to a lack of consensus between various stakeholders. Once you’ve identified the use case, map out the different stakeholders who need to be involved. To figure this out, you’ll need to have a plan for how the output of the machine learning model — detection, classification, segmentation, prediction, or recommendation — will be used and who will use it. There’s no point in having an ML system crunch numbers and predict the insurability of a prospective client when the output is either unusable, inaccessible, or simply not planned to be a part of the decision-making process. It’s essential to plan out how the predictions will be made accessible to the downstream tools/processes/people.

3. Hire the right staff and set them up for success

The shortage of available data science talent has been well documented — and hiring for that role remains a fundamental challenge. But success with AI / ML requires more than data science skills: from data prep and model building to training and inference; it’s a team sport requiring multiple different roles, including data engineers, ML architects, and operations. Organizing and scaling the team effectively is another challenge. Do you have the right people and skills in-house to take the project from idea to implementation? You’ll need to determine whether you build up the skills — through hiring and retraining — or hire someone to help complete the project in a given amount of time. Building up the skillset helps with scale in the long run, whereas third-party advisory services may help get the project up and running quickly.

4. Provide the right technology and tools

Time and again, we have seen data science projects stumble due to a lack of planning on the technology front. And it’s not just having the right technology and tools for building and developing models; the operationalization and production deployment aspects of ML models often represent the toughest challenge for any AI project. You need to consider the entire ML lifecycle. There are several aspects of this:

Data: Ensure that you have the right data for your use case. For instance, if you are building and training an ML model to detect cancer, you’ll need very large volumes of high quality, labeled images for that use case. Similarly, to identify anomalies in contract documents, you’ll need to prepare and provide access to high quality, labeled text data.

Tools: It’s not about picking just one tool; there is a multitude of tools in the AI / ML and data science ecosystem, and which tool to use really depends on the use case. Yes, TensorFlow is great, but it cannot solve all problems. The ML space is continuously evolving, and your technology stack needs to support multiple different frameworks including TensorFlow, Keras, PyTorch, and much more. At the same time, your architecture should allow for the creation of collaborative workspaces for the different personas — data scientists, data engineers, ML architects, software engineers — who will be involved in the ML lifecycle.

Infrastructure: Public cloud services do offer some advantages, but the cloud is not a panacea for large-scale AI / ML projects in enterprise organizations. Increasingly, many enterprises are adopting a mixed hybrid cloud approach: using on-premises or cloud infrastructure depending on the use case and stage in the ML lifecycle and depending on where the data they need is located. This allows them to leverage what they’ve already built on-premises while taking advantage of the agility and elasticity offered by public cloud services. The use of cloud-native technology — such as containers — for AI / ML workloads has also greatly improved the speed of development while allowing the flexibility to ‘build anywhere and deploy everywhere.’

Standardized processes: Machine learning workflows differ from typical software engineering workflows. Most enterprises lack standardized ML processes for model management, monitoring, and retrainingThis often hinders collaboration and leads to delayed or lost business value. In addition, ML models are trained on historical data, so their accuracy tends to degrade over time as the underlying data changes. Detecting these deviations requires specialized debugging tools and processes to retrain the models once pre-defined thresholds have been crossed. Setting up the processes, tooling, and infrastructure to store multiple model versions, trigger retraining, and seamlessly update models in production is critical to ML operationalization.

Here at Hewlett Packard Enterprise (HPE), we have a broad portfolio of hardware, software, and expertise to support our enterprise customers on their AI journey. We offer the most comprehensive edge to cloud, AI-optimized infrastructure. With HPE Machine Learning Operations (HPE ML Ops), we provide a secure, turnkey container-based software platform for the entire ML lifecycle. And with HPE Pointnext advisory and consulting services, we help ensure success for AI / ML projects with a focused team of data scientists and domain experts.


Author: Matheen Raza, Hewlett Packard Enterprise
Source: Venturebeat

Related posts
AI & RoboticsNews

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

AI & RoboticsNews

AI comes alive: From bartenders to surgical aides to puppies, tomorrow’s robots are on their way

AI & RoboticsNews

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

DefenseNews

Navy names aircraft carriers after former presidents Bush and Clinton

Sign up for our Newsletter and
stay informed!