AI & RoboticsNews

Reimagining the data center for the age of generative AI

This article is part of a VB special issue. Read the full series here: The future of the data center: Handling greater and greater demands.

Today, any conversation about artificial intelligence is bound to include the rise of ChatGPT, the ubiquitous chatbot built on OpenAI’s GPT series of large language models (LLMs). But how can you feed the demands of this kind of generative AI technology in your data center? 

The chatbot launched late last year and is making waves with its content-generation capabilities. People are using ChatGPT and competing bots from other vendors to get complex questions answered as well as to automate tasks such as writing software code and producing marketing copy.

But with all the possibilities inherent in this generative AI technology, using foundational models to their full potential has been difficult. Most of the models out there have been trained on publicly available data, which makes them less than ideal for specific enterprise applications like querying sensitive internal documents.

Enterprises want these models to work on internal corporate data. But does that mean they have to go all in and build them from scratch? Let’s dive in.

The task of building a LLM, such as GPT-3 or GPT-4, requires multiple steps, starting with compute-heavy training that demands hundreds, if not thousands, of expensive GPUs clustered together in data center servers for several weeks or months.

“The initial training requires a very significant amount of computing power. For example, the BLOOM model, a 176-billion parameter open-source alternative to GPT-3, required 117 days of training on a 384-GPU cluster. This is roughly equivalent to 120 GPU years,” Julien Simon, chief evangelist at Hugging Face, told VentureBeat.

As the size of the model increases, the number of GPUs required to train and retrain it increases. Google, for instance, had to plug in 6,144 chips to train its 540 billion-parameter PaLM model. The process also demands expertise in advanced training techniques and tools (such as Microsoft DeepSpeed and Nvidia MegaTron-LM), which may not be readily available in the organization.

Once the training is done, these chips are then needed to run inference on the model on an ongoing basis, further adding to the cost. To put it into perspective, using just 500 of Nvidia’s DGX A100 multi-GPU servers, which are commonly used for LLM training and inference, at $199,000 a piece would mean spending about $100 million on the project. On top of this, the additional power draw and thermal output stemming from the servers will add to the total cost of ownership.

That’s a lot of investment in data center infrastructure, especially for companies that are not dedicated AI organizations and are only looking to LLMs to accelerate certain business use cases.

Unless a company has unique high-quality datasets that could create a model with a solid competitive advantage that would be worth the investment, the best way to go ahead is fine-tuning existing open-source LLMs for specific use cases on the organization’s own data — corporate documents, customer emails, etc.

“A good counterexample is the BloombergGPT model, a 50 billion-parameter [model] trained by Bloomberg from scratch … How many organizations can confidently claim that they have the same amount of unique high-quality data? Not so many,” Hugging Face’s Simon said.

“Fine-tuning, on the other hand, is a much more lightweight process that will require only a fraction of the time, budget and effort. The Hugging Face hub currently hosts over 250,000 open-source models for a wide range of natural language processing, computer vision and audio tasks. Chances are you’ll find one that is a good starting point for your project,” he said.

If an enterprise does see value in building an LLM from scratch, it should start small and use managed cloud infrastructure and machine learning (ML) services instead of buying expensive GPUs for on-site deployment right away.

“We initially used cloud-hosted MLOps infrastructure, which enabled us to spend more time developing the technology as opposed to worrying about hardware. As we have grown and the architecture of our solution has settled down from the early rapid research and development days, it has now made sense to tackle local hosting [of] the models,” Bars Juhasz, CTO and cofounder of content generator Undetectable AI, told VentureBeat.

The cloud also provides more training options to choose from, going beyond Nvidia GPUs to those from AMD and Intel as well as customer accelerators such as Google TPU and AWS Trainium.

On the other hand, in cases where local laws or regulations mandate staying away from the cloud, on-site deployment with accelerated hardware such as GPUs will be the default first choice.

Before rushing to invest in GPUs, skills, or cloud partners for domain-specific LLMs and applications based on them, it is important for technical decision-makers to define a clear strategy by collaborating with other leaders in the enterprise and with subject matter experts. It is helpful to focus on the business case for the decision, and have an approximate idea of what the current and future demands of such workloads would be.

With this kind of planning, enterprises can make informed decisions about when and how to invest in training an LLM. This includes aspects like what kind of hardware to choose, where they can use pre-existing models developed by others, and who might be the right partners on their AI journeys.

“The landscape of AI/ML is moving incredibly quickly … If the inclusion of these new technologies is treated with the traditional mindset of future-proofing, it is likely the solution will be antiquated relatively quickly. The specialized nature of the technologies and hardware in question means a better choice may be to first develop the solution outlook, and upgrade their data centers accordingly,” Juhasz said.

“It can be easy to buy into the hype and trend of adopting new technology without dignified reason, but this will undoubtedly potentially lead to disappointment and dismissal of real use cases that the business could benefit from in the future,” he said. “A better approach may be to remain level-headed, invest time in understanding the technologies in question, and work with stakeholders to assess where the benefits could be reaped from integration.”

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here


This article is part of a VB special issue. Read the full series here: The future of the data center: Handling greater and greater demands.

Today, any conversation about artificial intelligence is bound to include the rise of ChatGPT, the ubiquitous chatbot built on OpenAI’s GPT series of large language models (LLMs). But how can you feed the demands of this kind of generative AI technology in your data center? 

The chatbot launched late last year and is making waves with its content-generation capabilities. People are using ChatGPT and competing bots from other vendors to get complex questions answered as well as to automate tasks such as writing software code and producing marketing copy.

But with all the possibilities inherent in this generative AI technology, using foundational models to their full potential has been difficult. Most of the models out there have been trained on publicly available data, which makes them less than ideal for specific enterprise applications like querying sensitive internal documents.

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

 


Register Now

Enterprises want these models to work on internal corporate data. But does that mean they have to go all in and build them from scratch? Let’s dive in.

Building large language models: A costly affair within data centers

The task of building a LLM, such as GPT-3 or GPT-4, requires multiple steps, starting with compute-heavy training that demands hundreds, if not thousands, of expensive GPUs clustered together in data center servers for several weeks or months.

“The initial training requires a very significant amount of computing power. For example, the BLOOM model, a 176-billion parameter open-source alternative to GPT-3, required 117 days of training on a 384-GPU cluster. This is roughly equivalent to 120 GPU years,” Julien Simon, chief evangelist at Hugging Face, told VentureBeat.

As the size of the model increases, the number of GPUs required to train and retrain it increases. Google, for instance, had to plug in 6,144 chips to train its 540 billion-parameter PaLM model. The process also demands expertise in advanced training techniques and tools (such as Microsoft DeepSpeed and Nvidia MegaTron-LM), which may not be readily available in the organization.

Once the training is done, these chips are then needed to run inference on the model on an ongoing basis, further adding to the cost. To put it into perspective, using just 500 of Nvidia’s DGX A100 multi-GPU servers, which are commonly used for LLM training and inference, at $199,000 a piece would mean spending about $100 million on the project. On top of this, the additional power draw and thermal output stemming from the servers will add to the total cost of ownership.

That’s a lot of investment in data center infrastructure, especially for companies that are not dedicated AI organizations and are only looking to LLMs to accelerate certain business use cases.

The ideal approach toward a data center for the age of AI

Unless a company has unique high-quality datasets that could create a model with a solid competitive advantage that would be worth the investment, the best way to go ahead is fine-tuning existing open-source LLMs for specific use cases on the organization’s own data — corporate documents, customer emails, etc.

“A good counterexample is the BloombergGPT model, a 50 billion-parameter [model] trained by Bloomberg from scratch … How many organizations can confidently claim that they have the same amount of unique high-quality data? Not so many,” Hugging Face’s Simon said.

“Fine-tuning, on the other hand, is a much more lightweight process that will require only a fraction of the time, budget and effort. The Hugging Face hub currently hosts over 250,000 open-source models for a wide range of natural language processing, computer vision and audio tasks. Chances are you’ll find one that is a good starting point for your project,” he said.

If an enterprise does see value in building an LLM from scratch, it should start small and use managed cloud infrastructure and machine learning (ML) services instead of buying expensive GPUs for on-site deployment right away.

“We initially used cloud-hosted MLOps infrastructure, which enabled us to spend more time developing the technology as opposed to worrying about hardware. As we have grown and the architecture of our solution has settled down from the early rapid research and development days, it has now made sense to tackle local hosting [of] the models,” Bars Juhasz, CTO and cofounder of content generator Undetectable AI, told VentureBeat.

The cloud also provides more training options to choose from, going beyond Nvidia GPUs to those from AMD and Intel as well as customer accelerators such as Google TPU and AWS Trainium.

On the other hand, in cases where local laws or regulations mandate staying away from the cloud, on-site deployment with accelerated hardware such as GPUs will be the default first choice.

Planning remains key

Before rushing to invest in GPUs, skills, or cloud partners for domain-specific LLMs and applications based on them, it is important for technical decision-makers to define a clear strategy by collaborating with other leaders in the enterprise and with subject matter experts. It is helpful to focus on the business case for the decision, and have an approximate idea of what the current and future demands of such workloads would be.

With this kind of planning, enterprises can make informed decisions about when and how to invest in training an LLM. This includes aspects like what kind of hardware to choose, where they can use pre-existing models developed by others, and who might be the right partners on their AI journeys.

“The landscape of AI/ML is moving incredibly quickly … If the inclusion of these new technologies is treated with the traditional mindset of future-proofing, it is likely the solution will be antiquated relatively quickly. The specialized nature of the technologies and hardware in question means a better choice may be to first develop the solution outlook, and upgrade their data centers accordingly,” Juhasz said.

“It can be easy to buy into the hype and trend of adopting new technology without dignified reason, but this will undoubtedly potentially lead to disappointment and dismissal of real use cases that the business could benefit from in the future,” he said. “A better approach may be to remain level-headed, invest time in understanding the technologies in question, and work with stakeholders to assess where the benefits could be reaped from integration.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Author: Shubham Sharma
Source: Venturebeat

Related posts
AI & RoboticsNews

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

AI & RoboticsNews

AI comes alive: From bartenders to surgical aides to puppies, tomorrow’s robots are on their way

AI & RoboticsNews

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

DefenseNews

Navy names aircraft carriers after former presidents Bush and Clinton

Sign up for our Newsletter and
stay informed!