AI & RoboticsNews

Speech AI, supercomputing in the cloud, and GPUs for LLMs and generative AI among Nvidia’s next big moves

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More


At its GTC 2023 conference, Nvidia revealed its plans for speech AI, with large language model (LLM) development playing a key role. Continuing to grow its software prowess, the hardware giant has announced a suite of tools to aid developers and organizations working toward advanced natural language processing (NLP)

In this regard, the company unveiled NeMo and DGX Cloud on the software side, and Hopper GPU on the hardware one. NeMo, part of the Nvidia AI Foundations cloud services, creates AI-driven language and speech models. DGX Cloud is an infrastructure platform specially designed for delivering premium services over the cloud and running custom AI models. In Nvidia’s new lineup of AI hardware, the much awaited Hopper GPU is now available and poised to enhance real-time LLM inference.

>>Follow VentureBeat’s ongoing Nvidia GTC spring 2023 coverage<<

Dialing up LLM workloads in the cloud

Nvidia’s DGX Cloud is an AI supercomputing service that gives enterprises immediate access to the infrastructure and software needed to train advanced models for LLMs, generative AI and other groundbreaking applications.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

 


Register Now

DGX Cloud provides dedicated clusters of DGX AI supercomputing paired with Nvidia’s proprietary AI software. This service in effect allows every enterprise to access its own AI supercomputer through a simple web browser, eliminating the complexity associated with acquiring, deploying and managing on-premises infrastructure.

Moreover, the service includes support from Nvidia experts throughout the AI development pipeline. Customers can work directly with Nvidia engineers to optimize their models and resolve development challenges across a broad range of industry use cases.

“We are at the iPhone moment of AI, “said Jensen Huang, founder and CEO of Nvidia. “Startups are racing to build disruptive products and business models, and incumbents are looking to respond. DGX Cloud gives customers instant access to Nvidia AI supercomputing in global-scale clouds.”

ServiceNow uses DGX cloud with on-premises Nvidia DGX supercomputers for flexible, scalable hybrid-cloud AI supercomputing that helps power its AI research on large language models, code generation and causal analysis.

ServiceNow also co-stewards the BigCode project, a responsible open-science LLM initiative, which is trained on the Megatron-LM framework from Nvidia.

“BigCode was implemented using multi-query attention in our Nvidia Megatron-LM clone running on a single A100 GPU,” Jeremy Barnes, vice president of product platform, AI at ServiceNow, told VentureBeat. “This resulted in inference latency being halved and throughput increased 3.8 times, illustrating the kind of workloads possible at the cutting edge of LLMs and generative AI on Nvidia.”

Barnes said that ServiceNow aims to improve user experience and automation outcomes for customers.

“The technologies are developed in our fundamental and applied AI research groups, who are focused on the responsible development of foundation models for enterprise AI,” Barnes added. 

The DGX cloud instances start at $36,999 per instance per month.

Streamlining speech AI development

The Nvidia NeMo service is designed to assist enterprises in combining LLMs with their proprietary data to improve chatbots, customer service and other applications. As part of the newly launched Nvidia AI Foundations family of cloud services, the Nvidia NeMo service enables businesses to close the gap by augmenting their LLMs with proprietary data. This allows them to frequently update a model’s knowledge base through reinforcement learning without starting from scratch.

“Our current emphasis is on customization for LLM models,” said Manuvir Das, vice president of enterprise computing at Nvidia, during a GTC pre-briefing. “Using our services, enterprises can either build language models from scratch or utilize our sample architectures.”

This new functionality in the NeMo service empowers large language models to retrieve accurate information from proprietary data sources and generate conversational, humanlike responses to user queries.

NeMo aims to help enterprises keep pace with a constantly changing landscape, unlocking capabilities such as highly accurate AI chatbots, enterprise search engines and market intelligence tools. With NeMo, enterprises can build models for NLP, real-time automated speech recognition (ASR) and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants and automated call center support.

Nvidia NeMo architecture
The NeMo architecture. Image source: Nvidia

NeMo can assist enterprises in building models that can learn from and adapt to an evolving knowledge base independent of the dataset that the model was initially trained on. Instead of requiring an LLM to be retrained to account for new information, NeMo can tap into enterprise data sources for up-to-date details.

This capability allows enterprises to personalize large language models with regularly updated, domain-specific knowledge for their applications. It also includes the ability to cite sources for the language model’s responses, enhancing user trust in the output.

Developers using NeMo can also set up guardrails to define the AI’s area of expertise, providing better control over the generated responses.

Nvidia said that Quantiphi, a digital engineering solutions and platforms company, is working with NeMo to build a modular generative AI solution to help enterprises create customized LLMs to improve worker productivity. Its teams are also developing tools that enable users to search for up-to-date information across unstructured text, images and tables in seconds.

LLM architectures on steroids? 

Nvidia also announced four inference GPUs, optimized for a diverse range of emerging LLM and generative AI applications. These GPUs are aimed at assisting developers in creating specialized AI-powered applications that can provide new services and insights quickly. Furthermore, each GPU is designed to be optimized for specific AI inference workloads while also featuring specialized software.

Out of the four GPUs unveiled at the GTC, the Nvidia H100 NVL is exclusively tailored for LLM deployment, making it an apt choice for deploying massive LLMs, such as ChatGPT, at scale. The H100 NVL boasts 94GB of memory with transformer engine acceleration, and offers up to 12 times faster inference performance at GPT-3 compared to the previous generation A100 at the data center scale.

Moreover, the GPU’s software layer includes the Nvidia AI Enterprise software suite. The suite encompasses Nvidia TensorRT, a high-performance deep learning inference software development kit, and Nvidia Triton inference server, an open-source inference-serving software that standardizes model deployment.

The H100 NVL GPU will launch in the second half of this year.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Author: Victor Dey
Source: Venturebeat

Related posts
AI & RoboticsNews

Nvidia and DataStax just made generative AI smarter and leaner — here’s how

AI & RoboticsNews

OpenAI opens up its most powerful model, o1, to third-party developers

AI & RoboticsNews

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models

DefenseNews

Army, Navy conduct key hypersonic missile test

Sign up for our Newsletter and
stay informed!