Apple, a company practically synonymous with technological innovation, has once again positioned itself at the forefront of the AI revolution.
The Cupertino, Calif.-based company recently announced significant strides in artificial intelligence research through two new papers introducing new techniques for 3D avatars and efficient language model inference. The advancements could enable more immersive visual experiences and allow complex AI systems to run on consumer devices such as the iPhone and iPad.
In the first research paper, Apple scientists propose HUGS (Human Gaussian Splats) to generate animated 3D avatars from short monocular videos (i.e. videos taken from a single camera). “Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes,” said lead author Muhammed Kocabas.
HUGS represents both the human and background scene using 3D Gaussian splatting, an efficient rendering technique. The human model is initialized from a statistical body shape model called SMPL. But HUGS allows the Gaussians to deviate, enabling capture of details like clothing and hair.
A novel neural deformation module animates the Gaussians in a realistic fashion using linear blend skinning. This coordinated movement avoids artifacts while reposing the avatar. According to Kocabas, HUGS “enables novel-pose synthesis of human and novel view synthesis of both the human and the scene.”
Compared to previous avatar generation methods, HUGS is up to 100 times faster in training and rendering. The researchers demonstrate photorealistic results after optimizing the system for just 30 minutes on a typical gaming GPU. HUGS also outperforms state-of-the-art techniques like Vid2Avatar and NeuMan on 3D reconstruction quality.
The new 3D modeling capabilitiy is a really impressive achievement from Apple researchers. The real-time performance and ability to create avatars from in-the-wild videos could unlock new possibilities for virtual try-on, telepresence, and synthetic media in the relatively near future. Imagine the possibilities if you could create novel 3D scenes like this right on your iPhone camera!
In the second paper, Apple researchers tackled a key challenge in deploying large language models (LLMs) on devices with limited memory. Modern natural language models like GPT-4 contain hundreds of billions of parameters, making inference expensive on consumer hardware.
The proposed system minimizes data transfer from flash storage into scarce DRAM during inference. “Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks,” explained lead author Keivan Alizadeh.
Two main techniques are introduced. “Windowing” reuses activations from recent inferences, while “row-column bundling” reads larger blocks of data by storing rows and columns together. On an Apple M1 Max CPU, these methods improve inference latency by 4-5x compared to naive loading. On a GPU, the speedup reaches 20-25x.
“This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility,” said co-author Mehrdad Farajtabar. The optimizations could soon allow complex AI assistants and chatbots to run smoothly on iPhone, iPads, and other mobile devices.
Both papers demonstrate Apple’s growing leadership in AI research and applications. While promising, experts caution that Apple will need to exercise great care and responsibility when incorporating these technologies into consumer products. From privacy protection to mitigating misuse, the societal impact must be considered.
As Apple potentially integrates these innovations into its product lineup, it’s clear that the company is not just enhancing its devices but also anticipating the future needs of AI-infused services. By allowing more complex AI models to run on devices with limited memory, Apple is potentially setting the stage for a new class of applications and services that leverage the power of LLMs in a way that was previously unfeasible.
Furthermore, by publishing this research, Apple is contributing to the broader AI community, which could stimulate further advancements in the field. It’s a move that reflects Apple’s confidence in its position as a tech leader and its commitment to pushing the boundaries of what’s possible.
If applied judiciously, Apple’s latest innovations could take artificial intelligence to the next level. Photorealistic digital avatars and powerful AI assistants on portable devices once seemed far off — but thanks to Apple’s scientists, the future is rapidly becoming reality.
Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.
Apple, a company practically synonymous with technological innovation, has once again positioned itself at the forefront of the AI revolution.
The Cupertino, Calif.-based company recently announced significant strides in artificial intelligence research through two new papers introducing new techniques for 3D avatars and efficient language model inference. The advancements could enable more immersive visual experiences and allow complex AI systems to run on consumer devices such as the iPhone and iPad.
In the first research paper, Apple scientists propose HUGS (Human Gaussian Splats) to generate animated 3D avatars from short monocular videos (i.e. videos taken from a single camera). “Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes,” said lead author Muhammed Kocabas.
HUGS represents both the human and background scene using 3D Gaussian splatting, an efficient rendering technique. The human model is initialized from a statistical body shape model called SMPL. But HUGS allows the Gaussians to deviate, enabling capture of details like clothing and hair.
VB Event
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
A novel neural deformation module animates the Gaussians in a realistic fashion using linear blend skinning. This coordinated movement avoids artifacts while reposing the avatar. According to Kocabas, HUGS “enables novel-pose synthesis of human and novel view synthesis of both the human and the scene.”
Compared to previous avatar generation methods, HUGS is up to 100 times faster in training and rendering. The researchers demonstrate photorealistic results after optimizing the system for just 30 minutes on a typical gaming GPU. HUGS also outperforms state-of-the-art techniques like Vid2Avatar and NeuMan on 3D reconstruction quality.
The new 3D modeling capabilitiy is a really impressive achievement from Apple researchers. The real-time performance and ability to create avatars from in-the-wild videos could unlock new possibilities for virtual try-on, telepresence, and synthetic media in the relatively near future. Imagine the possibilities if you could create novel 3D scenes like this right on your iPhone camera!
Bridging the memory gap in AI inference
In the second paper, Apple researchers tackled a key challenge in deploying large language models (LLMs) on devices with limited memory. Modern natural language models like GPT-4 contain hundreds of billions of parameters, making inference expensive on consumer hardware.
The proposed system minimizes data transfer from flash storage into scarce DRAM during inference. “Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks,” explained lead author Keivan Alizadeh.
Two main techniques are introduced. “Windowing” reuses activations from recent inferences, while “row-column bundling” reads larger blocks of data by storing rows and columns together. On an Apple M1 Max CPU, these methods improve inference latency by 4-5x compared to naive loading. On a GPU, the speedup reaches 20-25x.
“This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility,” said co-author Mehrdad Farajtabar. The optimizations could soon allow complex AI assistants and chatbots to run smoothly on iPhone, iPads, and other mobile devices.
Apple’s strategic vision
Both papers demonstrate Apple’s growing leadership in AI research and applications. While promising, experts caution that Apple will need to exercise great care and responsibility when incorporating these technologies into consumer products. From privacy protection to mitigating misuse, the societal impact must be considered.
As Apple potentially integrates these innovations into its product lineup, it’s clear that the company is not just enhancing its devices but also anticipating the future needs of AI-infused services. By allowing more complex AI models to run on devices with limited memory, Apple is potentially setting the stage for a new class of applications and services that leverage the power of LLMs in a way that was previously unfeasible.
Furthermore, by publishing this research, Apple is contributing to the broader AI community, which could stimulate further advancements in the field. It’s a move that reflects Apple’s confidence in its position as a tech leader and its commitment to pushing the boundaries of what’s possible.
If applied judiciously, Apple’s latest innovations could take artificial intelligence to the next level. Photorealistic digital avatars and powerful AI assistants on portable devices once seemed far off — but thanks to Apple’s scientists, the future is rapidly becoming reality.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.
Author: Michael Nuñez
Source: Venturebeat
Reviewed By: Editorial Team