AI & RoboticsNews

Microsoft’s AI generates 3D objects from 2D images

The AI research labs at Facebook, Nvidia, and startups like Threedy.ai have at various points tried their hand at the challenge of 2D-object-to-3D-shape conversion. But in a new preprint paper, a team hailing from Microsoft Research detail a framework that they claim is the first “scalable” training technique for 3D models from 2D data. They say it can consistently learn to generate better shapes than existing models when trained with exclusively 2D images, which could be a boon for video game developers, ecommerce businesses, and animation studios that lack the means or expertise to create 3D shapes from scratch.

In contrast to previous work, the researchers sought to take advantage of fully featured industrial renderers — i.e., software that produces images from display data. To that end, they train a generative model for 3D shapes such that rendering the shapes generates images matching the distribution of a 2D data set. The generator model takes in a random input vector (values representing the data set’s features) and generates a continuous voxel representation (values on a grid in 3D space) of the 3D object. Then, it feeds the voxels to a non-differentiable rendering process, which thresholds them to discrete values before they’re rendered using an off-the-shelf renderer (the Pyrender, which is built on top of OpenGL).

A novel proxy neural renderer directly renders the continuous voxel grid generated by the 3D generative model. As the researchers explain, it’s trained to match the rendering output of the off-the-shelf renderer given a 3D mesh input.

Microsoft 3D modelAbove: Couches, chairs, and bathtubs generated by Microsoft’s model.

Image Credit: Microsoft

In experiments, the team employed a 3D convolutional GAN architecture for the generator. (GANs are two-part AI models comprising generators that produce synthetic examples from random noise sampled using a distribution, which along with real examples from a training data set are fed to the discriminator, which attempts to distinguish between the two.) Drawing on a range of synthetic data sets generated from 3D models and a real-life data set, they synthesized images from different object categories, which they rendered from different viewpoints throughout the training process.

Microsoft 3D modelAbove: Mushrooms generated by the model.

Image Credit: Microsoft

The researchers say that their approach takes advantage of the lighting and shading cues the images provide, enabling it to extract more meaningful information per training sample and produce better results in those settings. Moreover, it’s able to produce realistic samples when trained on data sets of natural images. “Our approach … successfully detects the interior structure of concave objects using the differences in light exposures between surfaces,” wrote the paper’s coauthors, “enabling it to accurately capture concavities and hollow spaces.”

They leave to future work incorporating color, material, and lighting prediction into their system to extend it to work with more “general” real-world data sets.


Author: Kyle Wiggers.
Source: Venturebeat

Related posts
AI & RoboticsNews

Nvidia and DataStax just made generative AI smarter and leaner — here’s how

AI & RoboticsNews

OpenAI opens up its most powerful model, o1, to third-party developers

AI & RoboticsNews

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models

DefenseNews

Army, Navy conduct key hypersonic missile test

Sign up for our Newsletter and
stay informed!

Worth reading...
How to livestream the Oppo Find X2 and Oppo Smartwatch launch