AI & Robotics News

Amazon’s AI generates images of clothing to match text queries

March 2, 2020

Generative adversarial networks (GANs) — two-part AI models consisting of a generator that creates samples and a discriminator that attempts to differentiate between the generated samples and real-world samples — have been applied to tasks from video, artwork, and music synthesis to drug discovery and misleading media detection. They’ve also made their way into ecommerce, as Amazon revealed in a blog post this morning. Scientists at the tech giant describe a GAN that generates clothing examples to match product descriptions, which they say could be used to refine customer text queries. For instance, a shopper could search on “women’s black pants,” then add the word “petite” and the word “capri,” and the images on-screen would adjust accordingly with each new word,

It’s not unlike the GAN model commercialized by startup Vue.ai, which susses out clothing characteristics and learns to produce realistic poses, skin colors, and other features. From snapshots of apparel, it’s able to generate model images in every size up to five times faster than a traditional photo shoot.

Amazon’s proposed system — ReStGAN — is a modification of an existing system — StackGAN — that produces images by splitting them into two parts. Using a GAN, it first generates a low-resolution image directly from text, after which it upsamples the image with a GAN to a higher-resolution version with textures and natural coloration. The GANs are trained with a long short-term memory AI model that processes sequential inputs in order, enabling them to refine images as successive words are added to the inputs. And to make the task of synthesizing from the descriptions easier, the system is restricted to three product classes — pants, jeans, and shorts — for which the training images are standardized (i.e., the backgrounds are removed and the images are cropped and re-sized so that they’re alike in shape and scale).

ReStGAN

The research team trained the system in an unsupervised fashion, meaning the training data consisted of product titles and images that didn’t require any additional human annotation. They increased its stability using an auxiliary classifier that categorized images generated by the model according to three properties: apparel type (pants, jeans, or shorts), color, and whether they depicted men’s, women’s, or unisex clothing. And they grouped colors in a representational space called LAB, which was designed so that the distance between points corresponded to perceived color differences, forming the basis for a lookup table that maps visually similar colors to the same features of the textual descriptions.

The ability to retain old visual features while adding new ones is one of the novelties of the system, according to the researchers, the other being the color model, which yields images whose colors better match textual inputs. In experiments, the team reports that ReStGAN classified product type and gender 22% to 27% more accurately, respectively, compared with the previous best-performing models based on the StackGAN architecture. In the case of color, it improved 100%.

Author: Kyle Wiggers.
Source: Venturebeat

929

0