Generative adversarial networks (GANs) — two-part AI systems consisting of generators that create samples and that attempt to distinguish between the generated samples and real-world samples — have countless uses, and one of them is producing synthetic data. Researchers at Uber recently leveraged this in a paper titled “Accelerating Neural Architecture Search by Learning,” which proposes a tailored GAN — Generative Teaching Networks (GTNs) — that generates data or training environments from which a model learns before being tested on a target task. They say that they speed up searches for algorithms by up to nine times compared with approaches that use real data alone, and that GTNs are competitive with state-of-the-art architectures that achieve top performance while using “orders of magnitude” less computation.
As the contributing authors explain in a blog post, most model searches require “substantial” resources because they evaluate models by training them on a data set until their performance no longer improves. This process might be repeated for thousands or more model architectures in a single cycle, which is both expensive in terms of computation and incredibly time-consuming. Some algorithms avoid the cost by only training for a small amount of time and taking the resulting performance as an estimate of true performance, but this training can be further sped up by tapping machine learning — i.e., GTNs — to create training data.
GTNs achieve success by creating unrealistic data that’s helpful in the course of learning. They’re able to combine information about many different types of an object together, or focus training mostly on the hardest examples, and to evaluate the model-in-training on real-world data. Furthermore, they use a learning curriculum — a set of training examples in a specific order — to improve performance over generators that produce unordered random distributions of examples.
In experiments, the team says that models trained by GTNs achieved 98.9% accuracy on the popular open-source MNIST data set in 32 steps (about 0.5 seconds) of training, over which they ingested 4,096 synthetic images once (less than 10% of the images in the MNIST training data set). Evaluated on another data set — CIFAR-10, which is designed to measure model search performance — the models learned up to four times faster than on real data for the same performance level, even when compared with an optimized real-data learning algorithm. Moreover, performance on GTN data turned out to be generally predictive of true performance — that is, achieving the same predictive power as achieved with only 128 steps on GTN-generated data would’ve required 1200 steps on real data.
“Because GTNs evaluate each architecture faster, they are able to evaluate more total architectures within a fixed compute budget. In every case we tried, using GTN-generated data proved to be faster and led to higher performance than using real data. That result held even when we gave the real-data control ten days of compute compared to two-thirds of a day for GTN,” wrote the coauthors. “Through our research, we showed that GTN-generated training data creates a fast … method that is competitive with state-of-the-art … algorithms, but via an entirely different approach. Having this extra tool of GTNs in our … toolbox can help Uber, all companies, and all scientists around the world improve the performance of deep learning in every domain in which it is being applied.”
Author: Kyle Wiggers
Source: Venturebeat