Generative AI language models like OpenAI’s GPT-2 produce impressively coherent and grammatical text, but controlling the attributes of this text — such as the topic or sentiment — requires architecture modification or tailoring to specific data. That’s why a team of scientists at Uber, Caltech, and the Hong Kong University of Science and Technology devised what they call the Plug and Play Language Model (PPLM), which combines a pretrained language model with one or more attribute classifiers that guide novel text generation.
Preliminary results in a preprint paper show that PPLM is able to control a “range” of topics and sentiment styles, importantly without sacrificing fluency and while retaining flexibility that in any combination of differentiable models steers text generation.
Their research builds on that published by Google and the University of Michigan late last year, which investigated an architecture that could generate sentences from a given sample and change the mood, complexity, tense, or even voice while preserving the original text’s meaning meaning. And it could inform work on Plato, Uber’s platform for developing and testing conversational AI, which was released in July with connectors that integrate with existing machine learning and model-tuning frameworks.
PPLM use two or more small and easy-to-construct attribute models, according to the researchers: a bag of words, or a representation that describes the occurrence of words within a text snippet; and a linear discriminator, or a method that characterizes two or more classes of objects or events. Optimization is performed after the fact and the control is fine-grained, with a strength parameter that determines how strong the attribute influence should be.
“PPLM can be thought of as a generic machine for flipping literary critic models … that can judge the worthiness of text they read into author models … that can write something similarly worthy. In this sense, PPLM’s machinery is quite generic, permitting use of any differentiable attribute models, which will enable diverse and creative applications beyond the examples given here,” explained Uber AI Labs researchers Eric Frank and Colan Chen in a blog post, adding that any number of attribute controllers can be combined with a state-of-the-art generative model. “With this research, we introduce … an on-the-fly configurable method for controlled language generation that allows flexible combination of a large, pre-trained language model with one or more … models.”
In a series of experiments, given a bag of words attribute model and the topic “space” with different prefixes (“prior to this,” “an illustration of,” “the relationship between”), PPLM generated coherent paragraph-long samples. More impressively, the framework successfully switched from “space” to subjects like “military” and “science” while deftly handling prefixes that started the sentence nowhere near the intended topic. It also controlled the generation toward a mixed topic about “winter,” “politics,” and “kitchen,” and in a separate test that tapped a sentiment discriminator trained on a popular open source data set, PPLM steered text generation toward both positive and negative sentiments.
PPLM is applicable to a range of scenarios, the researchers note, including those involving biased or offensive speech. Prior studies have demonstrated that adversarial attacks can make language models produce racist output, but in a test involving 10 adversarial triggers and evaluators tasked with flagging the toxicity of 500 samples, the PPLM dropped the fraction of generated toxic speech from 63.6% to 4.6%.
Of course, PPLM might be used by malicious actors to create more toxic language, which the researchers assert is a risk inherent to such general-purpose technologies. “We believe that on balance this work creates a net positive by advancing the discussion of model capabilities, properties, and potential solutions to risks faced by current models,” wrote Frank and Chen. “There has recently been substantial discussion around the ethics of capable language models, both in their potential to recapitulate problematic social biases and for them to be directly abused for societal harm.”
The Uber research team has released the code accompanying their paper, a programming notebook in Google Colab, and an online demo developed in collaboration with Hugging Face. Said demo, which is publicly available, provides a dozen attribute models.
Author: Kyle Wiggers