Simplifying text’s grammar and structure is a useful skill most of us acquire in school, but AI typically has a tougher go of it owing to a lack of linguistic knowledge. That said, scientists at Facebook AI Research and Inria are progressing toward a simplification model dubbed ACCESS (AudienCe-CEntric Sentence Simplification), which they claim enables customization of text length, amount of paraphrasing, lexical complexity, syntactic complexity, and other parameters while preserving coherency.
“Text simplification can be beneficial for people with cognitive disabilities such as aphasia, dyslexia, and autism, but also for second language learners and people with low literacy,” wrote the researchers in a preprint paper detailing their work. “The type of simplification needed for each of these audiences is different … Yet, research in text simplification has been mostly focused on developing models that generate a single generic simplification for a given source text with no possibility to adapt outputs for the needs of various target populations. [We] propose a controllable simplification model that provides explicit ways for users to manipulate and update simplified outputs as they see fit.”
To this end, the team tapped seq2seq, a general-purpose encoder-decoder framework that takes data in addition to its context as inputs. They prepended a special token value — the ratio of a parameter (like length) calculated on the target sentence with respect to its value on the source sentence — at the beginning of source sentences. Then, they conditioned the model on four selected parameters, namely length, paraphrasing, lexical complexity and syntactic complexity.
In the course of experiments, the team trained a Transformer model on the Wiki-Large data set, which contains over 296,402 samples automatically aligned complex-simple sentence pairs from English Wikipedia and Simple English Wikipedia. They evaluated it on a validation and test sets taken from Turkcorpus, where each complex sentence had eight paraphrased sentences (without splitting, oversimplified structures, or reduction of content) created by Amazon Mechanical Turk workers.
The team reports that on SARI, a popular benchmark that compares predicted simplifications with both the source and the target references, ACCESS scored 41.87, a “significant” improvement over both the previous state of the art (40.45). And on a metric used for measuring readability that doesn’t account for grammaticality and meaning preservation (FKGL), it scored third to best with 7.22.
“We confirmed through an analysis that each parameter has the desired effect on the generated simplifications,” wrote the researchers, who believe their method paves the way toward adapting text simplification to audiences with different needs. “This paper showed that explicitly conditioning … on parameters such as length, paraphrasing, lexical complexity or syntactic complexity increases their performance significantly for sentence simplification.”
Author: Kyle Wiggers
Source: Venturebeat