Meta’s Fundamental AI Research (FAIR) team, is releasing several new AI models and tools for researchers to use. They are centered on audio generation, text-to-vision, and watermarking.
“By publicly sharing our early research work, we hope to inspire iterations and ultimately help advance AI in a responsible way,” the company said in a press release.
Audio creation model JASCO and watermarking tools
First up, Meta is releasing a new AI model called JASCO, which is short for Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.
JASCO can take different audio inputs, such as a chord or a beat, to improve the final AI-generated sound. According to a paper from FAIR’s researchers, JASCO lets users adjust features of a generated sound like chords, drums, and melodies to hone in on the final sound they want all through text.
VB Transform 2024 Registration is Open
Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
FAIR plans to release the JASCO inference code as part of its AudioCraft AI audio model library under an MIT license and the pre-trained model on a non-commercial Creative Commons license.
The company will also launch AudioSeal, which adds watermarks to AI-generated speech, one of Meta’s tools to identify content as made with AI.
“We believe [AudioSeal] is the first audio watermarking technique designed specifically for the localized detection of AI-generated speech, making it possible to pinpoint AI-generated segments within a longer audio snippet,” the company said in a press release.
AudioSeal detects AI-generated audio in a longer sound clip. Meta said that more localized detection “allows for faster and more efficient detection” and increases detection speed by 485 times. Unlike other models, AudioSeal will be released with a commercial license.
Meta also hopes researchers build on other models around images and text.
Chameleon model released to the public
FAIR will also release two sizes of its multimodal text model Chameleon to the public under a research-only license.
Chameleon 7B and 34B allow users to point the models to tasks requiring visual and textual understanding, such as image captioning.
However, in a press release, Meta said that it will not release the Chameleon image generation model “at this time,” and only the text-related models will be accessible.
The company will also give researchers access to its multi-token prediction approach that trains language models on multiple future words at once rather than one at a time. It will only be available on a non-commercial and research-only license.