Stability AI is opening up its generative AI efforts for audio today with the release of Stable Audio Open 1.0.
Stability AI is perhaps best known for its stable diffusion text-to-image generation AI technology, but that’s only one part of the company’s broader portfolio, which includes multiple models for code, text and audio. In Sept. 2023, Stability AI first publicly launched Stable Audio, as a text-to-audio generative AI tool. Stable Audio 2.0 was released on April 3 bringing more clarity and length to the generated audio.
While the full Stable Audio tool is available for general commercial use and can generate audio of up to 3 minutes, the new Stable Audio Open is significantly more limited. With Stable Audio Open the purpose is not to create full songs, rather it has a restricted focus on shorter pieces such as sound effects.
Stable Audio Open as the name implies, is also an open model, though it’s not technically open source. Rather than using an actual Open Source Initiative (OSI) approved license, Stable Audio Open is available to users under the Stability AI non-commercial research community agreement license. That license provides open access to the model, but it limits what users can do with it.
“Our goal with Stable Audio Open is to provide audio researchers and producers with hands-on access to one of our generative audio models in order to accelerate research, adoption, and practical creative use of these incredible new tools,” Zach Evans, head of audio research at Stability AI told VentureBeat.
Stable Audio Open is a specialized model optimized for creating things like drum beats, instrument riffs, ambient sounds and other audio samples for music production and sound design.
Unlike Stability AI’s commercial Stable Audio product, which produces longer, coherent musical tracks up to three minutes in length, Stable Audio Open is focused on generating high-quality audio data up to 47 seconds long using text prompts.
Stability AI has also taken a responsible approach to how the model was trained. The model was trained on audio data from FreeSound and the Free Music Archive, ensuring that no copyrighted or proprietary material was used without permission.
One of the key benefits of the Stable Audio Open release is that users can fine-tune the model on their own custom audio data. For instance, a drummer could fine-tune the model on samples of their own drum recordings to generate new, unique beats.
The fine-tuning of Stable Audio is enabled via the Stable Audio Tools library, which is licensed under an actual open-source license. The Stable Audio Open Model weights are now available on Hugging Face.
“The audio research team is constantly working on ways to improve the quality and controllability of our generative audio models,” Evan said. “We look forward to further commercial and open model releases that reflect the progress made by our research.”
Transform 2024 returns this July! Over 400 enterprise leaders will gather in San Francisco from July 9-11 to dive into the advancement of GenAI strategies and engaging in thought-provoking discussions within the community. Find out how you can attend here.
Stability AI is opening up its generative AI efforts for audio today with the release of Stable Audio Open 1.0.
Stability AI is perhaps best known for its stable diffusion text-to-image generation AI technology, but that’s only one part of the company’s broader portfolio, which includes multiple models for code, text and audio. In Sept. 2023, Stability AI first publicly launched Stable Audio, as a text-to-audio generative AI tool. Stable Audio 2.0 was released on April 3 bringing more clarity and length to the generated audio.
While the full Stable Audio tool is available for general commercial use and can generate audio of up to 3 minutes, the new Stable Audio Open is significantly more limited. With Stable Audio Open the purpose is not to create full songs, rather it has a restricted focus on shorter pieces such as sound effects.
Stable Audio Open as the name implies, is also an open model, though it’s not technically open source. Rather than using an actual Open Source Initiative (OSI) approved license, Stable Audio Open is available to users under the Stability AI non-commercial research community agreement license. That license provides open access to the model, but it limits what users can do with it.
Transform 2024 Registration is Open
Join enterprise leaders in San Francisco from July 9 to 11 for an exclusive AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
“Our goal with Stable Audio Open is to provide audio researchers and producers with hands-on access to one of our generative audio models in order to accelerate research, adoption, and practical creative use of these incredible new tools,” Zach Evans, head of audio research at Stability AI told VentureBeat.
What exactly is Stable Audio Open?
Stable Audio Open is a specialized model optimized for creating things like drum beats, instrument riffs, ambient sounds and other audio samples for music production and sound design.
Unlike Stability AI’s commercial Stable Audio product, which produces longer, coherent musical tracks up to three minutes in length, Stable Audio Open is focused on generating high-quality audio data up to 47 seconds long using text prompts.
Stability AI has also taken a responsible approach to how the model was trained. The model was trained on audio data from FreeSound and the Free Music Archive, ensuring that no copyrighted or proprietary material was used without permission.
Unleashing creativity with fine-tuning on Stable Audio Open
One of the key benefits of the Stable Audio Open release is that users can fine-tune the model on their own custom audio data. For instance, a drummer could fine-tune the model on samples of their own drum recordings to generate new, unique beats.
The fine-tuning of Stable Audio is enabled via the Stable Audio Tools library, which is licensed under an actual open-source license. The Stable Audio Open Model weights are now available on Hugging Face.
“The audio research team is constantly working on ways to improve the quality and controllability of our generative audio models,” Evan said. “We look forward to further commercial and open model releases that reflect the progress made by our research.”
Author: Sean Michael Kerner
Source: Venturebeat
Reviewed By: Editorial Team