Resemble AI is launching Rapid Voice Cloning, a new feature of its platform that significantly expedites the process of generating voice clones. The company works in the elusive AI voice category focused on enterprise users.
Available today, Rapid Voice Cloning can duplicate voices from relatively short datasets and produce an output in just about a minute. The move, Resemble says, marks a significant development and will make voice cloning technology more accessible, empowering more users to create custom voices for their applications. The company believes it will make an impact across fields such as content creation, personalization and accessibility.
Resemble published multiple voice clone samples showcasing the prowess of the new technology. VentureBeat also tested the feature to see how it really works.
When using Resemble’s web platform, users can create a digital replica of their voice by uploading an audio sample or recording a series of sentences. The company has been offering this feature for a while, but the process took time. Users had to record around 25 sentences or upload at least three minutes of voice content to set up the system, which would then take another hour or so to provide a clone.
Now, with the launch of Rapid Voice Cloning, users can get started with the technology more easily. All they have to do is give a clear audio sample of the target voice, lasting anywhere from 10 seconds to 1 minute. The company’s model under the hood instantly captures all the parameters, including accents, from the sample and gives the result for downstream use cases in a minute.
“While other state-of-the-art models often struggle to replicate the nuances and subtleties of different accents, Resemble AI’s advanced machine learning algorithms excel in this area. By analyzing and learning from just a 10-second voice sample, our Rapid Voice Cloning can create an AI-generated voice that faithfully mimics the unique intonations, pronunciations, and cadences of the original speaker’s accent,” the company noted in a blog post announcing the feature.
The company published a bunch of samples comparing its offering with Microsoft’s VALL-E and XTTS-v2 voice cloning models, complete with the input voice sample and the text used for the clone. The results were pretty impressive. However, when we created a free test account to see how the tech works for real, there were some clear gaps.
In our tests, the system mandated recording at least three long sentences, with no option to record a smaller 10-second sample. The processing was swift but it couldn’t recognize the speaker’s Indian accent and took the input by default as a voice sample in American English. This affected the accent of the output voice. However, it is expected to be fixed, since according to the company Rapid Voice Cloning will support most English accents.
Notably, the company will continue to provide the original cloning feature under the name of professional voice cloning. This option, with lengthy input requirements, will take time but support all English accents with support for text-to-speech and speech-to-speech use cases. Rapid cloning will only support text-to-speech generation.
With Rapid Voice Cloning’s speed and dramatically reduced sample requirements, Resemble AI expects to see more users using the technology with faster iterations and deployments. The biggest adoption is expected from content creators who may use the tech to generate voiceovers, dubbing, narration and dialogue for their podcasts, videos, audiobooks or e-learning materials. The company also says businesses can create enhanced accessibility and personalization experiences with the technology.
“For example, a fitness app could use Rapid Voice Cloning to create a personalized AI coach that speaks to each user in a familiar voice, providing encouragement and guidance. Similarly, a virtual assistant could adapt its voice to match the user’s preferences, creating a more intimate and tailored interaction,” the company stated.
While it remains to be seen how the tech gets adopted, it is important to note that Resemble is not the only player cutting down the time to generate voice clones. ElevenLabs, another major player in the category, offers a feature called Instant Voice Cloning that needs at least a minute of clear audio to generate a clone almost instantly. Like Resemble, ElevenLabs also offers a professional version of the tool, which covers more languages and accents.
As of now, Resemble AI allows users to create one free voice clone. For more, users would have to take up a paid plan from the company, which starts from $29/month and goes up to $499/month. There is also the option of a pay-as-you-go personal plan or a bigger enterprise plan with custom pricing.
Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.
Resemble AI is launching Rapid Voice Cloning, a new feature of its platform that significantly expedites the process of generating voice clones. The company works in the elusive AI voice category focused on enterprise users.
Available today, Rapid Voice Cloning can duplicate voices from relatively short datasets and produce an output in just about a minute. The move, Resemble says, marks a significant development and will make voice cloning technology more accessible, empowering more users to create custom voices for their applications. The company believes it will make an impact across fields such as content creation, personalization and accessibility.
Resemble published multiple voice clone samples showcasing the prowess of the new technology. VentureBeat also tested the feature to see how it really works.
How does the new AI voice Cloning feature work?
When using Resemble’s web platform, users can create a digital replica of their voice by uploading an audio sample or recording a series of sentences. The company has been offering this feature for a while, but the process took time. Users had to record around 25 sentences or upload at least three minutes of voice content to set up the system, which would then take another hour or so to provide a clone.
Now, with the launch of Rapid Voice Cloning, users can get started with the technology more easily. All they have to do is give a clear audio sample of the target voice, lasting anywhere from 10 seconds to 1 minute. The company’s model under the hood instantly captures all the parameters, including accents, from the sample and gives the result for downstream use cases in a minute.
“While other state-of-the-art models often struggle to replicate the nuances and subtleties of different accents, Resemble AI’s advanced machine learning algorithms excel in this area. By analyzing and learning from just a 10-second voice sample, our Rapid Voice Cloning can create an AI-generated voice that faithfully mimics the unique intonations, pronunciations, and cadences of the original speaker’s accent,” the company noted in a blog post announcing the feature.
The company published a bunch of samples comparing its offering with Microsoft’s VALL-E and XTTS-v2 voice cloning models, complete with the input voice sample and the text used for the clone. The results were pretty impressive. However, when we created a free test account to see how the tech works for real, there were some clear gaps.
In our tests, the system mandated recording at least three long sentences, with no option to record a smaller 10-second sample. The processing was swift but it couldn’t recognize the speaker’s Indian accent and took the input by default as a voice sample in American English. This affected the accent of the output voice. However, it is expected to be fixed, since according to the company Rapid Voice Cloning will support most English accents.
Notably, the company will continue to provide the original cloning feature under the name of professional voice cloning. This option, with lengthy input requirements, will take time but support all English accents with support for text-to-speech and speech-to-speech use cases. Rapid cloning will only support text-to-speech generation.
Use across different categories
With Rapid Voice Cloning’s speed and dramatically reduced sample requirements, Resemble AI expects to see more users using the technology with faster iterations and deployments. The biggest adoption is expected from content creators who may use the tech to generate voiceovers, dubbing, narration and dialogue for their podcasts, videos, audiobooks or e-learning materials. The company also says businesses can create enhanced accessibility and personalization experiences with the technology.
“For example, a fitness app could use Rapid Voice Cloning to create a personalized AI coach that speaks to each user in a familiar voice, providing encouragement and guidance. Similarly, a virtual assistant could adapt its voice to match the user’s preferences, creating a more intimate and tailored interaction,” the company stated.
While it remains to be seen how the tech gets adopted, it is important to note that Resemble is not the only player cutting down the time to generate voice clones. ElevenLabs, another major player in the category, offers a feature called Instant Voice Cloning that needs at least a minute of clear audio to generate a clone almost instantly. Like Resemble, ElevenLabs also offers a professional version of the tool, which covers more languages and accents.
As of now, Resemble AI allows users to create one free voice clone. For more, users would have to take up a paid plan from the company, which starts from $29/month and goes up to $499/month. There is also the option of a pay-as-you-go personal plan or a bigger enterprise plan with custom pricing.
Author: Shubham Sharma
Source: Venturebeat
Reviewed By: Editorial Team