Can you mimic any voice? You Can With Lyrebird AI Software
Recreating a voice is no longer difficult with the accessibility to specialized software. All you really need is a laptop and the internet. With an app like Lyrebird AI from the Canadian start-up Descript, it’s actually very easy to pull off for anyone that’s interested.
So yes, it is possible to mimic any voice, as long as you have the right (software) tools.
While the world of deepfake audio is still in the relatively early stages, there are already some hyper-realistic tools out there to take advantage of. One of the major influences at current times is the aforementioned Lyrebird AI, the name referencing the famous Australian-native bird that is highly skilled in mimicking sounds from their immediate surroundings, including human voices.
This article will explore what it takes to mimic a voice, the ideas and core concepts behind audio deepfakes and voice cloning, as well as a deeper look at the application of Lyrebird AI. You can also start experimenting with the software tool yourself, since it’s downloadable from the official website of the developer. Trying out Lyrebird AI is free, but if you want to use it longer, a monthly fee will apply. Let’s explore the world of deepfake audio and check out what this tool is really capable of.
The Rise Of Audio Deepfakes
Most people associate deepfake technology with videos and images, since those are the media contents that are often featured when people paint dark pictures of our future with deepfake altered-media content.
However, there is another emerging branch of deepfakes that is popularized just as quickly, mainly because of the wide range of potential applications it can have in so many different industries. Obviously, we are talking about deepfake audio, in which Artificial Intelligence (AI) and machine learning algorithms are used to make anyone say anything, at any time.
That sounds as cool and frightening as it is. Because it is exactly that. With a simple application like Lyrebird AI, it is now easier than ever to mimic your voice as precisely as possible. Record a bit of audio, and the software will extrapolate all sounds an intonations it needs to make that person say pretty much anything you type into the app. It’s an impressive piece of technology that says a lot about the future we are heading to. A future where voice cloning will be as normal as any other app on your smartphone.
What Is The Concept Of Voice Cloning?
The concept of voice cloning is not new – speech synthesis has been an important technology for many decades. The artificial production of human speech is often done through the use of Text-To-Speech (TTS) systems, which have been drastically improving in quality over time.
The latest improvement in the world of speech synthesis utilizes AI and machine learning to power the code which produces the near-realistic (but computer-generated) voices. Voice cloning is a subset of this type of these AI-generated outputs, and the concept is also known as ‘voice doubling’.
Voice cloning essentially takes an audio file of any individual voice and uses it as source material for creating eerily similar AI-generated audio recordings of that same voice. With just several hours of source material (audio recordings of an individual voice), deepfake application or software like Lyrebird AI is capable of cloning the voice, allowing it to be used for the creation of other deepfake audio outputs.
In essence, all voice cloning does is take audio file A and extract the voice intonations, emotions, and other subtle nuances from that voice. It then uses a set of algorithmic rules to form a completely new set of words or sentences – never spoken by the person in the source material. For example, the audio file tells a story about why rabbits are cute. And the AI-generated voice clone will be able to transform that into any other type of story. It could talk about what it thinks about the latest political news, the weather or any other story that is used as TTS-input for that matter.
Voice cloning is a method to make anyone say anything, at any time. With a minimal need for source audio and a few lines of code. It’s no wonder that deepfake audio software is being regarded as a potentially revolutionary ánd highly dangerous tool.
As of the time of writing, the most anticipated and widely used example of deepfake audio software is Descript’s application Lyrebird AI. It allows anyone to mimic any voice, using a very simple software TTS tool and a short voice recording (or already existing audio file).
About Lyrebird AI Software
Developed by a small Montreal-based start-up, Lyrebird AI has quickly climbed the ranks within the world of deepfake audio applications. The firs tech demo came out in early 2017, and since then the program has only gotten better and more accurate in mimicking human voices. And not just that, but it’s also – impressively enough – capable of shifting emotional cadence while generating brand new sentences.
And all you really need is a small piece of audio as source material. A few minutes is more than enough (before, this used to take at least half a day). According to Lyrebird’s co-founder Jose Sotelo, the app’s learning model was “built to determine the factors that make every voice unique.” The start-up Descript explained in an interview with Wired that they like to call this the so-called DNA of a voice. Every sample gets analyzed in a way to pick out the unique features about it.
After processing the raw data input (the audio of your voice) the algorithm gets to work behind the scenes. It takes a few minutes for the data to be processed, after which your digitized voice can play back anything you type into the application.
The TTS-functionality is really the core of Lyrebird, since it’s where the user determines what the voice will say. In the blink of an eye, it’s easy to make yourself (or any other person’s voice you put into the software) say anything. It doesn’t even need to be remotely close to the topic you talked about in the input audio. The source material is only used to extract the key characteristics (the ‘fingerprint’) and voice nuances that are unique to each individual.
In another popular interview, a journalist from Bloomberg touches upon a lot of the core functionalities, as well as the critiques and risks associated with the interesting new application. The interview also features a real-world test of the accuracy of the final product, the mimicked voices themselves used on the relatives of the interviewer. On the surface, mimicking a voice seems harmless fun, but the real implications and ethics of these tools are risky to say the least. Check out the Bloomberg interview with the founders and core developers in the YouTube video below:
Try To Mimic Any Voice With Lyrebird AI
If you want to try the software out for yourself, you can find more information on the official webpage for Lyrebird AI on Descript’s website. That’s also where you can download the tool and experiment with it more. There are three pricing plans for the tool, ranging from a (forever) free trial for up to 3 hours of speech.
Producers can start an annual plan for $10 USD a month that provides access to full audio and video editing tools. This also includes transcriptions. There is, however, a limitation of 10 hours of transcription recordings for each individual producer account. So you might want to upgrade that with a custom plan when you need more TTS-production hours per month. Another option is to simply register multiple producer accounts, or go for the team plan that is more convenient for larger businesses.
For teams of multiple people, there is a more extensive professional plan for $15 USD a month for each user. This also includes extra software features. Do remember that when choosing for monthly pricing, the prices might differ from the annual plans.
Please also note that these prices are likely to change over time, so please check the official pricing page here for up-to-date pricing plans. For all plans there is no need to enter any credit card details, which is always a nice thing to see from a regular user’s perspective.
We highly encourage deepfake enthusiasts to try out the free plan, and see where we are at with the progression of deepfake audio creation. The results that can be generated with such a user-friendly application are truly unprecedented. And it just goes to show how easy and accessible it is for anyone to start using deepfake audio and voice cloning for themselves. The trick question is, however, for which purpose people will start using these types of tools.
While the vast majority of people will start mimicking voices for fun or a media-related project, there is a small minority with bad intentions that will start using tools like these for e.g. voice phishing (“vishing”) purposes. It creates enough of a risk for these types of AI software tools to potentially become banned in the future, when governments will start to regulate the industry more. With adoption comes stricter regulation, so do enjoy your copy of this Lyrebird AI voice software while it lasts and is still ‘legalized’.