According to Meta, Voicebox can take an audio sample as short as two seconds long and then match that audio style for text-to-speech generation. If true, it’s more sophisticated than other synthesization models like Speechify or ElevenLabs, which normally require a fair bit more data before they can generate a quality synthetic voice.

Advertisement

In Meta’s promotional clip, one of the voices being modified does sound uncannily like Zuckerberg himself. Depending on how capable the model truly is, hearing Zuck does bring to mind some of the deepfakes modeled after the Meta CEO.

Unlike the company’s many other AI releases as of late, Voicebox isn’t going open source upon its debut, all of which brings to mind that Meta could be restricting its latest AI release because of potential harms that could result. While some folks online have used similar programs to craft synthesized voice clips of their favorite characters in media for fun, others have used them in harassment campaigns against the voice actors themselves. So it could be trying to prevent harm or it could be saving this potentially lucrative model for some future enterprise.

Advertisement

Advertisement

According to the Voicebox research paper, the system was trained on more than 50,000 hours of unfiltered, unenhanced speech from English audiobooks and another 60,000 hours of listening from multilingual audiobooks. That’s why in Meta’s video, the synthetic speech sounds less conversational, and more like somebody reading a child a bedtime story. The researchers said they would eventually scale the model to include more casual speech.

The model is also limited in that users cannot independently control what kind of voice the AI apes and the emotionality of a different speech sample.

But what is most concerning is that Meta doesn’t seem to address the elephant in the room with its latest paper. The researchers did not say which audiobooks were used to train the AI, and where they came from. It’s unclear if the tens of thousands of hours of audiobooks would be equivalent to many thousands of audiobooks.

Gizmodo reached out to Meta for more information about which audiobooks were used in the training data. A Meta spokesperson said they were “public domain” audiobooks, though the company declined to articulate where the company downloaded these books.

Advertisement

Voice actors have not been especially happy with the proliferation of AI, and are especially concerned about contracts allowing for companies to synthesize their voices without compensation. Apple has already taken heat for quietly launching a series of books narrated by AI-generated voices. The tech giant has reportedly approached several major audiobook publishers to create these new AI-narrated stories.

Considering how the audiobook market revenue has been growing by double digits year after year, and the way creative industries are salivating at reducing labor costs, this latest model could prove yet another headache for voice professionals.

Advertisement


Want to know more about AI, chatbots, and the future of machine learning? Check out our full coverage of artificial intelligence, or browse our guides to The Best Free AI Art Generators, The Best ChatGPT Alternatives, and Everything We Know About OpenAI’s ChatGPT.

Services MarketplaceListings, Bookings & Reviews

Entertainment blogs & Forums

Vk.