Home AI - Artificial Intelligence A Year On: OpenAI Has Yet to Unveil Its Voice Cloning Technology

A Year On: OpenAI Has Yet to Unveil Its Voice Cloning Technology

by admin

In late March of the previous year, OpenAI unveiled a “small-scale preview” of its AI service, Voice Engine, which the company claimed could replicate a person’s voice using just 15 seconds of audio. Nearly a year later, this tool is still in its preview phase, with no definitive timeline for a public launch or any assurances that it will be released at all.

OpenAI’s hesitance to fully deploy the service may be indicative of concerns about potential misuse, but it also seems aimed at avoiding heightened regulatory scrutiny. The organization has been criticized in the past for favoring “innovative products” over safety considerations, with allegations of rushed product releases to outpace competitors.

An OpenAI representative informed TechCrunch that testing of Voice Engine is ongoing with a select group of “trusted partners.”

“We’re learning from how [our partners are] utilizing the technology to enhance its effectiveness and safety,” the spokesperson elaborated. “It’s exciting to witness the various applications, from speech therapy and language learning to customer support and AI avatars in video games.”

Delayed Release

The Voice Engine, which underpins the voices offered through OpenAI’s text-to-speech API and powers ChatGPT’s Voice Mode, produces natural-sounding audio that closely mimics the original speaker. This tool can transform written text into speech, with limitations set by content guidelines. However, it has faced numerous delays and has consistently missed its projected release dates.

As OpenAI detailed in a June 2024 blog post, the Voice Engine model anticipates the most likely sounds for a speaker based on a provided text, considering various voices, accents, and speaking styles. This capacity allows the model to create not only text-to-speech outputs but also “spoken utterances” that represent how different speakers might vocalize a given text.

Initially, OpenAI had aimed to launch the Voice Engine—then referred to as Custom Voices—on March 7, 2024, according to an early draft blog post seen by TechCrunch. The plan included granting access to a select group of up to 100 “trusted developers” before a broader rollout, prioritizing those building applications with a “social benefit” or demonstrating “innovative and responsible” use of the technology. The company had even trademarked and priced it at $15 for every million characters for “standard” voices and $30 for “HD quality” voices.

However, at the last moment, OpenAI delayed the announcement. They ultimately introduced Voice Engine a few weeks later without any sign-up options, limiting access to a small group of around 10 developers who had begun collaborating with the firm in late 2023, as stated by OpenAI.

“We aspire to initiate a conversation regarding the responsible use of synthetic voices and how society can adapt to these emerging capabilities,” OpenAI stated in the announcement blog post for Voice Engine in late March 2024. “Based on these discussions and the feedback from our limited testing, we will make a more informed decision about whether and how to scale this technology.”

In Development

OpenAI has been developing Voice Engine since 2022. The company claims to have demonstrated the tool to “global policymakers at the highest levels” in the summer of 2023 to highlight both its potential and the associated risks.

Several partners currently utilize Voice Engine, including the startup Livox, which is developing devices to facilitate more natural communication for people with disabilities. CEO Carlos Pereira shared with TechCrunch that while Livox ultimately faced challenges in integrating Voice Engine into its products due to the technology’s online requirements (as many of Livox’s clients do not have internet access), he found the technology to be “incredibly impressive.”

“The voice quality and the ability to generate voices in multiple languages is unparalleled—especially for our clients with disabilities,” Pereira communicated via email. “It is truly the most impressive and user-friendly [tool for] creating voices that I’ve encountered […] We hope that OpenAI develops an offline version soon.”

Pereira noted that Livox has yet to receive any direction from OpenAI regarding a potential launch for Voice Engine, nor has he observed any indications that the company plans to introduce fees for its use, as Livox has not incurred any costs thus far.

In that June 2024 blog post, OpenAI suggested that one reason for the continued postponement of Voice Engine was the risk of misuse during the previous year’s U.S. election cycle. Based on consultations with stakeholders, the Voice Engine includes several safety measures, such as watermarking to trace the origin of audio that has been generated.

According to OpenAI, developers must secure “explicit consent” from the original speaker before deploying Voice Engine, and they should clearly inform their audience that the voices utilized are AI-generated. However, the company has not clarified how it plans to enforce these guidelines, potentially making large-scale compliance quite challenging, even for an organization as resourceful as OpenAI.

In its communications, OpenAI has also hinted at aspirations for creating a “voice authentication system” to validate speakers and a “no-go” list to prevent the generation of voices that are overly similar to those of prominent individuals. Both projects are technologically ambitious and could reflect poorly on the company if mishandled, given its past criticisms regarding safety protocols.

Effective filtering and identity verification are swiftly becoming essential for any responsible release of voice cloning technologies. Reports indicate that AI voice cloning was the third fastest-growing scam in 2024, according to one source. This has resulted in fraud and bank security procedures being circumvented, as privacy and copyright laws struggle to keep pace. Malicious actors have exploited voice cloning technology to create incendiary deepfakes of celebrities and politicians, leading to deepfakes that have spread rapidly across social media.

OpenAI might launch Voice Engine next week—or may never do so. The organization has consistently signaled the possibility of maintaining a limited scope for the service. Regardless of the reason—public perception or safety concerns—the prolonged preview of Voice Engine marks one of the longest in OpenAI’s history.

Compiled by Techarena.au.
Fanpage: TechArena.au
Watch more about AI – Artificial Intelligence

You may also like

About Us

Get the latest tech news, reviews, and analysis on AI, crypto, security, startups, apps, fintech, gadgets, hardware, venture capital, and more.

Latest Articles