Postponing the rollout of Voice Engine tech reduces election year misinformation.
OpenAI’s latest tool, capable of creating lifelike voice clones with 15 seconds of audio, is considered too risky for widespread release. The lab aims to mitigate misinformation during crucial global election periods.
In 2022, Voice Engine was initially developed, with an early version integrated into ChatGPT for its text-to-speech feature. However, its full capabilities have not been publicly disclosed, partly due to OpenAI’s cautious approach to wider release.
OpenAI stated in an anonymous blog post, “We aim to initiate discussions on the responsible implementation of synthetic voices and how society can adjust to these advancements. Through these discussions and outcomes from small-scale trials, we will make informed decisions regarding the potential deployment of this technology on a larger scale.”
The company’s post highlighted real-world applications of the technology, shared by various partners granted access to integrate it into their apps and products.
Age of Learning, an education technology firm, utilizes it for scripted voiceovers. Meanwhile, HeyGen, an “AI visual storytelling” app, allows users to generate translated content while maintaining the original speaker’s accent and voice. For instance, converting French audio to English results in speech with a French accent.
Furthermore, researchers at the Norman Prince Neurosciences Institute in Rhode Island utilized a low-quality 15-second clip of a young woman’s presentation to “restore her voice” lost due to a vascular brain tumor.
OpenAI stated, “We have chosen to provide a preview rather than widely release this technology at present,” aiming to enhance societal resilience against challenges posed by increasingly convincing generative models. Additionally, they encouraged measures such as phasing out voice-based authentication for accessing sensitive information like bank accounts.
OpenAI also advocated for the exploration of “policies safeguarding individuals’ voice usage in AI” and “raising public awareness regarding the capabilities and limitations of AI technologies, including potential deceptive content.”
OpenAI stated that Voice Engine generations are watermarked, enabling the organization to trace the source of any generated audio. Presently, they mentioned, “our agreements with these partners necessitate explicit and informed consent from the original speaker, and we do not permit developers to create avenues for individual users to generate their own voices.”
However, despite OpenAI’s tool standing out for its technical simplicity and the minimal amount of original audio required for generating a convincing clone, competing alternatives are already accessible to the general public.