More advanced, spoken conversations are coming to ChatGPT
Far from the kind of robotic voice that people have come to associate with digital assistants like Alexa or Siri, the ChatGPT advanced voice mode sounds remarkably lifelike. It responds in real time, can adjust to being interrupted, can make giggling noiseswhen a user makes a joke, and can judge a speaker’s emotional state based on their tone of voice. (During the initial demo, it also sounded suspiciously like Scarlett Johansson).
Starting on Tuesday, advanced voice mode — which works with the most powerful version of the chatbot, ChatGPT-4o — will begin rolling out to paid users. Advanced voice mode will start rolling out to a small group of subscribers to the app’s “Plus” mode, with the aim of making it available to all Plus users in the fall.
ChatGPT does have a less sophisticated voice mode already. But the rollout of a moreadvanced voice mode could mark a major turning point for OpenAI, transforming what was already a significant AI chatbot into something more akin to a virtual, personal assistant that users can engage in natural, spoken conversations in much the same way that they would chat to a friend. The ease of conversing with ChatGPT’s advanced voice mode could encourage users to engage with the tool more often, and pose a challenge to virtual assistant incumbents like Apple and Amazon.
But introducing a more advanced voice mode for ChatGPT also comes with big questions: Will the tool reliably understand what users are trying to say, even if they have speech differences? And will users be more inclined to blindly trust a human-sounding AI assistant, even when it gets things wrong?
OpenAI initially said it had planned to begin the advanced voice mode rollout in June, but said it needed “one more month to reach our bar to launch” to test the tool’s safety and ensure it can be used by millions of people while still maintaining real-time responses.
The company said that in recent months it has trialed the AI model’s voice capabilities with more than 100 testers seeking to identify potential weaknesses, “who collectively speak a total of 45 different languages, and represent 29 different geographies,” according to a Tuesday statement.
Among its safety measures, the company said voice mode won’t be able to use any voices beyond four, pre-set options that it created in collaboration with voice actors — to avoid impersonation — and will also block certain requests that aim to generate music or other copyrighted audio. OpenAI says the tool will also have the same protections as ChatGPT’s text mode to prevent it from generating illegal or “harmful” content.
Advanced voice mode will also have one major difference from the demo that OpenAI showed in May: users will no longer be able to access the voice that many (including the actor herself) believed sounded like Johansson. While OpenAI has maintained the voice was never intended to sound like Johansson and was created with the help of a different actor, it paused use of the voice “out of respect” after the actor complained.
The launch of ChatGPT’s advanced voice mode comes after OpenAI last week announced it was testing a search engine that uses its AI technology, as the company continues to grow its portfolio of consumer-facing AI tools. The OpenAI search engine could eventually pose a major competitive threat toGoogle’s dominance in online search.
The rollout of ChatGPT-4's advanced voice mode could attract more business for OpenAI, as it transforms the chatbot into a virtual assistant for natural conversations. However, the reliability of understanding user speech with differences and the trust in an AI assistant that makes mistakes are significant concerns.