2024-05-24: Scarlett Johansson And What Really Happened

So OpenAI may have used Scarlet Johansson’s voice for the voice of ChatGPT’s Sky voice setting.

So much digital ink has been spilled and will be spilled on this topic. But I felt I must write a coda.

First, the voices themselves.

Second, polling results


On the one hand

> OpenAI created voice

> using voice actors who were well informed and fairly compensated including with future royalties

> there are 5 voices of which Sky was only one of them

> some people do not find the voices similar

On the other hand

> the product was modeled on the movie Her

> ScarJo was asked to be the voice but turned it down

> OpenAI employees mentioned it on Twitter

> Reddit and the public have mistaken the voice for ScarJo

> there will definitely be emails/Slack showing references to ScarJo during the long product development cycle

> US case law is on the side of the celeb over the impersonator

On the third hand

> OpenAI docs show concern over voice output rollout

> the voice output likely can imitate any voice with a short input snippet

> they have demo’ed the same capability for images already: a single input image is enough to generate images with character consistency

> voice is likely to be the same as the model is multimodal in and out

> a single short input voice is enough to spoof any voice

It’s the Napster moment for voice tech.

> You can block it off for a year, but very soon any phone will be able to spoof any voice on command.

> If you can tell the AI model “use Morgan Freeman’s voice on helium” and it does and that’s part of the base capability of an open source model released next year, what do you do?

> open source models with this tech are 😂 free speech

> do you tell the phone manufacture that this software cannot run on the phone ?

> do you sue developers who access this capability on the model?

> do you sue the individual users?

> do you segment human voices? Are there 8 billion unique voices in the world so that there is no collision?

> do you award the voice to the person with more money? With a higher number of social media followers?

> perhaps there can only be 10,000 voice owners in the world and they all have to be members of SAG-AFTRA, and no device should ever be able to generate any other voice without paying one of the owners.


ChatGPT-4o can do voices as easily as it maintain character consistency in images

This quick voice cloning ability seems to come from a product called Voice Engine, demonstrated on May 23 by OpenAI in Paris, of all places.

So there you have it—24 months before widely available open-source emotive voice cloning.

Regardless of what happens to Scarlett Johansson, I’ve got to think the other alternative for OpenAI would have been to say: “You can choose any voice you want for ChatGPT by playing any sample voice into it”. Would that have worked out any better?

The reason I keep saying it’s a Napster moment is that it’s clear now that the ease of use of this tech, which was formerly gated behind years of monastic programmer experiences, means that everyone will be able to do it. And much like the prohibition of so many other things in public life, the legal system, which really requires the consent of the governed to function properly, will not be able to keep up with such rapid change.

