sdecoret - stock.adobe.com

The dangers of voice deepfakes in the November election

The growth of generative AI has led to more audio cloning technology. This could affect the U.S. election. Recent incidents show that existing safeguards are not effective.

More than 130 million people listened to and watched a video with a deepfake recording of Kamala Harris that was reposted by X owner Elon Musk on his social media platform.

While the original creator labeled the video as a parody on YouTube, Musk failed to add that label, which might have caused some to believe that the audio was authentic instead of a deepfake or voice clone of Kamala Harris.

The incident shows how dangerous audio deepfakes could be leading up to the U.S. presidential and local municipal elections in November.

"What this particular incident highlights is that the AI safety standards that are being proposed are not sufficient to protect election integrity," said Rahul Sood, chief product officer at Pindrop Security, an IT security company.

The growth of voice cloning and deepfake

Audio deepfakes have grown exponentially in the past few years, especially with the rise of generative AI and LLMs.

They have also already affected several elections worldwide. For example, in Slovakia, a fake audio of a top candidate claiming he rigged the election spread.

"The big step function or inflection point really happened with the new class of LLMs and GenAI that we're all experiencing in the world of tech," Sood said.

Generative AI essentially created the new synthetic voice or synthetic audio market, which has become prevalent and now lets users create voice cloning audio for as little as $1 a month.

While the technology was easier to detect a few years back, the rapid arrival of voice cloning technology has made it easy for anyone to create a deepfake online.

"Synthetically generated audio has crossed what is called the uncanny valley," Sood said. "Uncanny valley is a term of news and research to basically make the point that it is hard now for a human to be confident whether a piece of audio, video or machine is human or not."

Image of Donald Trump and Kamala Harris.
Audio deepfake technology is becoming more prevalent and likely to threaten the integrity of the 2024 election.

Examples of deepfakes during the election

An uncanny valley could affect the U.S. election.

For example, in January, a deepfake audio of President Joe Biden was behind a robocall in which New Hampshire voters were encouraged not to vote.

The technology used to create the deepfake technology in January was from AI voice clone vendor ElevenLabs.

However, the voice cloning technology used to create the deepfake of Vice President Harris came from open source text-to-speech system TorToise AI, according to Pindrop.

Pindrop also discovered through its analysis that not all the audio in the video of Harris was fake.

"This is what we call a partial deepfake," Sood said.

Harris' deepfake audio shows the weakness of existing safeguards, Sood continued.

First, most voice cloning platforms require consent of the person whose voice is being cloned and is required by law for commercial purposes in some states, including California, and in the European Union. In Harris' case, it doesn't seem like any consent was given, Sood said.

Additionally, not all voice AI engines provide a consent feature. Some of the voice clone platforms that have the feature do not include third-party testing of the voice consent technology.

Watermarking, another AI safety approach, is far from standard, though most generative AI vendors have considered using it or have recently started to use it. The Harris deepfake audio lacks a watermark.

Deepfakes and the 2024 election

The current pervasiveness of the technology means that it could have a significant impact on the 2024 election.

"A lot of experts are talking about how the 2024 election is going to be the deepfake election in ways that we haven't seen before," Futurum Group analyst Lisa Martin said.

While previous elections have been controversial for other reasons -- notably the tumult over vote counting in the 2020 presidential election -- the quality of deepfakes, particularly audio deepfakes, was not nearly as high as they are now.

A lot of experts are talking about the 2024 election is going to be the deepfake election in ways that we haven't seen before.
Lisa MartinAnalyst, Futurum Group

In 2024, the technology has reached a point at which it's unclear to an average voter whether a piece of audio they hear on TikTok or YouTube is fake, Martin said.

"If it sounds real, then they assume that it's real," she said. "The last thing that we all need was for this to really become a massive problem -- the access to deep fake technology in the AI that now all actors have at their disposal."

Another problem for voters is most do not believe they can be easily fooled by an audio recording, said Alexios Mantzarlis, director of the Security, Trust and Safety Initiative at Cornell Tech, a graduate campus and research center of Cornell University in New York City.

"Most people now know that an image can be edited," Mantzarlis said. "I don't know that everyone is quite as used to or ready for audio fakes that are as credible and as good as they are now."

Audio deepfakes to manipulate the federal election are not as concerning as deepfakes for local and state elections, Mantzarlis said.

"I'm less worried, frankly, about these more easily confirmable voices than, perhaps, of the voices of local elected officials and kind of spreading closer to the election," he said.

This is because the voices of federal candidates like Kamala Harris or Donald Trump can be easily identified. It's more likely that those deepfakes will be reported on, Mantzarlis added.

The audio deepfakes that are likely to be missed are those of candidates running for lesser positions at the state and local level.

"It'll just be easier to ascertain where a presidential candidate was and have more clips of their voice readily available to try and run against a detector than there may be of, say, state secretary of state or some county official who's responsible for voting procedures in a [small] county," Mantzarlis said.

The responsibility for tech vendors

The deepfake audio of Harris stands as a warning sign for what's possible and the work that tech vendors still must do as the technology evolves, he continued.

"They have a responsibility to help, responsibility to avoid, to work on prevention," he said. Many vendors have promised and made commitments to the White House to label and watermark AI-generated content.

"I don't think we're there yet with audio in particular," he said. "I'm not sure that the detection is necessarily good enough -- that it's consistent across tools."

Tech vendors also have a responsibility to raise awareness, Martin said.

"There's this mass population that needs to be consistently reminded by social media, by the mainstream media as well, that they have to be looking at things unfortunately, with suspicion," she said. "The social media networks and mainstream media need help start the balance between leveraging AI for positive purposes and then being able to detect when something is clearly a deepfake."

Some AI vendors such as Deep Media AI, Pindrop and others are known for their deepfake detection tools that can help the average person discern whether an audio or video is a deepfake or not.

In addition to vendors developing technology for detection, government is taking a stance.

States including Florida and Colorado have passed bills to require campaigns to disclose if they're using deepfakes in their political ads.

In a recent media interview, Senate majority leader Chuck Schumer hinted that two deepfake election bills could be attached to funding bills in September.

The bills would make AI-generated audio or visual depictions of federal candidates that are designed to influence an election or solicit campaign funds illegal.

The Federal Trade Commission has solicited proposals for products, policies and procedures aimed at protecting consumers from being harmed by voice cloning.

Esther Ajao is a TechTarget Editorial news writer and podcast host covering artificial intelligence software and systems.

Dig Deeper on AI technologies

Business Analytics
CIO
Data Management
ERP
Close