Voice Cloning Apps Make It Easy for Criminals to Steal Your Voice, CR Finds
Four popular voice cloning apps do not do enough to make sure users have the speaker’s consent
You’ve probably heard about “grandparent scams”: An unsuspecting senior gets a frantic phone call from an imposter claiming to be a family member, sounding remarkably like them and saying they need money—quickly—to get out of a jam. Can you send the funds now, via a gift card, a wire transfer, or just plain cash?
Investment scams in which you’re led to believe that celebrities like Tom Hanks, Elon Musk, and Dolly Parton are vouching for a particular financial opportunity are also becoming increasingly common. What makes both types of scams possible is technology that is quickly becoming better and cheaper that allows anyone to make a realistic clone of another person’s voice without their permission. The voice sounds like the person it’s purported to be in almost every way—but it’s not.
Voice Clones Now Convincingly Mimic Emotion
Just three years ago, computer-generated voices were clunky and robotic. Think of Amazon’s Alexa, the popular voice assistant that features mostly monotone voices devoid of emotion, such as anger, surprise, or curiosity.
Those old voices had telltale signs of not being human, researchers found. They included awkward conversation flow, unusually short or long pauses, and technical flaws—referred to by deepfake experts as “artifacts”—such as imperceptible high frequencies or background white noise.
But voice clones have improved dramatically, and buoyed by the low cost of the technology, scammers are taking advantage of the fact that they can now create synthetic voices in mere minutes and send them out into the world.
“A lot has changed. Not only is the generated audio very stable for a long conversation, it can be very expressive in its emotions,” says Surya Koppisetti, a senior applied scientist at Reality Defender, a deepfake detection company based in New York. “Human perception of what is a fake voice [like Alexa] and what isn’t is no longer good enough.”
For example, there have been dramatic improvements in voice chatbots. These are oftentimes automated customer service voices or “virtual companions” that can be used for real-life therapy—they “listen” to what you say, interpret the meaning and intent of your comment, and compare it with its own modeled data to generate what it hopes is an appropriate and meaningful response.
One chatbot product made by Sesame AI, with voices called Maya and Miles, goes even further, creating what is called “voice presence”—reading emotional cues, using natural timing and pauses in conversation, adjusting tone and style to match each situation, and maintaining consistency.
The result, experts say, are compelling fake voices that are nearly indistinguishable from the real thing and that could easily be misused by criminals.
"These voices can be stunning. But you just need one app that isn’t doing enough to protect people and then is used for fraud and crime," says Hany Farid, a computer science professor at the University of California, Berkeley who specializes in voice deepfakes and co-authored a soon-to-be-published study on the subject.
Not all of these advancements have yet made their way to voice cloning apps. Case in point: Farid and a team of researchers from Berkeley recently used ElevenLabs to create voice clones of 220 real people. When they had volunteers listen to both the real voices and fake AI cloned voices, they found that just 60 percent of people could correctly identify a voice as fake—only slightly better than a coin flip or a random guess.
The Berkeley team found that engaging the fake voice in longer conversations and asking open-ended questions helped improve deepfake detection. But the other tactics that people used to try and identify fake voices—listening for inflections and accents, breathing, background noises, and “disfluencies,” or ums and ahs—weren’t that helpful because they either weren’t present in the voice recordings or didn’t exactly match what people expected to hear.
But there’s evidence that the technology has improved enough to account for things like inflections and accents. For example, Teleperformance SE, the largest call center operator in the world, is working on a new AI-enabled feature that would “neutralize,” or remove, its Indian employees’ accents in real-time when U.S. callers phone in with customer service questions about their iPhone or Android phones, according to Bloomberg.
How One Deepfake Detection Company Created a Voice Clone of a CR Journalist
Reality Defender, a deepfake detection company based in New York, took a Consumer Reports journalist's voice and created a realistic deepfake in minutes using its software.
Apps Marketed as Useful for ‘Pranks’
Even though cyber fraud and scams are vastly underreported, there are signs the number of victims and the amount of money lost to criminals are increasing at faster and faster rates. There were more than 850,000 reports of “imposter scams” from U.S. consumers in 2023, resulting in nearly $2.7 billion in lost money, according to the Federal Trade Commission. That’s 12 percent more reports compared with the year before.
Some companies are taking steps to protect themselves and their clients. Reality Defender has an unnamed bank client that uses detection software in near real-time to analyze callers’ voices during customer support conversations to see whether they are deepfaked. Last year, the company found that 960 out of a sample of 1 million bank calls were somehow manipulated—a rate of nearly 0.1 percent.
That might sound like a small percentage, but the average amount of money lost to a single deepfake of a financial institution was about $600,000 last year, according to Regula, an identity verification company that works with banks. Based on its work with clients, Reality Defender found that fully automated deepfaked voice cyber attacks on banks took, on average, just 3 minutes, cost the scammers just $2.51 per attempt, and resulted in a 20 percent success rate of stealing money.
Voice cloning apps say they do take steps to limit non-consensual deepfakes, including embedding security features, like a watermark, in the audio clips and using both human and automated moderation to look for indicators of potentially harmful content, including words related to sexually explicit content, suicide, and fraud and scams.
Yet some AI voice companies explicitly market that their products can be used for deception. PlayAI listed “pranks” as a use case for its AI voice tools in a company blog post (it was deleted after CR called attention to it) and Speechify also suggests prank phone calls as a use case for its tools. “There’s no better way to prank your friends than by pretending you’re someone else. Voice changer apps allow you to make real-time changes to your voice and trick your friends into thinking you’re a different person when you make a fake call online,” reads one page of Speechify’s company website.
Other companies, including Microsoft and OpenAI, have developed voice cloning tools but, so far, have held them back from wider public release for fear of misuse.
And the real-world impacts of deepfake voice clones are already being felt. According to news reports, a sophisticated deepfake scheme targeted Taiwan’s presidential election in 2023; a robocall in New Hampshire that urged people not to vote made a convincing voice clone of former President Biden in 2024; and last year a finance worker in Hong Kong got tricked into sending $25 million to fraudsters after they used audio deepfakes to pose as the company’s chief financial officer.
How to Protect Against Voice Scams
The future for consumers navigating fraudulent deepfakes is worrisome, experts say.
Increasingly sophisticated cyber attacks are “multimodal,” meaning they use a combination of email, text, voice, video, and photo to create a multipronged attack on someone’s bank or investment accounts. Most consumers are unaware of how convincing scams have become, and financial institutions are oftentimes playing catch-up to new schemes.
It’s also getting easier and easier to scam people, by taking stolen pieces of personal information and using them with deepfaked voice clones. Cyber criminals are often small groups of hackers run out of overseas call centers or for foreign adversaries like North Korea and Russia. And they have a growing repository of hacked and leaked data to use, all to make scams even more convincing.
For example, a 2024 hack of National Public Data, a background check company, resulted in the theft of names, addresses, phone numbers, and Social Security numbers of some 3 billion people. That company closed, but all of the data—including the personal info of nearly every U.S. citizen—was eventually put up for sale online.
Pair that stolen data with a short 3-second clip of your voice and internet thieves have all they need to wage an effective cyber attack, says Vishwanath of the Cyber Hygiene Academy.
So what can everyday people do to protect themselves from convincing deepfakes?
It’s perhaps unrealistic to successfully purge from the internet all of the personal information that can be found about you. And it’s equally challenging to delete all of the audio and video clips of yourself that can be readily found online—say, an Instagram Reel of a family outing or a YouTube video of a work event.
But it’s important to be aware of the prevalence of deepfake voice scams; institute two-factor, or multiple layers, of security on all of your financial accounts; and be wary of phone calls or emails that ask you for money or account info and access.
In addition, cybersecurity researchers are calling for improved security measures on the part of financial institutions, social media companies, and voice cloning app makers themselves. Voice cloning apps could put into place new security protocols, such as authenticating that the speaker whose voice is being cloned has agreed to the terms and conditions and collecting customers’ credit card info to verify and, if necessary, track them down, says CR’s Gedye. Cybersecurity experts have called for the creation of “suspicion scores” for emails and texts and better security verification measures that go beyond two-factor authentication, for example, by requiring you to confirm your identity through a phone call and text or email to a different device.
Some existing state and federal laws already apply to the use of AI-enabled voice tools that are used for fraud and impersonation.
A few states have even begun updating their “right of publicity” laws, which give people the right to control the commercial use of their image, voice, and likeness. Tennessee passed the ELVIS Act, which, among other provisions, clarified that “voice” means “a sound . . . that is readily identifiable and attributable to a particular individual, regardless of whether the sound contains the actual voice or a simulation of the voice of the individual.”
The ELVIS Act applies not only to people who make non-consensual deepfakes but also to those who create any “algorithm, software, tool, or other technology, service, or device, the primary purpose or function of which is the production of an individual’s photograph, voice, or likeness without authorization from the individual.”
In April 2024, an FTC rule went into effect that prohibits the impersonation of a business or government entity, and gives the agency stronger enforcement tools to go after scammers. The FTC has proposed amending that rule to extend those protections to the impersonation of individual people.
But it’s unclear whether the current FTC will have the staff or take up the mandate to go after such voice fraud and scams; at least a dozen probationary employees of the FTC have been laid off, including workers in the Bureau of Consumer Protection, The Verge reported last week. In response to questions from CR, the FTC said there is no update on the proposed impersonation rule change and provided no comment on the agency’s staffing levels.
Editor’s Note: This article, published March 10, 2025, has been updated to include responses from Resemble AI, Descript, and ElevenLabs.
Editor’s Note: Our work on privacy, security, AI, and financial technology issues is made possible by the vision and support of the Ford Foundation, Omidyar Network, Craig Newmark Philanthropies, and Alfred P. Sloan Foundation.