Geeks2U Promise
We guarantee you'll love our fast, friendly service - or we'll refund your money.  
133,572 Happy Customers & Counting
Need tech support?
1300 769 448
Extended hours, 7 days a week
Home  /  geekspeak  /  Could voice verification quickly become bad security?

Could voice verification quickly become bad security?

Image manipulation has come so far, so quickly that the industry standard for photographic manipulation has rather rapidly become a verb; just as you might “Google” to find information online, if you’re editing an image, you “photoshop” it. Adobe probably doesn’t like its image editing software becoming a verb, because that brings with it copyright implications, but it’s also undeniably benefitting from all that free publicity.

Photoshop is the tool of choice for image professionals, but it’s also widely used to create “fake” images, whether it’s comedy meme production or more sinister images for disingenuous purposes. As such, just because you see an image online doesn’t mean it’s an authentic reproduction of that scene, whether it’s a highly photoshopped image of a glamour model, or the sudden insertion of figures into historical photographs.

What if you could apply “photoshopping” to our other senses, though?

Researchers working for Chinese search giant Baidu — essentially the Chinese version of Google, and easily that country’s largest search engine — have revealed their latest research into synthesizing human-like voices.

If you’ve used any of the voice assistants available on smartphone and desktops, such as Microsoft’s Cortana or Apple’s Siri, you’d be well aware that while they’re technically impressive, they’re also notably fake. Speech synthesis has come a long way, but their voices still sound stilted and robotic, especially if they have to pronounce complex words.

That robotic speech could quickly become a thing of the past, with Baidu’s researchers recently announcing upgrades to its artificial speech generation engine, Deep Voice. Where previously generating a passable facsimile of of human speech took more than half an hour’s speech to reliably synthesise, they claim it’s now feasible to produce a passable speech sample with just 3.7 seconds of speech from an individual. So good, in fact, that it can fool automated systems around 95% of the time.

There are some caveats to that for now, however. While automated systems only look for very basic tones, the samples created aren’t likely to fool a human being. Baidu says it needs more samples for that, ideally around 100 or so, but that’s just a little over six minutes of talking time.

The results are pretty impressive, especially when more samples are used. If you’re keen to listen in, there’s a gallery of synthesised samples here.

Baidu’s research could bring voices back to those who have lost them, or provide more nuanced communication for folks who lack the power of speech, as well as making AI-assisted translations much smoother, but there is a darker side, because just like photoshopped fakes, the prospect of creating audio fakes so it sounds like a given person said something raises significant privacy and authenticity issues. Many systems already use voice authentication for secure logins, and just like biometric measurements such as your fingerprint or irises, your voice isn’t something you can readily change to a significant degree.

What all of this means is that voice is likely to rapidly become a deprecated method for single factor authentication, simply because while the precise details of the Baidu’s team’s implementation aren’t exactly known, if you can mimic voice, it can’t be trusted.

As such, it’s likely that we’ll see even more of a shift towards multi-factor authentication, so that even if your voice can be faked, you’ve got other methods of authentication. Certainly, if you’re currently using such a system, or you have to interact with one in your daily life, it’d be wise to ask the folks in charge what their future plans are, because it’s not as though this technology is going to go away. Indeed, it’s only likely to get better, and right now, even in its limited form, it’s pretty impressive.


Recent News

Social media can be a huge force for change, and in these times where many of us are bouncing in and out of lockdowns, also a vital lifeline for communication on everything from important matters to the wildly trivial. We’re all allowed our personal obsessions, after all. However, many of us don’t think about the

Microsoft recently released its first public-facing beta version of the Windows 11 operating system that it will ship later this year. You’ve got to be signed up to its Windows Insider program to get it – and be willing to accept a little risk in terms of unstable operating systems – but then this is

Telstra recently announced that its 5G coverage for its mobile phone network covers around 75% of the Australian population. It’s also announced the “longest” (as in range) 5G phone call in the world, spanning some 113km in Gippsland. Meanwhile, rival telco Optus has claimed that it’s hit 300mbps upstream on trials of its emerging mmWave

Microsoft recently announced its next generation of the Windows operating system, Windows 11. If you’re thinking that seems odd given it did announce some years back that Windows 10 would be the “final” version of Windows, you’re not alone. For many years now, Microsoft’s simply provided Windows 10 updates rather than “new” versions of Windows,

Coronavirus (COVID-19) Update

Learn about the precautions we are taking and our new contactless pick-up and remote service options. Read More
Get help setting up your home office or homework area today. Learn More