Blog: Artificial Intelligence Can Now Copy Your Voice
Voice spying, cloning and voice reading on our mental health is just the beginning
Voice technologies aren’t just becoming the new customer touch point, it’s becoming a new way to gather data from global citizens. Baidu’s AI can clone your voice in seconds.
Microsoft has a vision for conversational AI in the future of the operating system. So does Huawei and many others. Smart speakers are probably spying on us gathering insights not just to improve their product.
Honestly, they can even tell if your have PTSD just by your voice. Just as AI with facial recognition can read your emotions on your face. Meanwhile AI is giving us the ability o create fake humans, personas with apparently human features that don’t really exist. Getting catfishes on the internet by a fake human is now a real possibility in the not too distant future.
- It takes just 3.7 seconds of audio to clone a voice.
- Today’s intelligent assistants are full of skills and they will get much smarter in the 2020s.
AI isn’t just monetizing the internet for Ad-giants like Google, Facebook and Amazon, Artificial Intelligence is about to create a fake world of even more complexity.
Smart speakers are going to explode in popularity in China in 2019, with Alibaba, Baidu and Xiaomi leading the way among others. Alibaba isn’t just like Amazon, it’s bigger. As it matures in the cloud and as Huawei’s profits increase, these two companies will eventually pose a real threat to AI dominance of Google and Microsoft.
AI is creating a new world and we don’t really know the dangers of it, we’re just going ahead like children into a world where AI regulation will become nearly impossible.
Now, with advances in artificial intelligence, the world is becoming more artificial, and you can’t be sure what you see or hear is real or a fabrication of artificial intelligence and machine learning. From incredible Ads of the future to “entities” we meet online, AI will transform our world to not just being more immersive, but more confusing, complex and manipulative.
The line between convenience and hacking humans (the opposite of enhancing us) is very real. It’s so profitable to use AI to gain an edge over other firms and reach people, the commercial weaponization of AI and our most intimate data is really inevitable.
Baidu’s research team used voice cloning techniques to develop the AI system which they expect will have noteworthy applications in personalizing human-machine interface.
Baidu’s research arm announced yesterday that its 2017 text-to-speech (TTS) system Deep Voice has learned how to imitate a person’s voice using a mere three seconds of voice sample data. (Synced is a great publication for AI).
Huawei has been working on emotionally intelligent AI for years. Alexa, Apple and Samsung are in the race for smarter interacts with their personal assistant AI via earpods with 2019 being a pivotal year for the product from all three providers.
Like all artificial intelligence algorithms, the more data voice cloning tools such as Deep Voice receive to train with the more realistic the results. Meanwhile companies like Spotify are integrating podcasts into how they recommend content that will be able to gather data on some of our core interests.
The smart home invasion of Alexa and Google Home devices is nothing short of a treasure chest of our most intimate data. Information on demand and insights on users that were previously impossible. All thanks to the AI-voice interface which is more immediate and will become ubiquitous in human societies, smart cities and the IoT in the next twenty years.
The technique known as voice cloning, could be used to personalize virtual assistants such as Apple’s Siri, Google Assistant, Amazon Alexa; and Baidu’s Mandarin virtual assistant platform DuerOS, which supports more than 50 million devices in China with human-machine conversational interfaces.
The frontiers of human interaction with AIs are broad and deep with incredible implications for customer relationships. The world we are building of AIs will be incredible and potentially very transparent with regards to our data.
Google unveiled Tacotron 2, a text-to-speech system that leverages the company’s deep neural network and speech generation methodWaveNet. WaveNet analyzes a visual representation of audio called a spectrogram to generate audio. It is used to generate the voice for Google Assistant.
It’s becoming impossible to tell the difference between an AI and a human and that’s incredibly problematic in a world where cybersecurity threats are only increasing. The internet of things aspect of connectivity in the 4th industrial revolution comes at a cost for privacy, censorship, data harvesting, fraud, identity theft and consumer manipulation of ever more personalized digital advertisements.