Blog: Scientists Are Helping AI Outsmart Hackers That Abuse Vulnerabilities In Their Training – India Times
Adversarial attacks is the description applied to a particular kind of hack. This is an attack that maliciously inputs data in a certain way as to trigger a glitch in an AI.
They’re not commonplace yet, but they could be soon, which is why researchers are working on trying to stop it.
These adversarial attacks could take a variety of forms. Maybe it’s a form of audio input within a song that directs your Alexa to send someone money, or a pattern on a roadside sign meant to confuse your self-driving car. The point is, they’re malicious at best and incredible dangerous at worst.
Regular cyber defense doesn’t work here, given that the hackers are preying on the vulnerabilities of a neural network and not a human mistake or security loophole. That’s why researchers at Carnegie Mellon University have been working on boosting AI to stop it falling for these tricks. In the process, they think they’ve even uncovered why it works in the first place.
Zico Kolter, a computer scientist at the University says some AIs are too smart for their own good. They spot patterns that humans wouldn’t even notice, and possibly even interpret them as instructions. That’s why an AI might be vulnerable to being hacked even if the human coder developer thinks its perfectly designed.
To defend against this the team created a special set of training data, which included images that look like one thing, but a computer interprets as another. For instance, it could be a picture of a dog, except that a closer look by the AI would identify cat-like fur. The team also mislabeled some of these pictures as the thing they looked like to the AI instead of what they actually were, training it in this way. Thanks to this training, the AI was then able to extrapolate analysis and correctly identify images nearly half the time.
The experiment showed the researchers that image recognition AI has two modes of classification. One is based on macro features like the type of ears and tail etc, while the other is micro features that are too subtle for humans to see. These micro features form the basis of adversarial attacks.
The AI isn’t just being confused by additions to what it perceives, it’s actively spotting patterns we don’t know exist and is acting on that.
Now that we know this for sure, we can change the way we train AI to account for this, says paper co-author Andrew Ilyas from MIT. Basically, what we need to do is modify the training data to only include features humans would recognise. That way it wouldn’t recognise and therefore would act on subtle patterns.
It’s not our programming of AI that’s the problem, it’s that our training data is flawed.