Blog: Only Intelligent Data Can Power Artificial Intelligence – Datanami
Our rapidly growing mastery of artificial intelligence has the potential to address society’s thorniest problems. But time and again, AI-powered projects have failed the test of fairness as algorithms have amplified the social, ethnic, and gender biases inherent in their data. Bad data inevitably translates into wrong answers.
The onus is on us, the data-driven organizations at the forefront of the AI revolution, to develop models and ethics that foster transparency, equality, and trust. These are critical to delivering fair and sound AI-driven solutions.
Without them, biased data never gets challenged. A data set might be unrepresentative or outdated. It might be tainted by the conscious or unconscious biases of the people who selected it, or those of society at large.
And when these come from AI, the errors can be harder to spot given the complexity and speed of the algorithms and, often, the inability of outsiders–or humans, in fact–to examine their logic.
Worse, humans tend to overestimate the capabilities of AI applications. Drivers lulled into complacency by their vehicle’s “autopilot” features aren’t prepared to take over the instant something goes wrong at speed. A passenger jet’s aerodynamic problem gets a software fix reliant on a single sensor, which feeds bad data and causes a tragedy.
Many judges now routinely get an AI recommendation before ruling on bail, punishment, and parole. These are supposed to be strictly advisory, but the temptation to treat the computer’s clear answer as definitive can be strong. Worse, some of the algorithms in use appear incapable of sifting out systemic racial bias, stacking the deck against minority defendants.
Sometimes the stacked deck is just a matter of unfamiliarity. When researchers from MIT and Stanford tested commercial facial recognition programs, they found vastly higher error rates for dark-skinned women than for light-skinned men. Another AI effort set up to screen job applicants for a major tech employer reportedly had to be scrapped after systematically downgrading women, because female applicants had been less prevalent in the past.
Such failures erode the credibility of all AI solutions, including those used in New Jersey and elsewhere to do away with cash bail and keep more people out of prison. They also risk incurring the wrath of regulators. In Europe, the General Data Protection Regulation, or GDPR, bars the “processing of personal data revealing racial or ethnic origin, political opinions, religious and philosophical beliefs…or sexual orientation.” Violators can be fined up to 4% of global revenue. In the U.S., Facebook is facing claims that algorithms allowing home sellers and landlords to target specific zip codes amount to “digital redlining,” which is the digital-era version of housing discrimination.
AI applications trained to spot patterns in language and personal data have time and again adopted the embedded biases. But the correlations they tend to spot aren’t necessarily causation. Software taught to treat people from poorer and higher-crime neighborhoods as statistically more of a threat will be de facto racist.
The problem is serious enough that the technology consultancy IDC expects spending on AI governance and compliance staff to double in the next two years.
Humans are, understandably, wary of AI. Elon Musk has famously described AI as humanity’s “biggest existential threat.” In a recent consumer sentiment survey on technology and data privacy, 62% of respondents said they disagree with AI being used to determine societal decisions such as criminal justice, healthcare, and state laws. Notably, those with a higher education seem to be more distrustful of the technology (64% of respondents with a Bachelor’s or higher degree) than those with a high school diploma or less (54%).
Companies deploying AI applications can save time and money, and address public concerns, by insisting on fairness and transparency from the start. Algorithms that can’t be clearly explained and independently assessed will not be trusted with critical decisions. Apparent bias will not get the benefit of the doubt on intent.
Fortunately, these problems have solutions. A facial recognition program can be given extra training so that it doesn’t end up discriminating against a particular shape or color simply because it’s less common. Software making parole recommendations can be armed with an ‘adversarial learning’ subroutine checking for systematic bias.
Most importantly, we must be vigilant about the quality of our data and establish rigorous data quality standards and controls for the data we use to develop AI models. We can achieve that by focusing on three main areas:
- Data governance: Every company needs a clear, transparent data policy, and the multi-tenant nature of the cloud presents a unique opportunity to expand the availability of governed data to power AI models in a more transparent way.
- Explainable AI: Advanced machine learning algorithms like deep learning neural networks are incredibly powerful, but lack transparency and thus carry regulatory, ethical, and adoption risks. We must develop and use AI explaining tools, in addition to partnering with risk and compliance professionals early in the development process, so no models are deployed into production without the proper vetting and approval.
- Full transparency and collaboration: The main concern with AI models is that they are developed behind the thick walls of secrecy of a select few private companies. Founded by Musk in 2015, OpenAI is one organization that fosters openness of research and collaboration in the space under the premise that the more data you have at your disposal to build your models, the fairer and more powerful those models will be.
The bottom line is, AI is without a doubt going to be a part of our future. So instead of putting our heads in the sand, we need to take our fear and channel it into approaching its development responsibly, under a solid framework of ethics, data governance, and transparency.
About the author: Matt Glickman is the vice president of product for Snowflake Computing. For over 20 years, Matt led the development of business-critical analytics platforms at Goldman Sachs. As a managing director, he was responsible for the entire data platform of Goldman’s $1-trillion asset management business. Matt also co-led Goldman’s risk management platform where his team built Goldman’s first, company-wide data warehouse that helped it navigate the 2008 financial crisis. Matt holds a bachelor’s degree in Computer Science and Applied Mathematics from Binghamton University.