Blog: Artificial Intelligence Needs Data Diversity – Forbes
Artificial intelligence (AI) algorithms are generally hungry for data, a trend which is accelerating. A new breed of AI approaches, called lifelong learning machines, are being designed to pull data continually and indefinitely. But this is already happening with other AI approaches, albeit with human intervention. A steady stream of data is the fuel for coveted results.
But, with the ever-increasing importance of data, the stakes of data bias are growing ever higher. AI companies have a moral obligation to their customers, and to themselves, to actively address data bias.
Our Current Problem
The examples of mistakes in this arena are numerous and egregious: Google’s Photos application classified African Americans as gorillas. Amazon’s internal recruiting application downgraded female candidates, Microsoft’s AI chatbot adopted racist and anti-Semitic verbiage in response to conversations on Twitter and Amazon’s facial recognition software mislabeled 28 members of Congress as criminals. All of these instances, in addition to similar issues, have exposed the underbelly of bias creeping into AI results, to the chagrin of the leaders and stakeholders of these companies.
Failure to address and even anticipate these issue will not only deliver sub-par products; it will encourage luddites to reject AI entirely. Furthermore, legal repercussions have the potential to dwarf the large fines that have been imposed on big AI companies.
Machine learning methods do not have built-in biases, but data typically does. For instance, looking at U.S. mugshot photos alone, an AI algorithm could easily interpolate an incorrect relationship between skin color and incarceration. Indeed, a particularly egregious example was an algorithm that was used to assist in sentencing guidelines. Lacking any precautions to be race-blind, the algorithm learned to recommend stricter guidelines disproportionately for minorities.
How To Solve It
The most practical way to address the issue of data bias is to actively confront it in either the collection or curation phases for AI data. Algorithms can promulgate or even amplify biases in their data sources. Therefore, the data should be diversified to reduce bias.
Data collection and preparation should be done by the team with diversified experience, backgrounds, ethnicity, race, age, and viewpoints. The view of someone from a less developed or developing country in Asia is going to be different than the view of someone from a Western country. An illustrative example was a robotic vacuum cleaner in South Korea that sucked the hair of a woman sleeping on the floor. The non-diverse team involved in the training data collection did not anticipate or consider the scenarios of people sleeping on the floor, although it is very common in some cultures. It is up to businesses to ensure diversity while building their AI systems — as well as the business leaders who are forming the development teams.
Another important type of diversity is intellectual diversity. This includes academic discipline, risk tolerance, political perspective, collaboration style — any of the individual characteristics which make us all unique. This type of diversity is known to enhance creativity and productivity growth, but it also improves the likelihood of detecting and correcting bias. Intellectual diversity can even exist within a single human who has developed a multidisciplinary background and experiences dealing with a broad range of people. The value of such people will increase as AI continues to affect a great range of ventures.
It is very hard to avoid bias completely but minimizing it is possible with proper attention and effort. When a bias is found, data scientists have an obligation to balance it. They can adjust the sample distribution, relabel samples, change confidence weighting or employ other mitigation strategies appropriate to the AI methods employed.
It is also important that these mitigating factors be documented and well-vetted internally, if not publicly. De-biasing based on a misconception is an altogether new evil. So, every opportunity to find such misconceptions or errors should be made.
De-biasing data is unfortunately not as simple as collecting it. This step will take a human eye and cannot easily be automated away. Data engineers ought to analyze their distributions thoroughly, checking for unusual distributions or highly correlated variables. Particularly when people are involved, results should be checked for unusual correlations with factors such as race, gender, age, location, religion and sexuality, regardless of whether or not these variables are inputs to the model.
A New Future for AI
Not too long ago, the main challenge for AI was whether it would work. Now that its efficacy is better established, what one should do with AI is often more important than what one could do. Prevention, removal or mitigation of bias begins with good data curation and continues throughout the AI development life cycle. For the foreseeable future, this means that AI needs humans in the loop — humans who enhance diversity and add resilience to bias.