Blog: Life’s not fair. Is machine learning making it worse?
Machine learning. It’s the buzzphrase at every tech meetup and the term that sounds like magic to many people. Machines that learn, awesome! Machines that learn from objective data are fairer and faster in their decision making, and might see connections that humans fail to see. They are better at taking in much more information at a time, and most of all, they are not plagued by the many fallacies that humans experience in decision making, right?
You probably know where I’m going with this, and you probably know that the last question was a rhetoric one. Machine learning algorithms can very well be plagued by the same biases humans suffer from. The difference is, though, that a biases in human decision making are well-known among everyone who’s ever taken a psychology 101 class. On the other hand, machine learning algorithms are hyped for their good performance and biases that can occur in them have been undiscovered or ignored for a long time. It has not been until fairly recently that the fairness in machine learning algorithms has been critically assessed.
“Why should I care, though? I don’t make machine learning models.”, you might think. You see, that’s a valid question. However, think about your daily routine. First, you wake up by your alarm clock. You might use an alarm clock app on your phone that analyses your sleep to decide the right time to wake you up. You then look at your phone, which automatically unlocks because it knows your face or your fingerprint. You decide to, before you actually get out of bed, check your social media. Uncoincidentally, you immediately see when one of your best friends posted something, because their changes always show up first in your feed. You decide to get dressed and have breakfast, and turn on your “Discover Weekly” playlist on spotify, which is tailored to your taste in music. Before you leave for work, you check the traffic on your route in Google Maps and decide to wait for another 20 minutes before leaving for work in the hopes of not spending much time in traffic. I hope you all know this, but this is all powered by machine learning. You might not build the models, but you definitely use them.
Machine learning impacts every one of us daily in ways we don’t even think about. And as we move into an age where machine learning models will decide whether we get a job, get a loan, or get out of prison, machine learning concerns all of us and everyone should be able to critically assess the decisions made by a machine learning algorithm. Machine learning is everywhere. Pedro Domingos sums it up nicely in his book The Master Algorithm, saying: “People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.”
So, this blog is about fairness in machine learning. However, what is fairness when it comes to machine learning?
There is not a simple answer to this question. There have been several suggestions to what it means for a machine learning algorithm to be fair. For example, COMPAS, a machine learning algorithm which calculates the recidivism risk score for prisoners, which plays a part in the decision whether to let prisoners out on parole or not. COMPAS was heavily criticised for not being fair because ProPublica found out that among all defendants who were released on parole and did not reoffend, blacks were far more likely than whites to receive a high risk score. However, when confronted with the criticism, the company that made the COMPAS system, Northpointe, pointed out that their system is fair. The scores produced by COMPAS mean the same regardless of race (e.g. a risk score of 7 means that a defendant has a risk of 60% to reoffend, regardless of race). .
This sounds contradictory. Can a machine learning algorithm produce outcomes that are both fair and unfair? Several researchers argued in the Washington Post that it’s not only possible, it’s actually inevitable given the fairness criteria above. It’s mathematically impossible for any algorithm to satisfy both fairness criteria at the same time. This begs the question, what definition of fairness should we focus on?
A study by researchers from Stanford and ETH Zurich tried to answer the question what the most appropriate definition is in the context of our society. They did this by testing ‘lay people’ and finding out what they see as the most appropriate definition of fairness. Their conclusion was that when people were presented with several fairness criteria, people tend to go with the most mathematically simplistic one, even when presented with more sophisticated and complicated alternatives. This means that what people who don’t work with algorithms every day view demographic parity- that the percentage of correct predictions is the same among every group. In the example of COMPAS above, this would mean that the definition of fairness from Northpointe actually holds. However, I guess everyone can see that the criticism from ProPublica is also justified.
I could go on and on about this discussion, but regardless of what definition we choose, it’s a fact that many machine learning models are not fair in ways that everyone can agree are wrong. From facial recognition that works almost perfectly on the faces of white men with an error rate of only 0.8% but has an error rate of up to 34.7% on the faces of black women, to text embedding models that associate Afro-American names with ‘unpleasant’ and Euro-American names with ‘pleasant’, machine learning models start to hurt and discriminate against real people in real ways, and we need to be aware of it.
Luckily, awareness about unfairness in machine learning starts to grow. Through interviews and surveys, Microsoft researchers set to find out what the industry actually needs to not only be aware of unfairness, but also to proactively take responsibility for the results their model produces. It turns out that the industry is lacking concrete, domain-specific guidelines on how to check their models for fairness. One interviewee said that they actually don’t systematically check their models for fairness: “You’ll just have to put your model out there, and then you’ll know if there’s fairness issues if someone raises hell online.” That this approach is problematic goes without saying. Companies, especially large ones, need to formulate their own fairness criteria before they start programming.
Next to these specific instructions, there are some things that IT companies can start to take into account now. As Joy Buolamwini explains in her TED talk, there are three general things to think about when we want to make software and machine learning models that perform equally well for all people. First of all, who codes matters: it’s important to create teams that consist of different individuals. That way, blind spots held by certain individuals are more likely to be noticed by other individuals. Secondly, how we code matters: think about fairness every step of the way. As the Dutch expression says: prevention is better than curing. Lastly, why we code matters. Is the model that you’re trying to make going to make the world worse? Can it be used to worsen inequality in the world?
It’s time to start shifting our focus from can we do this to should we do this. I hope this blog has, for some of you, created awareness about what is means for a machine learning model to be fair. I like to be realistic and I know that not every every machine learning model is going to make the world a better place to live in. My goal is only to not make machine learning models that will make the world a worse place to live in. That’s not too much to ask, right?