Blog: Fairness in the Age of Algorithms
Bias, Prediction, and Justice
Of all of the exciting work taking place in the field of data science, the machine learning algorithm (MLA) is one of the technological advancements that has garnered the most attention — and, to many, it is the area of data science that holds the most promise for the future. However, as with all powerful technologies, MLAs also carry the risk of becoming destructive forces in the world. In the words of Paul Virilio, a French cultural theorist who has written extensively about technology,
“When you invent the ship, you also invent the shipwreck; when you invent the plane you also invent the plane crash; and when you invent electricity, you invent electrocution…Every technology carries its own negativity, which is invented at the same time as technical progress.”
Earlier applications of MLAs included email spam filtering, image recognition, and recommender systems for movies, dining, and shopping. In these types of low-stakes settings, the concept of fairness isn’t really relevant; a prediction is merely correct or incorrect. The cost of errors in these systems is relatively low — usually minor inconvenience or annoyance at worst. However, the cost of errors in MLAs has dramatically increased as they have begun to be applied to human beings rather than words and pixels. Despite the seeming objectivity of the process of training MLAs to maximize prediction accuracy on training data, it sometimes results in algorithms that, while computationally correct, produce outputs that are biased and unjust from a human perspective. And in high-stakes settings, MLAs that don’t produce “fair” results can do enormous damage.
Fairness is an elusive concept. In the practice of machine learning, the quality of an algorithm is often judged based on its accuracy (the percentage of correct results), its precision (the ability not to label as positive a sample that is negative), or its recall (the ability to find all the positive samples). Deciding which of these three measures is the best proxy for fairness is not always straightforward, and improvements in one metric can cause decreases in others. Moving beyond the machine learning context to frame the discussion in a broader ethical and legal context only makes determining what is truly fair more complicated.
A large piece of the challenge is that MLAs can only be as fair as the data itself. If the underlying data is biased in any way, there is a risk that its structural inequalities will not only be replicated but possibly even amplified in the algorithm. Machine learning engineers must be aware of their own blind spots and implicit assumptions; all the small decisions they make about finding, organizing, and labeling training data for these models can be as impactful as their choice of machine learning techniques. Even more problematic, however, is the issue that societal problems like racial bias, discrimination, and exclusion are deeply rooted in the world around us — and consequently, they are inherent in the data that we extract from the world. For instance, there is evidence that, despite similar rates of drug use, blacks are arrested at four times the rate of whites for drug-related offenses. Even if engineers flawlessly collected this data and trained a perfectly predictive machine learning model with it, the resulting algorithm would still be tainted with the bias embodied in the environment that produced the data.
Achieving algorithmic fairness, it seems, is as difficult as achieving fairness in human-led decision-making systems. Human systems are biased in all of the ways that algorithmic systems are biased — since both are human creations — and human decision-makers are additionally biased in ways that machines are not. However, in some ways, algorithmic systems present greater challenges because their implementation often renders them both less visible and less transparent than human processes. Often, people are unaware that an algorithm is being used to make a decision which affects them — and even if they are, the algorithm is presented as a complex, unknowable “black box”, which is impossible to see, much less understand. As Cathy O’Neil, author of Weapons of Math Destruction and a proponent of algorithmic accountability, said in a recent interview,
“When you have something that’s important and secret, it’s almost always going to be destructive.”
One widely-discussed and controversial example of this problem is in the area of predictive policing. The criminal justice system has increasingly adopted predictive services built on advanced machine learning systems. For example, risk-assessment algorithms that estimate the likelihood of recidivism for accused criminals are now commonly used to inform judges’ decisions at every step of the process, including assigning bond amounts and determining sentencing. Ideally, these algorithms would help any individual judge to mete out justice more even-handedly, and they could help to eradicate inconsistencies in punishment across an entire court system.
However, an investigative report by ProPublica found that a program called COMPAS, one of the most widely used crimimal risk-assessment tools in the United States, instead of increasing fairness, tended to reinforce racial biases found in the law enforcement data upon which it was trained — despite the fact that race was not one of the inputs into the system. In addition, ProPublica found that the judges who relied on these risk assessments typically did not understand how the scores were computed, and therefore could not effectively question them or identify and correct for their inherent biases.
Northpointe, the company that created COMPAS, argues that it is, in fact, fair — because those defendants whom it predicted were at higher risk of re-offending did so at approximately equal rates, regardless of race. Its rate of true positives appears to be racially neutral.
However, ProPublica’s argument is that COMPAS is not fair, because black defendants who did not re-offend were more than twice as likely as whites to have been classified as medium or high risk. Its rate of false positives appears to be racially biased.
Because the overall recidivism rate for black defendants is higher than for white defendants, unless the algorithm is 100% accurate (which is not achievable in practice), it is mathematically impossible to simultaneously meet both Northpointe’s and ProPublica’s definitions of fairness.
Furthermore, there are many questions that go far beyond the domain of machine learning engineering. For example, is it right to make decisions in an individual case based on other defendants’ outcomes? Is it acceptable to set bail, determine sentencing, and grant or deny parole using characteristics that might be associated with race or socioeconomic status, such as level of education or the criminal record of a person’s parents? Can we determine an appropriate punishment for a past action based on the statistical likelihood of a future action? And at what point does such an assessment cease to predict the future and instead start to cause it?
It seems that there are three clear — though challenging — steps that must be taken to improve algorithmic fairness.
First, we must do better to ensure the quality of the data being used to train the algorithms. For instance, all subjects should have an equal chance of being represented in the data, which means that additional effort may be required to obtain data from underrepresented groups. Models also must be retrained periodically with new data to start to root out historical biases, despite the added expense that this incurs.
Second, within the field of machine learning, processes must be established and standardized across the industry to eradicate as much bias as possible from the engineering process. This could — and should — include a variety of approaches, including unconscious bias training for machine learning engineers similar to the training that intelligence analysts routinely undergo; engineering protocols akin to the protocols of scientific research, such as rigorous peer review; and independent post-implementation auditing of algorithmic fairness which judges the overall quality of an algorithm not only by the standard engineering metrics, but also by how it impacts the most vulnerable people affected by it.
Third, MLAs must be brought into the light in our society, so that we are all aware of when they are being used in ways that impact our lives: a well-informed citizenry is essential to holding the groups that create and use these algorithms accountable for ensuring their fairness. We are constitutionally guaranteed rights to due process, equal protection, and privacy; we should interpret these rights to include the right to know what data about ourselves is being used as input and the right to access any output that is generated about ourselves when MLAs are used in constitutionally-protected contexts. In the case of predictive policing instruments like COMPAS, this level of transparency may be irreconcilable with the needs of for-profit companies like Northpointe to protect their intellectual property. Therefore, it may be necessary to completely remove the ownership of these MLAs from the private sector and instead treat them as a public resource that is accessible by all.
Taking these steps will require profound changes throughout our society, by many stakeholders and across many domains. There is much work to be done even to frame the questions that need to be asked, much less to find answers to them. In a world governed by laws and conventions that never envisioned the power of the MLA, the active responsibility to continuously strive for fairness in the design and implementation of machine learning systems belongs to everyone who works in or with them. As MLAs become more prevalent in our society, it will become increasingly critical that the humans in the loop address this issue and ensure that this technology fulfills its promise to do good, rather than its potential to do harm.
- Mitigating algorithmic bias in predictive justice: 4 design principles for AI fairness by Vyacheslav Polonski, PhD
- A Gentle Introduction to the Discussion on Algorithmic Fairness by Gal Yona
- How big data is unfair by Moritz Hardt
- Machine Bias by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, ProPublica (May 2016)
- DataFramed Podcast: Weapons of Math Destruction (with Cathy O’Neil) (November 2018)
- Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms by Kate Crawford and Jason Schultz, Boston College Law Review (January 2014)
- Statement of Concern About Predictive Policing by ACLU and 16 Civil rights Privacy, Racial Justice, and Technology Organizations
- A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. by Sam Corbett-Davies, Emma Pierson, Avi Feller and Sharad Goel (October 2016)
- LAPD to scrap some crime data programs after criticism by Marke Puente, Los Angeles Times (April 2019)