I was going to write a blog about how algorithms* can be fair. But if 2016 was the year in which politics went crazy and decided that foreigners were the source of all problems, it looks like 2017 has already decided that the real problem is that foreigners are being assisted by evil algorithms.
So let’s be clear. In the current climate people who believe that data can make the world a better place need to stand up and say so. We can’t let misinformation and ludditism wreck the opportunities for the world going forwards.
And there is a world of misinformation!
For example, there is currently a huge amount of noise about algorithmic fairness (Nesta here , The Guardian here et al). I’ve already blogged a number of times about this (1, 2, 3), but decided (given the noise) that it was time to gather my thoughts together.
(Most of) Robocop’s prime directives (Image from Robocop 1987)
tldr: Don’t believe the hype, and don’t rule out things that are fairer than what happens at the moment.
Three key concepts
So here are some concepts that I would suggest we bear in mind:
- The real world is mainly made up of non-algorithmic decisions, and we know that these are frequently racist, sexist, and generally unfair.
- Intersectionality is rife, and in data terms this means multicolinearity. All over the place.
- No one has a particularly good definition of what fairness might look like. Even lawyers (although there are a number of laws about disproportionate impact even then it gets tricky).
On the other side, what are the campaigners for algorithmic fairness demanding? And what are their claims?
Claim 1: if you feed an algorithm racist data it will become racist.
At the most simple level yes. But (unlike in at least one claim) it takes more than a single racist image for this to happen. In fact I would suggest that generally speaking machine learning is not good at spotting weak cases: this is the challenge of the ‘black swan’. If you present a single racist example then ML will almost certainly ignore it. In fact, if racism is in the minority in your examples, then it will probably be downplayed further by the algorithm: the algorithm will be less racist than reality.
If there are more racist cases than non-racist cases then either you have made a terrible data selection decision (possible), or the real problem is with society, not with the algorithm. Focus on fixing society first.
Claim 2: algorithmic unfairness is worse/more prevalent than human unfairness
Algorithmic unfairness is a first world problem. It’s even smaller scale than that. It’s primarily a minority concern even in the first world. Yes, there are examples in the courts in the USA, and in policing. But if you think that the problems of algorithms are the most challenging ones that face the poor and BAME in the judicial system then you haven’t been paying attention.
Claim 3: to solve the problem people should disclose the algorithm that is used
Um, this gets technical. Firstly, what do you mean by the algorithm? I can easily show you the code used to build a model. It’s probably taken from CRAN or Github anyway. But the actual model? Well if I’ve used a sophisticated technique, a neural network or random forrest etc, it’s probably not going to be sensibly interpretable.
So what do you mean? Share the data? For people data you are going to run headlong into data protection issues. For other data you are going to hit the fact that it will probably be a trade secret.
So why not just do what we do with human decisions? We examine the actual effect. At this point learned judges (and juries, but bear in mind Bayes) can determine if the outcome was illegal.
And in terms of creation? Can we stop bad algorithms from being created? Probably not. But we can do what we do with humans: make sure that the people teaching them are qualified and understand how to make sensible decisions. That’s where people like the Royal Statistical Society can come in…
People will say “you’re ignoring real world examples of racism/sexism in algorithms”. Yes, I am. Plenty of people are commenting on those, and yes, they need fixing. But very very rarely do people compare the algorithm with the pre-existing approach. That is terrible practice. Don’t give human bias a free pass.
And most of those examples have been because of (frankly) beginners mistakes. Or misuse. None of which are especially unique to the world of ML.
So let’s stand up for algorithms, but at the same time remember that we need to do our best to make them fair when we deploy them, so that they can go on beating humans.
* no, I really can’t be bothered to get into an argument about what is, and what is not an algorithm. Let’s just accept this as shorthand for anything like predictive analytics, stats, AI etc…