In defence of algorithms

I was going to write a blog about how algorithms* can be fair. But if 2016 was the year in which politics went crazy and decided that foreigners were the source of all problems, it looks like 2017 has already decided that the real problem is that foreigners are being assisted by evil algorithms.

So let’s be clear. In the current climate people who believe that data can make the world a better place need to stand up and say so. We can’t let misinformation and ludditism wreck the opportunities for the world going forwards.

And there is a world of misinformation!

For example, there is currently a huge amount of noise about algorithmic fairness (Nesta here , The Guardian here et al). I’ve already blogged a number of times about this (1, 2, 3), but decided (given the noise) that it was time to gather my thoughts together.


(Most of) Robocop’s prime directives (Image from Robocop 1987)

tldr: Don’t believe the hype, and don’t rule out things that are fairer than what happens at the moment.

Three key concepts

So here are some concepts that I would suggest we bear in mind:

  1. The real world is mainly made up of non-algorithmic decisions, and we know that these are frequently racist, sexist, and generally unfair.
  2. Intersectionality is rife, and in data terms this means multicolinearity. All over the place.
  3. No one has a particularly good definition of what fairness might look like. Even lawyers (although there are a number of laws about disproportionate impact even then it gets tricky).

On the other side, what are the campaigners for algorithmic fairness demanding? And what are their claims?

Claim 1: if you feed an algorithm racist data it will become racist.

At the most simple level yes. But (unlike in at least one claim) it takes more than a single racist image for this to happen. In fact I would suggest that generally speaking machine learning is not good at spotting weak cases: this is the challenge of the ‘black swan’. If you present a single racist example then ML will almost certainly ignore it. In fact, if racism is in the minority in your examples, then it will probably be downplayed further by the algorithm: the algorithm will be less racist than reality.

If there are more racist cases than non-racist cases then either you have made a terrible data selection decision (possible), or the real problem is with society, not with the algorithm. Focus on fixing society first.

Claim 2: algorithmic unfairness is worse/more prevalent than human unfairness

Algorithmic unfairness is a first world problem. It’s even smaller scale than that. It’s primarily a minority concern even in the first world. Yes, there are examples in the courts in the USA, and in policing. But if you think that the problems of algorithms are the most challenging ones that face the poor and BAME in the judicial system then you haven’t been paying attention.

Claim 3: to solve the problem people should disclose the algorithm that is used

Um, this gets technical. Firstly, what do you mean by the algorithm? I can easily show you the code used to build a model. It’s probably taken from CRAN or Github anyway. But the actual model? Well if I’ve used a sophisticated technique, a neural network or random forrest etc, it’s probably not going to be sensibly interpretable.

So what do you mean? Share the data? For people data you are going to run headlong into data protection issues. For other data you are going to hit the fact that it will probably be a trade secret.

So why not just do what we do with human decisions? We examine the actual effect. At this point learned judges (and juries, but bear in mind Bayes) can determine if the outcome was illegal.

And in terms of creation? Can we stop bad algorithms from being created? Probably not. But we can do what we do with humans: make sure that the people teaching them are qualified and understand how to make sensible decisions. That’s where people like the Royal Statistical Society can come in…

Final thoughts

People will say “you’re ignoring real world examples of racism/sexism in algorithms”. Yes, I am. Plenty of people are commenting on those, and yes, they need fixing. But very very rarely do people compare the algorithm with the pre-existing approach. That is terrible practice. Don’t give human bias a free pass.

And most of those examples have been because of (frankly) beginners mistakes. Or misuse. None of which are especially unique to the world of ML.

So let’s stand up for algorithms, but at the same time remember that we need to do our best to make them fair when we deploy them, so that they can go on beating humans.


* no, I really can’t be bothered to get into an argument about what is, and what is not an algorithm. Let’s just accept this as shorthand for anything like predictive analytics, stats, AI etc…



STS forum. The strangest technology conference you’ve never heard of

At the beginning of October I was in Kyoto (yes, I can hear the tiny violins) attending the STS Forum on behalf of my employers.

What is the STS Forum?  Well this was the 12th meeting of a group focused on linking universities, technology companies, and governments to address global problems. The full name is Science and Technology in Society.

And it’s a really high level kind of thing. The opening was addressed by three prime ministers. There are more university vice-chancellors/provosts/rectors than you could imagine.  If you aren’t a professor then you’d better be a minister. No Nobel prize?  Just a matter of time.

So it’s senior.  But is is about technology?  Or at least the technology that I’m familiar with?

PM Abe addresses STS Forum

The usual players?

Well the first challenge is the sponsors.  A bunch of big companies. Huawei, Lockheed Martin, Saudi Aramco, Toyota, Hitachi, NTT, BAT, EDF.

All big, all important (I leave it up to you to decide if they’re good).  But are these really who you’d expect? Where are IBM?  Oracle? SAP? Even Siemens? Never mind Microsoft, Apple, or (dare I say it) LinkedIn, Facebook etc…

I daren’t even mention the world of big data: MongoDB, Cloudera or others.

Panels and topics

Then there are the panelists.  90% male. (In fact the median number of women on a panel is zero).  They are largely old.  None of them seem to be ‘real world’ experts – most are in Government and academia.

The topics are potentially interesting, but I’m nervous about the big data one. It’s not clear that there are any actual practitioners here (I will feed back later!)

Attendees and Ts

I have never been to a technology conference that is so suited. Even Gartner has a less uptight feel. Over 1000 people and not a single slogan. Wow. I feel quite daring wearing a pink shirt. And no tie.

What could they do?

I’m not saying it’s a bad conference. But I’m not sure it’s a technology conference, and I’m 100% certain it’s not a tech conference.

If they want it to be a tech conference then they need to take some serious action on diversity (especially gender and age)*.  They also need to think about inviting people who feel more comfortable in a T-shirt. The ones with slogans. And who know who xkcd is.

And this seems to be the biggest problem: the conference seems to highlight the gulf between the three components that they talk about (the triple helix) – universities, government, big business – and the markets where the theory hits the road. The innovators, the open source community, the disruptors.

On to the Big Data session

Well that was like a flashback to 2013. Lots of Vs, much confusion. Very doge.

It wasn’t clear what we were talking about big data for. Plenty of emphasis on HPC but not a single mention of Hadoop.

Some parts of the room seemed concerned about the possible impact of big data on society. Others wanted to explore if big data was relevant to science, and if so, how.  So, a lot of confusion, and not a lot of insight…