Why the Prime Minister is wrong: the maths

 

Since this post was written we’ve had several new terrorist attacks in the UK. Most recently in Manchester and London. These are horrific events, and it’s natural that people want to do something. In each case there has been a call for the internet companies to ‘do more’, without ever being clear exactly what that means. Perhaps it means taking down posts. Perhaps it means reporting suspects. But whatever stance you take, the maths is still the maths, which makes this post that I wrote in 2014 more valid than ever…

Yesterday the UK suggested that an unnamed internet company could have prevented the murder of a soldier by taking action based on a message sent by the murderer.

It’s assumed that the company in question was Facebook.

The problem is that the maths tells us that this is simply wrong. It couldn’t have, and the reason why takes us to a graveyard near Old Street.

bayes

Buried in Bunhill Fields is Thomas Bayes, a non-conformist preacher and part time statistician who died in 1761. He devised a theorem (now known as Bayes Theorem) that helps us to understand the real probability of infrequent events, base on what are called prior probabilities. And (thank God) events like this murder are infrequent.

For the sake of argument let’s imagine that Facebook can devise a technical way to scan through messages and posts and determine if the content is linked to a terrorist action. This, in itself, isn’t trivial. It requires a lot of text analytics, understanding idiom, distinguishing “I could kill them” from “I could kill them” and so on.

But Facebook has some clever analysts, so lets assume that they build a test. And let’s be generous: it’s 98% accurate. I’d be very happy if I could write a text analytics model that was that accurate, but they are the best. Actually let’s make it 99% accurate. Heck! Let’s make it 99.9% accurate!

So now we should be 99.9% likely to catch events like this before they happen?

Wrong.

So let’s look at what Bayes and the numbers tell us.

The first number of interest is the actual number of terrorists in the UK. The number is certainly small. This is the only recent event.

But recently the Home Secretary, Theresa May, told us that 44 terrorist events have been stopped in the UK by the security services. I will take her at her word. Now let’s assume that this means there have been 100 actual terrorists. Again, you can move that number up or down, as you see fit, but it’s certainly true that there aren’t very many.

The second number is the number of people in the UK. There are (give or take) 60million.

(I’m going to assume that terrorists are just as likely, or unlikely, as the population as a whole to use Facebook. This may not be true, but it’s a workable hypothesis.)

So what happens when I apply my very accurate model?

Well the good news is that I identify all of my terrorists – or at least I identify 99.9 of them. Pretty good.

But the bad news is that I also identify 60,000 non-terrorists as terrorists. These are the false positives that my model throws up.

The actual chance of a person being correctly identified as a terrorist is just 0.17%.

Now this is surely a huge advance over where we were – but imagine what would happen if we suddenly dropped 60,000 leads on the police. How would they be able to investigate? How would the legal system cope with generating these 60,000 warrants (yes, you would still need a warrant to see the material)?

And let’s be clear; if we’re more pessimistic about the model accuracy things get worse, fast. A 99% accurate model (still amazingly good) drops the chance of true detection to 0.017%. At 98% it’s 0.008%, and at a plausible 90% it would be 0.0015%. The maths is clear. Thank you Rev Bayes.