Why the Prime Minister is wrong: the maths

 

Since this post was written we’ve had several new terrorist attacks in the UK. Most recently in Manchester and London. These are horrific events, and it’s natural that people want to do something. In each case there has been a call for the internet companies to ‘do more’, without ever being clear exactly what that means. Perhaps it means taking down posts. Perhaps it means reporting suspects. But whatever stance you take, the maths is still the maths, which makes this post that I wrote in 2014 more valid than ever…

Yesterday the UK suggested that an unnamed internet company could have prevented the murder of a soldier by taking action based on a message sent by the murderer.

It’s assumed that the company in question was Facebook.

The problem is that the maths tells us that this is simply wrong. It couldn’t have, and the reason why takes us to a graveyard near Old Street.

bayes

Buried in Bunhill Fields is Thomas Bayes, a non-conformist preacher and part time statistician who died in 1761. He devised a theorem (now known as Bayes Theorem) that helps us to understand the real probability of infrequent events, base on what are called prior probabilities. And (thank God) events like this murder are infrequent.

For the sake of argument let’s imagine that Facebook can devise a technical way to scan through messages and posts and determine if the content is linked to a terrorist action. This, in itself, isn’t trivial. It requires a lot of text analytics, understanding idiom, distinguishing “I could kill them” from “I could kill them” and so on.

But Facebook has some clever analysts, so lets assume that they build a test. And let’s be generous: it’s 98% accurate. I’d be very happy if I could write a text analytics model that was that accurate, but they are the best. Actually let’s make it 99% accurate. Heck! Let’s make it 99.9% accurate!

So now we should be 99.9% likely to catch events like this before they happen?

Wrong.

So let’s look at what Bayes and the numbers tell us.

The first number of interest is the actual number of terrorists in the UK. The number is certainly small. This is the only recent event.

But recently the Home Secretary, Theresa May, told us that 44 terrorist events have been stopped in the UK by the security services. I will take her at her word. Now let’s assume that this means there have been 100 actual terrorists. Again, you can move that number up or down, as you see fit, but it’s certainly true that there aren’t very many.

The second number is the number of people in the UK. There are (give or take) 60million.

(I’m going to assume that terrorists are just as likely, or unlikely, as the population as a whole to use Facebook. This may not be true, but it’s a workable hypothesis.)

So what happens when I apply my very accurate model?

Well the good news is that I identify all of my terrorists – or at least I identify 99.9 of them. Pretty good.

But the bad news is that I also identify 60,000 non-terrorists as terrorists. These are the false positives that my model throws up.

The actual chance of a person being correctly identified as a terrorist is just 0.17%.

Now this is surely a huge advance over where we were – but imagine what would happen if we suddenly dropped 60,000 leads on the police. How would they be able to investigate? How would the legal system cope with generating these 60,000 warrants (yes, you would still need a warrant to see the material)?

And let’s be clear; if we’re more pessimistic about the model accuracy things get worse, fast. A 99% accurate model (still amazingly good) drops the chance of true detection to 0.017%. At 98% it’s 0.008%, and at a plausible 90% it would be 0.0015%. The maths is clear. Thank you Rev Bayes.

Advertisements

6 comments on “Why the Prime Minister is wrong: the maths

  1. On newsnight last night (BBC2 on UK TV. Don’t think it’s available online sadly)…

    There was a politician trying to explain the thinking a bit more. He said facebook already has a mechanism to find people who seem to be discussing terrorism. They use this to block them. Now presumably this is some combination of the bayesian filtering you describe to whittle down the numbers, combined with a human checks (I suppose ??? unless facebook is confident enough in its filtering to block immediately without human checks) So then his point was, facebook should report such incidents to the relevant authorities. He’s saying they should use the same mechanism to whittle down the numbers *which already exists* whether that’s automated filtering and/or human checks.

    To me that seems like a reasonable point. That’s not to say I agree with the suggestion. There’s serious freedom of speech, privacy implications of governments trying to claim such snooping powers, but… well maybe that’s what the debate should be about, not whether facebook can technically do it.

    • Thanks for the comment. But imagine what happens when MI5 – who already fail to cope with their existing 2000 cases – have to deal with an additional 60,000 or more? And there is a real civil rights issue because in most cases there will be no evidence of a crime taking place…

  2. Enjoyed the post, thanks. I know it doesn’t affect the theory but the 60m assumption doesn’t stack up. I don’t use Facebook so you clearly need to deduct one 🙂 (I’m not a terrorist either btw)

    • An excellent point: many people don’t use Facebook. But the same may be true for terrorists! I’ve made a (big) assumption that terrorists and non-terrorists are pretty similar. When you think about it this isn’t a bad assumption: after all if terrorists were clearly different then you wouldn’t need their Facebook activities to spot them. However, you could argue with it. Terrorists are likely to be younger than the population as a whole, and younger people are more likely to be on Facebook than older people. On the other hand, terrorists have good reasons not to put stuff on Facebook! So, yes, it is an assumption…

  3. Duncan, nice article and really great food for thought!

    I am not very good at maths – so I don’t really understand the Bayes theorem, but I fear that you might be out by a significant factor – as it is not the number of members of the UK population which is important, but the number of Facebook posts (which will be made by a smaller percentage of the population).

    So, from various sources, it is widely held that 24M in the UK visit Facebook each day. Let’s just say that they only visit once a day (probably on the low side) and that only 25% of the 24M unique visits result in a post. That would be 6,000,000 UK Facebook posts to analyse in a day.

    If you can identify non-terrorist posts with a 99.9% accuracy, then that’s 6,000 posts that have to be checked for potentially terrorist content per day. Now, each of those potential terrorist posts don’t necessarily represent one unique person – let’s just say that 50% of the those posts have duplicate posters. That’s 3,000 unique individuals to be checked out each day, or 1,095,000 people per year.

    Now, that figure could drop a bit because some people would make dubious posts on more than one day per year AND the penetration of Facebook in the UK is reckoned to be about 58% – so that’s 42% of the population that you would never have to check, so even if you halved the number of people you need to check, I reckon that you are looking at something more like 500,000 warrants a year.

    • Thanks for the thoughts. The nice thing about the maths is that the only things that matter are the accuracy of the test and the ratio of terrorists to non-terrorists. As such we can debate what the right numbers might be. I deliberately didn’t bring time into it – although it adds a different dimension. I also suspect that your numbers actually need to be revised upwards for two reasons: posts aren’t the only way to interact with Facebook (likes, follows, comments on posts, messages), and other technical/social media. If the Government wants Facebook to do this we have to assume that they want Twitter, YouTube, Google (more for GMail than anything else) etc. This hugely increases the output that MI5/6 and the courts have to deal with.

      My final thought is actually “500,000 additional warrants a year? Wow!” – and checking 42% of the population???? Well I don’t think those numbers are right, but if they were then the entire court an police system would break down.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s