Who hates Donald Trump the most?

By now you’re probably aware of the UK Government petition about Donald Trump. It’s currently be signed by over 500,000 people, making it the most successful petition in the UK…

But who actually hates him most? The site conveniently provides a map, so you can see where people dislike him… but constituencies aren’t all equal, and you always run the risk of doing a heatmap of population densities:

heatmap

(Source xkcd.com/1138)

So I thought I’d explore further.  Firstly, which are the top constituencies by proportion of the population who hate Donald?

Constituency MP Signed Percentage
Bethnal Green and Bow Rushanara Ali MP 1874 1.50%
Bristol West Thangam Debbonaire MP 1779 1.43%
Brighton Pavilion Caroline Lucas MP 1405 1.36%
Hackney South and Shoreditch Meg Hillier MP 1496 1.27%
Islington North Jeremy Corbyn MP 1284 1.24%
Cities of London and Westminster Rt Hon Mark Field MP 1364 1.24%
Glasgow North Patrick Grady MP 876 1.22%
Hackney North and Stoke Newington Ms Diane Abbott MP 1556 1.22%
Islington South and Finsbury Emily Thornberry MP 1237 1.20%
Edinburgh North and Leith Deidre Brock MP 1279 1.20%

OK, so we see some of the usual faces there. Jeremy Corbyn, Diane Abbot, Caroline Lucas. Maybe it’s just a political thing.  Let’s look at the bottom constituencies (again by the proportion who are keen on the bewigged one):

Constituency MP Signed Percentage
Blaenau Gwent Nick Smith MP 92 0.13%
Barnsley East Michael Dugher MP 119 0.13%
Makerfield Yvonne Fovargue MP 126 0.13%
Boston and Skegness Matt Warman MP 129 0.13%
Wentworth and Dearne Rt Hon John Healey MP 119 0.12%
Walsall North Mr David Winnick MP 117 0.12%
Doncaster North Rt Hon Edward Miliband MP 123 0.12%
Easington Grahame Morris MP 102 0.12%
Kingston upon Hull East Karl Turner MP 112 0.12%
Cynon Valley Rt Hon Ann Clwyd MP 78 0.11%

Well I wouldn’t (necessarily) have expected to see Ed Miliband and Ann Clwyd’s constituencies there.

So let’s go slightly more data sciency. What is the correlation between Labour vote and proportion hating Donald?

Rather than just looking at that, let’s look at a whole range of stuff:

Correlation

Woah! What was that?  Well that’s the correlations between a whole load of stuff and the percentage who hate Donald. Percentage is the second column (or row), and a blue box means a positive correlation, and a red box a negative one.

So let’s look at some of the more interesting ones, and sort them from highest to lowest. Remember that these are at a Constituency level.

  • PopulationDensity  0.64
  • Green                           0.55
  • FulltimeStudent       0.53
  • Muslim                        0.37
  • LD                                  0.19
  • Lab                                0.07
  • Con                             -0.11
  • Ethnicity White      -0.48
  • Christian                  -0.60
  • UKIP                           -0.61
  • Retired                      -0.67

So it would seem that the strongest correlation isn’t with Muslim populations, it’s actually strongest with built up areas, then with 2015 Green voters, then with full time students, and only then with Muslim areas.

And it’s probably not surprising that those lease likely to hate Donald are constituencies with lots of retired people, a strong UKIP presence (some immigration is OK then, it would seem), a large number of people identifying as Christian, and white people.

Taking it one small step further, and running our old friend, Linear regression we see a model like this (Labour and Con voters removed to avoid tons of correlation, LD voters removed out of sympathy):

Estimate Std. Error t value Pr(>|t|)
(Intercept) .557 .0705 7.89 E-14
Green .0201 .0016 12.78 < 2e-16
Muslim .0036 .0012 3.04 0.0025
Density .0035 .0003 12.54 < 2e-16
 White .0033 .0007 4.78 2.2 E-6
Student  .0023 .0011  2.05 0.041
Christian -.0021 0.0007 -2.97 0.0031
Lab -.0036 0.0003 -11.14 < 2e-16
UKIP -.0118 0.0007 -16.85 < 2e-16
Retired -.0173 0.0020 -8.55 < 2e-16

A slightly different story – but looking at the key stand outs: if you want to find a constituency that really hates Donald, first look for Green voters, then a densely populated area, with quite a sizeable Muslim and white community, but keep away from those UKIP voters, especially the retired ones.

*Data for the petition harvested at about noon, 11 December 2015
** Other data from ONS based on the 2011 Census
*** Featured image Donald Trump by Gage Skidmore
**** Northern Ireland doesn’t provide breakdown of petition numbers

The day the (medical) data broke free…

Image

Today is a good day for data – at least in healthcare. At last the data from the NHS is being set free.

For my international colleagues and friends it’s worth pointing out some things about the NHS.  The UK National Health Service* is actually a very large and complex organisation that cares for health needs. The main arms are the GP services and the Hospital services.  GPs are self employed and effectively contracted by the NHS. Hospitals are islands to themselves within regional groupings.  Above all lie funding and commissioning structures. Sounds complex? From a data perspective it certainly is. The data that is generated by the system is often written, frequently in isolated systems, and is barely there for joined up services, never mind research.

On the positive side, it’s free** at point of use, and generally does a good job.

.There have been signs for a while that the NHS has been starting to think about data.

  • Dr Carl Reynolds (@drcjar) at http://openhealthcare.org.uk/ has been leading the way on doing good things with health data, including running NHS hack days.  If you want to get involved the next one is in Liverpool on the 22-23 September
  • The UK set up the BioBank project, aimed to give a longitudinal study of people who aren’t necessarily ill.  If you think about it it’s fairly obvious that most people who go to their doctor are ill – BioBank aims to understand the factors in their lives that were the same, or different, to other people before and after they were ill.
  • Dr Ben Goldacre (@bengoldacre) has been leading a crusade to get clinical research data (even from trials that are abandoned or not published) into the public domain so that it can be used to compare outcomes.

But now the Government has gone much, much further and has created the Clinical Practice Research Datalink. In addition to having a funky website this aims to bring together data from the NHS so that this vast set of data can be used to improve health outcomes.

Of course there is a very, very big cloud hanging over this. How do you anonymise patient data so that it is still useful?  Simply removing names and addresses won’t deal with the issue, as Ross Anderson of Cambridge University identifies (the Guardian again – don’t say they aren’t fair and balanced!).

But I think, on balance, I disagree with Ross. I’ve come to the conclusion that we can’t rely on privacy, and that the exchange of a guarantee on privacy for free medical care is probably reasonable in itself. Especially as the guarantee isn’t really worth much these days.  When you add to this the potential benefits to research, then the answer is even more obvious. How many people would be happy to give up their privacy if they knew that one day they, or their kids, might be relying on the treatment that resulted?

*Actually there are three, NHS England and Wales, NHS Scotland, and NHS Northern Ireland, but let’s assume they are the same thing for this argument. NHS E&W is by far the largest.

**Nearly.

LonData III: the MoshiMonsters paradox

Are you familiar with MoshiMonsters?  It is an online pet type game/junior social network developed by MindCandy.  If not, then you probably don’t have kids aged between 6 and 12…

At LonData III we were lucky enough to have a presentation from Toby Moore, CTO of MoshiMonsters, who took us through the world of data that the game generates, and how MindCandy got to where they are.

Toby took us through their aim moving from no data, through big data, right data, predictive data, and eventually strategic data.

At the beginning of their story they had lots of data, but no ETL, no reporting, and no analysis.  They realised they had to move forwards, and put in place a technology stack of:

  • MS SQL Server as an ETL platform
  • Hadoop for data storage
  • MS SQL for analysis/reporting

This still didn’t resolve their problems, and so they are moving to QlikView to give users direct access to their data.

So this is a Big Data play, right? Lots of data? Hadoop?  It must be!

Is this Big Data, or just big data?

There are lots of things that are great about this story – and let me be clear that none of my comments in any way take away from the amazing success of MoshiMonsters…

I like:

  • The fact that data is so important to them
  • The willingness to give end users direct access to data

But I think it fails to be Big Data because

  • They don’t try to experiment using the data
  • They don’t do predictive analysis (although they use six-sigma statistical approaches to identify issues)
  • There is very limited analysis

Data kills Creativity? Really?

In fact the most worrying issue was a CP Snow like divide: on the one side Creativity.  On the other Data.

This came up several times in the presentation – they would never burden their creative staff with data. They don’t think that segmenting their customers, or analysing their behaviour is the way to go. They don’t test out alternative strategies on the website.

Partly this is because they are extremely sensitive to the nature of their customers (young children) who aren’t the same as the people paying the bills (adults). They say they try to avoid pressuring their customers out of the freemium and into the paying segments*.

I’ve got to say, I really don’t believe this divide to be true. Yes, an anally retentive approach to analysis might kill creativity, but anyone that anal probably doesn’t understand the limits of their analysis. Analysis leaves many, many grey areas.  And on the other hand creativity cannot work in a vacuum.

I came away somewhat disturbed by their approach, whilst still being in admiration of their success and drive. I don’t believe that Big Data approaches can be separated from creativity!

The conclusion:

  1. Is Hadoop necessary for Big Data? Possibly, but it isn’t sufficient.
  2. Is volume necessary for Big Data? Not on an absolute scale, although it helps.
  3. Is attitude necessary for Big Data? Yes, absolutely!
  4. Is it creative? Hell Yes!

 

 

Some thoughts on the Aprimo Offer Exchange

Firstly, let me say that this is not intended as a screw you analysis: the Aprimo Offer Exchange is a good idea, produced on a limited budget by some cool people. Instead this is intended to ask important questions about it and to try and suggest opportunities for improvement…

What is the Offer Exchange? Well those of us at Teradata Partners are being given a chance to sign up and then like various companies on Facebook. When we do this we are sent emails/in app offers from those companies. So far so good: the offers are varied, and some are pretty good. They come from a wide range of companies including Starbucks and local pubs (so useful!).

But it has its problems:

1: Targeted Marketing != Spam

I’m getting spammed. There is no other word for it. I had thirty emails on the first day. Now, yes, I consented, but I expect marketing organisations to treat my consent with respect!

2: Confusing offers

I’m getting multiple offers from the same company at the same time…

3: No sign of a measurement ability

This is my main gripe: there is no obvious way that the companies concerned can identify that an offer sent to me has been responded to. How can the targeting improve if there is no measurement? How can you rune A/B tests?

So for improvements: sort out measurement, model responses, segment and target, limit contacts!