West coast laggard = East coast majority = European innovator

I was going to think about some of the things that have really grabbed me this year, or which will be grabbing me next year… And then I realised that I’m not exactly as cutting edge as I thought I was.  But it’s certainly a matter of perspective, and that alone is a lesson:

Don’t assume that the people blogging, tweeting or apping are your core customers

I’m sure I’ll come back to that one over 2012, because it’s certainly an assumption that most companies seem to be making, partly because they have no way of linking social media to core data.  But anyway, you might like to laugh at my choices:


I’m late to this one.  Really late.  But then it appears that most of Europe (unless you are a Beiber believer) are too.  I was introduced to the delights by @smfrogers of The Guardian whilst at #strataconf in Santa Clara.  Everyone in the audience was tweeting, and it provided direct feedback for the presenters and a verbatim report of what people found interesting.  Of course there was no indication that the speakers were aware of this, and what people found interesting was often not what the speakers thought they would (compare the meaningful pauses in presentations with the Twitter feed…).

I tried this myself in Europe, arranging for simultaneous tweeting and presenting.  But no one was paying attention.  Or alternatively they were all paying attention to me rather than my tweets, which may be a good thing.

The quantified self

Sometime in December I realised that my wife has abs.  This is a surprise, and has made me feel more than a little ashamed.  How did she achieve this?  Through self awareness, helped by an app (myfitnesspal).  Essentially it’s an online database of foods and calories, exercises and calories, and allows you to track your consumption.  Boy is it addictive.  I’m hooked, and now realise that bread is both delicious and evil.

Having dipped my toe in the water I find I want more.  I want my finances to be easily and well organised.  My travel (that’s started with Tripit).  And I want it made easy. And to be on my iPhone.


I’d never really thought very positively about crowdsourcing until I came across Waze.  It’s essentially a user driven (that was a bad pun) GPS system.  Users contribute just by driving around, and can also feed in incidents such as crashes, improvements to maps etc…

I had tried using Twitter as a traffic monitor, and found it at least better than the radio, but when your motorways have the same naming convention as US military hardware you tend to get some odd results. #M16 anyone?  Waze is much better… even ignoring gamification, which it also has.

So, my big predictions for 2012: Twitter becoming mainstream in Europe, quantified self becoming meaningful, and crowd sourcing taking off.  And a happy New Year!


Making data science a sport – why Kaggle makes my blood boil

So, if you haven’t heard of them there is this company called Kaggle who run data mining competitions.  “Making data science a sport”.  Essentially they offer prizes (or help people who want to offer prizes) to people who can build the best predictive model on a set of data. 

Sometimes the prize is modest – say $10 000 – other times it is huge.  The biggest prize at the moment is $3M for health prediction.

The concept of a data mining prize most famously started with the Netflix challenge – where teams of people could compete to identify the best films to offer people.

I am a Data Miner.  I like building models.  I like predicting things from data.  So why does the thought of Kaggle make my blood boil?  And should it? 

Well I spend a lot of time talking to people about data mining.  I explain what I think the best approach is (I strongly recommend the CRISP-DM methodology, link currently unvailable, but we’re working on it). And the thing that hits me time and time again is that building the model – the bit that Kaggle has elevated to the pinacle of the process – is just about the least important part of it all.  There are numerous techniques available, all of which are well understood.  There are many software vendors (SAS, IBM, Revolution, KXEN), and there are just as many opinions on the best algorithm.  But. 

  • If you have the wrong business question* then no algorithm will fix your problem.
  • If you have the wrong data then no algorithm will fix your problem.
  • Conversley, if you have the right question and the right data it’s pretty hard to get it wrong even if you make a poor algorithm choice,

Does it make Kaggle bad – no it doesn’t.  But if this is making data science a sport we might want to think about the value of putting large sums of money into doing things that are essentially of little real value.  And no, I don’t mean golf.

*Business question = important thing that you want to make predictions about.