LonData III: the MoshiMonsters paradox

Are you familiar with MoshiMonsters?  It is an online pet type game/junior social network developed by MindCandy.  If not, then you probably don’t have kids aged between 6 and 12…

At LonData III we were lucky enough to have a presentation from Toby Moore, CTO of MoshiMonsters, who took us through the world of data that the game generates, and how MindCandy got to where they are.

Toby took us through their aim moving from no data, through big data, right data, predictive data, and eventually strategic data.

At the beginning of their story they had lots of data, but no ETL, no reporting, and no analysis.  They realised they had to move forwards, and put in place a technology stack of:

  • MS SQL Server as an ETL platform
  • Hadoop for data storage
  • MS SQL for analysis/reporting

This still didn’t resolve their problems, and so they are moving to QlikView to give users direct access to their data.

So this is a Big Data play, right? Lots of data? Hadoop?  It must be!

Is this Big Data, or just big data?

There are lots of things that are great about this story – and let me be clear that none of my comments in any way take away from the amazing success of MoshiMonsters…

I like:

  • The fact that data is so important to them
  • The willingness to give end users direct access to data

But I think it fails to be Big Data because

  • They don’t try to experiment using the data
  • They don’t do predictive analysis (although they use six-sigma statistical approaches to identify issues)
  • There is very limited analysis

Data kills Creativity? Really?

In fact the most worrying issue was a CP Snow like divide: on the one side Creativity.  On the other Data.

This came up several times in the presentation – they would never burden their creative staff with data. They don’t think that segmenting their customers, or analysing their behaviour is the way to go. They don’t test out alternative strategies on the website.

Partly this is because they are extremely sensitive to the nature of their customers (young children) who aren’t the same as the people paying the bills (adults). They say they try to avoid pressuring their customers out of the freemium and into the paying segments*.

I’ve got to say, I really don’t believe this divide to be true. Yes, an anally retentive approach to analysis might kill creativity, but anyone that anal probably doesn’t understand the limits of their analysis. Analysis leaves many, many grey areas.  And on the other hand creativity cannot work in a vacuum.

I came away somewhat disturbed by their approach, whilst still being in admiration of their success and drive. I don’t believe that Big Data approaches can be separated from creativity!

The conclusion:

  1. Is Hadoop necessary for Big Data? Possibly, but it isn’t sufficient.
  2. Is volume necessary for Big Data? Not on an absolute scale, although it helps.
  3. Is attitude necessary for Big Data? Yes, absolutely!
  4. Is it creative? Hell Yes!




Better regulation – wanted everywhere!

It seems that the UK is leading the world in one area: our weak and ineffective regulators.

Lets take a look at the roll-call of shame:

The Financial Services Authority

Area of responsibility: Making sure that banks play by the rules.

Area of incompetence: Disappointingly they seem to actually be doing good work with payment protection insurance, but on the other hand they did allow a global financial meltdown on their watch, which isn’t the best advert for competence. 

Self regulation?  No, set up by government

The Press Complaints Commission

Area of responsibility: Keeping the written press to the straight and narrow.

Area of incompetence: If you’ve only been exposed to these jokers through the Leveson inquiry then you’re lucky. Essentially a private club of editors making sure that other editors could do exactly what they wanted without fear of censure, they made sure that they were ineffective through a breathtakingly narrow term of reference, and through stunningly weak punishments in the one-in-a-million* chance that they ruled against a paper.  

Self regulation? At its best.

The Advertising Standards Authority

Area of responsibility: Press, display and web advertising. It should be honest.

Area of incompetence: I’ve dealt with them on a couple of occasions, and some interesting things stand out.  They are happy to close complaints without reference to the complainers (which is a neat trick that my bank should consider!), they believe that a term written in 5pt face on page 13 of a contract is as visible as thirty foot high advertising, and they use the blanket excuse that “people expect adverts to lie, so lies are OK in adverts”.

Self regulation? Independent

The Information Commissioner’s Office

Area of responsibility: data protection

Area of incompetence: In this age of data you would think that the ICO would be one of the most important regulators there is. Private companies and public organisations can access and hold vast amounts of personal data – and the use of this data can have huge impacts on people’s lives.  Yet the ICO seems reluctant to take on big companies.

They do fine public bodies, but the irony of course is that those fines simply come back to us, the public.  The one occasion I can remember where the ICO fined a private company was after they knew it had ceased trading (i.e. there was no-one to actually pay the fine).

To make matters worse the ICO is notoriously reluctant to give advice about the legality of action before it’s taken – why does this matter?  Well the landscape is constantly changing.  By refusing to give advice the ICO is discouraging honest businesses and is leaving the field open for dishonest businesses**.

It’s time for the ICO to learn the lesson from bad regulators and get its house in order.

*And not in a Pratchett sense

**Or those who simply don’t care



Big brother, or why bother?

A few days ago the UK Government half announced* that they want to change the law to allow additional monitoring of electronic communications by GCHQ – the UK spy central.

Details of what exactly has been proposed are limited, but I’m going to try and pick through what they could mean from an analytical perspective.  I’m not going to get into the territory of right or wrong, protection from terrorism or unwarranted breach of civil liberties.  I’ll leave that for other, frequently less qualified, observers.

What has been proposed?

Broadly speaking the proposals seem to require ISPs and other electronic communications providers to give access (potentially in real time) to information about the sender and receiver of communications, possibly the size or length of the communication, and the time and date.

All of this can be done without a warrant.  If they want to know the contents then they have to apply for a warrant.

If this sounds familiar as a data set, it should – it’s pretty much the same data set that telcos use when they do Social Network Analysis**.  In the industry this is known as the social graph, made up of nodes (people) and edges (communications). 

Analytically what could they work out?

I’ve done extensive work in social network analysis, and it is an incredibly powerful analytical tool.  It’s not perfect, but it can be used for a number of predictions and analyses:

Your network

At its most basic level this means the security services get easy access to knowing who everyone in your social circle is.  Or social circles.  And if they know anything about any person in that circle then they know something about you.  So do you care if they know about that book club?  Possibly not.  What about that musical theatre society?****


Once you know the social graph you can predict who is the most important person in terms of influence.  Sometimes this is a little counter intuitive.  For example, at college is the influential person the one who makes lots of calls, or who receives lots of calls?  Would you want people outside your circle knowing that?


It’s pretty easy to spot the family relationships in social graphs, even when you don’t listen to the contents of calls (or read the contents of emails).

Types of relationship

But you can go further and identify different types of relationship.  Is it a work relationship or a social one?  A casual relationship or a deep one?  All becomes open…


And how about changes?  A French colleague noted that this information could be used to spot “the minute that someone fell in love, when they moved in together, when the relationship was in difficulty, and when they split up.”

The end of anonymity

Of course these connections allow you to link known people/data to unknown people/data.  Take Facebook as an easy example.  Imagine that you don’t tell me which school you went to.  If I want to know I can look at your connections – if more than 50% of them went to school x then it’s a good bet that you did too.  And once I start building up that picture then it’s very hard to preserve anonymity.

Some technical issues (because it’s always fun to poke holes)

Real time? Really?

This always amuses me.  What do they mean by real time, and who is going to pay for it?  Do they mean real time access, but not historical access, or are they going to require the same level of detail to be stored?

Matching individuals

I am @duncan3ross on Twitter.  I’m duncan.3.ross on Facebook.  But I’m TheSheep on UKPollingReport.  A key issue for the security services will be knowing that those three are the same.  Easy for most home users: log the IP address.  But if I was trying to hide then I would have disposable SIM based smartphones that would separate my online identities.  It would require a bit of fieldcraft, but it wouldn’t be too difficult.

Of course, if you assume that one of the things that the security services want to do is build up a correspondence between various online identities then they can probably work around this, providing they do it for everyone, all the time.

Internationalisation and Blackberry

How far will this law reach?  Will it be relevant if my server is in Bermuda?  If I’m a US company?  And what about RIM (assuming they’re still in existence?).  They have regularly had battles with the Indian and various gulf governments about the privacy of BBM communications…


I hope this has opened your eyes to what could be done, without any judicial oversight, and without ever reading the content of messages.  Now you can decide if it’s a good or a bad thing.


*Half announcements are popular, because as a politician they give you the opportunity to change your mind under the guise of consultation, testing the waters, etc… and to tell people to “wait until you’ve seen the full (i.e. different) proposals” if they seem to be a real stinker.

**Not, and I repeat this, not Social Media analysis.  This doesn’t require you to use Facebook***.  It can be done really effectively with phone calls.

***Although you can if you want.

****Yes, that’s a euphemism.