Why Big Data *isn’t* like CRM

It gives me great pleasure to be able to disagree with a learned document for MIT. Or a Professor from the Wharton Business School.  So both at once?  Joy!  I accept that this is a character flaw, but there we have it.

So what has got me so annoyed?

Well this article has Peter Fader likening the Big Data failures of CRM.  Now I was there.  I worked in CRM.  And you, Big Data, are no CRM.

So why is Prof Fader so anti Big Data?

Some of the reasons are just plain dumb.  Yes, more data is not always the same as better data, but deliberately ignoring data is a crazy idea.

What else could it be?  Well (without wanting to go ad-hominem on him) it’s often the case that standing out against perceived wisdom is a better way to make your mark in academia than going with the flow.  Don’t believe in the Higgs Boson?  You’ll get airtime much faster than the thousands who do. Don’t believe in Big Data?  Perhaps MIT will do an article with you…

But perhaps, just perhaps he has some good points.

Prof Peter Fader looking dynamic, but wrong (Wharton/Peter Olson)

So let’s explore (for a moment) why CRM failed.

The failures of CRM

When I started out in CRM, Peppers and Rogers had just released the seminal, and still brilliant One-to-One Future.  They argued that companies who made the leap to treating their customers as individuals, who learned from the data that customers provided, would be leaders.  To my mind this idea never failed.  We can look to the world around us and ask the question: which companies actually implemented that one-to-one vision?  Precious few.

So what went wrong?  Why does Prof Fader link the words “frustration,” “disaster,” “expensive,” and “out of control” to CRM.

It’s because for many, including the software company I worked for at the time, CRM became a technology solution and not a business philosophy.

And often the technology didn’t work quite as well as people hoped.  And when it did companies assumed that putting software in place, but changing nothing else was a good approach.  It wasn’t: they just enabled marketeers to do bad things more efficiently.

And if you haven’t seen a lesson for Big Data there then you haven’t been paying attention: Big Data does not equal Hadoop.  If it does then we are in danger of running down the CRM rabbit hole, and Prof Fader will be right.  And I will be denying ever disagreeing with him.

A discussion on Big Data – Teradata Universe 2012

The following notes are recreated from the Big Data Panel Session held at the Teradata Universe conference in Dublin, April 2012.

The panel consisted of Dr Judy Bayer (Director Strategic Analytics, Teradata), Tom Fastner @tfastner (Senior Architect, eBay), Navdeep Alam @YoshiNav (Director Data Architecture, Mzinga), and Professor Mark Whitehorn (University of Dundee).

It was moderated by me… so any false memory syndrome is laid at my door.  Note: I have edited it slightly to turn a conversation into something a bit more readable, I hope the contributors will forgive me!

Let’s start with an easy question: what one word sums up Big Data for you?

Mark: Fun

Judy: For this decade – noisy

Navdeep: Bringing together

Tom: Fun(damental)

Navdeep: Big Data is bringing together technologies, it requires interoperability between systems such as Teradata and Hadoop, SQL and MapReduce, it’s also bringing people together.

What makes it fundamental? And fun?

Tom: If you go back to Crossing the Chasm, we are on the left side of the chasm: the innovators. It is fundamental to get our job right as we are doing it first.

Mark: And I can’t believe people pay me to do this, it’s such fun.

You mentioned noise, Judy, why is that?

Judy: Big Data has always been around, it’s defined as much by current capabilities as anything. And each generation of big data brings noise and confusion as to what to do with it.

Audience: It’s also all about telling a story from the data.

So what makes a good Data Scientist?

Tom: There are six characteristics, they are a special breed who need to be: curious, questioning, good communicator, open minded, someone who can dig deeper…

We have five to ten concurrent users of Hadoop and these are the data scientists. I sit next to one and he’s constantly going “Wow!”.  But they also cause the most problems with their big questions.

Judy: I’d add creativity, a passion to experiment and fail, and a passion for finding the stories in data.

Mark (stroking his luxuriant facial hair): A beard and sandals! No: someone who can think outside the box and be adventurous.

Navdeep: They need to be insatiable when it comes to data.  They also need to be a cry-baby – in that they cannot be satisfied, they should always want more data, hardware, more resources.

The McKinsey Global Institute report from 2011 showed a huge skill shortfall for Big Data Analysis – would you agree?

Navdeep: There is clearly a shortage of skills, you need to mix business and technology, so collaboration is key

Tom: Yes!

Mark: In 2008 I was at conference when someone asked what is the academic world doing to fix this problem? In response the University of Dundee set up a Masters course in business intelligence.

Audience: Do Data Scientists exist without realising it? Is Data Science a rebranding of existing skills like data mining?

Judy and I have had disputes about whether Data Scientists actually exist…

Judy: Well I believe analysts are born not made, but they need training to fulfill their potential. When it comes to Data Science I think there may be something new here. Data Scientists will be better at collaboration than traditional Data Miners. But we’re at the infancy of the subject, with data and the tools that don’t really exist yet. In many ways this is a parallel with the early days of data mining.

Tom: Take Kaggle for example, it’s interesting because of the collaboration between the individuals in the teams. You have to form teams and build on skillsets to produce the best algorithm to solve the problem.

Audience: This is probably re-branding, you need an analyst who can work across areas…

Audience: I find Data Science a restrictive term, it doesn’t capture the art side and the creativity that is required – people are rejecting the easy to use GUI tools and going back to R and programming languages.

Which brings us on to a related topic: what is the most important big data analytical technology?

Navdeep: Massively parallel processing, with fault tolerance on commodity hardware and with support for virtual machines. In other words removing the complexity of parallel processing , allowing organisations outside of the military and academia to experiment.

Judy: Visualisation – for example ways of visualizing network graphs.

Tom: It isn’t a single technology, it’s an eco-system, and it’ll take many years to develop.

Mark: R – we need languages that let us use this data.

But isn’t there a danger that these languages restrict usage to a niche specialism?

Mark: Good fundamental languages will allow tools to be built on top.

Do those tools exist?  Judy, do you see visualisation as a mature technology? It’s clear that part of the data science skill set is telling stories but the visualisation doesn’t seem to be quite there yet.

Judy: Some of the visualization you see has too much wow factor (trying to be clever) but isn’t easy enough to understand.  It needs to be easy to communicate but also to be actionable.

Mark: The work of Hans Rosling is a brilliant example of clear visualisation.

Navdeep: It’s clear that BI tools are not sufficient alone, custom visualisation needs to be written.

Audience: Have we collected the right data? Do we need to look at what we have and keep everything, or just what’s relevant?

Tom: There are limitations of what you can actually store. ebay do delete historical data and certain things like pictures. Some data can be reproduced rather than stored.

Mark: It’s a balance. In the case of proteomics it is relatively more expensive to produce than store – and reprocessing may be required at a later date.

Navdeep: Cloud storage is expensive – so at Mzinga we focus on keeping behavourial data that can’t be reproduced. We use a private cloud solution to store our archives. In the case of Facebook data, we use Hadoop to process it, and keep the results. Currently we purge the source data when it is over 5 years old.  We try to recognise what’s valuable and hard to reproduce, and keep that.

If Big Data Fails in 2012 – what will be the cause?

Judy: Keeping everything, our data and businesses, siloed. Not recognise that we have to integrate everything we have.

Mark: Stupidity! We can do it, we can get value. Technically it works, it is people who could cause it to fail.

Navdeep: It comes down to a lack of people who understand and can use the tech. People are needed to drive innovation.

Tom: People, and expectations set by management. It takes time to grow and it is being done successfully.  Big Data is a buzz word that will not go away.

Do you see anything that worries you about Big Data?  What about data protection or security?

Tom:  We have a lot of data at ebay, but need to be cautious over what we do with the users’ data to avoid alienating them. As a result there are lots of rules regarding what can and can’t be done with the data.

Navdeep: We’ve worked with security agencies, and understand the need to be careful.  It’s important to respect the different laws in different countries.

Judy: Privacy and security will increase in relevance but won’t cause big data to fail. Ways will be found to increase privacy – and laws will need to change to cope with the new world.

Isn’t it our job as data professionals to think about what is reasonable and ethical? Thinking about the Target case, a good comment was: don’t use data to do anything indirectly that you wouldn’t do directly.

Finally, if you we’re starting a big data project tomorrow and could do anything at all, what would you do?

Navdeep: I would study the universe.  For decades we’ve had measurements from sensors, so I would take all the information and build some analyses and correlations. There is a huge opportunity to bring all this data together.

Mark: Proteomics! But as we’re already doing it I would opt for quantum data, bringing probability theory to the subatomic world.

Tom: Neuro Linguistic Programming, understanding language – can it be done in a database? Could that be more efficient than Hadoop?

Judy: Analytics that would do good for society, for example using analysis to increase literacy. But I’ve got to back Mark too: proteomics, it’s awesome

Thank you

Additional reporting by @chillax7

NoSQL? NOSQL? How about NOHadoop?

Comrades (for that is how all good exhortations to action begin), it is time for us to stand up against a Heresy that is sweeping the world of Big Data Science.

The Heresy is that there is only one God, and its name is Hadoop. This yellow elephant is being taken by some to be the alpha and omega of data science.  Just the other day an eminent blogger started a comment by saying “If Logo of BIG data is Elephant, What is the Logo of Analytics?”

And this is annoying.

It’s like saying the logo of driving is a prancing horse (sorry Daimler).  Or calling a computer tablet an iPad.  Well forget that last one. But you get the idea.  Hadoop may be the fastest example of eponymy ever; it has almost become a generic brand name. “Lets get us some of those there Hadoops” can virtually be heard coming from the boardrooms of the Global 3000.

But only almost.

There is still time for the business idea of big data science to triumph if all of us non-Hadoop struck folks get together.

So, if you like MongoDB, if you think SQL has a few tricks up it’s sleeve, if you R a data mining pirate, if you think the use comes first and the technology comes second, then join the Not Only Hadoop campaign.

Say #NOHadoop