Last year I created some quick and dirty analysis of the topics that were going to be covered at Strata NY, and as I’m leaving for California at the weekend for another Strata conference I thought I’d try again. Why? Well to my mind Strata is by far the most interesting conference about Data Science, so hopefully this will have some predictive power as to where the market is going.
Given my theory about the west coast leading and Europe lagging that should be even truer.
As a word of caution these are created in Wordle. If you don’t like Wordle, stop reading. Yes, I know it’s limited, but it’s colourful and you can gain some quick insights from it.
The big topics
When you take all the track sessions this is what you see:
No surprise that data is still BIG, and that hadoop is a huge topic (there are two tracks dedicated to it), but nice that analytics is significant as is data science.
Data Science track
Perhaps the most surprising thing in the Data Science track is that data scientists don’t like calling it Big Data. They also seem to be keener on data mining than on analytics as a topic. I’m heartened by this – I’m not a big fan of b*g data as a phrase…
Business and Industry
Back in the ‘real’ world Big is Big. Some odd words appear too: democratizing? In business?
A simple glance at this wouldsuggest that the world of visualisation is still searching for the right toolset. Whereas the architecture war seems to be firmly with Hadoop there is no such dominance for the visual. Also interesting that visualisation experts aren’t bothered by ‘big’. Perhaps there isn’t anything unique about b*g data from a visualisation perspective – or perhaps they just haven’t hit on it yet.
The Hadoop tracks
There are two separate Hadoop tracks, applied and technical. Spot the difference:
To my mind the interesting thing is what isn’t there in the applied sessions: business cases. That list still looks pretty techy to me.
Privacy and Policy
As with Strata NY you almost wonder if this is from the same conference. I wish this was a bigger track, because until we take privacy seriously then we are all at risk (especially in Europe).
No surprise that Social and Web data are prominent, since for many people that’s all that data science is about. But where is the manufacturing and locational data (OK, geo is starting to creep in).
Last time I was quite dismissive of the Sponsored track. But on mature reflection – and now being employed by a company that sponsors tracks – I’m looking at this as a guide to where people are putting their money for the next 12 months or so. Where are businesses willing to invest in data science?
Unfortunately many if them seem to be blowing their marketing budget trying to own the word ‘big’. Also interesting is how little prominence Hadoop gets, given it’s significance elsewhere in the agenda.
I’ve taken the text from the short presentation overviews on the Strataconf.com site, and removed venue and speaker information. I’ve also tried to take out words like ‘talk’, ‘session’, ‘panel’, and for reasons I hope are obvious, ‘data’. I’ve made all words lower case, and stripped out a few oddities (this isn’t meant to be a perffect analysis).
I’ve then limited the number of words, usually to 75, but in the case of smaller tracks to fewer.