7 Things from Strata Santa Clara

This is the fifth time I’ve made the pilgrimage to Strata – I was lucky enough to be at the very first event, here in Santa Clara, and that’s made me think about how things have changed over the last two years.

Two years ago big data was new and shiny. About 1600 people turned up at the South end of silicon valley to enthuse about what we could do with data.

Now we’re talking about legacy Hadoop systems, and data science (big data is so 2011), but what else has changed?

1)   Hadoop grew up

The talk this year wasn’t about this new shiny thing called Hadoop, it was about which distro was the best (with a new Intel distro being announced), and which company had the biggest number of coders working in the original open source code. Seriously there were almost fistfights over the bragging rights.

As a mark of the new seriousness the sports jacket to t-shirt ratio was up. But don’t worry the PC to Mac ratio was still tending to zero (the battle was between Air and Pro).

2)   NoSQL is very much secondary to Hadoop

The show is extremely analytically oriented (in a database sense… but more of that later). The NoSQL vendors are there, but attract a fraction of the attention.

3)   SQL is back

Yes, really.  It turns out it is useful for something after all.

4)   Everyone is looking for ways to make it actually work (and work fast)

Hadoop isn’t perfect, and there are a wide range of companies trying to make it work better. Oddly there is a fetishisation of speed. Odd because this is something that the data warehouse companies went through in the days when it was all about the TPC benchmarks. No people, scaling doesn’t just mean big and fast. It means usable by lots of people and a whole raft of other things.

Greenplum were trying to convince us that theirs was fastest. Intel told us that SAP HANA was faster, and more innovative. Really. And the list went on.

Rather worryingly there seems to be a group out there who want to try and reinvent the dead end of TPC* but for Hadoop.

5)   There’s a lot of talk about Bayes, but not many data miners 

I ran a session on Data Mining. Only a handful of people out of about 200 in the room would admit to being data miners. This is terrifying as data scientists are trying to do analytical tasks. Coding a random forest does not make you a data miner!

6)   Data philanthropy is (still) hot

We had a keynote from codeforamerica, and lots of talk about ethics, black hats etc… I ran a Birds of a Feather table on Doing Good With Data. A group of us were talking about the top secret PROJECT EPHASUS. 

7)   The Gartner Hypecycle is in the trough of disillusionment

At least as far as big data is concerned. The show sold out. And the excitement was still there. Data Science has arrived.


* For a fun review about the decision that Teradata made to withdraw from TPC benchmarks, try this by my esteemed colleague Martin Willcox.