Yesterday I almost had a heart attack when an esteemed colleague (who shall remain nameless) came out with the statement: “Data Scientists are people who are hardcore Hadoop coders”… had I totally misunderstood him? Or was I so out of step with the world that I had totally misunderstood data science?
This is all the more important for two reasons:
- My job title (full disclosure, I made it up) is Director Data Science
- I’m busy trying to recruit data scientists for my team.
Well, to be honest, I could probably live with being wrong about 1. LinkedIn will never find out, so that’s OK.
But whilst I’ve been engaged in recruitment I have had to decide what it is I’m looking for in candidates. So here it is… in descending order a data scientist will be:
The first, and most important trait is curiosity. Insane curiosity. In many walks of life evolution selects against the kind of person who decides to find out what happens “if I push that button”. In Data Science it selects for it.
In my own analytical experience nothing has come close to the feeling when you discover something new (even if other people have already been there). In 5th form working out how to prove what root -9 was. At University… well too much to drink there, but at work discovering that we could push complex analytics onto an MPP system. That complex things (cars) failed with the same distribution as simple things (their components). That social networks could be used to predict some things, and that they couldn’t be used to predict others.
And that last one is important too: the joy of disproving something!
I expect any data scientist to have a background in, and an understanding of, complex analytics. I don’t mean reporting. I’ve nothing against reporting, it’s important and someone has to do it. But not a data scientist. I’m after people who can build a model that predicts something, or who can cluster data, who know the tricks of creating a good dataset, and when a model result is too good. And importantly people who can tell me if the result is statistically relevant or just one of those things.
When it comes to Big Data “those things” will come up more and more often as our data gets bigger.
I have no use for people who are unable to communicate with non-specialists. Its hard enough discussing these topics within the community – we need people who can explain to those outside the community. The users of the services we will provide.
Of course communication is two way, and the data scientist needs to listen too.
The data scientist needs to provide additional value above and beyond what’s happening already. You can provide a fantastic new way of predicting churn that will only cost $1 million and uses data sources that are already in use? And it doesn’t outperform the existing methods. Hmmm.
Novelty, either in ways of thinking, or in terms of the data and approaches to be used, is vital.
Obviously by business I mean “focused on the overall objectives of the organisation you’re working with or in”, but that’s a bit long winded. Again data scientists need to get their heads out of the algorithms and into the business problems. I you can tell me the correct parameterisation for a novel take on SVM, but can’t tell me the top three issues for a business (and how big data can help fix them), then you aren’t a data scientist, you’re an academic.
Last and least. Yes, it would be nice if you can code Hadoop. Or C#. Or R. But this is a passing phase brought on by a lack of good interfaces, it’s not a permanent state of affairs. So, if you have this skill, good for you. Bt if you only have this skill it’s time to get out into the world.