Some things I learned at Teradata

Over the last three and a half years I have led a fantastic team of data scientists at Teradata. But now it’s time for me to move on… so what did I learn in my time? What are the key Data Science messages that I’m going to take with me?

Pulp-O-Mizer_Cover_Image (4)

A lot of people don’t get it

What makes a good data scientist? One definition is that it is someone who can code better than a statistician, and do better stats than a coder. Frankly that’s a terrible definition, which really says you want someone who is bad at two different things.

In reality the thing that makes a good data scientist is a particular world view. One that appreciates the insight that data provides, and who is aware of the challenges and joys of data. A good data scientist will always want to jump into the data and start working on finding new questions, answers, and insights.  A great data scientist will want to do that, but will start by thinking about the question instead! If you throw a number at a good data scientist you’ll get a bunch of questions back…

Many people don’t have that worldview. And no matter how good they get at coding in R they will never make a good data scientist.

Data science is the Second Circle of data.

It’s one for the problem, two for the data, three for the technique

One of my favourite dislikes are the algorithm fetishists. A key learning from working across different customers and industries is that when analytical projects fail it’s very rarely because the algorithm was sub optimal. Usually it has been because the problem wasn’t right – or that the data didn’t match the problem.

Where choice of algorithm is important is in consideration of the use of the solution (and potentially in the productionisation of it) rather than in terms of simple measures of performance.

Don’t be afraid of the simple answer

Yes, you know how to run an n-path. Or do Markov chain analysis. Or build a random forrest. But if the answer can be generated from a simple chart, why would you use those other techniques? To show how clever you are?

There is another side – being aware that the simple answer may be wrong, and that the lure of simplicity is dangerous in itself. But usually if you get it then you know about that…

And of course there is also something to be said about the idea that the best ideas seem simple, but only after you’ve found them.

Stories are powerful

When you’re trying to sell an analytical approach (or even analytical software or hardware) the story you tell is vital. And the story might not be where the actual value is. Because to tell the story best you often use the edge cases. The best example comes from some work a colleague was doing. The actual analysis was great, but the thing that sold it to the client was a one-off event (albeit one that was ongoing) of such astonishing stupidity that it instantly caught the imagination. Everyone could immediately see that it was both crazy, and also that it was bound to happen. And it had been found through analysis.

I really wish I could tell you what it was! Buy me a drink sometime and you might find out…

Some of you may say that you’re not selling analytics. But if you’re a data scientist you are – to your boss, your co-workers, people you want to impress… and if you’re selling analysis you need to tell stories.

You still need to munge that data

So much time is spent dealing with data. This is one of the reasons that so many data scientists still use SQL (and it’s also a reason why logical modelling is still more attractive than late binding – I’m lazy and want someone else to have done some of the work first).

I wish it wasn’t the case. And I wish that tools were better at it than they are.

Don’t look for data scientists, look for data science people

Remember that when you want to recruit (and retain) data scientists that they are people. I’ve been lucky at Teradata to work with some fantastic people – both in my team, in the wider company, and at our clients.

I have a concern, however, that we (the data science community) are undervaluing some people, and as a result overlooking fantastic talent. A recent survey on data science salaries by O’Reilly included a regression model, and one of the key findings was that if you were a woman your salary dropped by $13k. For no reason whatsoever.

This seems bizarre to me, as I have had the privilege to work with some fantastic women in data: Judy Bayer, Fran Bennett, Garance Legrand, Kaitlin Thaney, Yodit Stanton and many many more*.

Data Science can change the world

Teradata believe in data philanthropy – the idea that if more social organisations use data for decisions that they will make better decisions, and that tech companies can play a part in helping them achieve this. Because of this they have supported DataKind and DataKind UK.

This is really important – because there are a bunch of challenges in helping charities and not for profits when it comes to data. The last thing these organisations need is well intentioned, but damaging, solutionism being dumped on them by West Coast gurus. There is nothing wrong in Elon Musk working on big issues through things like Tesla, but there is a whole bunch more that can be achieved if we can find sensitive ways to work with the people who deal with social problems everyday.

In my work with DataKind I’ve seen what data can do for charities, and this, in turn, has made me a better data scientist.

Where am I going?

I’m about to start a new career leading the data team at Times Higher Education – where we produce the leading ranking of Universities across the world.  I’ve loved my time at Teradata, and I’ve learnt some important stuff, but it’s time for a change!

*sorry if I didn’t mention you here…

Advertisements