Fair data – fair algorithm?

In my third post about the ethics of data science I’m heading into more challenging waters: fairness.

I was pointed to the work of Sorelle Friedler (thank you @natematias, @otfrom, and @mia_out) on trying to remove unfairness in algorithms by addressing the data that goes into them rather than trying to understand the algorithm itself.

I think this approach has some really positive attributes:

  • It understands the importance of the data
  • It recognises that the real world is biased
  • It deals with the challenges of complex algorithms that may not be as amenable to interpretation as the linear model in my previous example.

Broadly speaking – if I’ve understood the approach correctly – the idea is this…

Rather that trying to interpret an algorithm, let’s see if the input data could be encoding for bias. In an ideal world I would remove the variables gender and ethnicity (for example) and build my model without them.

However, as we know, in the real world there are lots of variables that are very effective proxies for these variables. For example, height would be a pretty good start if I was trying to predict gender…

And so that is exactly what they do! They use the independent variables in the model to see if you can classify the gender (or sexuality, or whatever) of the record.

If you can classify the gender then the more challenging part of the work begins: correcting the data set.

This involves ‘repairing’ the data, but in a way that preserves the ranking of the variable… and the challenge is to do this in a way that minimises the loss of information.

It’s really interesting, and well worth a read.

Some potential difficulties

Whilst I think it’s a good approach, and may be useful in some cases, I think that there are some challenges that this approach needs to address, both at a technical, and at a broader level.  Firstly though let’s deal with a couple of obvious ones:

  • This approach is focused around a US legal definition of disparate impact. That has implications on the approach taken
  • The concept of disparate impact is itself a contentious ethical position, with arguments for and against it
  • Because the approach is based on a legal situation, it doesn’t necessarily deal with wider ethical issues.

Technical challenges

As always, the joy of technical challenges are that you can find technical solutions. So here we go:

  • The focus of the work has been on classifiers – where there is a binary outcome. But in reality we’re entering the world of probability, where decisions aren’t clear cut. This is particularly important when considering how to measure the bias. Where do you put the cutoff?
  • Non-linear and other complex models also tend to work differentially well in different parts of the problem space. If you’re using non-linear models to determine if data is biased then you may have a model that passes because the average discrimination is fair (i.e below your threshold) but where there are still pockets of discrimination.
  • The effect of sampling is important, not least because some discrimination happens to groups who are very much in the minority. We need to think carefully about how to cope with groups that are (in data terms) significantly under-represented.
  • What happens if you haven’t recorded the protected characteristic in the first place? Maybe because you can’t (iPhone data generally won’t have this, for example), or maybe because you didn’t want to be accused of the bias that you’re now trying to remove.  There is also the need to be aware of the biases with which this data itself is recorded…

The real difficulties

But smart people can think through approaches to those.  What about the bigger challenges?

Worse outputs have an ethical dimension too:

If you use this approach you get worse outputs. Your model will be less accurate. I would argue that when considering this approach you also need to consider the ethical impact of a less predictive model. For example, if you were assessing credit worthiness then you may end up offering loans to people who are not going to be able to repay them (which adversely effects them as well as the bank!), and not offering loans to people who need them (because your pool of money to lend is limited). This is partially covered in the idea of the ‘business necessary defence’ in US law, but when you start dealing with probabilities it becomes much more challenging. The authors do have the idea of partially adjusting the input data, so that you limit the impact of correcting the data, but I’m not sure I’m happy with this – it smacks a bit of being a little bit pregnant.

Multiple protected categories create greater problems:

Who decides what protected categories are relevant? And how do you deal with all of them?

The wrong algorithm?

Just because one algorithm can classify gender from the data doesn’t mean that a different one will predict using gender. We could be discarding excellent and discrimination free models because we fear it might discriminate, rather than because it does.  This is particularly important as often the model will be used to support current decision making, which may be more biased than the model that we want to use… We run the risk of entrenching existing discrimination because we’re worried about something that may not be discriminatory at all (or at least less discriminatory).

Conclusions

If it sounds like I think this approach is a bad one, let’s be clear, I don’t. I think it’s an imaginative and exciting addition to the discussion.

I like its focus on the data, rather than the algorithm.

But, I think that it shouldn’t be taken in isolation – which goes back to my main thesis (oh, that sounds grand) that ethical decisions need to be taken at all points in the analysis process, not just one.

 

 

 

 

Sexist algorithms

Can an algorithm* be sexist? Or racist? In my last post I said no, and ended up in a debate about it. Partly that was about semantics, what parts of the process we call an algorithm, where personal ethical responsibility lies, and so on.

Rather than heading down that rabbit hole, I thought it would be interesting to go further into the ethics of algorithmic use…  Please remember – I’m not a philosopher, and I’m offering this for discussion. But having said that, let’s go!

The model

To explore the idea, let’s do a thought experiment based on a parsimonious linear model from the O’Reilly Data Science Salary Survey (and you should really read that anyway!)

So, here it is:

70577 intercept
 +1467 age (per year above 18; e.g., 28 is +14,670)
 –8026 gender=Female
 +6536 industry=Software (incl. security, cloud services)
–15196 industry=Education
 -3468 company size: <500
  +401 company size: 2500+
–15196 industry=Education
+32003 upper management (director, VP, CxO)
 +7427 PhD
+15608 California
+12089 Northeast US
  –924 Canada
–20989 Latin America
–23292 Europe (except UK/I)
–25517 Asia

The model was built from data supplied by data scientists across the world, and is in USD.  As the authors state:

“We created a basic, parsimonious linear model using the lasso with R2 of 0.382.  Most features were excluded from the model as insignificant”

Let’s explore potential uses for the model, and see if, in each case, the algorithm behaves in a sexist way.  Note: it’s the same model! And the same data.

Use case 1: How are data scientists paid?

In this case we’re really interested in what the model is telling us about society (or rather the portion of society that incorporates data scientists).

This tells us a number of interesting things: older people get paid more, California is a great place, and women get paid less.

–8026 gender=Female

This isn’t good.

Back to the authors:

“Just as in the 2014 survey results, the model points to a huge discrepancy of earnings by gender, with women earning $8,026 less than men in the same locations at the same types of companies. Its magnitude is lower than last year’s coefficient of $13,000, although this may be attributed to the differences in the models (the lasso has a dampening effect on variables to prevent over-fitting), so it is hard to say whether this is any real improvement.”

The model has discovered something (or, more probably, confirmed something we had a strong suspicion about).  It has noticed, and represented, a bias in the data.

Use case 2: How much should I expect to be paid?

This use case seems fairly benign.  I take the model, and add my data. Or that of someone else (or data that I wish I had!).

I can imagine that if I moved to California I might be able to command an additional $15000. Which would be nice.

Use case 3: How much should I pay someone?

On the other hand, this use case doesn’t seem so good. I’m using the model to reinforce the bad practice it has uncovered.  In some legal systems this might actually be illegal, as if I take the advice of the model I will be discriminating against women (I’m not a lawyer, but don’t take legal advice on this: just don’t do it).

Even if you aren’t aware of the formula, if you rely on this model to support your decisions, then you are in the same ethical position, which raises an interesting challenge in terms of ethics. The defence “I was just following the algorithm” is probably about as convincing as “I was just following orders”.  You have a duty to investigate.

But imagine the model was a random forest. Or a deep neural network. How could a layperson be expected to understand what was happening deep within the code? Or for that matter, how could an expert know?

The solution, of course, is to think carefully about the model, adjust the data inputs (let’s take gender out), and measure the output against test data. That last one is really important, because in the real world there are lots of proxies…

Use case 4: What salary level would a candidate accept?

And now we’re into really murky water. Imagine I’m a consultant, and I’m employed to advise an HR department. They’ve decided to make someone an offer of $X and they ask me “do you think they will accept it?”.

I could ignore the data I have available: that gender has an impact on salaries in the marketplace. But should I? My Marxist landlord (don’t ask) says: no – it would be perfectly reasonable to ignore the gender aspect, and say “You are offering above/below the typical salary”**. I think it’s more nuanced – I have a clash between professional ethics and societal ethics…

There are, of course, algorithmic ethics to be considered. We’re significantly repurposing the model. It was never built to do this (and, in fact, if you were going to build a model to do this kind of thing it might be very, very different).

Conclusions

It’s interesting to think that the same model can effectively be used in ways that are ethically very, very different. In all cases the model is discovering/uncovering something in the data, and – it could be argued – is embedding that fact. But the impact depends on how it is used, and that suggests to me that claiming the algorithm is sexist is (perhaps) a useful shorthand in some circumstances, but very misleading in others.

And in case we think that this sort of thing is going to go away, it’s worth reading about how police forces are using algorithms to predict misconduct

 

*Actually to be more correct I mean a trained model…

** His views are personal, and not necessarily a representation of Marxist thought in general.

 

 

The ethics of data science (some initial thoughts)

Last night I was lucky enough to attend a dinner hosted by TechUK and the Royal Statistical Society to discuss the ethics of big data. As I’m really not a fan of the term I’ll pretend it was about the ethics of data science.

Needless to say there was a lot of discussion around privacy, the DPA and European Data Directives (although the general feeling was against a legalistic approach), and the very real need for the UK to do something so that we don’t end up having an approach imposed from outside.

People first

Immanuel_Kant_(painted_portrait)

Kant: not actually a data scientist, but something to say on ethics

Both Paul Maltby and I were really interested in the idea of a code of conduct for people working in data – a bottom-up approach that could inculcate a data-for-good culture. This is possibly the best time to do this – there are still relatively few people working in data science, and if we can get these people now…

With that in mind, I thought it would be useful to remind myself of the data-for-good pledge that I put together, and (unsuccessfully) launched:

  • I will be Aware of the outcome and impact of my analysis
  • I won’t be Arrogant – and I will avoid hubris: I won’t assume I should, just because I can
  • I will be an Agent for change: use my analytical powers for positive good
  • I will be Awesome: I will reach out to those who need me, and take their cause further than they could imagine

OK, way too much alliteration. But (other than the somewhat West Coast Awesomeness) basically a good start. 

The key thing here is that, as a data scientist, I can’t pretend that it’s just data. What I do has consequences.

Ethics in process

But another way of thinking about it is to consider the actual processes of data science – here adapted loosely from the CRISP-DM methodology.  If we think of things this way, then we can consider ethical issues around each part of the process:

  • Data collection and processing
  • Analysis and algorithms
  • Using and communicating the outputs
  • Measuring the results

Data collection and processing

What are the ethical issues here?  Well ensuring that you collect with permission, or in a way that is transparent, repurposing data (especially important for data exhaust), thinking carefully about biases that may exist, and planning and thinking about end use.

Analysis and algorithms

I’ll be honest – I don’t believe that data science algorithms are racist or sexist. For a couple of reasons: firstly those require free-will (something that a random forest clearly doesn’t have), secondly that would require the algorithm to be able to distinguish between a set of numbers that encoded for (say) gender and another that coded for (say) days of the week. Now the input can contain data that is biased, and the target can be based on behaviours that are themselves racist, but that is a data issue, not an algorithm issue, and rightly belongs in another section.

But the choice of algorithm is important. As is the approach you take to analysis. And (as you can see from the pledge) an awareness that this represents people and that the outcome can have impact… although that leads neatly on to…

Using and communicating the outputs

Once you have your model and your scores, how do you communicate its strengths, and more importantly its weaknesses. How do you make sure that it is being used correctly and ethically? I would urge people to compare things against current processes rather than theoretical ideals.  For example, the output may have a gender bias, but (assuming I can’t actually remove it) is it less sexist than the current system? If so, it’s a step forwards…

I only touched on communication, but really this is a key, key aspect. Let’s assume that most people aren’t really aware of the nature of probability. How can we educate people about the risks and the assumptions in a probabilistic model? How can we make sure that the people who take decisions based on that model (and they probably won’t be data scientists) are aware of the implications?  What if they’re building it into an automated system? Well in that case we need to think about the ethics of:

Measuring the results

And the first question would be, is it ethical to use a model where you don’t effectively measure the results? With controls?

This is surely somewhere where we can learn from both medicine (controls and placebos) and econometrists (natural experiments). But both require us to think through the implications of action and inaction.

Using Data for Evil IV: The Journey Home

If you’re interested in talking through ethics more (and perhaps from a different perspective) then all of this will be a useful background for the presentation that Fran Bennett and I will be giving at Strata in London in early June.  And to whet your appetite, here is the hell-cycle of evil data adoption from last year…

HellCycle