As machine learning algorithms are increasingly used to inform critical decisions across high-impact domains, there has been rising concern that their predictions may unfairly discriminate based on legally protected attributes, such as race and gender. Scholars have responded by introducing numerous mathematical definitions of fairness to test the algorithm. A number of tools have been introduced to automatically test the algorithm’s predictions against various fairness definitions and provide “pass/fail” reports.

However, these efforts have not achieved any consensus on how to address the challenge of unfair outcomes in algorithmic decisions. Given that it is mathematically impossible to meet some of the fairness conditions simultaneously, the reports often provide conflicting information on the algorithm’s fairness. “De-biasing” techniques (pre-, in-, and post-processing) presume that “unacceptable” bias can be easily identified, separated, and surgically removed, but reality is not that simple.

It is increasingly apparent that fairness cannot be summarized into one equation because it depends on the complex real-life context. This discovery is nothing new; the first debates on algorithmic fairness arose in the 1960’s and fizzled out with the conclusion that there is “no statistic that could unambiguously indicate whether or not an item is fair.” From a legal standpoint, the approach in automating “fairness testing” is incompatible with the requirements of EU non-discrimination law, which relies heavily on the context-sensitive, intuitive, and ambiguous evidence.

In a recent paper, Lee, Floridi, and Singh (2020) argue that there is a gap between the fairness metrics used in computer science literature and fairness as defined in ethical philosophy and in welfare economics. In particular, there is no clear separation between what inequality and bias are “acceptable” and what are “unacceptable” in a society with layers of existing inequalities. The lack of ambiguity in mathematical definition of fairness has little resemblance to real-life contexts.

This blog post will provide a high-level summary of the paper’s findings on the gaps between the ethical philosophers’ and computer scientists’ definitions of fairness. The paper also addresses the gap between computer scientists’ and welfare economists’ definitions of fairness and discusses the role of autonomy and welfare in assessing ethical success (“key ethics indicators (KEIs)”), which will be discussed in Part 2 of this series.

Challenge of separating “acceptable” inequalities and biases from the “unacceptable” 

The notion of fairness is based on the egalitarian foundation that humans are fundamentally equal and should be treated equally. How equality should be measured and to what extent it is desirable have been a source of debate in both philosophical ethics from a moral standpoint and welfare economics from a market efficiency standpoint. What are the relevant criteria based on which limited resources should be distributed?

Consider the layers of inequality in Table 1. Two individuals are unequal on multiple levels that may affect the target outcome of interest, whether it is credit-worthiness, predicted performance at a job, or insurance risk. It is possible that the differences in the observed outcome are attributable to one or more of the above inequalities.

Building an algorithm to predict the outcome could result in a faithful representation of these inequalities and the resulting replication and perpetuation of the same inequality through decisions informed by its predictions. However, which of the inequalities should be allowed to influence the model’s prediction?

Layers of bias that inaccurately skews the predicted outcome of a group

In addition to the differences in distributions in the ground truth, there may be biases in the model development lifecycle that exacerbate the existing inequalities between two groups. The challenge is that in many cases, the patterns associated with the target outcome are also associated with one’s identity, including race and gender.

Suresh and Guttang (2020) have recently grouped these biases throughout into 6 categories: historical, representation, measurement, aggregation, evaluation, and deployment. Historical bias refers to past discrimination and inequalities, and the remaining five biases, displayed in Table 3, align to the phases of the model development lifecycle (data collection, feature selection, model build, model evaluation, and productionisation) that may inaccurately skew the predictions. Table 3 gives examples in racial discrimination in lending processes to demonstrate each type of bias.

The mitigation strategy depends on whether we believe α0 − αn and β0 − βn need to be actively corrected to rebalance the inequalities and bias. It is important to understand the source of the bias in order to address it.

Overall, fairness cannot be formalised without (i) identifying which inequalities and biases exist that affect the outcome of interest, and (ii) assessing which of them should be retained, such as income differences, and which of them should be actively corrected, such as treatment inequalities

The next section will discuss the failure of existing fairness definitions to consider the source of the disparities in outcome metrics (bias and/or inequality) and the lack of sensitivity to the contextual nuances beyond equalisation of the outcome of interest.

Philosophical perspectives on equality

Table 2 gives examples of philosophical perspectives and their perceptions of what types of inequality are acceptable. While this is far from a comprehensive coverage, it reveals a subjective debate with nuances and complexities insufficiently addressed in existing algorithmic fairness literature.

Limitations of mathematical fairness formalisations

Existing mathematical definitions of fairness, while loosely derived from a notion of egalitarianism, lack the nuances and context-specificity present in philosophical discourse. To demonstrate this gap, the paper walks through a use case: a lender building a model to predict a prospective borrower’s risk of default on a loan. In this case, the False Positives (FP) represent lost opportunity (predicted default, but would have repaid), and the False Negatives (FN) represent lost revenue (predicted repayment, but defaulted).

The calculations of error rates used in the metrics are defined below, with some of the most commonly cited fairness definitions in Table 4:

  • True Positive Rate (TPR) = TP/(TP + FN)
  • True Negative Rate (TNR) = TN/(FP + TN)
  • False Positive Rate (FPR) = FP/(FP + TN) = 1 – TNR
  • False Negative Rate (FNR) = FN/(FN + TP) = 1 – TPR
  • Positive Predictive Value (PPV) = TP/(TP+FP)

There is a clear gap between fairness as defined in the ethical philosophy and as formalized in the mathematical definitions.

For example, the equal opportunity metric, while it sounds attractively similar to Rawlsian EOP, fails to address discrimination that may already be embedded in the data. Rawlsian EOP also assumes that inequalities in native talent and ambition may result in unequal outcomes, which is not addressed in the equalisation of false negative rates. Each group fairness metric, including equal odds, positive predictive parity, and positive / negative class balance, requires different assumptions about the gap between the observed and unobservable: “if there is structural bias in the decision pipeline, no [group fairness] mechanism can guarantee fairness.” In many domains in which there are concerns over unfair algorithmic bias, including credit risk and employment, there has often been a documented history of structural and societal discrimination, which may affect the underlying data through biases previously discussed.


Academic literature on algorithmic ethics has predominantly focused on fairness without an in-depth consideration of the components of fair vs. unfair inequalities, and without the contextualisation of the decision. Overall, these metrics do not give any information on which layers of inequalities and which types of biases they are attempting to correct, risking over- or under-correction. A deeper engagement with the ethical assumptions being made in each model is necessary to understand the drivers of the unequal outcomes. What types of inequalities are acceptable depends on the context of the model.

The decision on the target state – i.e. the way it ought to be – is an ethical decision with mathematically inevitable trade-offs between objectives of interest. Heidari et al. dismiss the distinction between relevant vs. irrelevant features in practice as out of scope for their paper: “Determining accountability features and effort-based utility is arguably outside the expertise of computer scientists.” On the contrary, the authors argue that computer scientists and model developers must be actively engaged in the discussion on what layers of inequality should and should not be influencing the model’s prediction, as this directly influences not only the model design and feature selection but also the selection of performance metrics. Decision-makers should be informed about the value judgements, assumptions, and consequences of their design, better enabling conversations with regulators and with broader society regarding what comprises an ethical decision in the particular context.