Machine learning (ML) is increasingly being used to inform important decisions in financial services (FS). Algorithms can predict who will default on a loan, who should be hired, and what price each customer is willing to pay for a product or service. They can learn much more from data than humans can digest, identifying patterns in the predictions in unexpected ways. Those patterns are sometimes associated with who we are, including our race and gender. 

AI offers many new opportunities and benefits in FS (see: our report with the World Economic Forum), but there is a risk that algorithms may identify patterns that exist because of past discriminatory and/or exclusionary practices. For example, an AI recruiting tool trained on historical data may be biased against women, and it's been found that facial recognition software in self-driving cars are worse at detecting darker skin colours.

Scholars have reacted to these potential risks by introducing numerous definitions of fairness and their corresponding mathematical formalisations, such as equal odds, positive predictive parity, and counterfactual fairness. New techniques also have been introduced for pre-processing (purging the data of bias prior to training the algorithm) and for post-processing (bias correction in predictions after algorithm build), sacrificing accuracy for greater equity. If the bias can be measured mathematically, it can be removed. Practitioners have adopted these definitions to produce reports to show pass/fail results for each of these conditions.

However, in reality, it is not that simple. No algorithm can pass all of these tests, as these definitions are mathematically incompatible to each other. Choosing one requires foregoing another. Selecting a definition is problematic in itself because fairness is not a binary and absolute, one-size-fits-all condition. It is a complex notion debated among philosophers for millennia, from Aristotle to Rawls. There is considerable disagreement among consumers on what it means to be fair, and there are often multiple competing objectives to consider in a decision. Many existing definitions assume a clear distinction between “legitimate” features (e.g. income) and “irrelevant” features (e.g. race), but often, proxies of the prediction are closely intertwined with proxies of personal characteristics. Existing fairness definitions fail to address discrimination already embedded in the data. These biases cannot be removed through a technical solution (pre- or post-processing); rather, they require an inspection of and changes to existing policies and processes. 

Discriminatory bias can derive not only from algorithmic design but also from data collection mechanism, differential treatment, and biased feedback. Limited marketing to minority groups can result in selection bias. Human bias in judgement can be crystallised in data sets. A paired audit study in Chicago showed that black loan applicants with equivalent characteristics were quoted lower loan amounts and received less information and assistance in applying for a mortgage. A default risk model is also only trained on accepted loans; it is often unknown whether those who were denied a loan would have defaulted. Algorithms need to be trained, but training them on historical data can replicate an identical bias, worsening inequalities at an unprecedented scale. Because the source of the bias is not algorithmic, it cannot be simply solved algorithmically. Beyond algorithmic design, de-biasing requires non-technical solutions, such as: a new outreach strategy, change in processes, training of human decision-makers, and an analysis of whether the data is representative of all potential customers.

These problems are not unique to ML. The alternative to an ML model may be a worse model with poorer performance and a worse impact on minority groups. Fairness should be considered in relation to an alternative, rather than as an absolute goal. It is important to move away from the false simplicity of fairness as a mathematical condition and take seriously the practical and ethical trade-offs in each decision-making model.

Specifying the benefits and risks would give actionable insights to the decision-maker on which model best reflects his or her values and risk profile. Some objectives may be measured, such as increasing aggregate financial inclusion or decreasing loan denial rates for racial minorities. Others are qualitative - for example, regulatory considerations and relative 'explainability' of each algorithm.

Who is accountable in ensuring the risk of unfair outcomes is sufficiently considered and addressed? A data scientist often focuses on the key performance metrics provided by the business. The initiative in identifying and managing these risks needs to come from the top. The principles of fairness, transparency, and explainability are important, but they are only meaningful when operationalised into the enterprise risk management processes. Only when AI is appropriately governed will the leaders have the confidence to innovate.

This risk of unfairness of algorithms is also an opportunity. Human decision-making can be mired in cognitive biases that are challenging to track; by contrast, an algorithm is inherently auditable, and when the ethical and practical objectives are clearly defined, it is possible to test whether it achieves the desired outcome. This is an opportunity for the FS leaders and regulators to meaningfully define and formalise what it means to implement a fair decision-making system.

At Deloitte, we have leveraged our wide range of expertise to help clients tackle these ethical considerations – from regulatory strategy to privacy risks to technology and cyber risks. Our proprietary AI Governance Framework helps accelerate the gap analysis of existing controls in managing the new risks introduced by AI to FS business. Customised training on AI risks and ethics is available to client teams, both for FS leaders and for AI practitioners. 

Michelle Seng Ah Lee works in Risk Analytics leading the AI ethics offerings at Deloitte UK. She is also pursuing a PhD at the University of Cambridge on this topic in the Computer Science and Technology department. She can be reached at: michellealee@deloitte.co.uk

Further analysis from Deloitte: