In this period of accelerated digital transformation and increasing adoption of artificial intelligence (AI) in financial services, biased and discriminatory credit risk algorithms are increasingly making headline news. One U.S. credit card was investigated after complaints that it gave higher credit limits to men than to women. Poor risk management of AI-driven decision-making can be subject to regulatory scrutiny, potential fines, and reputational damage (See: our white paper on AI and risk management). Further, there may be additional costs from customer attrition.
A new study suggests customers move twice as much money away from banks that use algorithms in loan application decisions when told that the algorithms may be affected by proxy data linked to gender, ethnicity, and social media usage. Participants in the study view that it is unfair to use information that may be a proxy for protected characteristics and punish the banks by reallocating their earnings to a different bank.
This report on “The perception of fairness of algorithms and proxy information in financial services” was published as a part of the partnership between the UK Centre for Data Ethics and Innovation (CDEI) and the Behavioural Insights Team (BIT). It is a part of the year-long CDEI review into bias in algorithmic decision-making, due to be published in the summer.
The control group in the experiment were told that:
- Bank A uses financial information to determine an individual’s application, such as a person’s salary, whether someone is employed and if they have debt.
- Bank B uses advanced computing techniques and a broader range of personal information than Bank A to make decisions about loan applications. This allows it to make predictions about an individual's application.
The three experimental groups were additionally told that:
- However, by including information which may link to gender, for example, salary and occupation, Bank B may end up offering different levels of credit to men and women;
- However, by including information which may link to ethnicity, for example, salary, post code, and occupation, Bank B may end up offering different levels of credit to people of different ethnicities; and
- However, by including social media data, Bank B may end up offering different levels of credit to people with a greater social media presence compared to people with a smaller social media presence.
This blog post summarises the most interesting insights revealed in this study and the potential implications for a lender.
Insight 1: customers have a negative perception of algorithms’ usage of large amount of personal information in loan approval. 17% of those in the control group reallocated their funds, suggesting that the information that Bank B uses “advanced computing techniques” and “a broader range of personal information” is sufficient to sway the consumers.
Insight 2: customers financially punish algorithms when prompted that they are potentially discriminatory based on race and ethnicity by switching financial service providers. This “punishment” is less than the amount customers move when told that a bank is avoiding taxes but greater than the control group when told simply that algorithms and large personal data sets are being used.
Figure 1: effect of treatment compared to tax avoidance and control scenarios
Despite the relative punishment of Bank B, Bank A and control Bank B are not necessarily free from the risk of discriminatory bias either. Salary is mentioned as an example of a proxy for gender, but Bank A uses salary information in decision-making. If men have higher salaries, it is reasonable that all three scenarios would give more credit to men than to women.
Postcode and occupation are also arguably relevant financial information, e.g. in mortgage applications, and may be used by Bank A. The discomfort may lie in the fact that non-financial information that are not causally related to the outcome of interest is being considered. For example, insurance companies have been accused of charging higher premiums for people with Hotmail accounts and for people named Mohammed.
Some financial information may also be a proxy for ethnicity: a paired audit study found that in some US metropolitan areas, black and Hispanic applicants were steered towards loan types that are more expensive to finance, causing a high association between race/ethnicity and loan type. In reality, Bank B may even give more credit to women and minority applicants if their risk is found to be over-estimated in the current markets.
Whether giving different levels of credit to men and women is fair depends on the definition of fairness. What do “similar” applicants look like? What are justifiable sources of unequal outcomes, e.g. salary? Defining and understanding fairness is complicated, as discussed in a previous blog post. A concept we may intuitively believe to understand is difficult to formalise. There are many ways to mathematically define fairness, many of which are impossible to meet at the same time.
What is important to note here is that customers are sensitive to the messaging around potential discrimination, but they may not have sufficient understanding of its causes to identify its risk factors. This is aligned to the BIT recommendation for CDEI to “consider further testing to understand the impact of framing on perceptions of fairness.”
Insight 3: those historically likely to be disadvantaged feel most strongly that it is unfair for a bank to use proxy information for race and gender, although the results are not statistically conclusive due to limited sample size. For example, women punish the bank that used information which could act as a proxy for sex more strongly than men.
Insight 4: people are consistently more concerned about the use of social media data than about the potential proxies of race and gender. This is not related to the customer’s age or frequency of social media usage. Again, it may not be clear how social media data are associated with an individual’s creditworthiness, increasing the skepticism around their usage. For example, frequency of check-ins at high-end restaurants for a foreign national with limited U.K. credit history may be used as a positive sign of his or her spending capability. This may improve the model’s accuracy but is heavily affected by biases, e.g. what people choose to share on social media. Proxy analysis should consider how these metrics may be associated with both protected features (e.g. gender / race) and the outcome of interest (e.g. likelihood of repayment).
Insight 5: when customers believe that the more complex algorithm is more accurate, they view it as fairer, which holds true even when prompted that they may impact gender and ethnicity groups differently. People are more likely to keep money in Bank B if they believe the algorithm is more accurate, even when prompted that it may result in different levels of credit based on gender and ethnicity. It is worth noting that Bank B is overall viewed less favourably than Bank A, as even those who believe Bank B to be more accurate – on average – move money to Bank A.
BIT notes that more needs to be done to translate accuracy to customers to provide this information in a meaningful way without incentivising organisations to overclaim their model accuracy.
Figure 2: money moved to Bank A by perceived relative accuracy
Insight 6: CDEI uses this report to highlight the tension between organisations’ hesitation to collect sensitive information (gender, ethnicity) and their requirement to test for potential proxy bias. While their recommendation on this is still pending the publication of their report in the summer, the ICO guidance on AI auditing framework recommends technical approaches to mitigate discrimination risk, while acknowledging this would require processing special category data, requiring appropriate lawful basis under GDPR.
Summary
This study shows and quantifies the potential cost of perceived unfair proxies to a lender. Consumers are likely to punish a lender if prompted that they use advanced computing techniques, a broad range of personal information, potential proxies of gender and ethnicity, and social media data in their loan application decisions. While the prompting may not be representative of the true risk of each algorithm, it shows that messaging and communication around algorithm design are important. When told that an algorithm is more accurate, the consumers view it as fairer.
While the regulatory risk and potential reputational damage of being accused of algorithmic discrimination have been well known, this study explicitly measures how much this could cost a lender. As financial institutions increasingly adopt advanced algorithms and look to alternative data sources to improve their model accuracy, it is important to consider the potential risks of unfair discrimination, which is not only damaging to their brand but also to their bottom line.
Deloitte Model Guardian is a tool to help organisations not only identify potential algorithmic biases but investigate why they may exist and how the risks could be mitigated. Contact Michelle Lee for more information or to arrange a demo: michellealee@deloitte.co.uk.
Michelle Seng Ah Lee works in Risk Analytics leading the AI ethics offerings at Deloitte UK. She is also pursuing a PhD at the University of Cambridge on this topic in the Computer Science and Technology department.