According to Cambridge Dictionary, a model is an abstract representation of another thing, either as an object or a simple description. For example, by looking at a model of a house, one can get a better idea of what the future building would look like. Models are created to reflect the one’s understanding of reality and all of them are approximations and simplifications of the real world. Models can be used in many ways, including business decision-making.

As an example, one business process within a commercial bank was a workflow comprising of several activities that required extra careful handling due to high attention from regulating authorities. Quality control was so important that 100% of the processed cases were double-checked. The processes had been tracked for several years already, so percentage of quality check failures was fairly low.

The bank started thinking of how to cut costs of double-checking. Business analysts noticed that quality of performed activities depended on several features known before a case got into the workflow. These were the historical performance of an employee who considered the case, working schedule and environment, certain characteristics of the case, etc. Analysts gathered statistics on data available at the tasks distribution moment and built a machine learning model that predicted whether the completed workflow for the case would fail the corresponding quality check. The model achieved high accuracy and was taken to the business line manager.

The business owner was happy to reduce quality checks for the “high quality” cases by 20% by setting a threshold on predicted probability to fail a quality check. Organisation deployed the model and expected it to behave predictably and reliably in solving the business problem - that is to reduce the number of checks while maintaining expected quality of the work at the sufficient level of average quality checks success rate.

However, later in the year business analysts found out that the number of quality checks went down only by 15%, even though the profile of the data seemed to stay the same. The reason was that the predicted probabilities of failure were not representative of the true probabilities, a result of lack of model calibration. Predicted probability of failure was actually not a probability in statistical sense, but rather a score ranging from zero to one that needed to be adjusted to reflect true probabilities on real-life data. Hence, the model was actually wrong and resulted in costs for the organisation.

Any model can go wrong at different stages of the model life cycle and raise costs in terms of money, reputation, customer relations, breach of regulations, and others. This cost is called model risk, which is caused by two main reasons: 1) faulty models (due to assumptions, data handling, modelling process, etc.), and 2) inappropriate use of models (i.e. model is applied in a different context from that envisioned by the model design). For example, if during model operation the profile of input data changes so that the model does not work as efficiently as it used to (so called “data shift”), it is the second option.

As a response to model risk, the area of model risk management appeared. An essential part of model risk management is model validation, which ensures that models behave as expected and do solve business problems posed to them.

In financial services, machine learning models are widely used, but the stakes of model errors are often high. For example, models are used in credit risk management, compliance, reporting, surveillance and monitoring, and pricing. Model validation in “traditional model risk management” solutions is focused on particular well-defined types of models and problems solved with these models.

For example, trading surveillance systems used in the same bank will be watched by the financial supervisory authorities closely. Lack of appropriate model calibration can raise a wave of precautionary publications in FCA Market Watch and further penalties will be imposed on the organisations that do not follow the guidance. (For example, FCA Market Watch summarises observations from suspicious transaction reporting (STR) supervisory visits, stating that each firm is responsible for making its own judgements about alert systems calibration, and firms risk failing to comply with market abuse regulations if they assume that because a certain calibration is appropriate for their peers, it must be appropriate for them,

However, as in the above example of quality checks sampling, within the same organisation there can exist models that are not exposed to the sight of regulators but are still bearing inherent model risks. As contexts of modelling are as broad as modelling itself, it is particularly hard to extend “traditional” model risk management systems to embrace all the models that an organisation may have. Moreover, during the “summer of AI”, data is getting larger and less structured, algorithmic solutions and technological stack are growing more complex, ethical issues intertwine with regulatory requirements and are aggravated by the scale of automated decision making.

Existing (and non-existing) model risk management frameworks have to adapt to the changes in areas of high modelling standards, at the same time capture the growing range of AI applications in non-traditional areas. Working together, model validation teams should be able to understand the business problem context and conceptual model, set functional and non-functional requirements, profile the data, inspect the modelling pipeline, review the technology stack and model governance, and capture potential ethical and data protection requirements for large-scale production systems. For organisations that are not yet mature in their adoption of AI, it can be a real challenge.