In financial services, organisations have learned to solve a large number of problems using quantitative approaches. This brings a lot of opportunities and, on the other hand, introduces a broad spectrum of potential problems where solutions can go wrong. Due to inherent complexity of the financial services space and potentially high impact of an error on stakeholders and customers, many of the models developed in the sector must undergo double-checking.

When we explore a model to find out whether it works as expected and solves the stated business problem, we go through the different stages of modelling. If there were no uncertainties, so that we could observe precise picture of the world, get perfect representation of the ‘population’ data, measure all the required data with 100% accuracy, build a perfectly specified model, and operate it in a stable unchanging environment, then we would use the appropriate complex quantitative model and achieve high performance.

However, in practice this is never the case. Therefore, it is important to understand potential sources of uncertainty, have practical methods of quantifying them and a view on how uncertainty from different sources interacts with other characteristics of the modelling solutions such as their complexity.


Uncertainty can be thought of as ‘what is not known precisely’, which means imperfect or incomplete information. For example, acquiring more information about a modelling problem tends to reduce uncertainty about its formulation and solution.

Various taxonomies and classifications of uncertainty provide different points of view on uncertainty. “Practice-driven classification of uncertainty in ML applications” gives two common examples:

  • ‘Aleatoric’ uncertainty due to unknowns that differ each time we run the same experiment vs.
  • ‘Epistemic’ uncertainty caused by lack of information or wrong assumptions;
  • ‘Irreducible’ uncertainty due to natural variability of things that cannot be eliminated vs. ‘Reducible’ uncertainty due to lack of specific information that results from ignorance, scarce or misleading data, poor controls, and unknown biases vs.
  • ‘Inference’ uncertainty that is induced by the act or process of deriving a conclusion about an entity that is unobserved, unmeasured or unavailable (e.g., predictive, statistical, or unobservable proxy uncertainties).

These classifications are generally hard to use in practice because their boundaries are vague, they cannot be reasonably quantified for a generic machine learning solution or because they have no direct implications for model development, testing and operation.

Authors identify three major sources of uncertainty: the ‘data uncertainty’, the ‘modelling uncertainty’, and the ‘model use’. These concepts are also the pillars of any quantitative model verification and validation, which makes it easier for us to develop this classification further and apply in practice:

‘Data uncertainty’ arises when data used for modelling purposes is affected by various types of traditional data quality issues or potentially being out of context. In order to capture uncertainty stemming from the data, a number of methods can be used, including data exploration with respect to population representativeness, estimation of data quality and measurement errors, impact assessment of data quality issues, observations aggregation, and other use case specific data transformations and checks.

‘Modelling uncertainty’ is related to model specification and fitting. It is caused by the fact that statistical and broader machine learning techniques provide empirical models that are only an approximation of the real relationship between the input data and model outcomes. It is useful to start with the problem formulation: is the problem abstract by nature? For example, if the goal is to increase happiness of a software application user, it may be problematic to define happiness in measurable terms. If the modelling targets are vague and imprecise or the conceptual model is far from the real problem, when the model is misspecified or does not account for data uncertainties, it may result in high modelling uncertainty. Standard model evaluation measures applied on high quality data can serve as a lower boundary for this type of uncertainty.

All models are developed to be used in specific context. ‘Model use’ uncertainty can be attributed to the model being applied outside of the intended scope or the input data being not representative of the intended scope. Potential remediations include monitoring conceptual drift, relevant context characteristics, and checking the boundaries of the intended scope.

In “An evolution of uncertainty assessment and quantification”, it is suggested that during machine learning model validation, all three types of uncertainty should be understood, identified and managed with relevant uncertainty reduction steps. For example, predictive models may be required to meet high performance targets and to estimate the uncertainty remaining in their outcomes. Error rate of 5% means that the model is 5% uncertain on average. However, in high materiality/high impact model use cases, this statement is too general and more precise prediction uncertainty estimates are required. For instance, poor quality of a scan image of supporting financial documentation for a loan application may affect the outcome of an automated credit decision-making system.


Similar to uncertainty, model complexity can be characterised in different ways and there is no universal definition. For the purpose of comparison between machine learning models from the same ‘family’, complexity can refer to the number of parameters or features/terms included in a model. This definition addresses richness of the model space and does not depend on the data. Generalized Degrees of Freedom can be used for a broader case of comparison between different models as it measures sensitivity of model estimates to changes in data observations.

It is not always possible to capture all the modelling complexity with the model itself. For example, complexity of a Bayesian model depends on prior information and observed data. Hence, some combinations of input data may result in more complex models than others.

Different aspects of data and model complexity can also be measured as number and perceived depth of data transformation steps involved, or as number of different data sources involved, or as number and complexity of algorithms/models that the current model depends on. Another aspect of complexity is the amount of resources required to train and operate the model. Usually, the most important resource requirements are time and memory.

Intuitively, more complex and sensitive models provide more flexible and better fit to the input data. However, increasingly complex models may turn out to be less interpretable, more computationally expensive, brittle, and prone to errors on previously unseen data. Usually, the ‘law of parsimony’ is applied for model selection, which means that between two models with similar performance the choice is made in favour of the simpler one.

Balancing uncertainty and complexity

Uncertainty and complexity in machine learning problems often go hand in hand, and growing complexity can aggravate uncertainty. For example, if the data used for training a model are inaccurate, noisy, poorly represent the real-world problem, or come in an unstructured form (e.g., audio or images), then derived features that are used in modelling are also noisy (and often biased). Fitting a complex and sensitive model on these data can result in ‘overfitting the noise’, and hence misrepresent the real world, yielding poor performance and stability. Therefore, the amount of ‘data uncertainty’ may define the best model choice.

According to Jason Brownlee, challenges in developing the best model to a great extent can be attributed to uncertainties, and the solution is to systematically evaluate different models until a ‘good-enough’ set of features and algorithms is discovered. This sounds like a probabilistic approach to incorporating uncertainty into machine learning and there are many examples where factoring uncertainty into machine learning algorithms results in better outcomes for a wide range of real-life problems. For instance, Random Forest algorithm proved to be one of the most popular and performing classification algorithms in cases where the problem definition or the input data are characterised with high uncertainty. This algorithm limits the number of features that are randomly chosen for training single models that are later ensembled from ‘trees’ into a ‘forest’.

Probabilistic modelling offers a general framework for building systems that learn from data. Advantages include better estimates of uncertainty, automatic ways of learning structure and avoiding overfitting. For example, systems can be designed to adapt themselves while being used as long as the lacking knowledge becomes available. Behaviour of the system can allow architecture-based adaptation, can implement one of multi-agent-based approaches or a self-organising approach.

A variety of methods have been developed for quantifying predictive uncertainty in deep neural networks. As one example, probabilistic neural networks such as mixture density networks capture the inherent uncertainty in outputs for a given input. As another example, Bayesian neural networks learn a posterior distribution over parameters that quantifies parameter uncertainty, a type of epistemic uncertainty that can be reduced through the collection of additional data.


Despite its big impact, there is no single definition of modelling uncertainty. One way to approach ambiguities is to view them from perspective of the data, the model and the model use. Risk-based approach to uncertainty implies that different types of uncertainty should be understood, identified and managed with relevant uncertainty reduction steps.

Uncertainty can interact with other aspects of modelling processes such as complexity. Often, model and data complexity aggravate uncertainty, making models brittle and model outcomes unreliable (e.g., due to training data overfitting or model misspecification).

However, some complex models can effectively handle uncertainty by design, using probabilistic approaches to deal with imperfect or incomplete information about dynamic reality. This happens when model’s complexity is well balanced against all uncertainties of the modelling process.