Part 1 in a series: Introduction and Conceptual Modelling

There are many interpretations and definitions of what a model is and can do. It can be a process designed to lower cost and increase efficiency in a warehouse. It can also be a machine learning algorithm used to determine credit limits for customers of a bank. The common factor is that these models can help us to solve complex problems and make better decisions in the real world. For that to hold, however, we need to ensure that any model is reliable. Without proper consideration and validation, a model’s credibility is at risk with the community developing or using it, making its results useless at best and damaging at worst. At the heart of providing such assurance is setting up appropriate modelling targets and anchoring all validity tests to said targets.

This blogpost series will discuss three common aspects that are central to modelling targets: the conceptual model, model performance measures and finally, choosing the right dataset. Today, we'll focus on the first.

A conceptual model is a formal simplification of real-world systems and processes, and it is used as a first step in the development of more complex models. The aim of the conceptual model is to build a good representation of the real-world system or process, to ensure that the correct understanding of the scenario is captured before the machine learning modelling exercise is started. It is important to ensure that a conceptual model is both suitable and feasible, which means that the machine learning model can materially help address the business problem and can be practically implemented.

To bring this to life, consider the following real-world example: an e-mail service provider that wants to separate scam emails from regular emails. The service provider will need to program its software to be able to detect spam e-mail with a high rate of accuracy and direct them to a nominated spam folder. We can come up with the following process to serve as our conceptual model:

  1. Develop and train a spam filter algorithm combining rules and machine learning techniques on a set of labelled data,
  2. Validate the algorithm’s outcomes, and
  3. Build the spam filter into the system.

Some questions on the modelling targets we may consider during the design could be:

  • Representativeness: Do the process steps logically make sense and will they help us achieve our objectives?
  • Suitability: Is this an appropriate way to solve the problem?
  • Feasibility: Have we got sufficient data to create and validate the algorithm, and have we got the right technology stack and skills in the team to build this?

While the spam detection example is a simple one, other real-world problems should also be broken down in a similar way to enable us to approach the relevant problem logically to achieve our goals. Most of these problems are more layered and less black-and-white than the one we just discussed. Take the example, for instance, of a company looking to use machine learning to make sound hiring decisions and improve the efficiency of the recruitment process. The objective is broader and more layered, and there are different angles from which to approach the problem. Once again, building a conceptual model provide a foundation around which to anchor our focus. This would entail analysing each step of the recruitment process, gathering findings and pinpointing where and how hiring efficiency or certainty could be improved. At this juncture, we would also layer in wider nuances and concerns such as ethical implications and company policies, to ensure that any suitability and feasibility considerations are anchored to the real world. This process will help us scope out the right problem to solve, the most appropriate route to try, and above all, indicate the right modelling target/s to build. For example, it could be determined that CV screening is by far the most onerous part of the process but it is against company policy to let algorithms filter candidates, so the algorithm would instead decide which CVs could be put through for interviews immediately and which should be flagged for a review by a human being. The process of building a conceptual model has helped define the problem, the process and the target.

This concludes our whistle-stop tour of conceptual modelling. In our next article, we will discuss performance targets for machine learning models.