Deloitte Risk Analytics data scientists recently won 2nd place in the Google-Deloitte Sustainability Hackathon over the week of 17th October, narrowly losing out to a product to recommend new electric vehicle charge points around the UK. The team proposed a new tool that leverages Google Earth Engine data combined with open-source EPC data to estimate energy efficiency and emission levels at a postcode level. This blog post will go through our idea and approach, highlighting the potential impact across financial services industries.
The team: Michelle Lee, Ryan Lloyd, Will Rainbow, Jem Godden, Jack Buesnel, Anson Poon, Domenico Brambati
ESG Subject Matter Experts: Mobi Shemfe, Devang Shah
The business challenge
Over the next few decades, there will need to be significant refurbishment of existing UK properties, as 60% of homes don’t meet the recommended energy efficiency standards (Open Property Group research). Energy performance certificate (EPC) ratings grade the energy efficiency of a property from A to G, with A being the most efficient and G the least efficient. Homes with poor EPC ratings may find it increasingly difficult to secure a mortgage as more lenders are requesting these as part of the mortgage application process. Consequently, the demand for properties without EPC ratings may plummet, resulting in owners of these properties being unable to sell or lease them onwards.
EPCs contain information on the energy efficiency (measured in kWh/m2) and environmental impact (measured in tonnes of CO2) of homes. The energy efficiency and environmental impact data in EPCs can be used to calculate a property’s transition cost – the cost of upgrading to a higher EPC rating. However, many properties don’t have EPCs.
Currently, there is limited data on the energy efficiency or emissions of individual properties. In the hackathon, we calculated that over 400K postcodes in England (with multiple properties per postcode) do not have any EPC data. This data shortage is much more significant outside the UK, as many countries have older buildings without EPC-equivalents. The United Nations Economic Commission for Europe (UNECE) research shows that only 6% of existing residential buildings are covered by EPC in UNECE countries, while 18% of new residential buildings and 41% of new non-residential buildings are covered. In some countries, EPC is not used at all.
To address this gap, we had a vision for a central database that has postcode-level energy efficiency data for properties with existing ratings. We could use this to understand the key drivers of efficiency and the typical improvement costs depending on property features.
We would also build a central database for postcode-level greenhouse gas emission data. There is a rich data set available on Google Earth Engine on various types of pollution, updated in near real-time. We could get a more holistic understanding of pollution by supplementing our existing data sets.
Additionally, we can source data of PROXIES of energy efficiency and greenhouse gas emissions. For example, Google Earth Engine has a data set on surface temperature data set – can we see heat emanating from top of buildings to estimate energy efficiency? With enough predictors, we can build machine learning models to estimate energy usage and estimate greenhouse gas emissions.
While we were realistic in our expectation that we would not be able to achieve all of this in a 1-week hackathon, our goal was a proof-of-concept. We wanted to show this vision was achievable and get a sense of the amount of effort it would require to build the full-scale solution.
All of this can have a significant real-world impact across stakeholders.
- Governments can get a more detailed view of pollution, as we are aiming to supplement existing data on CO2 with other environmental pollutants for each postcode.
- Mortgage lenders can better understand risks in their portfolio – whether the borrowers will be able to afford the upgrade and future payments and understanding potential future property value
- Property owners can estimate the transition cost where they don’t have such data available, which is a known issue
- Developers can assess the cost and benefit of upgrading the properties
- Finally, with good enough models on energy efficiency and emissions, commercial property owners can sense check their reporting. If what is estimated by our model is significantly different to what is reported, this can be flagged for review.
Our approach: data sources
We set out sourcing and joining together rich open source data sets, including:
- Existing EPC data on property level
- Postcode area polygons
- Geospatial mapping of postcode to the polygons and longitude/latitude
We engineered new features, including:
- Carbon Place data which has the LSOA-level gas and electricity usage and building age and mapping it to postcode level
- Surface temperature data from Google Earth Engine
- Pollution data sets from Google Earth Engine
Then, we built initial models 1) to predict CO2 emissions and 2) to predict energy usage, using Google’s Vertex AI autoML platform.
Our approach with Google Earth Engine
We were able to extract rich pollution data from Google Earth Engine. For this hackathon, we used postcode areas which exclude the last digit or two digits of the full postcode. This is because we did not have access to the postcode polygons. In the future, we should be able to extract more granular postcode-level data by purchasing the postcode polygons through the Ordnance Survey AddressBase product.
Below are the postcode areas mapped here to Google Earth Engine nitrogen dioxide data set.
We were also excited to use the surface temperature data to estimate energy efficiency. In the future, we can overlay this data set with the building polygons data set from Google Earth Engine to exclude heat from nearby roads and only look at heat from the rooftops.
Using Google’s autoML platform (Vertex AI), we built an initial model for Cardiff postcodes to predict energy consumption per postcode. Given the time limit, we selected Cardiff to look at an area that’s not too sparsely populated but also not too dense given the low resolution of our input data. Our model has an RMSE of 7.52, which we can roughly interpret as the standard deviation of the variation not explained by our model. Given the range of energy consumption is 428 to 9K, this is a fairly good model.
A lot more work needs to be done. We need to ensure that the features used in the prediction are available to the mortgage lender or property owner, such as square footage and type of flooring. If the property owner does not usually have this information, we were aiming to build a clustering algorithm (unsupervised ML) to identify “similar” properties to make statistically driven, reasonable assumptions about what these features may be for each property without EPC data.
We also built a model to predict carbon emission, which had an RMSE of 0.963 for tonnes of carbon dioxide (range 4.5 to 104.6), which is even better than the energy efficiency model.
Key challenges with Google Earth Engine
The feature engineering exercise with Google Earth Engine turned out to be time-consuming and non-trivial.
Below is the map of Cardiff postcode areas and the nitrogen dioxide levels. The goal is to get summary statistics of NO2 levels within each of these shapes (polygons). For now, we took the estimate at the mid-point of each polygon, but it would be sensible to extract more summary stats, such as the min, max, average, standard deviation.
In addition, the Google Earth Engine (EE) does not return a full data set for one day. On each day, EE would have a snapshot of one of these squares. The next day, it will have a different square. Sometimes, these squares overlap. What we need to do is create a “mosaic” of polygons, as you can see from this map of London below, until we have enough to cover the region we want to investigate. This is why we were only able to extract the data for one region in the given timeframe.
Almost all of our team members were fully booked on client engagements, so the hackathon was a result of our work side-of-desk. Much of this work had to be done after work days in the evenings. It was also our first time working with Google Earth Engine.
With more time and experience, we believe we can achieve what we set out in our vision. Our next steps would be to purchase the full postcode polygons and extract more of the data sets from Google Earth Engine and other sources at a more granular level. These newly engineered features would be used to build a better predictive model of energy usage and emission levels for each postcode.
Our goal was to show that this is possible and prove the value of this central database and ML modeling – and we believe we did. We are excited to take this solution forward with the Deloitte ESG team to unlock the power of analytics and geospatial data.