openLocal Commercial Location Data for England & Wales Research Integration

30 June 2025

United Kingdom | Whythawk

openLocal.uk is a quarterly-updated commercial location database, aggregating open data on vacancies, rental valuations, rates & ratepayers, into an integrated time-series database of individual retail, industrial, office and leisure business units.

openLocal Commercial Location Data for England & Wales Research Integration

Since 2016, Whythawk has made more than 4,000 Freedom of Information requests and curated over 20 million records on individual commercial locations in England and Wales.

We have supported the Greater London Authority (GLA), the Ministry for Housing, Communities & Local Government (MHCLG), University College London (UCL) and the universities of Leeds, Northumbria, and Warwick, and research groups like Centre for Cities, Centre for London and the Consumer Data Research Centre (CDRC).

Our data and analysis have served to inform analysis into the COVID lockdown period, the Levelling Up economic recovery response, and research into meanwhile use for empty shops, business energy consumption, the impact of rates on business vacancy, and business activity clustering maps.

openLocal tracks the history of all types of business units, across England and Wales, irrespective of their proximity to active high streets or town centres. We integrate a wide variety of source data imported from thousands of openly licenced datasets published as spreadsheets by local and national government.

Data are assembled via a combination of machine-learning techniques – including regression analysis, natural language processing and pattern-matching – into a single, unified geospatial database supporting research requirements for complex queries. All sources are automatically imported and processed, save for local rates data which are processed manually and algorithmically by our data wranglers.

All our data are available under a Creative Commons Attribution Licence ensuring you can easily share and reuse our work.

Our challenge, as our database has grown to near a terrabyte over the last 10 years, has been to improve our methods for rapdily and effectively restructuring and integrating data into accessible reports for researchers. In 2025, we began a major infrastructure redevelopment project to rebuild our existing systems.

Our solution

We have undertaken a complete refactoring and optimisation of the data integration and validation processes, as well as splitting the workflow into three dedicated applications:

  • Transformer: produces standardised structured data from the 300 local authorities in England and Wales we track. This is served by our independent whyqd data wrangling application.
  • Integrator produces structured and integrated data, validated against multiple test requirements.
  • Explorer is a researcher-facing application permitting complex queries and structured data downloads.

In addition, we are setting up a standalone, optimised server dedicated to the database and which all other applications can access as appropriate.

Our objective is an integrated service which provides the typical services of a Valuations Office Agency explorer extended with our unique ratepayer and rates relief data.

Outcomes

As at the end of 2025, the core report-building functionality is complete and already being used by researchers - both commercial and academic - and we are developing the visual explorer application to go live in Q2 2026.

Related projects

RDA MOMSI enhancements to multi-omics metadata standards dashboard
24 December 2025

Multi-Omics Metadata Standards Integration (MOMSI) Research Data Alliance Working Group wanted to enhance the dashboard interactive visualisations for their query-based, interactive dashboard. The dashboard will render information from their existing Landscape Review.

La Marine Nationale Française CKAN Upgrade & Deployment
17 October 2025

Marine Nationale has an existing CKAN data management portal deployed on their internally accessible network. This is a secure environment, and upgrades and extensions to the software are performed by Marine Nationale directly.

UCL Data-As-A-Science Graduate Bootcamp
11 July 2025

Data has become the most important language of our era, informing everything from intelligence in automated machines, to predictive analytics in medical diagnostics. The plunging cost and easy accessibility of the raw requirements for such systems – data, software, distributed computing, and sensors – are driving the adoption and growth of data-driven decision-making.

essential