Scaling your Data Platform with Reference Data Products
Apr 10, 2026
•
Wietsche Calitz
Scaling data platforms is complex—reference data products enable continuous, end-to-end validation and safer evolution of new features.
A well-designed data platform makes it safe and easy to launch the infrastructure required to build successful data products. The platform journey typically starts small. You develop your first use case while simultaneously building the platform itself. It is often a process of iteration and experimentation. Initially, this would include foundational components such as storage, compute, and connections to BI tools. As new use cases emerge, additional capabilities become necessary — orchestration, multiple environments, CI/CD pipelines, and more.
While individual platform features can be tested in isolation, end-to-end validation becomes increasingly difficult as complexity grows. This is where reference data products become essential.
Think of your data platform as a restaurant, and your reference data products as food critics. They taste everything you serve, all without the restaurant knowing, so they don’t bias the review. This honest feedback is key to making things better.
A reference data product is built to leverage all platform features in support of an imaginary use case, using public or synthetic data. For example, at a client, we built a solution that processes influenza data from https://gateway.euro.who.int/en/ using Python, DBT, Airflow, AWS Athena, PowerBI, etc. (all tools which the platform provisions) in order to create a production dashboard — with clear warnings not to use it!
This reference product becomes embedded in the data product lifecycle. Whenever a new platform feature is developed, the reference data product must incorporate it. It is deployed to production once a week and passes through the same governance and operational processes as any real data product. In effect, the definition of done for any new platform capability includes successful implementation within the reference data product.

It may sound like a lot of work, but if you automate as you go, it disappears in the background and only makes a noise when you need to hear it. Be your own critic — build a reference data product!
Latest
Scaling your Data Platform with Reference Data Products
Scaling data platforms is complex—reference data products enable continuous, end-to-end validation and safer evolution of new features.
You Built a Data Mesh, But Your Metrics Are Still a Mess. Here’s Why.
Even with data mesh, metrics break: decentralized logic, no ownership, and cross-domain gaps. A semantic layer unifies KPIs.



