Rethinking the data product workbench in the age of AI

25.08.2025

Niels Claeys

Rethinking how AI empowers data teams to build and maintain better data products without replacing them.

At Dataminded, we recently surveyed over 150 data professionals to better understand the challenges that they faced while building and scaling data products. The results can be found here. The results surfaced a clear and comprehensive set of challenges that companies are struggling with.

In this blogpost, we’ll explore the challenges and share our vision on the next generation data product workbench. Our definition of a data product workbench is as follows:

A data product workbench is a platform that enables the creation, management, and deployment of data products.

Challenges

Despite the advancements in data tooling, many organizations still struggle with building and maintaining pipelines and ensuring data quality. These issues have existed for as long as we’ve worked in data and they remain just as relevant in today’s AI-driven landscape. Let’s take a closer look at why these problems are still so hard to solve.

Building and maintaining pipelines

Over the past five years, the data tooling ecosystem has exploded. On the surface, this should make life easier for data engineers. In reality, while building a single pipeline has become simpler, managing hundreds of pipelines across a growing data landscape remains a major operational struggle.

The challenge has shifted: it’s no longer about getting pipelines to run, but about maintaining, updating, and debugging them at scale. With so many moving parts and edge cases, failures are inevitable and hard to resolve consistently.

Data quality struggles

Data quality remains a complex, multifaceted problem. It’s not something that can be solved with a single approach like writing unit test, writing data tests or having data lineage. As use cases become more advanced, so do the expectations for data reliability and pipeline robustness. At the same time, business teams are pushing for faster delivery of insights and features. That pressure often leads to trade-offs where quality and long-term maintainability are sacrificed in favor of short-term speed.

Now that we’ve identified these challenges, the question becomes: how can a data product workbench evolve to help solve them, especially with the rise of AI?

At Dataminded, we see AI as a powerful enabler for data teams, not a replacement for them. Unlike other tech startups, we don’t believe in replacing data engineers with AI agents. Instead, we believe in empowering data professionals through intelligent tooling.


Principles for integrating AI

Our approach is grounded in the following 3 principles:

Human in the loop AI

AI agents can significantly boost developer productivity but they are assistants, not replacements. The data engineer remains the bridge between technical implementation and business value. They understand the context, own the solution, and ultimately decide what code is merged.

Our job isn’t writing 100 lines of Python, it’s solving real business problems. This requires a close collaboration with business in order to correctly translate the business requirements into a technical solution. AI can help write code faster, solving certain problems but the engineer stays in control.

Sandbox-first approach

All changes to data products must be validated in a safe, isolated environment, before being deployed to production. This could be a dedicated development environment where engineers can run full end-to-end tests to verify the correctness and stability of their code.

Only code that has passed this check should be considered for review. It’s the team’s responsibility to ensure their sandbox has representative data for meaningful testing.

Even in the age of AI, this principle remains: AI-generated changes must also be validated in the sandbox before a pull request is created or merged. You can’t earn trust through code changes alone, it’s the testing that proves that the changes are correct.

Platform wide context through metadata

The workbench embeds a shared understanding of how data products are built across your organization. It captures metadata including: data assets, schemas, and transformations for every data product. This contextual layer empowers both humans and AI agents to create new data products that align with your existing data landscape.

The next generation data product workbench

Building on the principles outlined earlier, we’ve re-imagined how a modern data product workbench should support the data product life cycle. We identified three key phases where AI can meaningfully assist data engineers and analysts.

Exploratory data analysis

The starting point for any new pipeline is understanding the available data and exploring how it can be used to meet a specific business need. This typically involves identifying relevant datasets and experimenting with how they can be joined or transformed to produce the desired output.

This is an inherently iterative process, requiring fast feedback loops. That’s why the ideal interface is a SQL/Python notebook environment, which is designed for flexible analysis and experimentation.

In most cases, this work is led by data analysts, who are domain experts with deep understand the data landscape. We believe the greatest value of AI in this phase lies in conversational assistance:

  • Recommending relevant datasets based on a query or business context

  • Suggesting transformations

  • Translating natural language into SQL to enable “chat with your data” capabilities

For AI to be effective here, it needs access to rich metadata, schemas, and semantic context. With this foundation, AI becomes a valuable co-pilot, helping teams move from raw exploration to actionable pipeline designs.
The output of this phase is a clear step-by-step guide from existing data sources to the desired output. Below, we share a few screenshots showing how this would look like.

Creating a new data product and finding relevant data sources to use.

Creating a new data product and finding relevant data sources to use.

Testing out the transformations in a notebook environment.

Testing out the transformations in a notebook environment.

Building data products

This is the phase where most AI-powered coding tools, like Cursor, Windsurf,.. focus their efforts. The ecosystem is evolving rapidly, which is why we believe that the data product workbench should not reinvent the wheel here. The real value of a data product workbench lies in how well it integrates with these tools to support efficient, context-aware development. Effective integration requires two key ingredients:

  • A clear, step-by-step plan for what needs to be built, which should be the output of the exploratory analysis phase

  • A knowledge base of organizational standards: including best practices, architectural constraints, and conventions for building data products within your organisation

When your AI has access to this context, it can become a true development accelerator. Without it, the risk is high that AI-generated code will create more friction than value.

Monitoring and maintenance

Once a pipeline is built, it must reliably run in production to deliver ongoing business value. This phase focuses on monitoring, maintaining, and updating pipelines. The code/quality of a pipeline decreases over time until it eventually fails. The trigger for changing a pipeline can be: fixing a bug, applying a security patch, resolving a performance issue or upgrading a dependency,…

AI can significantly reduce the operational overhead in this stage by proactively detecting issues and proposing solutions. Rather than replacing engineers, the AI acts as a smart assistant, accelerating resolution time and freeing engineers to focus on higher-value work. A workflow that we see here is the following:

  • Issue detection: The AI agent identifies a failure or anomaly in pipeline X and proposes a fix. The fix contains code changes as well as the necessary tests.

  • Automated testing: The fix is tested in a unit test and/or in a sandbox environment to ensure that it works and doesn’t break something else.

  • Pull request creation: Once the fix passes all checks, the agent creates a pull request for human review and approval. If accepted, it’s merged and deployed to production.

What makes the data product workbench powerful here is its tight integration between monitoring, sandbox environments, and Git workflows. The AI not only detects and suggests fixes, but it also runs them in an isolated environment before surfacing them to engineers. The following screenshots show how this workflow could look.

Unified dashboard with all AI proposed fixes and improvements.

Unified dashboard with all AI proposed fixes and improvements.

For every fix show every step performed by the AI agent.


Conclusion

The rise of AI opens up new opportunities to re-imagine how we build, manage, and scale data products. At Dataminded, we believe the future lies in a human-in-the-loop approach, where AI enhances the data engineer’s workflow without replacing their expertise.

By embedding AI into a well-structured data product workbench, we can streamline exploratory analysis, automate operational tasks, and ensure high-quality pipelines and keeping engineers firmly in control.

Are you interested to help us in making this vision a reality? Clap, comment this the post or book a call with us through the following link.

Originally published on Substack

Latest

Rethinking the data product workbench in the age of AI

Rethinking how AI empowers data teams to build and maintain better data products without replacing them.

When writing SQL isn't enough: debugging PostgreSQL in production
When writing SQL isn't enough: debugging PostgreSQL in production
When writing SQL isn't enough: debugging PostgreSQL in production

When writing SQL isn't enough: debugging PostgreSQL in production

SQL alone won’t fix broken data. Debugging pipelines requires context, lineage, and collaborationnot just queries.

Portable by design: Rethinking data platforms in the age of digital sovereignty
Portable by design: Rethinking data platforms in the age of digital sovereignty
Portable by design: Rethinking data platforms in the age of digital sovereignty

Portable by design: Rethinking data platforms in the age of digital sovereignty

Build a portable, EU-compliant data platform and avoid vendor lock-in—discover our cloud-neutral stack in this deep-dive blog.

Hinterlasse deine E-Mail-Adresse, um den Dataminded-Newsletter zu abonnieren.

Hinterlasse deine E-Mail-Adresse, um den Dataminded-Newsletter zu abonnieren.

Hinterlasse deine E-Mail-Adresse, um den Dataminded-Newsletter zu abonnieren.

Belgien

Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen


USt-IdNr. DE.0667.976.246

Deutschland

Spaces Kennedydamm,
Kaiserswerther Strasse 135, 40474 Düsseldorf, Deutschland


© 2025 Dataminded. Alle Rechte vorbehalten.


Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen

USt-IdNr. DE.0667.976.246

Deutschland

Spaces Kennedydamm, Kaiserswerther Strasse 135, 40474 Düsseldorf, Deutschland

© 2025 Dataminded. Alle Rechte vorbehalten.


Vismarkt 17, 3000 Leuven - HQ
Borsbeeksebrug 34, 2600 Antwerpen

USt-IdNr. DE.0667.976.246

Deutschland

Spaces Kennedydamm, Kaiserswerther Strasse 135, 40474 Düsseldorf, Deutschland

© 2025 Dataminded. Alle Rechte vorbehalten.