Blogs
Data Products
Stop loading bad quality data
Ingesting all data without quality checks leads to recurring issues. Prioritize data quality upfront to prevent downstream problems.
Data Product Portal Integrations 2: Helm
Data Product Portal links governance, access & tools for self-service data on AWS. Supports Terraform & API integration for automation.
Data Product Portal Integrations 1: OIDC
Integrate OIDC with the Data Product Portal for secure, user-specific access via SSO. Easy setup with AWS Cognito, Docker, or Helm.
The State of Data Products in 2024
Data Products are rising fast in 2024, focusing on user experience, collaboration, and governance—set to reach maturity within 2–3 years.
Introducing Data Product Portal: An open source tool for scaling your data products
The Data Product Portal is an open-source tool to build, manage & govern data products at scaleenabling clear access, lineage & self-service
The Missing Piece to Data Democratization is More Actionable Than a Catalog
The Data Product Portal is the missing link for scaling data democratization, beyond catalogs, it unifies access, governance & tooling.
Data Platform
A 5-step approach to improve data platform experience
Boost data platform UX with a 5-step process:gather feedback, map user journeys, reduce friction, and continuously improve through iteration
Source-Aligned Data Products: The Foundation of a Scalable Data Mesh
Source-Aligned Data Products ensure trusted, domain-owned data at the source—vital for scalable, governed Data Mesh success.
Why You Should Build A User Interface To Your Data Platform
Don’t give users a bag of tools—build a UI for your data platform to reduce complexity, boost adoption, and enable true self-service.
Data Strategy
A glimpse into the life of a data leader
Data leaders face pressure to balance AI hype with data landscape organization. Here’s how they stay focused, pragmatic, and strategic.
The building blocks of successful Data Teams
5 key traits of successful data teams: ownership, business focus, software best practices, self-service, and company-wide strategy.
AI/ML
From Good AI to Good Data Engineering. Or how Responsible AI interplays with High Data Quality
Responsible AI depends on high-quality data engineering to ensure ethical, fair, and transparent AI systems.
Prompt Engineering for a Better SQL Code Generation With LLMs Copy
Boost SQL generation with LLMs using prompt engineering, schema context, user feedback & RAG for accurate, business-aware queries.
Tools & Technology
Integrating MegaLinter to Automate Linting Across Multiple Codebases. A Technical Description.
Automate code quality with MegaLinter, SQLFluff, and custom checks in Azure DevOps CI. Supports multi-language linting and dbt integration.
Monitoring thousands of Spark applications without losing your cool
Monitor Spark apps at scale with CPU efficiency to cut costs. Use Dataflint for insights and track potential monthly savings.
Data Stability with Python: How to Catch Even the Smallest Changes
Detect data changes efficiently by sorting and hashing DataFrames with Python—avoid re-running pipelines and reduce infrastructure costs.
Demystifying Device Flow
Implement OAuth 2.0 Device Flow with AWS Cognito & FastAPI to enable secure logins for headless devices like CLIs and smart TVs.
Short feedback cycles on AWS Lambda
Speed up AWS Lambda dev with a Makefile: build, deploy, test, and stream logs in one loop boost feedback cycles to just ~15 seconds.
Age of DataFrames 2: Polars Edition
In this publication, I showcase some Polars tricks and features.