top of page

Data engineering manifesto

The manifesto is a set of principles that will make sure your projects will be delivered on time, empower your people, and ensure the stability of your products.

This is based on our collective experience from all of our projects.

DE_manifesto_2021.png

1. We are software engineers. 
 

A data engineer creates software by writing code while following best practices. This includes adopting software design principles, version control, automated testing, CI/CD, Cloud-Native, and DevSecOps practices. 

2. Data is a product.

Without customers that actually use it or buy it, data is not generating anything. It is a product and we should treat it like one and measure how successful it is: usage, time-to-market, quality, availability, ... 

3. We enable by providing self-service solutions.

As data engineers, we build custom data pipelines for complex use cases. For more common use cases we provide self-service solutions. We are an enabler in an organisation. 

4. We embrace cloud and managed services.

 

Public cloud providers have drastically changed the playing field of data engineering over the last few years. They have increased flexibility while lowering the operational burden. There are still valid exception scenarios but cloud is the new default. 

5. Batch processing is easier than stream processing.

 

While it might be more interesting to build a real-time data stream, it can add unneeded complexity (state, replay, ... ). If your use case only requires data once per day, keep it simple, keep it stupid, keep it sustainable! This doesn’t mean we shy away from stream processing when needed.

6. We aim for simplicity through consistency.

Consistency is a powerful weapon yet there is no golden hammer. For each use case, we try existing “boring” technologies first, rather than selecting a technology to expand our own knowledge and experience. We will use bleeding edge technology when we have good reason to do so.

7. Notebooks are for research, not for development. 

Notebooks bring value at the beginning of a data product when exploring data,  looking for potential solutions and documenting along the way. The structure of notebooks does not incentivize good software design patterns though. So once you are done exploring, a proper IDE is the right tool for the job. 

8. Data scientists are our colleagues, not our mortal enemies. 

 

Data engineers and data scientists share a common technology stack, yet they often play a different role in the organization. This can lead to a dysfunctional marriage but it doesn’t have to. We work together and see the best results emerge in cross-functional teams. 

9. SQL is the lowest-common denominator.  

 

Over the years we have seen the rise of domain-specific languages and drag-and-drop tools. Nothing has come close to the expressiveness and adoption of SQL. We see it as the bridge to enable less technical profiles and we favor it as the interface in self-service solutions. 

Check our webinar

Check out the webinar to have more in-depth ideas about the

Data Minded Manifesto.

 
bottom of page