Conveyor vs Databricks
Databricks is one of the most popular tools for building and running SQL, python and R notebooks. It provides a great way to get started and experiment with your first data pipelines. Conveyor focuses primarily on delivering high quality data products which is not possible through only a notebook environment.
Notebooks vs high quality code
Notebooks are great for experimenting and creating a first version of your code due to it's interactive environment but they have several drawbacks for writing production code:

-
No modular code
Difficult to share code as well as as navigate across multiple notebooks in the Databricks UI
-
No tests
No easy way to write tests and thus protect against regression
-
Not reproducible
Dependent versions of python, Spark are not specified in the notebook but in the Databricks cluster
-
No configuration parameters/files
Managing configuration is difficult, Databricks only provides the dbutils package but this is not portable outside

Conveyor
-
Support both notebooks and IDE
Use notebooks for experimentation but your IDE to write modular and easy to maintain code
-
Add tests to your code
-
Docker image
Code is packaged with all dependencies to make it truly build once, deploy anywhere
-
Airflow dags
Airflow configuration can use environment variables, extra arguments to customize your code
Databricks and creating data pipelines
Databricks has poor support for creating data pipelines:
Databricks and project governance
Managing tens or a hundred of data projects with Databricks is both challenging as well as costly:

databricks
-
Data access on a workspace
All notebooks in the same workspace can access the same data. To separate them by using different workspaces
-
Databricks clusters per team
In order to make teams independent, they need their own cluster as it defines the python, spark,... versions
-
Databricks licence fee of 50-80%
On top of the raw compute cost of your cloud provider

Conveyor

databricks
-
Use latest source code in job
A job in Databricks runs the latest version of your notebook, which is not necessarily the latest committed code
-
Job dependencies
Limited support to express depencencies between notebooks
-
Notebook dependencies
Many library, framework versions are defined on cluster level instead of in your notebook
-
No overview on all jobs
Difficult to monitor all jobs of a given day
-
Docker image
All code and depencencies are packaged in a docker image such that you know exactly which version is being executed by your job.
-
Airflow
Airflow has extensive support for defining complex Dags as well as a UI with an overview of jobs
-
Notebook dependencies
Many library, framework versions are defined on cluster level instead of in your notebook

Conveyor
-
Data access per project
We support the principle of least privilege by linking data access to project/job
-
RBAC (role-based access control)
is used to define permissions for users on projects/environments
-
Cost dashboards
Give insights into costs in order to reduce them
Databricks notebooks and collaboration
Databricks added support for Repositories which is a big improvement to manage multiple notebooks. There are however still 2 major issues when collaborating with multiple people:

databricks
-
git is not a first class citizen
You can not make changes to the same file with multiple people at the same time
-
Cannot develop notebooks from your IDE
When working with multiple people, you need a premium cluster which does not support connecting to it from your IDE
Conveyor

-
Native GIT integration
Create feature branches and merge changes when working with multiple people on the same files