top of page

Conveyor vs EMR

EMR is the default way to run Spark Jobs on AWS. It's a stable environment with a genuinely fast runtime. The main differences with Conveyor are: ​

Learning curve



  • Know and configure AWS infrastructure details
    Manually configure VPC's, IAM roles etc

  • No 1 way to use notebooks
    EMR studio, EMR notebooks, Sagemaker are candidates

  • Configure a tool to manage workloads
    Use managed workflows for AIrflow, Step functions or a homebrew solution



  • Conveyor environments with managed Airflow
    Schedule containers or spark jobs on the cluster

  • Notebooks for experimentation
    Explore data or test ML algorithms with one command

  • Conveyor run command
    Start Spark or container jobs on the cluster from your local environment

The management model

In EMR, you have to create and manage a cluster to schedule your jobs.​

Job types



  • IAM roles are linked to a cluster
    Use 1 cluster per job if you want to use different IAM roles

  • Clusters do not autoscale by default

  • Clusters do not update automatically

  • Clusters are bound to spark/hadoop versions
    When sharing a cluster, all applications need to be updated at the same time

  • Creating a cluster takes up to 15 minutes

  • EMR on EKS
    Manage the EKS data-plane with all components yourself



  • Conveyor manages clusters for you

  • One cluster for all jobs
    Each job can use a separate IAM role

  • We run containers
    Run the same container locally or anywhere else

  • Mix spark/hadoop versions on one cluster
    All dependencies are packaged in docker containers



  • Any job type existing in the Hadoop ecosystem
    Spark, Pig, Hive, etc are supported

  • (Too) many ways to package code
    Jars, pyfiles, pex distributions, containers,...

  • No support for non Hadoop jobs
    Cannot run simple python code



  • Any container can be run non-distributed
    Use your favorite programming language

  • Use spark for distributed jobs
    For processing large volumes of data

  • DBT for data warehouse transformations
    lower the barrier for data analysts to use and process data

bottom of page