Conveyor Product & Features

Can we do better?

Cloud providers like AWS, Azure and GCP provide an overwhelming amount of great building blocks. When delivering data projects for our customers, we noticed the proposed stacks often:
 

  • require a lot of glue code, resulting in workable yet sub-optimal user experiences

  • aren’t actively encouraging software engineering practices

  • require a steep learning curve to configure for your needs

In our opinion, one of the key dimensions for a better solution is high affinity with software engineering best-practices. Software engineering best-practices and automation leading to smooth deliveries are what make digital winners what they are today.

By using containers as the main medium to share softwares with the execution environments, software engineering best-practices and CICD are easier to put in place. So we created a k8s-based, multi-cloud, multi-cluster productivity tool build by developers for developers.

pf_front_600dpi.png

How does it work?

Architecture

End-users interact with Conveyor through a command-line interface (CLI) and a web-based user interface (UI). The CLI is used throughout the building and deploying of data projects, while the UI is mostly used in the run phase.

The control plane is hosted by Data Minded. It is middleware between the users and the data plane. It is responsible for cross cutting concerns like user authentication and authorization, state management, cost aggregation, ... .

The data plane is the Kubernetes-based managed infrastructure that runs within your own cloud environment. It consists of multiple services that takes care of provisioning and scaling the needed infrastructure and everything related to the scheduling of the execution of data projects.

Core concepts

Conveyor has two simple concepts: projects and environments. A project is code, a unit of deployment (architectural quantum) that describes the what and how it needs to be executed.

 

Environments are isolated segments in the infrastructure where a data project can be deployed and executed.

pf_solution_data_pipelines_600dpi.png

Data Pipelines

Create a batch pipeline often used for analytics to periodically collect, transform and move data to a data warehouse according to business needs.

Build

Templates for various technologies and use case, get you started with just a couple of key strokes. Using the remote execution `run` command, you can execute your code remotely in the right context.

Deploy

Create persistent or throw-away environment. Select a resource size and a security context. Deploy and promote data project with ease.

Operate

Once your data project is deployed, you want to follow-up on resource utilization, cost and troubleshoot potential failures.

features

pf_icon_multi-cluster_multi-cloud_600dpi.png

Multi-cluster & Multi-cloud

An environments links to a Kubernetes clusters deployed as part of the customers data plane. Customers can have multiple clusters spread over homogeneous or heterogenous cloud accounts.

pf_icon_templates_600dpi.png

Templates

Create paved roads, take care of boilerplate and encourage the use of best practices across teams. Clients can create their own templates based on their needs, and the systems they integrate with.

pf_icon_data_exploration_600dpi.png

Data Exploration

Jupyter notebooks are well known for their ease in data exploration and experimentation. Data Minded cloud notebooks are build on the same containerization foundation as data project code. This enables other use cases e.g. using notebooks to debug and facilitate iterative industrialization of experiments.

pf_icon_workflow_management_600dpi.png

Workflow Management

Each environment has a dedicated Apache Airflow instance for batch workload orchestration. Automated client-side DAG validation.

pf_icon_distributed_jobs_600dpi.png

Distributed Jobs

Some projects need more power that can be provided by a single node. Data Minded Cloud offers Apache Spark (batch and streaming) as a first class citizen.

pf_icon_monitoring_and_logging_600dpi.png
pf_icon_cost_monitoring_600dpi.png
pf_icon_single_sign-on_and_rbac_600dpi.png
pf_icon_data_access_management_600dpi.png

Monitoring & Logging

To operate data projects, logs and metrics are centralized and available in real-time. For some integrations, access to the technology native UIs are available e.g. Apache Spark History Server.

Cost Monitoring

Follow your cloud cost per project over time. Gain insight into cost distribution across projects and environments.

Single Sing-on & RBAC

Authenticate users with your own identity provider. Control the actions any user can perform on projects and environments using role-based mechanisms.

Data Access Management

Each project can be linked with cloud specific IAM credentials. This link is enforced so that each job can only access those resources it was granted access to. Combined with the RBAC model on environment and projects, data access management is in your hands.