Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Running Pipelines with a Custom Image

Time required: 10 minutes


Prerequisites

You must have already:

  • Signed up for a Matatika account
  • Deployed the Matatika platform in your own cloud

Introduction

A custom pipelines image allows customisation of the pipeline runtime environment for a workspace. By default (no custom image set), pipelines will run using the default base image - this usually involves cloning the repository, installing plugins and executing meltano run within the container. You might want to use a custom pipelines image to, for example, delegate these runtime setup tasks to a container image build, which can significantly speed up pipeline execution times - or perhaps something else completely custom!

Configure the catalog container registry

Set MATATIKA_DATAFLOW_DOCKERREGISTRY in the catalog deployment environment to point at a container registry where your custom image exists:

# private registry
MATATIKA_DATAFLOW_DOCKERREGISTRY=matatika.azurecr.io

# Docker Hub
MATATIKA_DATAFLOW_DOCKERREGISTRY=docker.io

Configure a Dockerfile for your workspace

The key requirement for a custom pipelines images is the workspace must be configured as the image working directory.

RUN mkdir workspace
WORKDIR /workspace

COPY . .

Examples

Pre-installed plugins

FROM matatika.azurecr.io/matatika/matatika-catalog-shelltask:latest-dev

RUN mkdir workspace
WORKDIR /workspace

COPY . .

RUN meltano install

# cleanup pip temporary files to reduce image size
RUN rm -rf ~/.cache

From example-github-analytics

Build and push your workspace pipelines image

docker build -t <registry host>/<image name> .
docker push <registry host>/<image name>

where <image name> should match pipelines_image from your workspace.yml:

pipelines_image: <image name>

See example-github-analytics workspace.yml and azure-pipelines.yml for a working example