Skip to content

CONCEPT

Pipeline environment

Definition

A pipeline environment is a named version of a batch pipeline — a complete set of orchestrator workflow definitions (e.g. Airflow DAGs) deployed to a single orchestrator server such that the version can be scheduled and run end-to-end independently from other versions on the same server.

Defined by Zalando in :

"A pipeline environment is a version of a pipeline (set of Airflow DAGs) deployed to an Airflow server on which it can run end-to-end. Each environment contains all DAGs necessary to produce the required output (e.g. marketing ROI in our case), so multiple environments can co-exist on one server and can be used independently."

Why the abstraction matters

Airflow (and most orchestrators) have no native concept of environment. A DAG id is globally unique per Airflow server; a given DAG exists in exactly one version at a time. So if multiple teams want to test conflicting changes to the same DAG, they either:

  • share the test server and collide, or
  • use separate servers (expensive + slow — see MWAA ~30 min/server).

"Pipeline environment" is the layer Zalando adds on top of Airflow to give isolation without multi-server cost: each PR gets its own pipeline env, identified by a branch / feature name, sharing the scheduler process.

Implementation at Zalando

Each pipeline environment is a zip (DAG zip packaging) named for the feature branch (feature1.zip). Airflow's DAG id rewriter injects the branch name into every DAG id at init (qu.test_dagqu.feature1.test_dag), so multiple zips with the same source DAGs can coexist.

Bound to a data environment

A pipeline env must read/write an isolated data layer too, otherwise cross-env data conflicts recreate the original sharing problem. Zalando's model is a 1-to-1 binding between a pipeline environment and a data environment — e.g. pipeline env feature1 reads/writes db_attribution_feature1.

Seen in

  • — Zalando's Performance Marketing org runs the ROI pipeline in live / test / featureN pipeline environments, one per open PR, each on the same shared test Airflow server.
Last updated · 542 distilled / 1,571 read