CONCEPT Cited by 1 source
Pipeline environment¶
Definition¶
A pipeline environment is a named version of a batch pipeline — a complete set of orchestrator workflow definitions (e.g. Airflow DAGs) deployed to a single orchestrator server such that the version can be scheduled and run end-to-end independently from other versions on the same server.
Defined by Zalando in sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning:
"A pipeline environment is a version of a pipeline (set of Airflow DAGs) deployed to an Airflow server on which it can run end-to-end. Each environment contains all DAGs necessary to produce the required output (e.g. marketing ROI in our case), so multiple environments can co-exist on one server and can be used independently."
Why the abstraction matters¶
Airflow (and most orchestrators) have no native concept of environment. A DAG id is globally unique per Airflow server; a given DAG exists in exactly one version at a time. So if multiple teams want to test conflicting changes to the same DAG, they either:
- share the test server and collide, or
- use separate servers (expensive + slow — see MWAA ~30 min/server).
"Pipeline environment" is the layer Zalando adds on top of Airflow to give isolation without multi-server cost: each PR gets its own pipeline env, identified by a branch / feature name, sharing the scheduler process.
Implementation at Zalando¶
Each pipeline environment is a zip (DAG zip packaging) named for the feature branch (feature1.zip). Airflow's DAG id rewriter injects the branch name into every DAG id at init (qu.test_dag → qu.feature1.test_dag), so multiple zips with the same source DAGs can coexist.
Bound to a data environment¶
A pipeline env must read/write an isolated data layer too, otherwise cross-env data conflicts recreate the original sharing problem. Zalando's model is a 1-to-1 binding between a pipeline environment and a data environment — e.g. pipeline env feature1 reads/writes db_attribution_feature1.
Related¶
- concepts/data-environment
- concepts/per-pr-ephemeral-environment
- concepts/airflow-dag-zip-packaging
- concepts/dag-id-rewriting
- patterns/per-pr-airflow-environment-via-dag-versioning
- systems/apache-airflow
- systems/zalando-marketing-roi-pipeline
Seen in¶
- sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning — Zalando's Performance Marketing org runs the ROI pipeline in
live/test/featureNpipeline environments, one per open PR, each on the same shared test Airflow server.