CONCEPT Cited by 1 source
DAG id rewriting¶
Definition¶
DAG id rewriting is mutating an orchestrator workflow's identifier at load/init time so that multiple copies of the same source workflow can coexist on a single orchestrator server under distinct ids.
At Zalando, the rewrite injects the feature-branch name between the team prefix and the DAG suffix:
(Source: sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning)
Why it's needed¶
In Airflow, a DAG id is globally unique per server. You cannot register two DAGs with id qu.test_dag on the same scheduler — the second load fails or overwrites the first. That's the blocker for having multiple pipeline environments on one shared server.
Rewriting the id at DAG.__init__ time makes the same source file produce a unique id in each environment, without touching the team's DAG code.
How Zalando does it¶
They fork Airflow's dag.py — the file that defines the DAG class. Inside __init__, the override:
- Reads the file path of the Python file that initialised the DAG (e.g.
/usr/local/airflow/dags/feature1.zip/qu/main/file.py). - Extracts the zip filename →
feature1(the feature-branch / environment name). - Rewrites
dag_idto{team_name}.{feature_name}.{rest_of_dag_id}. - Appends
feature_nameas a DAG tag so the Airflow UI can filter by environment.
The zip package name is the environment name — this is why zip packaging and DAG id rewriting are co-designed at Zalando.
Brittleness¶
- Assumes a team-prefixed DAG id schema (
{team}.{rest}). Flat namespaces need a different rewrite rule. - Forking
dag.pyis a maintenance tax — every Airflow upgrade requires re-applying the patch. - Only works with zip-packaged deploys — the environment name is read from the zip filename.
Related¶
- concepts/pipeline-environment
- concepts/airflow-dag-zip-packaging
- patterns/per-pr-airflow-environment-via-dag-versioning
- patterns/library-fork-for-dag-id-rewrite
- systems/apache-airflow
Seen in¶
- sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning — canonical instance; the fork is the core mechanism behind Zalando's per-PR pipeline-environment design.