CONCEPT Cited by 1 source

DAG id rewriting¶

Definition¶

DAG id rewriting is mutating an orchestrator workflow's identifier at load/init time so that multiple copies of the same source workflow can coexist on a single orchestrator server under distinct ids.

At Zalando, the rewrite injects the feature-branch name between the team prefix and the DAG suffix:

source:  qu.test_dag
rewrite: qu.feature1.test_dag

(Source: sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning)

Why it's needed¶

In Airflow, a DAG id is globally unique per server. You cannot register two DAGs with id qu.test_dag on the same scheduler — the second load fails or overwrites the first. That's the blocker for having multiple pipeline environments on one shared server.

Rewriting the id at DAG.__init__ time makes the same source file produce a unique id in each environment, without touching the team's DAG code.

How Zalando does it¶

They fork Airflow's dag.py — the file that defines the DAG class. Inside __init__, the override:

Reads the file path of the Python file that initialised the DAG (e.g. /usr/local/airflow/dags/feature1.zip/qu/main/file.py).
Extracts the zip filename → feature1 (the feature-branch / environment name).
Rewrites dag_id to {team_name}.{feature_name}.{rest_of_dag_id}.
Appends feature_name as a DAG tag so the Airflow UI can filter by environment.

The zip package name is the environment name — this is why zip packaging and DAG id rewriting are co-designed at Zalando.

Brittleness¶

Assumes a team-prefixed DAG id schema ({team}.{rest}). Flat namespaces need a different rewrite rule.
Forking dag.py is a maintenance tax — every Airflow upgrade requires re-applying the patch.
Only works with zip-packaged deploys — the environment name is read from the zip filename.

Seen in¶

sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning — canonical instance; the fork is the core mechanism behind Zalando's per-PR pipeline-environment design.