Skip to content

CONCEPT Cited by 1 source

DAG id rewriting

Definition

DAG id rewriting is mutating an orchestrator workflow's identifier at load/init time so that multiple copies of the same source workflow can coexist on a single orchestrator server under distinct ids.

At Zalando, the rewrite injects the feature-branch name between the team prefix and the DAG suffix:

source:  qu.test_dag
rewrite: qu.feature1.test_dag

(Source: sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning)

Why it's needed

In Airflow, a DAG id is globally unique per server. You cannot register two DAGs with id qu.test_dag on the same scheduler — the second load fails or overwrites the first. That's the blocker for having multiple pipeline environments on one shared server.

Rewriting the id at DAG.__init__ time makes the same source file produce a unique id in each environment, without touching the team's DAG code.

How Zalando does it

They fork Airflow's dag.py — the file that defines the DAG class. Inside __init__, the override:

  1. Reads the file path of the Python file that initialised the DAG (e.g. /usr/local/airflow/dags/feature1.zip/qu/main/file.py).
  2. Extracts the zip filename → feature1 (the feature-branch / environment name).
  3. Rewrites dag_id to {team_name}.{feature_name}.{rest_of_dag_id}.
  4. Appends feature_name as a DAG tag so the Airflow UI can filter by environment.

The zip package name is the environment name — this is why zip packaging and DAG id rewriting are co-designed at Zalando.

Brittleness

  • Assumes a team-prefixed DAG id schema ({team}.{rest}). Flat namespaces need a different rewrite rule.
  • Forking dag.py is a maintenance tax — every Airflow upgrade requires re-applying the patch.
  • Only works with zip-packaged deploys — the environment name is read from the zip filename.

Seen in

Last updated · 550 distilled / 1,221 read