Skip to content

CONCEPT Cited by 1 source

Airflow DAG zip packaging

Definition

Packaging DAGs is an Airflow feature that deploys a DAG and all of its Python dependencies as a single zip archive dropped in Airflow's dags/ folder. The scheduler loads DAGs from inside the zip. Crucially, dependencies inside the zip take precedence over anything installed globally — each zip is its own import namespace.

Why it's useful

Multiple zips can ship different versions of the same Python package without conflict. At Zalando, this is the dependency-isolation half of their per-PR pipeline environments: PR A's zip can depend on pandas==2.0, PR B's zip on pandas==2.1, both on the same Airflow server.

Gotcha: Jinja templates

Jinja templates for templated files (as opposed to inline strings) don't work from inside a zip out of the box. Jinja resolves the absolute path correctly but can't read a file that's inside a zip archive.

Zalando's workaround (Figure 3 of the source post) is to also deploy an unpackaged copy of the zip contents to a sibling directory, and at DAG init time add that directory to template_searchpath:

# /usr/local/airflow/features/feature1/
feature_dir_path = get_feature_dir_path(file_path)
template_searchpath.add(feature_dir_path)

So every feature has two deployments: the zip (for code + deps) and an unpackaged directory (for Jinja file reads). Both named for the feature branch.

Pairs with DAG id rewriting

The zip filename is the source of truth for the environment name. DAG id rewriting reads it at init. So zip packaging + DAG id rewriting together form the per-PR pipeline environment primitive.

Seen in

Last updated · 550 distilled / 1,221 read