CONCEPT Cited by 1 source
Airflow TaskGroup Parallelism¶
Definition¶
Airflow TaskGroup parallelism is the DAG-structuring
discipline of placing N instances of the same logical
pipeline into N separate TaskGroup subgraphs inside one
DAG, so that each instance runs independently in parallel, with
an optional final task consolidating all instances' outputs.
It's the Airflow-native answer to "run this pipeline for every market / every tenant / every shard" without spawning N separate DAGs.
(Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)
The pattern in Zalando's framework¶
Zalando quotes it directly:
"[Taskgroup]: We want to be able to evaluate multiple markets in parallel, where each market shares the same flow but with different test queries. Therefore we can implement each evaluation lineage as a task group and put all of them together in the same DAG. This way each task group can run independently in parallel and, once they are all finished, a final task consolidates all evaluation results together." (Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)
Structurally:
┌─────────── TaskGroup: market=LU ──────────┐
│ generate → retrieve → evaluate → report │
┌────────┤ │
│ └───────────────────────────────────────────┘
DAG entry──┤ ┌─────────── TaskGroup: market=PT ──────────┐
├────────┤ generate → retrieve → evaluate → report │
│ └───────────────────────────────────────────┘
│ ┌─────────── TaskGroup: market=GR ──────────┐
└────────┤ generate → retrieve → evaluate → report │
└───────────────────┬───────────────────────┘
│
┌───────┴────────┐
│ consolidation │
│ task (fan-in) │
└────────────────┘
Why TaskGroups, not separate DAGs¶
Two SRE-relevant properties:
- One scheduling unit. One trigger (cron, manual, external event) fans out to all markets; one status readout aggregates back. No per-tenant cron herd, no per-tenant alerting rule proliferation.
- One consolidation task. Cross-market reports ("which markets share low-scoring brand segments") require all market results in one place. Separate DAGs would force an external aggregator; TaskGroups keep the fan-in inside Airflow.
Why TaskGroups, not serial iteration¶
The obvious alternative — one task that loops over markets — is worse along three axes:
- No per-market retry isolation. One market's transient failure forces re-running all other markets.
- Observability collapses. The Airflow UI shows one running task, not per-market status.
- No true parallelism. Serial loop waits sequentially; N TaskGroups use N task slots.
What's inside each TaskGroup¶
In Zalando's case, each TaskGroup contains three Kubernetes PodOperator stages plus an NER-analyser sidecar, all Docker-image-encapsulated. See patterns/podoperator-encapsulated-evaluation-job.
Tradeoffs¶
- DAG complexity scales with N markets. Very large N (say, hundreds of tenants) starts to strain the Airflow scheduler's DAG-parse time and UI rendering; at that scale, dynamic task mapping (Airflow 2.3+) or separate DAGs with a trigger-all parent DAG become better fits.
- Shared resource contention at fan-out. All TaskGroups hitting the same Product API / Elasticsearch cluster at once can saturate downstreams. Zalando's cache partially mitigates but doesn't eliminate; the source doesn't discuss rate-limiting.
- Consolidation task becomes a serial bottleneck. Last TaskGroup to finish gates the consolidation; runtime = max(TaskGroup) + consolidation.
Seen in¶
- sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge — canonical wiki instance. Zalando uses one TaskGroup per market (LU / PT / GR) in the search-quality framework DAG.