PATTERN Cited by 1 source
Per-market Parallel TaskGroup DAG¶
Intent¶
Run the same logical pipeline — test-query generation, result
retrieval, evaluation — for N markets in parallel inside
one Airflow DAG, by structuring each market's pipeline as a
TaskGroup and adding a final consolidation task that
aggregates across all TaskGroups.
One trigger → N parallel evaluation lineages → one consolidated output. The pattern is the DAG-level structural answer to multi-tenant offline evaluation.
(Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)
Structure¶
DAG trigger
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
TaskGroup TaskGroup TaskGroup
market=LU market=PT market=GR
├─generate ├─generate ├─generate
├─retrieve ├─retrieve ├─retrieve
├─evaluate ├─evaluate ├─evaluate
├─ner-diff ├─ner-diff ├─ner-diff
└─report └─report └─report
\ │ /
└───────────┼───────────┘
│
▼
consolidation
(fan-in task)
Each TaskGroup is independent: its own retries, its own logging, its own UI status line. One failing TaskGroup does not poison the others — it only delays the consolidation task, which waits on all upstream TaskGroups.
Zalando's quote¶
"[Taskgroup]: We want to be able to evaluate multiple markets in parallel, where each market shares the same flow but with different test queries. Therefore we can implement each evaluation lineage as a task group and put all of them together in the same DAG. This way each task group can run independently in parallel and, once they are all finished, a final task consolidates all evaluation results together."
Why this over alternatives¶
- vs separate DAGs per market: one scheduling unit, one cross-market consolidation, one place to look at health.
- vs one task looping over markets: per-market retry isolation, true parallelism, per-market status in the UI.
- vs Airflow dynamic task mapping (2.3+): simpler for fixed, small N (3 markets); dynamic mapping becomes the better fit when N grows large or is runtime-determined.
Consolidation-task responsibilities¶
- Aggregate per-market reports into a cross-market view.
- Surface cross-market brand-level or category-level issues (a brand failing in three markets simultaneously is a different signal from failing in one).
- Emit to whatever downstream consumer — storage, dashboard, alert, follow-up Airflow trigger.
Variations¶
- Heterogeneous markets. If markets need slightly different pipelines (e.g. some need translation, some don't), parameterise the TaskGroup definition or conditionally skip the translation stage.
- Nested TaskGroups. Per-market, per-language sub-groups if a single market has multiple target languages.
- Partial-failure consolidation. Use Airflow's
trigger_rule='all_done'on consolidation to produce a partial report even if some markets fail.
Limitations¶
- Scheduler capacity. Very large N (hundreds of tenants) can strain DAG-parse and UI rendering; dynamic task mapping or separate DAGs with a trigger-all parent become better.
- Downstream resource contention. All TaskGroups hitting the same Product API / ES cluster / LLM API at fan-out can saturate shared downstreams. Rate-limiting or staggered starts may be needed.
- Consolidation waits for slowest. One slow market gates the final fan-in; total runtime = max(TaskGroup) + consolidation.
Seen in¶
- sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge — canonical wiki instance; Zalando runs one TaskGroup per market (LU / PT / GR) in the search-quality framework DAG.