PATTERN Cited by 1 source

Per-market Parallel TaskGroup DAG¶

Intent¶

Run the same logical pipeline — test-query generation, result retrieval, evaluation — for N markets in parallel inside one Airflow DAG, by structuring each market's pipeline as a TaskGroup and adding a final consolidation task that aggregates across all TaskGroups.

One trigger → N parallel evaluation lineages → one consolidated output. The pattern is the DAG-level structural answer to multi-tenant offline evaluation.

(Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)

Structure¶

          DAG trigger
               │
 ┌─────────────┼─────────────┐
 │             │             │
 ▼             ▼             ▼
TaskGroup     TaskGroup     TaskGroup
 market=LU     market=PT     market=GR
 ├─generate    ├─generate    ├─generate
 ├─retrieve    ├─retrieve    ├─retrieve
 ├─evaluate    ├─evaluate    ├─evaluate
 ├─ner-diff    ├─ner-diff    ├─ner-diff
 └─report      └─report      └─report
       \            │            /
        └───────────┼───────────┘
                    │
                    ▼
              consolidation
              (fan-in task)

Each TaskGroup is independent: its own retries, its own logging, its own UI status line. One failing TaskGroup does not poison the others — it only delays the consolidation task, which waits on all upstream TaskGroups.

Zalando's quote¶

"[Taskgroup]: We want to be able to evaluate multiple markets in parallel, where each market shares the same flow but with different test queries. Therefore we can implement each evaluation lineage as a task group and put all of them together in the same DAG. This way each task group can run independently in parallel and, once they are all finished, a final task consolidates all evaluation results together."

Why this over alternatives¶

vs separate DAGs per market: one scheduling unit, one cross-market consolidation, one place to look at health.
vs one task looping over markets: per-market retry isolation, true parallelism, per-market status in the UI.
vs Airflow dynamic task mapping (2.3+): simpler for fixed, small N (3 markets); dynamic mapping becomes the better fit when N grows large or is runtime-determined.

Consolidation-task responsibilities¶

Aggregate per-market reports into a cross-market view.
Surface cross-market brand-level or category-level issues (a brand failing in three markets simultaneously is a different signal from failing in one).
Emit to whatever downstream consumer — storage, dashboard, alert, follow-up Airflow trigger.

Variations¶

Heterogeneous markets. If markets need slightly different pipelines (e.g. some need translation, some don't), parameterise the TaskGroup definition or conditionally skip the translation stage.
Nested TaskGroups. Per-market, per-language sub-groups if a single market has multiple target languages.
Partial-failure consolidation. Use Airflow's trigger_rule='all_done' on consolidation to produce a partial report even if some markets fail.

Limitations¶

Scheduler capacity. Very large N (hundreds of tenants) can strain DAG-parse and UI rendering; dynamic task mapping or separate DAGs with a trigger-all parent become better.
Downstream resource contention. All TaskGroups hitting the same Product API / ES cluster / LLM API at fan-out can saturate shared downstreams. Rate-limiting or staggered starts may be needed.
Consolidation waits for slowest. One slow market gates the final fan-in; total runtime = max(TaskGroup) + consolidation.

Seen in¶

sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge — canonical wiki instance; Zalando runs one TaskGroup per market (LU / PT / GR) in the search-quality framework DAG.