Skip to content

PATTERN Cited by 1 source

Per-market Parallel TaskGroup DAG

Intent

Run the same logical pipeline — test-query generation, result retrieval, evaluation — for N markets in parallel inside one Airflow DAG, by structuring each market's pipeline as a TaskGroup and adding a final consolidation task that aggregates across all TaskGroups.

One trigger → N parallel evaluation lineages → one consolidated output. The pattern is the DAG-level structural answer to multi-tenant offline evaluation.

(Source: sources/2026-03-16-zalando-search-quality-assurance-with-ai-as-a-judge.)

Structure

          DAG trigger
 ┌─────────────┼─────────────┐
 │             │             │
 ▼             ▼             ▼
TaskGroup     TaskGroup     TaskGroup
 market=LU     market=PT     market=GR
 ├─generate    ├─generate    ├─generate
 ├─retrieve    ├─retrieve    ├─retrieve
 ├─evaluate    ├─evaluate    ├─evaluate
 ├─ner-diff    ├─ner-diff    ├─ner-diff
 └─report      └─report      └─report
       \            │            /
        └───────────┼───────────┘
              consolidation
              (fan-in task)

Each TaskGroup is independent: its own retries, its own logging, its own UI status line. One failing TaskGroup does not poison the others — it only delays the consolidation task, which waits on all upstream TaskGroups.

Zalando's quote

"[Taskgroup]: We want to be able to evaluate multiple markets in parallel, where each market shares the same flow but with different test queries. Therefore we can implement each evaluation lineage as a task group and put all of them together in the same DAG. This way each task group can run independently in parallel and, once they are all finished, a final task consolidates all evaluation results together."

Why this over alternatives

  • vs separate DAGs per market: one scheduling unit, one cross-market consolidation, one place to look at health.
  • vs one task looping over markets: per-market retry isolation, true parallelism, per-market status in the UI.
  • vs Airflow dynamic task mapping (2.3+): simpler for fixed, small N (3 markets); dynamic mapping becomes the better fit when N grows large or is runtime-determined.

Consolidation-task responsibilities

  • Aggregate per-market reports into a cross-market view.
  • Surface cross-market brand-level or category-level issues (a brand failing in three markets simultaneously is a different signal from failing in one).
  • Emit to whatever downstream consumer — storage, dashboard, alert, follow-up Airflow trigger.

Variations

  • Heterogeneous markets. If markets need slightly different pipelines (e.g. some need translation, some don't), parameterise the TaskGroup definition or conditionally skip the translation stage.
  • Nested TaskGroups. Per-market, per-language sub-groups if a single market has multiple target languages.
  • Partial-failure consolidation. Use Airflow's trigger_rule='all_done' on consolidation to produce a partial report even if some markets fail.

Limitations

  • Scheduler capacity. Very large N (hundreds of tenants) can strain DAG-parse and UI rendering; dynamic task mapping or separate DAGs with a trigger-all parent become better.
  • Downstream resource contention. All TaskGroups hitting the same Product API / ES cluster / LLM API at fan-out can saturate shared downstreams. Rate-limiting or staggered starts may be needed.
  • Consolidation waits for slowest. One slow market gates the final fan-in; total runtime = max(TaskGroup) + consolidation.

Seen in

Last updated · 507 distilled / 1,218 read