SYSTEM Cited by 1 source
Zalando Marketing ROI Pipeline¶
What¶
Zalando's Performance Marketing department's marketing ROI (return-on-investment) pipeline — a batch data + machine-learning pipeline that measures the return on paid advertisement campaigns. Compute is Databricks Spark; orchestration is Apache Airflow; the data layer is Spark databases backed by S3.
The pipeline is composed of sub-pipelines ("components") owned by different cross-functional teams (applied science, engineering, product) within Performance Marketing. Named examples in the source post:
- Input data preparation
- Marketing attribution model
- Incremental profit forecast for campaigns
Some components are built using Zalando's in-house Python SDK zFlow.
Why it's interesting on the wiki¶
The ROI output has no ground truth — there is no oracle to compare a new pipeline version against. To validate any change to an input or a component, the whole pipeline has to be run end-to-end and its output inspected. That's the forcing function for Zalando's per-PR Airflow environment work: when multiple teams are editing different components of the same pipeline in parallel, they cannot share a single test environment without conflicts, and MWAA-style per-PR-new-server would take ~30 min and real cost per PR.
See sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning for the full architecture.