Skip to content

SYSTEM Cited by 2 sources

Apache Airflow

Apache Airflow is the open-source workflow orchestration system originally created at Airbnb (2014; donated to ASF, TLP in 2019). Workflows are expressed as Python-defined Directed Acyclic Graphs (DAGs) of operator tasks; the Airflow scheduler parses the DAG files, schedules task instances according to their dependencies + cron-like or data-aware schedule, and tracks run state in a metadata database. Tasks execute on worker processes via one of a handful of executors (LocalExecutor, CeleryExecutor, KubernetesExecutor).

For the purposes of this wiki, Airflow is the default substrate for batch orchestration across data platforms — ETL, feature pipelines, ML training jobs, data-quality jobs, periodic reports, warehouse-to-store sync. The most common managed offering is Astronomer-hosted Airflow (commercial SaaS on top of OSS Airflow, founded by original Airflow contributors).

Common shapes it appears in

  • Hand-written DAGs — one Python file per pipeline; the classic usage.
  • Auto-generated DAGs — a higher-level configuration (YAML / JSON / SQL-plus-metadata) is compiled into Airflow DAGs by a codegen step. The config-driven DAG generation pattern — the platform owns DAG boilerplate (monitoring, data-quality gates, alerting, metadata tagging); customers own only feature-/dataset-specific query + metadata.

Seen in

Last updated · 319 distilled / 1,201 read