Skip to content

PATTERN Cited by 1 source

Cron-driven PR-closed cleanup

Pattern

Decouple the teardown of per-PR ephemeral environments from the PR-close event itself. A separate cron polls the version-control system's PR API on a fixed cadence; for any PR observed to have transitioned from open→closed, it deletes the corresponding environment.

At Zalando (from sources/2022-06-09-zalando-accelerate-testing-in-apache-airflow-through-dag-versioning):

"We have developed a cron job that checks the status of pull requests. Once a pull request is closed, the corresponding environment is deleted on the Airflow server. The job deletes the zip file and the folder which contains the unpackaged files. Then, it queries the Airflow metastore for all associated DAGs and deletes them via Airflow cli."

The cron deletes three things per closed PR:

  1. featureN.zip from the Airflow dags/ folder.
  2. The unpackaged sibling directory used for Jinja template_searchpath.
  3. All metastore DAG rows whose id contains the feature tag — via the Airflow CLI.

Why not a webhook (or Airflow hook)?

A webhook on PR-close is lower latency but:

  • Easier to lose a signal — missed webhook → orphaned env forever.
  • Harder to reason about idempotency and retries — if the cron's deletion job crashes partway through, the next tick just re-runs.
  • Coupling lifecycle to the orchestrator library (Airflow hooks) would push the logic into the same place as the DAG class fork — accumulating complexity in one file.
  • Periodic scan is self-healing — if something goes wrong in one run (stuck state, partial delete), the next run retries.

For per-PR env cleanup, orphans-for-a-few-minutes is perfectly acceptable — the cost of a stuck environment is storage + metastore rows, not end-user visible failure. Periodic poll over webhook is the right tradeoff.

The "out-of-band lifecycle" shape

More generally: ephemeral-resource lifecycles want to be managed out-of-band from the resource-creation mechanism. Creation is triggered by the human action (PR open); deletion runs on a reliable periodic reconciliation loop. This is the same architectural shape as Kubernetes garbage collectors, systemd timers, and cloud reaper jobs.

Tradeoffs

  • Self-healing — missed deletes are retried automatically.
  • Simple implementation — one cron, one query, three delete steps.
  • Decoupled from orchestrator internals — a library upgrade doesn't break cleanup.
  • Non-zero orphan window — env lives on for up to one cron tick after PR close.
  • Requires read access to the VCS PR API — and resilience to rate limits.
  • Naming convention must encode the PR identity — the cron needs to map "closed PR #123" to "env named featureN".

Seen in

Last updated · 550 distilled / 1,221 read