Skip to content

CONCEPT Cited by 1 source

Agent-controlled deployment

Agent-controlled deployment is the design posture where an AI agent — not a human release manager — writes the code path, deploys it, self-enrolls into a small cohort via a feature flag, observes metrics, and ramps or reverts the rollout autonomously. The human "sets the boundaries"; the flag bounds the blast radius.

The load-bearing insight from Cloudflare Flagship, 2026-04-17 is that this posture doesn't require a new safety primitive — it requires treating feature flags as the primary agent-facing deployment primitive, not code pushes.

The loop

Direct from the post:

"An agent writes a new code path behind a flag and deploys it — the flag is off, so nothing changes for users. The agent then enables the flag for itself or a small test cohort, exercises the feature in production, and observes the results. If metrics look good, it ramps the rollout. If something breaks, it disables the flag. The human doesn't need to be in the loop for every step — they set the boundaries, and the flag controls the blast radius."

Steps, factored:

  1. Ship inert — agent deploys new code guarded by a flag whose default is off. Deploy ≠ release.
  2. Self-enroll — agent turns the flag on only for its own test context (specific userId, a tiny cohort).
  3. Observe — agent reads production metrics (error rate, latency, business signals) scoped to the enrolled cohort.
  4. Ramp — if green, raise the rollout percentage; the consistent-hashed bucket guarantees previously-admitted users stay admitted.
  5. Revert — if red, flip the flag off in seconds. No code rollback, no deploy.

Why flags, not just CI/CD

CI/CD gates the code-enters-production moment. Flags gate the code-executes-for-users moment. Those are different boundaries:

  • A bad deploy caught by CI/CD never reaches users — good, but gated by pre-production signals.
  • A bad flag ramp caught by observation rolls back in seconds — gated by production signals, which is the only surface that exposes environment-specific or scale-emergent bugs.

Agents are particularly suited to the second boundary: tight observation loops, precise metric thresholds, consistent tolerance for the "ramp, watch, revert" discipline.

What "sets the boundaries" means for the human

The human retains:

  • Target definition — what the flag ramps toward (100% of users? one region? an internal cohort?).
  • Rate ceilings — how fast the agent is allowed to ramp.
  • Metric definitions — which signals count as green/red, and at what thresholds.
  • Kill-switch authority — the human can flip off ahead of the agent; see patterns/global-feature-killswitch.

The specific ramp decisions — "5% looks fine, go to 10% at 14:00, then 25% after an hour" — are delegated.

Architectural requirements

For an agent-driven loop to be safe, the flag primitive must offer:

  • Seconds-or-less propagation from write to edge evaluation (Flagship: DO→KV sync "within seconds"; see patterns/do-plus-kv-edge-config-distribution).
  • Audit trail with diff + actor + timestamp so every ramp the agent performed is reviewable.
  • Consistent-hashed bucketing so ramps are monotonic — otherwise the agent's observations are polluted by cohort flip-flopping.
  • Default-on-error semantics — evaluation failures return the caller-supplied default, so an agent's bad rule addition doesn't cascade into a request failure.

Not a novel ML capability — a novel integration of

existing ones

Nothing about the flag primitive itself changes to support agents. The shift is that the flag surface — previously a human UI (dashboards, toggles, form fields) — becomes a machine API surface that an agent calls with the same typed SDK a human release manager would. The OpenFeature-native design makes the agent-as-caller integration indistinguishable from human-as-caller, which is the point.

Seen in

Last updated · 200 distilled / 1,178 read