Skip to content

CONCEPT Cited by 2 sources

Feature flag

A feature flag is a runtime switch that gates a code path by context, rollout percentage, or targeting rule — enabling release to be decoupled from deploy and cohort-scoped rollouts. The code is in production but inert until the flag turns on for some population; ramping is a config change, not a redeploy.

The basic decoupling

Without flags, deploy-day = release-day — every user gets the new code path the moment the binary/Worker goes live. With flags:

  • Deploy: new code ships behind if (flag(...)) { new } else { old }. Flag is off. Nothing changes for users.
  • Release: flip the flag on — for one user, one cohort, a percentage, a region — watch metrics.
  • Revert: flip the flag off. Seconds, not a rollback deploy.

This is why flags are the substrate for agent-controlled deployment: the blast radius is bounded by the flag state, which is itself a fast, auditable control-plane edit.

Evaluation model

A flag typically has:

  • Variations — the possible returned values (boolean, string, number, or full JSON object — see Flagship's four variation types).
  • Rules — conditions evaluated in priority order, first-match wins, each mapping a matching context to a variation (plus an optional percentage rollout).
  • Default — returned on evaluation error or when no rule matches.

Rules compose via AND/OR — Flagship allows up to 5 levels of nesting, implicit AND at the top of a rule, nested AND/OR groups below.

What's hardcoded flags

The anti-pattern is "just commit a boolean and redeploy" — works at one or two flags, collapses at ten:

"One hardcoded flag becomes ten. Ten becomes fifty, owned by different teams, with no central view of what's on or off. There's no audit trail — when something breaks, you're searching git blame to figure out who toggled what." (Source: Cloudflare Flagship, 2026-04-17)

A proper flag service adds: central catalog, dashboard UI, audit trail, rule evaluation engine, rollout ramping with stable bucketing, and vendor-neutral call-site API via OpenFeature.

Evaluation-location trade-offs

Where the flag(context) call runs matters:

  • Remote evaluation (HTTP to flag service) — every user request waits on an outbound call. Centralized, easy to update, but "sits on the critical path of every single user request. It could add considerable latency." (Source: sources/2026-04-17-cloudflare-introducing-flagship-feature-flags-built-for-the-age-of-ai.)
  • Local-evaluation SDK — rules downloaded into memory, evaluated in-process, refreshed in the background. Assumes a long-lived process; breaks on serverless isolate substrates where "a Worker isolate can be created, serve a request, and be evicted between one request and the next."
  • Edge-local evaluation against a replicated config store — evaluation in the same runtime isolate that serves the request, reading config from an edge-replicated KV cache; no outbound hop, no long-lived SDK. See patterns/in-isolate-rule-evaluation + patterns/do-plus-kv-edge-config-distribution.

Not just for humans

The 2026-04-17 Cloudflare Flagship launch is explicit that the motivating user is an AI agent, not a release manager:

"An agent writes a new code path behind a flag and deploys it — the flag is off, so nothing changes for users. The agent then enables the flag for itself or a small test cohort, exercises the feature in production, and observes the results. If metrics look good, it ramps the rollout. If something breaks, it disables the flag."

See concepts/agent-controlled-deployment.

Stale-flag cleanup as KTLO automation target

The mirror image of the agent-as-flag-author framing is agent as flag-cleaner: once a flag is fully rolled out, the inert side of the conditional becomes dead code that "impacts performance, reliability, and developer productivity." Atlassian's 2026-06-01 Jira-team post is the canonical wiki articulation of stale-flag cleanup as one of the canonical KTLO chores mapped onto an agentic workflow:

"On a large, multi-product codebase, cleanup is harder than 'just removing the code.' A flag might be fully rolled out for some customers but still active for others due to compliance requirements, release tracks, or experiment holdouts. Piecing together a flag's true state across systems was manual and error-prone. The work was tedious, time-consuming, and repetitive — perfect for agents." (Source: sources/2026-06-01-atlassian-how-we-cut-up-to-80-of-engineering-chores-using-ai-agents-in)

The automation pipeline:

  1. Daily heuristic cron scans the codebase for flag references, cross-references compliance / experiment / release-track state, and emits one Jira work item per stale flag with structured fields (flag name, type, repo, paths, line numbers, desired final state). (patterns/heuristic-cron-emits-agent-work-items)
  2. Engineer transitions the work item's status, which triggers the agent with a custom system prompt. (patterns/jira-status-transition-triggers-agent-workflow)
  3. Agent runs a three-tier skill fallback chain — repo-specific cleanup skill → flag the repo for skill creation + provide owner instructions → generic cleanup skill — and opens a draft PR. (patterns/agent-skill-with-fallback-chain)
  4. Engineer reviews + merges through the standard merge-queue path (Bitbucket Merge Queues).

Reported throughput on the Jira repo alone: 500+ merged PRs in 70 days — ≈ 7 PRs/day on a single KTLO category.

Seen in

  • Canonical framing of feature flags as per-tenant-cell blast-radius cap in a DBaaS fleet. Max Englander (PlanetScale, 2025-07-03) names feature flags as the substrate for per-database progressive delivery: "Database cluster config and binary changes are shipped database by database using feature flags". Load-bearing consequence: "A bug in Vitess or the PlanetScale Kubernetes operator rarely impacts more than 1-2 customers, thanks to our extensive use of feature flags to roll out changes." Canonical granularity: flags evaluate in a per-database context so the unit of rollout is one customer cluster at a time, bounding the blast radius of a bad ship to the customer(s) whose flag flipped just before telemetry surfaced the regression. Also composes with weekly failover drill — the flag-gated ship arrives via the failover itself.
  • sources/2026-04-17-cloudflare-introducing-flagship-feature-flags-built-for-the-age-of-ai — canonical wiki introduction; Flagship is the OpenFeature- native Cloudflare-built primitive, framed explicitly as the bounded-blast-radius substrate for agent-driven shipping.
  • feature flag as kill-switch substrate at the async-job altitude. PlanetScale uses Flipper — a general-purpose Ruby feature-flag library — as the deploy-less config channel for disabling async-job classes via naming-convention flags (disable_<classname>). Canonical datum: general feature-flag systems can serve as the config channel for specialised kill-switches, no bespoke control plane required. See patterns/feature-flagged-job-enqueue-rejection.

  • sources/2026-06-01-atlassian-how-we-cut-up-to-80-of-engineering-chores-using-ai-agents-incanonical wiki instance of stale-flag cleanup as agentic-KTLO automation target. Atlassian's Jira team runs a daily heuristic cron that emits one Jira work item per stale flag with full pre-resolved cross-system state (flag name + type + repo + paths + line numbers + desired final state), then routes through Jira-status-transition-triggered agent runs that dispatch via a three-tier repo-specific → flag-for-creation → generic skill fallback chain. Reported throughput: 500+ merged PRs in 70 days on the Jira repo alone. The post canonicalises the cross-system-state-piecing problem (compliance / experiments / release-track holdouts make "is this flag actually stale" a hard query) and demonstrates that a heuristic cron encoding the team's pattern recognition is a viable detection layer for agentic remediation.

Last updated · 542 distilled / 1,571 read