Skip to content

CLOUDFLARE 2026-04-17 Tier 1

Read original ↗

Introducing Flagship: feature flags built for the age of AI

Summary

Cloudflare's 2026-04-17 Agents-Week post launches Flagship — Cloudflare's native feature flag service, in private beta — built on OpenFeature (the CNCF open standard for flag evaluation) and on Cloudflare's own developer-platform primitives: Workers, Durable Objects, and Workers KV. The post frames flags as the safety net for agent-shipped code: an AI agent writes a code path behind a flag, deploys it off, self-enrolls into a small cohort, observes metrics, ramps or disables — the human "sets the boundaries, and the flag controls the blast radius". Architecturally, a flag write goes atomically to a per-app Durable Object (SQLite-backed, single-writer, holds config + changelog), syncs within seconds to Workers KV which replicates globally, and is evaluated at the edge by an engine running inside the same isolate already handling the request — no outbound HTTP call, no long-lived SDK process (Workers isolates can be minted and evicted between requests so "local-evaluation SDK" assumptions don't hold). Rules compose via AND/OR nested up to five levels deep with priority-ordered first-match; any rule can include a percentage rollout via consistent hashing on a context attribute (stable bucketing across ramps). Non-Workers runtimes (Node.js, Bun, Deno, browser) get a standard SDK; a client-side provider pre-fetches and caches with a configurable TTL. OpenFeature adoption is explicitly positioned to let customers "adopt Flagship without rewriting your evaluation code. Leave without rewriting it either" — the same relationship OpenTelemetry has to observability.

Key takeaways

  • AI-as-deployer is the thesis (from the lede, word-for-word): "AI is writing more code than ever… Tomorrow, the agent does all of that itself. The question becomes: how do you let an agent ship to production without removing every safety net? Feature flags are the answer." Canonical wiki framing of agent-controlled deployment — the flag is the bounded-blast-radius primitive the agent operates against, not a feature-team toggling tool.

  • Hardcoded-in-Worker and remote-HTTP-call are both explicitly named as anti-patterns. Hardcoded flags work "in the beginning" but "one hardcoded flag becomes ten. Ten becomes fifty, owned by different teams, with no central view of what's on or off. There's no audit trail." Remote evaluation (the fetch("https://flags.example-service.com/...") path) "sits on the critical path of every single user request. It could add considerable latency." (Source: [[sources/2026-04-17- cloudflare-introducing-flagship-feature-flags-built-for-the-age- of-ai]])

  • Workers breaks the "local-evaluation SDK" pattern. The post names why flag SDKs that download-rules-and-evaluate- locally-in-process don't port to serverless: "On Workers, none of these assumptions hold. There is no long-lived process: a Worker isolate can be created, serve a request, and be evicted between one request and the next. A new invocation could mean re-initializing the SDK from scratch." This is the canonical wiki motivation for why in-isolate rule evaluation against an edge-cached config store is the right shape on Workers, not a long-lived in-SDK rules cache.

  • DO-as-source-of-truth + KV-as-edge-replication is the config-distribution primitive. Direct from the post: "When you create or update a flag, the control plane writes the change atomically to a Durable Object — a SQLite-backed, globally unique instance that serves as the source of truth for that app's flag configuration and changelog. Within seconds, the updated flag config is synced to Workers KV, Cloudflare's globally distributed key-value store, where it's replicated across Cloudflare's network." Canonical wiki instance of DO-plus-KV edge config distribution applied to the flag tier; sibling of the same-pattern usage in Artifacts (KV as auth-token store in front of per-repo DOs).

  • Evaluation is in the isolate, on edge-local data, never leaves the POP. "When a request evaluates a flag, Flagship reads the flag config directly from KV at the edge — the same Cloudflare location already handling the request. The evaluation engine then runs right there in an isolate: it matches the request context against the flag's targeting rules, resolves the rollout percentage, and returns a variation. Both the data and the logic live at the edge — nothing is sent elsewhere to be evaluated."

  • OpenFeature (CNCF) adoption is explicit as portability posture. "Flagship is built on OpenFeature, the CNCF open standard for feature flag evaluation. OpenFeature defines a common interface for flag evaluation across languages and providers — it's the same relationship that OpenTelemetry has to observability. You write your evaluation code once against the standard, and swap providers by changing a single line of configuration." Canonical wiki instance of vendor-neutral evaluation API.

  • Worker binding vs. SDK. The Worker binding (wrangler.jsonc: { "flagship": [{ "binding": "FLAGS", "app_id": "<APP_ID>" }] }) is "no HTTP round-trip, no SDK overhead" and "Your account ID is inferred from your Cloudflare account, and the app_id ties the binding to a specific Flagship app." Typed accessors across variation types — getBooleanValue(), getStringValue(), getNumberValue(), getObjectValue() — plus *Details() variants that return "the resolved value alongside the matched variant and the reason it was selected". Failure semantics named: "On evaluation errors, the default value is returned gracefully. On type mismatches, the binding throws an exception — that's a bug in your code, not a transient failure." (type mismatch is load-bearing because JSON/object flags would silently corrupt downstream if coerced).

  • Rule model: conditions + variation + optional rollout + priority, first-match wins. Rules compose via AND/OR nested up to five levels deep. Post-supplied example: (plan == "enterprise" AND region == "us") OR (user.email.endsWith("@cloudflare.com")) = serve ("premium"). At the top level of a rule, multiple conditions are combined with implicit AND; within each condition, nested AND/OR groups. Lower priority number = higher priority.

  • Percentage rollouts vs. gradual deployments are explicitly distinguished. "Unlike gradual deployments, which split traffic between different uploaded versions of your Worker, feature flags let you roll out behavior by percentage within a single version that is serving 100% of traffic." Rollouts use consistent hashing on the specified context attribute: "The same attribute value (userId, for example) always hashes to the same bucket, so they won't flip between variations across requests. You can ramp from 5% to 10% to 50% to 100% of users, so those who were already in the rollout stay in it." Canonical wiki wording for why percentage-rollout consistency matters: ramp-safety = same-bucket guarantee under increasing percentages (see concepts/percentage-rollout).

  • Flag value types explicitly include JSON objects. "Flag values can be boolean, strings, numbers, or full JSON objects — useful for configuration blocks, UI theme definitions, or routing users to different API versions without maintaining separate code paths." JSON-object flags collapse the "flag vs. config store" distinction on the edge — one distribution primitive covers both.

  • Audit trail is a first-class claim. Among the production-grade features named in the closing section: "A full audit trail. Every flag change is recorded with field-level diffs, so you know who changed what and when." Explicitly positioned against the hardcoded-flag anti-pattern where "you're searching git blame to figure out who toggled what."

Systems extracted

  • systems/cloudflare-flagshipthe product. Native Cloudflare feature-flag service, OpenFeature-built, in private beta as of 2026-04-17. Control plane → DO (per-app, SQLite-backed, source of truth + changelog) → KV (replicated globally) → edge evaluation in-isolate. Worker binding (env.FLAGS) is zero-HTTP-overhead; OpenFeature SDK works across Workers, Node.js, Bun, Deno, browser. Dashboard + typed TS API (getBooleanValue / getStringValue / getNumberValue / getObjectValue + *Details()). Source code public: github.com/cloudflare/flagship.
  • systems/openfeature — the CNCF open standard for feature-flag evaluation. Flagship implements the OpenFeature Provider interface. Framed analogously to OpenTelemetry for observability — swap providers by changing one line of configuration.
  • systems/cloudflare-durable-objectscontrol-plane source of truth for each Flagship app (one DO per app, SQLite-backed, holds full flag configuration + changelog, writes are atomic).
  • systems/cloudflare-kvedge-replication tier. Config syncs from DO to KV within seconds; every edge isolate reads flag config from its local KV replica. Load-bearing property: "caching is managed for you, reads are local, and you don't need a persistent connection to keep things up to date."
  • systems/cloudflare-workersthe evaluation runtime. Evaluation engine runs in the same V8 isolate already serving the request; the Flagship binding is exposed on env.FLAGS. New binding config: flagship: [{ binding, app_id }] in wrangler.jsonc.

Concepts extracted

  • concepts/feature-flag — programmable runtime switch that gates a code path by context/rollout%/rule, enabling release decoupled from deploy and per-cohort rollout.
  • concepts/percentage-rollout — ramp a variation to X% of a cohort using consistent hashing on a context attribute so increasing X monotonically grows the bucket (no flip-flops as ramps change).
  • concepts/vendor-neutral-evaluation-api — standardised API surface (OpenFeature) for flag evaluation across providers and languages; ratio to OpenTelemetry-for-observability made explicit in the post.
  • concepts/agent-controlled-deployment — the design posture where an AI agent (not a human release manager) flips flags, ramps percentages, and reverts; the flag is the blast-radius-bounding primitive the agent operates against.
  • concepts/consistent-hashing — hash function that maps the same key to the same bucket across ramps; the load-bearing property that makes rollout ramps monotonic (adding a bucket doesn't move previously-bucketed keys out).
  • concepts/audit-trail — first-class property of a configuration system where every change is recorded with diff + actor + timestamp.

Patterns extracted

  • patterns/do-plus-kv-edge-config-distribution — atomic control-plane writes to a single-writer [[systems/ cloudflare-durable-objects|Durable Object]] (source of truth, changelog), with async fan-out to globally replicated Workers KV for edge-local reads. Recurring Cloudflare architecture for hot-read / rare-write config surfaces that must be globally consistent eventually but don't need per-request strong consistency.
  • patterns/in-isolate-rule-evaluation — evaluate configuration rules (feature flags, routing rules, ACLs) in the same V8 isolate already serving the request, reading config from an edge-local KV cache. No outbound evaluation call, no long-lived in-process rules cache (invalid on isolate-evicted-between-requests substrates). Failure mode: return the caller-supplied default on evaluation errors.

Operational numbers

  • "Within seconds" — DO → KV propagation target for a flag write.
  • Up to 5 levels — nested AND/OR depth limit on rule conditions.
  • 4 variation types — boolean, string, number, full JSON object.
  • 5 runtimes — Workers, Node.js, Bun, Deno, browser (client-side with configurable-TTL pre-fetch cache).
  • Lower number = higher priority — rule evaluation order.
  • 1 line of config — the advertised cost of swapping OpenFeature providers.

Caveats

  • Product-announcement post, not a postmortem or scaling retrospective. No production numbers (QPS, latency percentiles, KV propagation p99, config size limits per app) are published in the post.
  • Private beta, not GA. Pricing "more details as we approach general availability".
  • DO → KV propagation is eventually consistent. "Within seconds" is the advertised target but the post gives no SLO / p99 bound, so agent-driven rapid-ramp loops must budget for post-write read-your-writes lag at the edge (consequential for the agent-self-enrolls-then-reads-metrics loop). Same consistency caveat that applies to any KV-backed edge config surface — no explicit novel mitigation named.
  • No discussion of flag-rule limits. Max flags per app, max rules per flag, max size of a JSON-object variation, max context attributes per evaluation — none disclosed.
  • No discussion of evaluation-engine CPU budget in the isolate. Rule evaluation happens on the CPU-billed Worker request; deeply-nested 5-level rules on hot paths will consume billable CPU but no ceiling is quoted.
  • Client-side browser provider caches with a "configurable TTL" — staleness vs. propagation latency is a knob the developer picks, with no post-supplied guidance on defaults.
  • Audit-trail retention mentioned as "field-level diffs" — retention window, query API, export format not specified.
  • No explicit comparison to LaunchDarkly / Statsig / Split / Unleash / Flipt as competitors — post stays focused on the Workers-native architecture story rather than cross-vendor comparison.
  • "AI is writing more code than ever" framing is the marketing lede — the architecture (DO + KV + isolate evaluation, OpenFeature provider, consistent-hash rollout) stands on its own regardless of whether the caller is an agent or a human.

Source

Last updated · 200 distilled / 1,178 read