Skip to content

CONCEPT Cited by 1 source

Feature-file size limit

A fixed-size cap on the number of rows / features / entries an internally-generated configuration file can contain, enforced at load time by the consumer. The cap typically exists for performance reasons — preallocated memory, no per-request allocation, no GC pressure, cache-friendly contiguous layout.

The cap becomes a load-bearing invariant: if data production ever exceeds the cap, the consumer must either (a) fail-open (log + fall back to a known-good prior file), (b) fail-closed (panic / error-out and return 5xx), or (c) silently truncate (risk subtle correctness bugs).

Canonical instance

Cloudflare's Bot Management module on the FL2 proxy had a 200-feature cap, preallocated for performance. Actual usage ~60 features — "well above" headroom. On 2025-11-18, an upstream ClickHouse permission migration caused the feature-file generator to produce a file with doubled rows. The over-200 file triggered a Rust .unwrap() panic on the bounds check:

thread fl2_worker_thread panicked:
  called Result::unwrap() on an Err value

Every request hitting the bots module returned 5xx. ~3 hours of core-traffic impact. Fail-closed without a fail-open path.

See sources/2025-11-18-cloudflare-outage-on-november-18-2025.

Why the cap was a legitimate design choice

Bot Management runs on every request. Per-request CPU budget is in the microseconds. Preallocating the feature array means:

  • No runtime allocation on the hot path.
  • No GC pressure on the worker thread.
  • Contiguous-layout cache locality for scoring inner loops.
  • Predictable memory footprint per FL2 worker.

A variable-size feature array would trade all of that for flexibility.

Why the cap was a hazard

The cap is only load-bearing if data production can exceed it. The feature-file-generation query grew by 2× without the source code of the generator changing — the change happened three systems upstream (ClickHouse grants → system.columns metadata visibility → implicit-default-database assumption). Neither the Bot Management team nor the ClickHouse team could have reasonably caught this at code-review time.

Remediation stance

Cloudflare's stated #1 project (2025-11-18): "Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input."

The load-bearing invariant needs the same ingestion discipline as customer-submitted input: validate size, shape, value ranges, cross-field invariants before the load reaches the preallocated buffer; fall back to a known-good prior file on violation.

See patterns/harden-ingestion-of-internal-config and concepts/internally-generated-untrusted-input.

Seen in

Last updated · 200 distilled / 1,178 read