CONCEPT Cited by 1 source
Feature-file size limit¶
A fixed-size cap on the number of rows / features / entries an internally-generated configuration file can contain, enforced at load time by the consumer. The cap typically exists for performance reasons — preallocated memory, no per-request allocation, no GC pressure, cache-friendly contiguous layout.
The cap becomes a load-bearing invariant: if data production ever exceeds the cap, the consumer must either (a) fail-open (log + fall back to a known-good prior file), (b) fail-closed (panic / error-out and return 5xx), or (c) silently truncate (risk subtle correctness bugs).
Canonical instance¶
Cloudflare's Bot Management
module on the FL2 proxy had a
200-feature cap, preallocated for performance. Actual usage
~60 features — "well above" headroom. On 2025-11-18, an upstream
ClickHouse permission migration caused the
feature-file generator to produce a file with doubled rows.
The over-200 file triggered a Rust .unwrap() panic on the
bounds check:
Every request hitting the bots module returned 5xx. ~3 hours of core-traffic impact. Fail-closed without a fail-open path.
See sources/2025-11-18-cloudflare-outage-on-november-18-2025.
Why the cap was a legitimate design choice¶
Bot Management runs on every request. Per-request CPU budget is in the microseconds. Preallocating the feature array means:
- No runtime allocation on the hot path.
- No GC pressure on the worker thread.
- Contiguous-layout cache locality for scoring inner loops.
- Predictable memory footprint per FL2 worker.
A variable-size feature array would trade all of that for flexibility.
Why the cap was a hazard¶
The cap is only load-bearing if data production can exceed
it. The feature-file-generation query grew by 2× without the
source code of the generator changing — the change happened
three systems upstream (ClickHouse grants → system.columns
metadata visibility → implicit-default-database assumption).
Neither the Bot Management team nor the ClickHouse team could
have reasonably caught this at code-review time.
Remediation stance¶
Cloudflare's stated #1 project (2025-11-18): "Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input."
The load-bearing invariant needs the same ingestion discipline as customer-submitted input: validate size, shape, value ranges, cross-field invariants before the load reaches the preallocated buffer; fall back to a known-good prior file on violation.
See patterns/harden-ingestion-of-internal-config and concepts/internally-generated-untrusted-input.
Seen in¶
- sources/2025-11-18-cloudflare-outage-on-november-18-2025 — canonical wiki instance.
Related¶
- concepts/preallocated-memory-budget — the optimization that creates the cap.
- concepts/internally-generated-untrusted-input — the trust-boundary confusion that makes the cap dangerous.
- concepts/unhandled-rust-panic — the crash shape when the cap is hit without a fail-open path.
- patterns/harden-ingestion-of-internal-config — the stated remediation discipline.
- concepts/blast-radius — the reason a single bad feature file took down the fleet.