PATTERN Cited by 1 source
Fallback to general-purpose compressor¶
Problem¶
Format-aware compression relies on exploitable structure. For pure text (enwik, dickens, free-form prose) and unknown / unparseable formats, there is no structure for a specialized compressor to exploit. A naive format-aware compressor in that regime performs worse than a general-purpose compressor — because the specialized transforms aren't winning anything and the specialization overhead is still being paid.
Solution¶
Make the general-purpose compressor an available sub-plan inside the format-aware compressor. When the trainer can't find a beneficial specialization, it emits a Plan that reduces to "just run zstd / gzip / $generic." The format-aware compressor's worst case is the general-purpose compressor's performance, not worse.
From the OpenZL post (Source: sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework):
"When there is no structure, there is no advantage. This is typically the case in pure text documents, such as enwik or dickens. In these cases, OpenZL falls back to zstd, offering essentially the same level of performance."
And:
"Nonetheless and not pictured here, OpenZL always has the option to fallback to the zstd codec, so its performance can be lower-bounded by zstd."
Canonical instance¶
- OpenZL (Meta, 2025) — zstd is the fallback engine. The OpenZL primitive library includes zstd as a codec choice, so the trainer can legitimately choose "just run zstd" as a Plan.
Why this pattern works¶
Three properties the fallback provides:
- Universally-applicable worst-case bound. A specialization framework that can always punt to the generic tool has a hard lower bound on its worst case: zstd's performance. No dataset can make OpenZL worse than zstd.
- Reduces the trainer's burden. The trainer doesn't have to solve every input; it has to recognize when specialization is unprofitable and return to the fallback. Recognizing the fallback condition is easier than producing an optimal specialized Plan.
- Makes the tool broadly adoptable. Users don't have to pre-classify their data into "structured" vs "unstructured" to decide whether to use the specialized tool. They run OpenZL everywhere; it wins on structured inputs, matches zstd on unstructured inputs.
Structural limit: parse cost¶
Fallback covers "no structure" but doesn't cover "structure, but expensive to parse." CSV is Meta's canonical example — OpenZL parses the CSV format to deliver excellent ratio, but compression speed caps at "about 64 MB/s" against zstd's "~1 GB/s" on the same input. The fallback-to-zstd path would give zstd speed, but worse ratio than parsed-CSV. Users choose their point on the Pareto curve.
Sibling: the llm-cascade / llm-cost-fallback pattern¶
At a higher level of abstraction, the wiki has concepts/llm-cascade — cheap specialized model first, expensive generalist model as fallback. The shape is the same: specialized-first, generalist-always- available, user sees a better-or-equal result than the generalist alone.
Seen in¶
- sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework — canonical wiki instance of the pattern in the compression domain.
Related¶
- systems/openzl — canonical instance.
- systems/zstandard-zstd — the fallback engine.
- concepts/format-aware-compression — the category this pattern makes safely deployable.