PATTERN Cited by 1 source
Offline train, online resolve (compression)¶
Problem¶
Format-aware compression requires picking the right transform sequence + parameters for each data shape. If this picking happens in the hot path (at encode time, per-frame), the compressor pays the search cost every frame — slow + fragile. If it happens only once at integration time, it can't adapt to data drift and can't emit Pareto-curve tradeoff points for different workloads.
Solution¶
Split the work into two phases with a config object (a Plan) flowing between them:
- Offline training. Trainer consumes shape description (SDDL, parser function, or preset) + sample data; runs a budgeted search over transform choices and parameters; emits a Plan (or a Pareto-set of Plans spanning speed × ratio tradeoffs).
- Online resolve + encode. Encoder consumes the Plan + the current frame; resolves the Plan into a concrete Resolved Graph (picking branches at any control points); records the Resolved Graph into the frame; emits compressed bytes.
- Decode anywhere. Any OpenZL decoder reads the Resolved Graph from the frame and executes the inverse transform sequence. No out-of-band state.
Loop closes via Managed Compression: periodically re-sample production data, re-run the trainer, roll out updated Plans "like any other config change" (Source: sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework).
Canonical instance¶
- OpenZL (Meta, 2025) — "describe (SDDL) → train (produce a plan) → compress (emit frames with resolved graphs) → decode anywhere with the same binary." Precise statement of the pattern.
Why the split matters¶
Three architectural properties fall out of it:
- Training cost is amortized over many frames; the hot path just resolves, doesn't search.
- Plans are first-class config objects — stored, versioned, rolled out, compared, A/B tested.
- The decoder only needs to know how to run a Resolved Graph — it doesn't need to know the Plan format or the trainer version. This is what makes the universal decoder property feasible.
Precedents + siblings¶
- Zstandard dictionary compression (Meta, 2018) — the earlier form of the pattern at Meta. Dictionary was trained offline against samples, then distributed; the zstd binary consumed the dictionary at encode + decode time. The "Plan" was just a dictionary; there was no DAG of transforms. OpenZL generalizes this by making the config a graph instead of a flat dictionary.
- Query planner / executor split in databases — the query planner runs expensive optimization once against a statistics snapshot, produces a plan, and the executor runs the plan against each row. OpenZL's trainer is the query planner; its encoder is the executor.
- JIT compilation — hot-path traces get heavily optimized once, then executed many times.
Seen in¶
- sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework — canonical wiki source. Meta's own phrasing: "describe → train → compress → decode anywhere."
Related¶
- systems/openzl · systems/managed-compression-meta
- concepts/compression-plan — the object that flows through the pipeline.
- concepts/universal-decoder — the decoder-side invariant this pattern makes possible.
- concepts/format-aware-compression — the parent category.
- patterns/embedded-decode-recipe-in-frame — what makes "decode anywhere" work.
- patterns/graceful-upgrade-via-monoversion-decoder — the rollout discipline paired with this pattern.